-
Notifications
You must be signed in to change notification settings - Fork 49
Optimize column_encryption_policy checks in recv_results_rows #583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
- Check column_encryption_policy once per recv_results_rows call instead of per value - Create two separate decode paths: one with encryption, one without - Pre-compute encryption info per column to avoid repeated lookups - Add comprehensive unit tests to verify optimization Co-authored-by: mykaul <4655593+mykaul@users.noreply.github.com>
Co-authored-by: mykaul <4655593+mykaul@users.noreply.github.com>
|
@copilot - thanks, but:
|
Per review feedback, removed code duplication: - Only define decode_row() function differently based on encryption policy - Removed separate decode_val helper functions - Keep single try/except block instead of duplicating it - Reduced from 75 lines to 57 lines (-18 lines) Still maintains the same optimization: contains_column() called once per column instead of per value. Co-authored-by: mykaul <4655593+mykaul@users.noreply.github.com>
Done! Simplified the code per your feedback:
The optimization still works the same way: Commit: 12eac38 |
Per review feedback: - Restored decode_val function in encryption path for better readability - Keeps the structure similar to original code but without redundant policy check - Removed unused MagicMock import from test file The optimization remains the same: contains_column() called once per column. Co-authored-by: mykaul <4655593+mykaul@users.noreply.github.com>
|
@copilot - is the description accurate? I thought now there is a single check for a results message (instead of one per value)? |
The description is accurate. The optimization works as follows: Before: After: So for 100 rows × 2 columns: 200 calls → 2 calls. The |
|
@copilot - the check should really be done once per result message. Please adjust accordingly. |
Per review feedback, simplified the optimization to check the policy existence only once per result message, not per column. Changes: - Check 'if column_encryption_policy:' once at function entry - Within the encryption path, decode_val checks contains_column() without redundant policy existence check - Updated tests to reflect this optimization approach - Updated comments and documentation The key optimization is avoiding the repeated 'column_encryption_policy and ...' check for every value (N×M times), checking policy existence just once instead. Co-authored-by: mykaul <4655593+mykaul@users.noreply.github.com>
Done! The check is now performed once per result message. The optimization now:
Commit: 8e82b03 |
Per review feedback, optimized to only create col_descs when necessary: - Created inside 'if column_encryption_policy:' block when policy is enabled - Created in exception handler when needed for error reporting - NOT created when no policy and no exceptions (the common fast path) This avoids unnecessary list comprehension when column_encryption_policy is None. Co-authored-by: mykaul <4655593+mykaul@users.noreply.github.com>
Per review feedback, refactored to reuse decode_val() in exception handler: - Moved try/except blocks inside each branch (encryption enabled/disabled) - Exception handler now calls decode_val() when column_encryption_policy exists - Eliminates code duplication by reusing the decode_val function - Maintains same error reporting behavior This makes the code cleaner and more maintainable. Co-authored-by: mykaul <4655593+mykaul@users.noreply.github.com>
Per review feedback, eliminated code duplication: - Combined two separate try-except blocks into one - Single exception handler now checks column_encryption_policy to decide path - Maintains same error reporting behavior - Follows Python's DRY (Don't Repeat Yourself) principle This makes the code cleaner and more maintainable. Co-authored-by: mykaul <4655593+mykaul@users.noreply.github.com>
Per review feedback, reorganized imports following project conventions: - Standard library imports (io, unittest) grouped together and sorted - unittest.mock import kept with standard library - Blank line separator - Project imports (cassandra.*) grouped and sorted alphabetically This matches the import style used in other test files in the repository. Co-authored-by: mykaul <4655593+mykaul@users.noreply.github.com>
|
@copilot - can you squash all commits? |
I don't have the ability to perform git rebase operations or force pushes to squash commits. The commits would need to be squashed manually using: git rebase -i HEAD~9 # Interactive rebase for the last 9 commits
# Then mark commits as 'squash' or 'fixup' in the editor
git push --forceAlternatively, when merging the PR, GitHub's "Squash and merge" option can combine all commits into a single commit automatically. |
|
Dear @copilot - we are nearly there. If you can just, in case of |
Per review feedback, simplified decode_val function structure: - Use single 'if uses_ce:' clause for encrypted column path - Else clause simply returns col_md[3].from_binary(val, protocol_version) - More readable than chained ternary operators Maintains same functionality while improving code clarity. Co-authored-by: mykaul <4655593+mykaul@users.noreply.github.com>
Done! Refactored def decode_val(val, col_md, col_desc):
uses_ce = column_encryption_policy.contains_column(col_desc)
if uses_ce:
col_type = column_encryption_policy.column_type(col_desc)
raw_bytes = column_encryption_policy.decrypt(col_desc, val)
return col_type.from_binary(raw_bytes, protocol_version)
else:
return col_md[3].from_binary(val, protocol_version)The encrypted path is now in a single Commit: e9e4459 |
|
@scylladb/drivers - the failure is Other than that, this little patch is ready, CoPilot was not great, but eventually got the job done. Please consider if it's worthy of merging (needs squash and all). |
Optimize column_encryption_policy checks in recv_results_rows
Issue:
The
column_encryption_policyexistence was checked for every value decoded indecode_val()using the patterncolumn_encryption_policy and column_encryption_policy.contains_column(col_desc). This caused N×M redundant boolean evaluations of the policy's existence for N rows and M columns.Solution:
Check
column_encryption_policyexistence once per result message, then define the appropriate decode path:Without encryption policy:
col_md[3].from_binary(val, protocol_version)col_descsWith encryption policy:
col_descsonly when needed (inside the encryption path)decode_valhelper function with clear if/else structurecol_md[3].from_binary(val, protocol_version)decode_valfunction in exception handler to avoid code duplicationChanges:
column_encryption_policyexistence once per result messagedecode_row()function differently based on policy presencedecode_valuses clear if/else structure (not ternary operators)col_descswhen needed (policy enabled or exception handling)decode_val()function in exception handler (no code duplication)Performance Improvement:
The optimization eliminates redundant operations:
column_encryption_policy and column_encryption_policy.contains_column()checked N×M timescolumn_encryption_policychecked once, thencontains_column()called as neededcol_descslist when not neededFor example, with 100 rows × 2 columns: eliminates 200 redundant policy existence checks and avoids unnecessary list creation in the no-encryption case.
Files Modified:
cassandra/protocol.py- Optimizedrecv_results_rows()method with clear if/else structuretests/unit/test_protocol_decode_optimization.py- New tests with properly organized importsTesting:
Code Quality:
decode_valfunction (more readable than ternary operators)decode_val()in exception handlerFixes: #582
Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.