Skip to content

Conversation

@PhilTaken
Copy link
Member

The most relevant commit is 8d569b2.

Due to the previous implementation working with generous amounts of temporary files (especially so when using age-diffable), the IO overhead from reading from and writing to these files is large enough to severely slow down secret handling operations.

This PR implements in-memory handling of deployment secrets via bindings to the rage library, a rust implementation of the age spec. This leads to massively improved secret handling performance.

While this PR does introduce some added requirements on external libraries, it also relaxes the dependence on the (r)age cli.

When implementing this PR, I discovered a few bugs in the test implementation such as improper isolation of calls to git and gpg having issues creating an agent socket in nested folder structures as well as a dependence on a very specific terminal width where a bunch of tests would fail due to very strict string comparison. I have added these changes to this PR.

N.B.: due to limitations around handling of RSA keys in the cryptography library that rage depends on, RSA keys with moduli > 4096 are (currently) not supported via the bindings. In these cases, the implementation attempts to fall back to calling the age binary - see commit 000b98a. Since rage is incompatible with these RSA keys anyways, the previous command discovery implementation could be simplified.

@PhilTaken PhilTaken self-assigned this Feb 17, 2025
@PhilTaken PhilTaken requested a review from zagy as a code owner February 17, 2025 10:32
@PhilTaken PhilTaken requested a review from ctheune February 17, 2025 10:32
@elikoga
Copy link
Member

elikoga commented Feb 17, 2025

Calling out to the rage library is probably the right call for performance here. I distinctly remember being very frustrated during development of the age binary integration.

@PhilTaken
Copy link
Member Author

The largest performance hog now is the call out to op to read the password, at around 300-600ms each the impact is pretty significant if you want to use batou secrets decrypttostdout in git diffs. It currently sits around 1 second total for each side of the diff for each commit.

Certainly usable but could be significantly improved imho.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to note that 3.7 is no longer supported.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in eb1bd5f

"""\
batou/2... (cpython 3...)
================================== Preparing ===================================
... Preparing ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not quite clear why this changes when changing the encryption?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not directly related to the encryption but was required so I can run the test suite on my machine to test all other changes. The specific commit is d4cacca. I can move these out to a separate PR now if you want.

@zagy
Copy link
Member

zagy commented Feb 27, 2025

I also get an error in our standard deploy container, meaning there is another dependency which might not be available everywhere (think switches). While a remote system does not need do decrypt (since that happens locally) the dependency would be required. We need to re-think this, I suppose.

  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'error'
  error: subprocess-exited-with-error

  × Preparing metadata (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [6 lines of output]

      Cargo, the Rust package manager, is not installed or is not on PATH.
      This package requires Rust and Cargo to compile extensions. Install it through
      the system's package manager or via https://rustup.rs/

      Checking for Rust toolchain....
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

@PhilTaken PhilTaken force-pushed the phil/speedup-batou-age-interaction branch from 7b7b6f5 to ba432a8 Compare February 27, 2025 08:41
@PhilTaken
Copy link
Member Author

Yes, the added dependency on pyrage does require a rust compiler if there is no wheel available for that platform.

We could gate the depedency on pyrage behind a feature flag that is enabled by default so it can be turned off for some systems 🤔

key_content = f.read()
try:
priv_key = serialization.load_ssh_private_key(key_content, None)
except ValueError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually should be

Suggested change
except ValueError:
except (ValueError, TypeError):

I ran into an issue where batou secrets edit staging failed with the error message thrown inside cryptography

Key is password-protected, but password was not provided.

that is a TypeError

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add, thanks for the review. I did not consider that case.

If only it was clear what kinds of exceptions could be thrown when calling a function...

@PhilTaken PhilTaken force-pushed the phil/speedup-batou-age-interaction branch from eb1bd5f to 844647f Compare June 12, 2025 11:46
@PhilTaken
Copy link
Member Author

PhilTaken commented Jun 12, 2025

I will attempt to loosen the dependency on pyrage by adding a bit of yes/no logic in the encryption code such that systems that either don't need pyrage at all or don't have a pre-built pyrage wheels available are not deprecated with this PR

@PhilTaken PhilTaken force-pushed the phil/speedup-batou-age-interaction branch from 54e8738 to f7d4c98 Compare August 15, 2025 09:03
Massively improve secret handling performance by avoiding large
amounts of writing to / reading from temporary files.

This solves a lot of complexity around the age call because it
does not require piping the ssh key's password into a subprocess' stdin
conditionally.

Instead, this can be handled using `cryptography` to handle the
password-protected ssh key and `pyrage`, python bindings to the rage
library which supports de/encrypting in-memory.

Another benefit of this implementation is a relaxed requirement (now
optional) on the `age` CLI since the bindings to the rage library
provide a full implementation out of the box.
The test suite provides some git configuration, additional
custom git configuration should not cause test cases to fail.

GPG has a maximum length for the path to the GPG agent socket,
Copying the entire gpg directory to a temporary directory
ensures that the path is not too long when the repo is cloned into a somewhat
nested directory structure.
recipient errors can occur when using rsa keys with module > 4096 as
recipients with rage, age can handle those
@PhilTaken PhilTaken force-pushed the phil/speedup-batou-age-interaction branch from 679e1cb to 664b79c Compare August 22, 2025 09:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants