refactor: defer zip manifest building to execution phase to improve analysis phase performance #3381
+35
−24
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.

When py_binary/py_test were being built, they were flattening the runfiles
depsets at analysis time in order to create the zip file mapping manifest for
their implicit zipapp outputs. This flattening was necessary because they had
to filter out the original main executable from the runfiles that didn't belong
in the zipapp. This flattening is expensive for large builds, in some cases
adding over 400 seconds of time and significant memory overhead.
To fix, have the zip file manifest use the
runfiles_with_exeobject, which isthe runfiles, but pre-filtered for the files zip building doesn't want. This
then allows passing the depsets directly to
Args.add_alland using map_eachto transform them.
Additionally, pass
runfiles.empty_filenamesusing a lambda. Accessing thatattribute implicitly flattens the runfiles.
Finally, because the original profiles indicated
str.format()was a non-trivialamount of time (46 seconds / 15% of build time), switch to using
+instead.This is a more incremental alternative to #3380 which achieves most of the
same optimization with only Starlark changes, as opposed to introducing an
external script written in C++.
Profile of a large build, which shows a Starlark CPU profile. It shows an overall build
time of 305 seconds. 46 seconds (15%) are spent in
map_zip_runfiles, half of whichis in
str.startswith()and the other half instr.format().