-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Fm/stg 956 make ci faster #1246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🦋 Changeset detectedLatest commit: 8202603 The changes in this PR will be included in the next version bump. This PR includes changesets to release 2 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR significantly optimizes CI performance by implementing parallel eval execution, build artifact caching, and smart test skipping. The changes reduce total CI time from ~270 minutes to ~90 minutes (67% reduction) through several key improvements:
- Parallel Execution: Eval jobs now run in parallel after regression completes, instead of sequentially
- Build Artifact Caching: Build artifacts are uploaded once and reused across all eval jobs, eliminating 6+ redundant builds
- Smart Skipping: New
skip-evalsandskip-regression-evalslabels plus automatic skipping for docs-only changes - Dependency Optimization: Uses
pnpm/action-setup@v4with caching and--frozen-lockfilefor faster, more deterministic installs - Test Parallelization: Increased worker counts for better resource utilization (CI: 3-4 workers, local: 5-6 workers)
The implementation properly handles dependencies between jobs, with all eval jobs depending on run-build to ensure artifacts are available. E2E tests still build independently to avoid artifact download overhead for their simpler needs.
Confidence Score: 4/5
- This PR is safe to merge with careful monitoring of the first few CI runs
- The changes are well-architected and achieve significant CI performance improvements. The parallel execution strategy is sound, build artifact caching is implemented correctly, and dependency management is improved. However, there's one potential edge case with the docs-only filter that could cause markdown files in
packages/to incorrectly skip evals when they shouldn't. The increased test parallelization may also need monitoring for flakiness. - Monitor
.github/workflows/ci.ymlcarefully during the first few CI runs to ensure artifact sharing works correctly and the docs-only filter behaves as expected
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| .github/workflows/ci.yml | 4/5 | Implements parallel eval execution, build artifact caching, smart skipping logic, and dependency optimization. Changes reduce CI time from ~270 to ~90 minutes through parallelization and artifact reuse. |
| packages/core/lib/v3/tests/v3.bb.playwright.config.ts | 5/5 | Increases Browserbase test parallelization from 2 to 3 workers locally while keeping CI at 2 workers for resource management. |
| packages/core/lib/v3/tests/v3.local.playwright.config.ts | 5/5 | Increases local test parallelization from 2 to 3 workers in CI and 5 locally for improved performance. |
| packages/core/lib/v3/tests/v3.playwright.config.ts | 5/5 | Increases test parallelization from 2 to 4 workers in CI and 6 locally for improved test execution speed. |
Sequence Diagram
sequenceDiagram
participant DetermineChanges
participant DetermineEvals
participant RunLint
participant RunBuild
participant E2ELocal
participant E2EBB
participant Regression
participant Combination
participant Act
participant Extract
participant Observe
participant TargetedExtract
participant Agent
DetermineChanges->>DetermineEvals: outputs (core, evals, docs-only)
DetermineChanges->>RunLint: if core or evals changed
DetermineChanges->>RunBuild: if core or evals changed
RunLint->>E2ELocal: needs
RunBuild->>E2ELocal: needs
RunBuild-->>RunBuild: Upload build artifacts
RunLint->>E2EBB: needs
RunBuild->>E2EBB: needs
E2ELocal->>Regression: needs
E2EBB->>Regression: needs
RunBuild->>Regression: needs (for artifacts)
DetermineEvals->>Regression: check skip flags
Regression-->>Regression: Download build artifacts
Note over Combination,Agent: Parallel execution after regression
Regression->>Combination: needs
RunBuild->>Combination: needs (for artifacts)
DetermineEvals->>Combination: check labels
Combination-->>Combination: Download build artifacts
Regression->>Act: needs
RunBuild->>Act: needs (for artifacts)
DetermineEvals->>Act: check labels
Act-->>Act: Download build artifacts
Regression->>Extract: needs
RunBuild->>Extract: needs (for artifacts)
DetermineEvals->>Extract: check labels
Extract-->>Extract: Download build artifacts
Regression->>Observe: needs
RunBuild->>Observe: needs (for artifacts)
DetermineEvals->>Observe: check labels
Observe-->>Observe: Download build artifacts
Regression->>TargetedExtract: needs
RunBuild->>TargetedExtract: needs (for artifacts)
DetermineEvals->>TargetedExtract: check labels
TargetedExtract-->>TargetedExtract: Download build artifacts
Regression->>Agent: needs
RunBuild->>Agent: needs (for artifacts)
DetermineEvals->>Agent: check labels
Agent-->>Agent: Download build artifacts
6 files reviewed, 1 comment
| docs-only: | ||
| - '**/*.md' | ||
| - 'examples/**' | ||
| - '!packages/**/*.md' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Check that the negation pattern !packages/**/*.md works as intended. Verify that changes to files like packages/core/README.md or packages/evals/CHANGELOG.md don't incorrectly set docs-only=true and skip evals when they shouldn't.
Prompt To Fix With AI
This is a comment left during a code review.
Path: .github/workflows/ci.yml
Line: 48:51
Comment:
**style:** Check that the negation pattern `!packages/**/*.md` works as intended. Verify that changes to files like `packages/core/README.md` or `packages/evals/CHANGELOG.md` don't incorrectly set `docs-only=true` and skip evals when they shouldn't.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| run-combination-evals: | ||
| needs: [run-regression-evals, determine-evals] | ||
| needs: [run-regression-evals, run-build, determine-evals] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
skip-regression-evals label never lets other evals run
The new skip-regression-evals label is intended to bypass only the regression run, but every subsequent eval job still declares needs: [run-regression-evals, …]. In GitHub Actions a job whose dependency is skipped is itself skipped, so tagging a PR with skip-regression-evals will prevent run-combination-evals, run-act-evals, run-extract-evals, etc. from executing even when their labels are present. This makes the label ineffective and blocks targeted eval runs. Consider removing the hard dependency on run-regression-evals (or guarding it with if: always()) so other eval jobs can proceed when regression is intentionally skipped.
Useful? React with 👍 / 👎.
why
CI was slow due to sequential eval jobs, repeated builds, and inefficient dependency management.
what changed
Parallel Execution:
Build Caching:
Smart Skipping:
Dependency Optimization:
Test Parallelization:
test plan