Skip to content

Conversation

@iitslamaa
Copy link
Contributor

Summary

Introduces an inline diff viewer for comparing two eval runs (via eval IDs or local JSON). Adds structured comparison UI, drawer view for detailed diffs, and minor styling updates to the existing Results page.

Changes
• Inline Compare Mode
• Automatically enabled when both baseline and current query params are present (/eval?baseline=…&current=…)
• Displays aggregated comparison summary and per-row deltas
• Results Table
• Added Status, Pass Δ, and Score Δ columns
• Zebra striping and sticky header for readability
• Row click opens detailed diff drawer
• “same” status renamed to “unchanged”
• Diff Drawer
• Shows baseline and current outputs
• Unified diff rendering with clear additions/removals
• Graceful empty state when no textual change is detected
• UI Improvements
• “Show only changed” implemented as an on/off switch
• Additional spacing for summary chip section
• Minor adjustments for chip layout consistency

Implementation Details
• New files under src/app/src/pages/eval/diff/:
• components/ResultsTable.tsx
• components/DiffDrawer.tsx
• components/InlineDiff.tsx
• hooks/useRunSummary.ts
• hooks/useRunComparison.ts
• lib/match.ts, match.test.ts, types.ts
• Integrated into ResultsView.tsx to conditionally render inline diff mode when baseline and current params are detected.
• Includes basic sample data under public/diff-sample/ for local testing.

Testing
1. Place test JSONs in public/diff-sample/:
• baseline.json
• current.json
2. Run dev server and open
/eval?baseline=/diff-sample/baseline.json&current=/diff-sample/current.json
3. Verify:
• Correct summary chip counts
• Table rows reflect expected added/removed/changed items
• Drawer shows unified diff output
• “Show only changed” filters rows correctly

Follow-ups
• Add color-coded summary chips for improved readability
• Add word-level diff highlighting
• Add copy buttons in drawer
• Add scroll-into-view behavior on row selection

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant