-
Notifications
You must be signed in to change notification settings - Fork 57
fix(vespa): fix search on participant fields #1166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThree Vespa search schema files were updated to remove the "index" directive from multiple field indexing configurations. The fields affected span across event, mail, and user schemas, reducing BM25 full-text search capability while maintaining attribute and summary indexing. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @junaid-shirur, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refines the indexing strategy for various participant-related fields within the Vespa schemas for events, mail, and users. By removing the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to fix search behavior on participant fields by modifying Vespa schemas. The changes involve removing the index property from the indexing configuration for several fields in event.sd, mail.sd, and user.sd. While this correctly alters the indexing strategy, it leaves behind orphaned index configuration blocks and properties. My review highlights these redundant configurations, which should be removed to ensure schema clarity, improve maintainability, and prevent potential deployment issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
server/vespa/schemas/user.sd (1)
15-20: Critical: Inconsistent indexing configuration will break search functionality.The
indexdirective has been removed from theindexingline, but theindex { enable-bm25 }block is still present. Without theindexdirective in the indexing pipeline, no text index will be created, rendering the BM25 configuration non-functional.This field is actively used in rank profiles:
- Line 205:
bm25(email)incombined_bm25()- Line 216:
nativeRank(email)incombined_nativeRank()- Multiple match-features references (lines 230, 246, 300, 330, 341)
Apply this diff to restore the index directive:
field email type string { - indexing: attribute | summary + indexing: index | attribute | summary index { enable-bm25 } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
server/vespa/schemas/event.sd(1 hunks)server/vespa/schemas/mail.sd(1 hunks)server/vespa/schemas/user.sd(1 hunks)
| field attendeesNames type array<string> { | ||
| indexing: index | attribute | summary | ||
| indexing: attribute | summary | ||
| index: enable-bm25 | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Inconsistent indexing configuration will break search functionality.
The index directive has been removed from the indexing line, but index: enable-bm25 is still present. In Vespa, the index directive in the indexing pipeline is required to create a text index. Without it, the field won't be indexed for text search, making the enable-bm25 setting non-functional.
This field is actively used in multiple rank profiles:
- Line 195:
bm25(attendeesNames)incombined_bm25() - Line 216:
nativeRank(attendeesNames)incombined_nativeRank() - Lines 245, 304, 338, 367, 381: Match features
Removing the index directive will cause these rank expressions to fail or return zero scores.
Apply this diff to restore the index directive:
field attendeesNames type array<string> {
- indexing: attribute | summary
+ indexing: index | attribute | summary
index: enable-bm25
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| field attendeesNames type array<string> { | |
| indexing: index | attribute | summary | |
| indexing: attribute | summary | |
| index: enable-bm25 | |
| } | |
| field attendeesNames type array<string> { | |
| indexing: index | attribute | summary | |
| index: enable-bm25 | |
| } |
🤖 Prompt for AI Agents
In server/vespa/schemas/event.sd around lines 82 to 85, the field attendeesNames
has removed the "index" directive from its indexing pipeline but still declares
"index: enable-bm25", which leaves the field without a text index; restore the
indexing pipeline to include "index" (e.g., change the indexing line to include
index between attribute and summary) so the field is actually text-indexed and
the existing bm25/nativeRank/match features using attendeesNames continue to
work.
| field from type string { | ||
| indexing: attribute | summary | index | ||
| indexing: attribute | summary | ||
| attribute: fast-search | ||
| index: enable-bm25 | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Inconsistent indexing configuration will break search functionality.
The index directive has been removed from the indexing line, but index: enable-bm25 is still present. Without the index directive in the indexing pipeline, no text index will be created, rendering the BM25 configuration non-functional.
This field is actively used in rank profiles:
- Line 175:
matches(from)inmatchedFieldCount() - Lines 180, 184:
nativeRank(from)inpeopleRank()andsimplePeopleRank()
Apply this diff to restore the index directive:
field from type string {
- indexing: attribute | summary
+ indexing: index | attribute | summary
attribute: fast-search
index: enable-bm25
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| field from type string { | |
| indexing: attribute | summary | index | |
| indexing: attribute | summary | |
| attribute: fast-search | |
| index: enable-bm25 | |
| } | |
| field from type string { | |
| indexing: index | attribute | summary | |
| attribute: fast-search | |
| index: enable-bm25 | |
| } |
🤖 Prompt for AI Agents
In server/vespa/schemas/mail.sd around lines 60-64, the field `from` currently
omits the `index` directive in its `indexing:` entry while still specifying
`index: enable-bm25`, which prevents creation of a text index and breaks BM25
usage in rank profiles (matches(from), nativeRank(from)). Restore the `index`
directive in the `indexing:` line so it includes `index` (e.g., `indexing: index
| attribute | summary`), keeping `index: enable-bm25` as-is; ensure whitespace
and order match project conventions.
| field to type array<string> { | ||
| indexing: attribute | summary | index | ||
| indexing: attribute | summary | ||
| attribute: fast-search | ||
| index: enable-bm25 | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Inconsistent indexing configuration will break search functionality.
The index directive has been removed from the indexing line, but index: enable-bm25 is still present. Without the index directive in the indexing pipeline, no text index will be created, rendering the BM25 configuration non-functional.
This field is actively used in rank profiles:
- Line 175:
matches(to)inmatchedFieldCount() - Lines 180, 184:
nativeRank(to)inpeopleRank()andsimplePeopleRank()
Apply this diff to restore the index directive:
field to type array<string> {
- indexing: attribute | summary
+ indexing: index | attribute | summary
attribute: fast-search
index: enable-bm25
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| field to type array<string> { | |
| indexing: attribute | summary | index | |
| indexing: attribute | summary | |
| attribute: fast-search | |
| index: enable-bm25 | |
| } | |
| field to type array<string> { | |
| indexing: index | attribute | summary | |
| attribute: fast-search | |
| index: enable-bm25 | |
| } |
🤖 Prompt for AI Agents
In server/vespa/schemas/mail.sd around lines 66 to 70, the field definition
removed the text index from the indexing directive but left `index:
enable-bm25`, which is inconsistent and prevents creation of a text index needed
by BM25 and rank profiles; restore a text index by adding the appropriate
`index` token back into the `indexing` line (e.g., include `index` alongside
`attribute | summary` or replace the `indexing` value with `index | attribute |
summary`) so the `index: enable-bm25` setting has a corresponding text index to
apply to.
| field cc type array<string> { | ||
| indexing: attribute | summary | index | ||
| indexing: attribute | summary | ||
| attribute: fast-search | ||
| index: enable-bm25 | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Inconsistent indexing configuration will break search functionality.
The index directive has been removed from the indexing line, but index: enable-bm25 is still present. Without the index directive in the indexing pipeline, no text index will be created, rendering the BM25 configuration non-functional.
This field is actively used in rank profiles:
- Line 175:
matches(cc)inmatchedFieldCount() - Lines 180, 184:
nativeRank(cc)inpeopleRank()andsimplePeopleRank()
Apply this diff to restore the index directive:
field cc type array<string> {
- indexing: attribute | summary
+ indexing: index | attribute | summary
attribute: fast-search
index: enable-bm25
}🤖 Prompt for AI Agents
In server/vespa/schemas/mail.sd around lines 72 to 76, the field "cc" has
removed the text index from its indexing pipeline but still keeps "index:
enable-bm25", which is useless without a text index; restore the text index by
adding "index: (enable-bm25)" back into the indexing directive (or include
"index" in the indexing line, e.g. "index: enable-bm25" within the indexing
pipeline) so a text index is created and BM25-based functions like matches(cc)
and nativeRank(cc) work again.
| field bcc type array<string> { | ||
| indexing: attribute | summary | index | ||
| indexing: attribute | summary | ||
| attribute: fast-search | ||
| index: enable-bm25 | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Inconsistent indexing configuration will break search functionality.
The index directive has been removed from the indexing line, but index: enable-bm25 is still present. Without the index directive in the indexing pipeline, no text index will be created, rendering the BM25 configuration non-functional.
This field is actively used in rank profiles:
- Line 175:
matches(bcc)inmatchedFieldCount() - Lines 180, 184:
nativeRank(bcc)inpeopleRank()andsimplePeopleRank()
Apply this diff to restore the index directive:
field bcc type array<string> {
- indexing: attribute | summary
+ indexing: index | attribute | summary
attribute: fast-search
index: enable-bm25
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| field bcc type array<string> { | |
| indexing: attribute | summary | index | |
| indexing: attribute | summary | |
| attribute: fast-search | |
| index: enable-bm25 | |
| } | |
| field bcc type array<string> { | |
| indexing: index | attribute | summary | |
| attribute: fast-search | |
| index: enable-bm25 | |
| } |
🤖 Prompt for AI Agents
In server/vespa/schemas/mail.sd around lines 78–82, the bcc field's indexing
pipeline accidentally dropped the index directive so the existing "index:
enable-bm25" has no effect; restore the indexing pipeline to include the index
directive (so a text index is created) while keeping the attribute and summary
options and preserving "index: enable-bm25" and "attribute: fast-search" so
matches(bcc) and nativeRank(bcc) used in the rank profiles continue to work.
| field name type string { | ||
| indexing: index | attribute | summary | ||
| indexing: attribute | summary | ||
| index { | ||
| enable-bm25 | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Inconsistent indexing configuration will break search functionality.
The index directive has been removed from the indexing line, but the index { enable-bm25 } block is still present. Without the index directive in the indexing pipeline, no text index will be created, rendering the BM25 configuration non-functional.
This field is actively used in rank profiles:
- Line 205:
bm25(name)incombined_bm25() - Line 216:
nativeRank(name)incombined_nativeRank() - Multiple match-features references (lines 230, 246, 299, 329, 341)
Apply this diff to restore the index directive:
field name type string {
- indexing: attribute | summary
+ indexing: index | attribute | summary
index {
enable-bm25
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| field name type string { | |
| indexing: index | attribute | summary | |
| indexing: attribute | summary | |
| index { | |
| enable-bm25 | |
| } | |
| } | |
| field name type string { | |
| indexing: index | attribute | summary | |
| index { | |
| enable-bm25 | |
| } | |
| } |
🤖 Prompt for AI Agents
In server/vespa/schemas/user.sd around lines 7 to 12 the field "name" has an
indexing pipeline that omits the "index" directive while an index { enable-bm25
} block remains; restore the text index by updating the indexing line to include
"index" (e.g., indexing: index | attribute | summary) so the BM25 configuration
becomes effective and rank profile calls like bm25(name) / nativeRank(name) and
other match-feature references will work correctly.
Description
Removed the index property from the participants fields to enable partial (substring) search using
matches.Even without the index property, the field remains searchable via userInput and our existing rank profiles.
For more details, refer to the Vespa documentation.
Testing
Additional Notes