Skip to content

Conversation

@sallybg
Copy link
Contributor

@sallybg sallybg commented Oct 29, 2025

No description provided.

Copy link
Collaborator

@bencap bencap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on the scripts, and let me know if you have any questions about refactoring to share these methods with a worker process.

Comment on lines 570 to 582
value = str(mapping.hgvs_g) if mapping and mapping.hgvs_g else ""
elif column_key == "post_mapped_hgvs_p":
hgvs_str = get_hgvs_from_post_mapped(mapping.post_mapped) if mapping and mapping.post_mapped else None
if hgvs_str is not None and is_hgvs_p(hgvs_str):
value = hgvs_str
value = str(mapping.hgvs_p) if mapping and mapping.hgvs_p else ""
elif column_key == "post_mapped_hgvs_c":
value = str(mapping.hgvs_c) if mapping and mapping.hgvs_c else ""
elif column_key == "post_mapped_hgvs_at_assay_level":
value = str(mapping.hgvs_assay_level) if mapping and mapping.hgvs_assay_level else ""
elif column_key == "vep_functional_consequence":
vep_functional_consequence = mapping.vep_functional_consequence if mapping else None
if vep_functional_consequence is not None:
value = vep_functional_consequence
else:
value = ""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of the empty string to represent null values here we should use the na_rep.


from mavedb.scripts.environment import script_environment, with_database_session

CLINGEN_API_URL = "https://reg.test.genome.network/allele"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should import this from /src/mavedb/lib/clingen, and should also probably change that one to not be the test site (https://ldh.genome.network/ldh/srvc).

Probably reworking the environment variable CAR_SUBMISSION_ENDPOINT would be the best thing to do, so we can have the URL be more configurable and use the test one in dev environments.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could just add an issue for reworking the env var though, to keep this task clean.

)

# for variant_urn, post_mapped, clingen_allele_id in variant_info:
for variant_urn, mapped_variant in variant_info:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since score sets may have a lot of variants and it can be frustrating to invoke the routine and not have insight into how far we've gotten, it may be worthwhile to convert that variant_info result into a list and log our progress every 10% of variants.

Suggested change
for variant_urn, mapped_variant in variant_info:
variant_info_list = variant_info.all()
num_variants = len(variant_info_list)
# for variant_urn, post_mapped, clingen_allele_id in variant_info:
for v_idx, (variant_urn, mapped_variant) in enumerate(variant_info_list):
if (v_idx + 1) % ((num_variants + 9) // 10) == 0:
logger.info(
f"Processing variant {v_idx+1}/{num_variants} ({variant_urn}) for score set {score_set.urn} ({idx+1}/{len(urns)})."
)

hgvs_p: Optional[str] = None

# NOTE: if no clingen allele id, could consider searching clingen using hgvs_assay_level. for now, skipping variant if no clingen allele id in db
# TODO skipping multi-variants for now
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add TODO as issue.

hgvs_p = allele["hgvs"][0]
break

# TODO should we check that assay level hgvs mtches either g. or p.?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be a good check if we want to fail the population if they don't match. If we are fine populating when they don't match, we can skip it.

Comment on lines +221 to +262
queue = []
variant_map = {}
for mapped_variant in mapped_variants:
hgvs_string = mapped_variant.post_mapped.get("expressions", {})[0].get("value") # type: ignore
if not hgvs_string:
logger.warning(f"No HGVS string found in post_mapped for variant {mapped_variant.id}.")
continue
queue.append(hgvs_string)
variant_map[hgvs_string] = mapped_variant

if len(queue) == 200:
consequences = get_functional_consequence(queue)
for hgvs, consequence in consequences.items():
mapped_variant = variant_map[hgvs]
if consequence:
mapped_variant.vep_functional_consequence = consequence
mapped_variant.vep_access_date = date.today()
db.add(mapped_variant)
else:
logger.warning(f"Could not retrieve functional consequence for HGVS {hgvs}.")
db.commit()
queue.clear()
variant_map.clear()

# Process any remaining variants in the queue
if queue:
consequences = get_functional_consequence(queue)
for hgvs, consequence in consequences.items():
mapped_variant = variant_map[hgvs]
if consequence:
mapped_variant.vep_functional_consequence = consequence
mapped_variant.vep_access_date = date.today()
db.add(mapped_variant)
else:
logger.warning(f"Could not retrieve functional consequence for HGVS {hgvs}.")
db.commit()

except Exception as e:
logger.error(
f"Failed to populate functional consequence predictions for score set {score_set.urn}: {str(e)}"
)
db.rollback()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could simplify this queue logic by building the strings up front and then batching the list with the batched function from mavedb/lib/utils.py.

In pseudocode, you would:

hgvs_strings = []
for mv in mapped_variants:
    append hgvs to hgvs_strings, warn on missing

for hgvs_batch in batched(hgvs_strings, 200):
    get_functional_consequence(hgvs_batch)
    existing logic ...

Implemented like this, we don't have to handle a half filled queue or worry about checking the queue length at all. We just generate the list of items we'd like to annotate, batch them, and handle all requests and results identically.

@bencap bencap linked an issue Nov 10, 2025 that may be closed by this pull request
@bencap bencap merged commit aeeddd8 into release-2025.5.0 Nov 14, 2025
6 checks passed
@bencap bencap deleted the store-all-hgvs branch November 14, 2025 02:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants