Populate and surface post-mapped HGVS expressions and VEP functional consequence, and surface gnomAD AF #553

sallybg · 2025-10-29T21:15:26Z

No description provided.

bencap

Thanks for working on the scripts, and let me know if you have any questions about refactoring to share these methods with a worker process.

bencap · 2025-10-31T16:03:32Z

src/mavedb/lib/score_sets.py

+            value = str(mapping.hgvs_g) if mapping and mapping.hgvs_g else ""
        elif column_key == "post_mapped_hgvs_p":
-            hgvs_str = get_hgvs_from_post_mapped(mapping.post_mapped) if mapping and mapping.post_mapped else None
-            if hgvs_str is not None and is_hgvs_p(hgvs_str):
-                value = hgvs_str
+            value = str(mapping.hgvs_p) if mapping and mapping.hgvs_p else ""
+        elif column_key == "post_mapped_hgvs_c":
+            value = str(mapping.hgvs_c) if mapping and mapping.hgvs_c else ""
+        elif column_key == "post_mapped_hgvs_at_assay_level":
+            value = str(mapping.hgvs_assay_level) if mapping and mapping.hgvs_assay_level else ""
+        elif column_key == "vep_functional_consequence":
+            vep_functional_consequence = mapping.vep_functional_consequence if mapping else None
+            if vep_functional_consequence is not None:
+                value = vep_functional_consequence
            else:
                value = ""


Instead of the empty string to represent null values here we should use the na_rep.

bencap · 2025-10-31T16:09:33Z

src/mavedb/scripts/populate_mapped_hgvs.py

+
+from mavedb.scripts.environment import script_environment, with_database_session
+
+CLINGEN_API_URL = "https://reg.test.genome.network/allele"


We should import this from /src/mavedb/lib/clingen, and should also probably change that one to not be the test site (https://ldh.genome.network/ldh/srvc).

Probably reworking the environment variable CAR_SUBMISSION_ENDPOINT would be the best thing to do, so we can have the URL be more configurable and use the test one in dev environments.

We could just add an issue for reworking the env var though, to keep this task clean.

bencap · 2025-10-31T20:16:34Z

src/mavedb/scripts/populate_mapped_hgvs.py

+            )
+
+            # for variant_urn, post_mapped, clingen_allele_id in variant_info:
+            for variant_urn, mapped_variant in variant_info:


Since score sets may have a lot of variants and it can be frustrating to invoke the routine and not have insight into how far we've gotten, it may be worthwhile to convert that variant_info result into a list and log our progress every 10% of variants.

Suggested change

for variant_urn, mapped_variant in variant_info:

variant_info_list = variant_info.all()

num_variants = len(variant_info_list)

# for variant_urn, post_mapped, clingen_allele_id in variant_info:

for v_idx, (variant_urn, mapped_variant) in enumerate(variant_info_list):

if (v_idx + 1) % ((num_variants + 9) // 10) == 0:

logger.info(

f"Processing variant {v_idx+1}/{num_variants} ({variant_urn}) for score set {score_set.urn} ({idx+1}/{len(urns)})."

)

bencap · 2025-10-31T20:17:38Z

src/mavedb/scripts/populate_mapped_hgvs.py

+                hgvs_p: Optional[str] = None
+
+                # NOTE: if no clingen allele id, could consider searching clingen using hgvs_assay_level. for now, skipping variant if no clingen allele id in db
+                # TODO skipping multi-variants for now


Add TODO as issue.

bencap · 2025-10-31T20:20:01Z

src/mavedb/scripts/populate_mapped_hgvs.py

+                                    hgvs_p = allele["hgvs"][0]
+                                    break
+
+                # TODO should we check that assay level hgvs mtches either g. or p.?


Could be a good check if we want to fail the population if they don't match. If we are fine populating when they don't match, we can skip it.

bencap · 2025-11-03T19:27:57Z

src/mavedb/scripts/vep_functional_consequence.py

+            queue = []
+            variant_map = {}
+            for mapped_variant in mapped_variants:
+                hgvs_string = mapped_variant.post_mapped.get("expressions", {})[0].get("value")  # type: ignore
+                if not hgvs_string:
+                    logger.warning(f"No HGVS string found in post_mapped for variant {mapped_variant.id}.")
+                    continue
+                queue.append(hgvs_string)
+                variant_map[hgvs_string] = mapped_variant
+
+                if len(queue) == 200:
+                    consequences = get_functional_consequence(queue)
+                    for hgvs, consequence in consequences.items():
+                        mapped_variant = variant_map[hgvs]
+                        if consequence:
+                            mapped_variant.vep_functional_consequence = consequence
+                            mapped_variant.vep_access_date = date.today()
+                            db.add(mapped_variant)
+                        else:
+                            logger.warning(f"Could not retrieve functional consequence for HGVS {hgvs}.")
+                    db.commit()
+                    queue.clear()
+                    variant_map.clear()
+
+            # Process any remaining variants in the queue
+            if queue:
+                consequences = get_functional_consequence(queue)
+                for hgvs, consequence in consequences.items():
+                    mapped_variant = variant_map[hgvs]
+                    if consequence:
+                        mapped_variant.vep_functional_consequence = consequence
+                        mapped_variant.vep_access_date = date.today()
+                        db.add(mapped_variant)
+                    else:
+                        logger.warning(f"Could not retrieve functional consequence for HGVS {hgvs}.")
+                db.commit()
+
+        except Exception as e:
+            logger.error(
+                f"Failed to populate functional consequence predictions for score set {score_set.urn}: {str(e)}"
+            )
+            db.rollback()


You could simplify this queue logic by building the strings up front and then batching the list with the batched function from mavedb/lib/utils.py.

In pseudocode, you would:

hgvs_strings = [] for mv in mapped_variants: append hgvs to hgvs_strings, warn on missing for hgvs_batch in batched(hgvs_strings, 200): get_functional_consequence(hgvs_batch) existing logic ...

Implemented like this, we don't have to handle a half filled queue or worry about checking the queue length at all. We just generate the list of items we'd like to annotate, batch them, and handle all requests and results identically.

sallybg requested a review from bencap October 29, 2025 21:15

This was linked to issues Oct 30, 2025

Calculated Mutational Consequence (using VEP standards) #429

Closed

Return p./g./c. HGVS Strings with all Mapping Jobs #468

Closed

bencap reviewed Nov 3, 2025

View reviewed changes

bencap linked an issue Nov 10, 2025 that may be closed by this pull request

Populate additional HGVS columns via CAID #470

Closed

sallybg mentioned this pull request Nov 12, 2025

Simplify queue logic in hgvs population script #580

Open

sallybg added 18 commits November 12, 2025 15:00

Populate functional consequences for variants via script

970ea81

Batch initial requests to VEP

2f86e1b

Batch requests to Variant Recoder

ee17f93

Run post-variant-recoder vep as a batch

087035f

Add VEP functional consequence to variants csv

4f8e0d8

Retrieve and store hgvs representations of mapped variants

83a5c67

Include post-mapped hgvs in variants data csv

fffde3d

Fix get_hgvs_from_post_mapped imports

ba7f99c

Assert type for mypy

5a9c887

Update variants csv tests to reflect mapped hgvs changes

00f4429

Resolve alembic merge conflict

6bfaaca

Use provided na_rep string to represent null values

f88a72c

Use production clingen API

e04d305

Update progress while populating mapped hgvs

fc404d0

Update todo issue links

41c223b

Fix worker import for hgvs extraction

1403c06

Add vep to csv namespace options

ae5ab83

Fix csv tests

4fdad00

sallybg force-pushed the store-all-hgvs branch from 5422a27 to 4fdad00 Compare November 13, 2025 00:54

Add gnomad af to csv

21a771f

sallybg force-pushed the store-all-hgvs branch from 857b15c to 21a771f Compare November 14, 2025 01:51

bencap merged commit aeeddd8 into release-2025.5.0 Nov 14, 2025
6 checks passed

bencap deleted the store-all-hgvs branch November 14, 2025 02:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Populate and surface post-mapped HGVS expressions and VEP functional consequence, and surface gnomAD AF #553

Populate and surface post-mapped HGVS expressions and VEP functional consequence, and surface gnomAD AF #553

Uh oh!

sallybg commented Oct 29, 2025

Uh oh!

bencap left a comment

Uh oh!

bencap Oct 31, 2025

Uh oh!

bencap Oct 31, 2025

Uh oh!

bencap Nov 3, 2025

Uh oh!

bencap Oct 31, 2025

Uh oh!

bencap Oct 31, 2025

Uh oh!

bencap Oct 31, 2025

Uh oh!

bencap Nov 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		from mavedb.scripts.environment import script_environment, with_database_session

		CLINGEN_API_URL = "https://reg.test.genome.network/allele"

-            for variant_urn, mapped_variant in variant_info:
+            variant_info_list = variant_info.all()
+            num_variants = len(variant_info_list)
+            # for variant_urn, post_mapped, clingen_allele_id in variant_info:
+            for v_idx, (variant_urn, mapped_variant) in enumerate(variant_info_list):
+                if (v_idx + 1) % ((num_variants + 9) // 10) == 0:
+                    logger.info(
+                        f"Processing variant {v_idx+1}/{num_variants} ({variant_urn}) for score set {score_set.urn} ({idx+1}/{len(urns)})."
+                    )

Populate and surface post-mapped HGVS expressions and VEP functional consequence, and surface gnomAD AF #553

Populate and surface post-mapped HGVS expressions and VEP functional consequence, and surface gnomAD AF #553

Uh oh!

Conversation

sallybg commented Oct 29, 2025

Uh oh!

bencap left a comment

Choose a reason for hiding this comment

Uh oh!

bencap Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

bencap Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

bencap Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

bencap Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

bencap Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

bencap Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

bencap Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants