RGD automated import pipeline for ClinVar variants, variant-to-disease annotations and gene-to-disease annotations
RGD ID:
8554872
Variants with MedGen condition and gene associations are downloaded from NCBI's ClinVar FTP site at ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/ClinVarFullRelease_00-latest.xml.gz These variants are imported into the RGD database and associated with RGD genes based on the data contained in that file. Annotations are then assigned to both the variants and their associated genes via the assignment of MedGen conditions with clinical significance to the variants by ClinVar. The ClinVar FTP file ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/gene_condition_source_id is used to map NCBI MedGen condition IDs to OMIM phenotype IDs. Because RGD's Disease ontology uses OMIM IDs as aliases for disease term, OMIM IDs from the ClinVar data can be matched in turn to RGD Disease Ontology (RDO/MEDIC) terms, which are then assigned to RGD variant and gene records as RDO annotations. These annotations are given an evidence code of "IEA" or "inferred from electronic annotation" for human (the source species) and are automatically propagated to the orthologous rat and mouse genes with the "ISS", "inferred from sequence or structural similarity", evidence code. The source of all these annotations is listed as "ClinVar".