Gene-related Tracks
Depending on which “instance” of JBrowse you are looking at (e.g. the version for the rat RGSC v3.4, v5.0 or v6.0 assembly, human build 36.3 or 37 assembly, or the mouse build 37 assembly) there may be additional types of gene tracks, but all of the JBrowse instances include at least 3 tracks for RGD Gene Features: RGD genes, RGD genes and transcripts, and RGD transcripts. In addition, some include tracks for Disease related objects such as genes associated with one or more diseases in a variety of categories such as “Cardiovascular Diseases”, and for gene-chemical interaction data. Examples given here use the JBrowse for rat RGSC v5.0 as of January 2015.
Jump to information about…
“RGD Genes” and “RGD Genes and Transcripts” Tracks
|
|
|
Both the “RGD Gene” track and the “RGD Genes and Transcripts” track are based on RGD gene records. The “Gene” tracks show each gene as a single rectangular box, the “box glyph”, representing the entire region covered by any transcript of that gene (e.g. Chrdl2 in the gene displayed here). This means that if one transcript starts at (as a simplified example) 100 and ends at 1500, and a second transcript for the gene starts at 150 and ends at 3000, the box representing that gene in the “RGD Genes” track would go from 100 to 3000.
The RGD Gene and Transcripts tracks show all of the individual transcripts assigned to each gene. To the left of each transcript is the NCBI RefSeq ID for that transcript. Below the cluster of transcripts is the gene symbol. |
|
|
Mousing over the cluster highlights the entire group and brings up a “tool tip” which gives the gene symbol. This is especially helpful when the browser is zoomed in to show only part of a given gene. The gene symbol which displays “in line” on the browser display is located at the far left of the gene display. If only the 3′ end of the gene is showing, the gene symbol label does not show. Mouse over anywhere in the cluster of transcripts to see the gene symbol. Clicking the gene glyph in the RGD Genes track or clicking anywhere in the highlighted cluster of transcripts in the RGD Genes and Transcripts track brings up an informative popup, “RGD Feature Data for…” |
|
Click here to view the RGD gene and transcript-related track examples in JBrowse. |
|
RGD Gene Track Popups
|
|
|
|
|
The RGD Genes and RGD Genes and Transcripts tracks both use the same informational popups to display and link to additional information about each gene. As shown to the left, the “RGD Feature Data” popup shows much of the general information that RGD maintains about the gene, such as the gene symbol and name, the gene type, NCBI’s RefSeq status for the gene, the species, the description, and several types of external database identifiers for the gene and its associated protein(s). A number of these are links to web pages with more information. For example, the gene symbol links to the RGD gene report page, as does the link for “RGD Gene Report for…”. The Location is a link to the JBrowse display for that gene—i.e. if you are zoomed out, it will show the browser zoomed in to just that gene, if you are zoomed in so that you only see a portion of the gene, it will zoom the browser out to display the entire gene. “RGD Gene Annotation Analysis for…” links to the RGD Gene Annotator (GA) Tool for that gene showing all the annotations for the gene and its human and mouse orthologs as applicable, as well as all of the external database identifiers for the gene being displayed. External Database IDs for Ensembl, NCBI’s Gene database, UniGene and/or UniProt link to the corresponding records at these databases as applicable. |
|
|
|
(Back to top) |
RGD Transcript Tracks
|
|
|
Unlike the RGD Genes and Transcripts track, which is gene-based, the RGD Transcripts track is focused on individual transcripts. The track does not cluster the transcripts according to what gene they are assigned to, or show the gene symbol in line with the transcripts for that gene. Instead, each transcript is treated as an individual object. Each transcript is shown as a “line and box” glyph representing the introns (lines) and exons (boxes) for that transcript. The transcript’s RefSeq ID is shown just below this. Mousing over a transcript highlights just that transcript, and clicking on it will bring up an “RGD Feature Data” popup for that transcript. |
|
|
|
|
|
RGD Transcript Track Popups
|
|
|
|
|
The RGD Transcripts popup gives information such as the class of transcript (e.g. mRNA), its genomic location and length, the gene to which it is associated, the RefSeq status of the transcripts (distinct from the RefSeq status of the gene) and whether or not the transcript codes for a protein or not. If the transcript is protein-coding, as is usually the case, the “Is Non-Coding” field displays “N” for “No” to indicate that. The Transcript ID links to the GenBank record at NCBI for that RNA. |
|
Click here to view the RGD transcript-related track examples in JBrowse. |
|
|
|
(Back to top) |
RGD Ontology-based Gene Tracks
|
The core of RGD’s data is the associations between data objects such as genes and controlled vocabularies (ontologies) pointing to disease associations, gene function, pathway involvement, etc. Two of these types of associations have been captured in JBrowse tracks as “Disease Related Tracks” and “Gene-Chemical Interaction Tracks”. |
|
|
Disease Related Tracks are based on RGD’s annotations to terms from the RGD Disease ontology. Objects such as genes are tagged with disease terms based on published data suggesting an association, or based on associations imported from other databases such as OMIM and ClinVar (for more on RGD ontology annotations and the data that can be extracted from them, please see the “Introduction to Biomedical Ontologies” video tutorial series and/or the ontology help pages). The JBrowse Disease Related Tracks list disease categories, such as “Cardiovascular Diseases” and “Endocrine System Diseases”. The disease categories represent the highest level (i.e. the most general) terms in the disease ontology. The tool uses the hierarchical structure of the ontology to find all of the more specific terms under each of those high level ones and select and display all of the genes annotated to one or more of those specific terms. |
|
|
Select a category to see all the genes in your region of interest that are annotated to any term that falls in that category. In the example to the left, Ucp3 and Ucp2 are both annotated to Cardiovascular diseases, whereas Dnajb13 is not.
Gene-chemical interaction tracks work in much the same way. The track categories are based on the “Biological Role” branch of the ChEBI (Chemical Entities of Biological Interest) vocabulary. The example to the left shows that, of the genes in the displayed region, Ucp2 and Dnajb13 interact with chemicals with a role of “Epitope”, but Ucp3 does not. |
|
Click here to view the Ontology-based gene track examples in JBrowse. |
|
|
|
|
|
|
|
RGD Ontology-based Gene Track Popups
|
|
|
|
Click the box representing the gene to bring up an informational popup for that gene. In addition to general information about the gene, such as its location, gene symbol and name, gene type and descriptions (see above for more information about these), the popups for disease gene tracks include the list of all the disease ontology terms associated with that gene that fall within that category. For instance, in the example to the left, the rat gene Ucp2 has been associated with the following cardiovascular disease terms: Stroke; Myocardial Reperfusion Injury; Heart Failure; Hypertrophy, Right Ventricular; Carotid Artery Diseases; Hypertension; Liver Reperfusion Injury; and Ischemic Attack, Transient. The RDO ID which follows the term name represents a stable identifier for that concept. The ID links to the RGD ontology report for that term where a definition of the term, term synonyms, links to MeSH and/or OMIM records for the term and a list of RGD genes, QTL and/or strains annotated to the term can be found. |
|
|
As with the disease tracks, when a gene in a gene-chemical interaction track is clicked, an informational popup with the name of the chemical with which that gene or gene product interacts is displayed. The popup also shows a phrase describing the type of interaction. The example to the left shows that the expression of the gene Ucp2 is decreased by the chemical beta-D-glucosamine. The chemical name in the popup links to the RGD ontology report page for that ChEBI term. The description of the interaction links to the RGD Gene-chemical interactions report which gives more information about the chemical and the type of interaction(s) between the chemical and the gene or gene product. |
|
|
In some cases, the interaction type given in the popup is listed as “multiple interactions” denoting a more complex group of interactions, possibly involving additional genes or chemicals. In these cases, follow the link to the RGD Gene-chemical interactions report which gives additional details. For instance, in the case of the interaction between the gene Ucp2 and capsaicin in the “Biophysical Role” track, the popup lists “multiple interactions”, which is expanded in the Notes field to “Capsaicin inhibits the reaction [Dietary Fats results in decreased expression of UCP2 mRNA]” in the gene-chemicals interaction report as shown in the image to the left. |
|
|
(Back to top) |
Gene-related Histogram Displays
|
|
|
Because of the number of genes present in large regions of the chromosome, as you zoom out the display begins to become more difficult to decipher. Labels for individual genes, for instance, are no longer displayed, and the gene and transcript glyphs are placed very close together and must be stacked vertically to allow all the genes to be displayed. Depending on the size and resolution of your screen, at a zoom level that renders approximately 10-30 Mb of the chromosome showing in the window, the JBrowse tool converts the Genes, Genes-and-Transcripts and Transcripts track displays to “feature density” histograms. When this happens, the tool splits the region displayed into smaller subregions, counts the number of “features” (i.e., the number of genes in the RGD Genes and RGD Genes-and-Transcripts tracks, and the number of transcripts in the RGD Transcripts track) in each subregion and displays the results as a histogram. In the example to the left, each subregion is approximately 2 Mb in length and the highest number of genes in a subregion is about 160 genes according to the scale displayed to the right of the image.
Notice that although the histogram for the RGD Transcripts track is similar to those of the RGD Genes and RGD Genes-and-Transcripts tracks, they are not the same. Also notice that the scale is different. Because each transcript is counted separately for that track, depending on the number of transcripts per gene for the genes in the subregion the total number of transcripts for that segment could be approximately the same as the number of genes or could be quite different. |
|
|
|
|
|
Click here to view this gene-related histogram view in JBrowse. |
|
|
RNA-Seq Gene Prediction Tracks
|
|
|
|
|
|
|
RGD accepts data submitted by researchers and can display such data in JBrowse once the researchers “okay” it to be publicly available. One such set of tracks is the RNA-Seq-based gene prediction data submitted by the Liu lab at the Medical College of Wisconsin. These researchers used RNA-Seq data from brain, bone marrow and kidney, with and without the inclusion of EST data, to do genome annotation of the rat v5.0 assembly, including prediction of known and novel transcripts/isoforms and prediction of novel genes. These gene predictions are shown in the Gene Models–>RNA-Seq Predicted Gene Models–>Cancer Center, Medical College of Wisconsin tracks. As shown in the example to the left, where the RNA-Seq and/or EST data supported prediction of a gene with its intron/exon structure, this structure is displayed. If a gene that is displayed in the RGD Genes track is not displayed in an RNA-Seq gene prediction track, either that gene is not expressed in the corresponding tissue, or the RNA sequencing in that region was not of sufficient quality or quantity to support a gene prediction. Please note that the absence of a gene prediction at a specific location in a specific tissue cannot be taken as proof that the corresponding gene is not expressed in that tissue. |
|
|
Click here to view the RNA-Seq Gene prediction track example in JBrowse. |
|
|
|
|
|
|
|
|
|
|
|
More information about the data in these tracks can be found in the paper Improved rat genome gene prediction by integration of ESTs with RNA-Seq information. Li et al, Bioinformatics. 2015 Jan 1;31(1):25-32. PMID: 25217576 http://www.ncbi.nlm.nih.gov/pubmed/?term=25217576 For convenience, a link to the PubMed record for this article is available in the “About this track” information accessible from the track title’s drop-down menu as shown in the image to the left. For questions about this data or how it was derived, please contact the authors of the paper. |
|
|
|
|
|
|
|
|
|
Popups for RNA-Seq gene predictions contain the information submitted by the researchers. These include the source which gives the tissue and method from which the data was derived, the genomic location (with a link which opens JBrowse at that specific position), the length of the predicted gene/transcript, the “name” or designation of the predicted gene/transcript as submitted by the researcher, and arbitrary identifiers assigned by the computer program which loaded the data into the genome browser originally, the “Load ID” and the “Primary ID”. Note that the name of the predicted gene/transcript is a unique identifier assigned by the researchers to that transcript from that tissue and does not indicate what known gene, if any, that predicted gene could be—for example, as shown in the examples to the left a transcript in brain is located at the same genomic position and has the same basic intron/exon structure as the known gene C2cd3, but the name of that predicted transcript is g2062, not C2cd3. |
|
|
|
|
|
|
|
|