Gene Annotation-Independent scRNA-Seq Analysis

Gene Annotation-Independent scRNA-Seq Analysis

By Conor León
February 3, 2022

Genome Annotation and scRNA-seq Data

Single cell sequencing has revolutionized the field of transcriptomics, allowing researchers to visualize which genes are actively being expressed across a population of cells.

Thus far, scRNA-seq and subsequent analyses have relied on high-quality, annotated genomes. “Annotations” are essentially notes researchers have made in the genome of the functions of different genes, and are used in scRNA-seq analysis by bioinformaticians to identify cell types in their data. Genome annotation is achieved through homology analysis, and subsequent manual curation or validation for commonly studied species. This then poses the question: how can researchers utilize scRNA-seq for uncommon organism genomes and other genomic regions with little or no functional annotation?

Computational Tool in R

In a recently published Nature article, Wang et al. propose borrowing a trick from plant geneticists to solve this issue. The proposed procedure involves analyzing scRNA-seq data from a cell population of interest without gene-annotation, through a computational tool in R named groHMM.

This tool allows for identification of transcriptionally active regions (TARs) in single-cell data. Using groHMM, comparisons can then be made between transcribed regions that are already annotated (aTARs) and transcribed regions that lack annotation (uTARs). The difference reveals how much important information lies outside of known gene annotations for the organism of interest.

Using uTARs to Distinguish Between Cell-Types

Clusters of the same cell-type can be grouped by co-expression of well-characterized genes. Certain uTARs were found to correlate strongly with cell-types. Wang et al. correctly identified embryonic heart cells from chickens by identifying  18 associated uTARs using groHMM that were differentially expressed. This opens up the exciting possibility that refinement of this procedure could be used in distinguishing between cell-types in transcriptomic datasets.

Potential Applications

The most obvious application is a more robust analysis of scRNA-seq data from organisms lacking significant gene-annotation. However, it has exciting potential for applications in developing tissues, which often lack the quality of annotation found in adult tissues. Given the transient expression patterns, better understanding of the developmental transcriptome would be extremely valuable.

Future Directions

The field of scRNA-seq data analysis is ever-evolving. The authors of this paper hope that their proposed procedure can be used in conjunction with additional custom bioinformatic tools and pipelines to further current understanding of transcriptional dynamics in cells.



Conor León, Geneticist & Content Writer, Bridge Informatics

Conor is a Content Writer at Bridge Informatics, a professional services firm that helps biotech customers implement advanced techniques in management and analysis of genomic data. Bridge Informatics focuses on data mining, machine learning, and various bioinformatic techniques to discover biomarkers and companion diagnostics. If you’re interested in reaching out, please email daniel.dacey@old.bridgeinformatics.com or dan.ryder@old.bridgeinformatics.com.

Sources:

https://www.nature.com/articles/s41467-021-22496-3?fbclid=IwAR0sNrtSePA7R_X3_mFgSY_nU_LHsMsD5YFdgmklGdm26WBC6xTlHJGaJ9k

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-015-0656-3

Recent Posts