Beyond Bacteria: Why Viruses in the Gut Matter for Human Health


The viral component of the human gut microbiome is historically understudied, and thus its role in human health is largely unknown. In a recent study published in Nature Microbiology, Shah et. al. comprehensively characterized the “virome” of healthy infant guts. Surprisingly, their results turned up thousands of new viruses, most of them bacteriophages, indicating that the viral component of the gut microbiome may be more essential to human health than previously known.

The Gut Virome in Human Health

The gut microbiome is essential for maturation of the immune system, development and general health. Studies show that the loss of gut microbiome diversity can lead to chronic immune diseases, such as asthma and allergies. Although most metagenomic research has been focused on bacteria, the viral or phage component in the gut is equally important. Studies show that the healthy gut viral content, or the “virome”, is not only important for proper immune function but also the prevention of chronic metabolic diseases, such as obesity and type 2 diabetes.

Phages emerge in the gut microbiome within the first few months post birth and can be subdivided into two main categories: virulent and temperate. Virulent phages undergo lytic life cycles, where they multiply by killing infected bacterial host cells. Temperate phages, which represent the largest component of the gut virome, integrate into the genomes of gut bacteria and are then termed prophages. Characterization of the human gut virome has been challenging due to uncharted viral diversity, the lack of universal viral marker genes and standardized methods for de novo viral genome assembly. 

In a recent paper published in Nature, Shah et al. characterized the fecal viromes of 647 infants using standards published by the International Committee for the Taxonomy of Viruses (ICTV) and identified hundreds of viral clades comprising thousands of newly identified viral species. 

Bioinformatics Methods for Taxonomic Classification of Viral Families and Species in the Infant Gut

For the taxonomic classification of infant gut virome, the authors isolated DNA from the infant fecal matter, generated libraries and sequenced them using the Illumina HiSeq X platform at a depth of 3 Gb per sample (150 x 150 bp). The trimmed and deduplicated paired reads were de-replicated and used for de novo metagenome assembly via Spades. The assembled genomes were annotated using Prodigal, followed by the construction of aggregate protein similarity (APS) trees to identify bacterial clades. 

The assembled metagenomes were mined for CRISPR protospacers, which are portions of the viral protein encoding genes, using CRISPRDetect to build viromes. Furthermore, BLAT was used for species level deduplication of the viromes into viral operational taxonomic units (vOTUs), followed by construction of APS trees to identify viral families. The relative abundance of vOTUs was determined by mapping sample reads to the virome contigs using BWA mem aligner and msamtools.  

Characterizing the Infant Gut Virome

The authors identified over 10,000 unique viral species belonging to 248 viral family level clades. Surprisingly, over 90% of the viral families were previously unidentified, consisting of over 7000 unique vOTUs. These consisted of bacteriophages and eukaryotic viruses, which can infect a range of host organisms. Interestingly, the study found that the viral diversity in the infant gut increases from birth to six months of age, and then plateaus between six and twelve months. 

The identification of a diverse viral community in the healthy infant gut suggests that viruses may play a more significant role in the gut microbiota than previously anticipated and highlights the need for further research to understand the interactions between the gut microbiota and its viral component. This study expands our understanding of the viral diversity in the healthy infant gut and underscores the need for further research to fully understand the role that viruses play in the gut microbiome and their impact on human health.

Outsourcing Bioinformatics Analysis: How Bridge Informatics Can Help

Studies like these are made possible by technological advances making biological data generation, storage and analysis faster and more accessible than ever before. From pipeline development and software engineering to deploying existing bioinformatics tools, Bridge Informatics can help you on every step of your research journey.
As experts across data types from leading sequencing platforms, we can help you tackle the challenging computational tasks of storing, analyzing and interpreting genomic and transcriptomic data. Bridge Informatics’ bioinformaticians are trained bench biologists, so they understand the biological questions driving your computational analysis. Click here to schedule a free introductory call with a member of our team.

Haider M. Hassan, Data Scientist, Bridge Informatics

Haider is one of our premier data scientists. He provides bioinformatic services to clients, including high throughput sequencing, data pre-processing, analysis, and custom pipeline development. Drawing on his rich experience with a variety of high-throughput sequencing technologies, Haider analyzes transcriptional (spatial and single-cell), epigenetic, and genetic landscapes.

Before joining Bridge informatics, Haider was a Postdoctoral Associate at the London Regional Cancer Centre in Ontario, Canada. During his postdoc, he investigated the epigenetics of late-onset liver cancer using murine and human models. Haider holds a Ph.D. in biochemistry from Western University, where he studied the molecular mechanisms behind oncogenesis. Haider still lives in Ontario and enjoys spending his spare time visiting local parks. If you’re interested in reaching out, please email or

Recent Posts