Chapter 3 Genomics

3.1 **Scientific Programming: Bioinformatics & Computational Biology**

3.1.1 Genomics

National Center for Biotechnology and Information (NCBI)](

The Bacterial and Viral Bioinformatics Resource Center (BVBRC)](

European Molecular Biology Laboratories (EMBL)]( European Bioinformatics Institute (EBI)](

QIAGEN’s Knowledge Hub,](,) Bench Guide]( and Digital Insights,](,)

Swiss Institute for Bioinformatics (SIB)]( which Geert van Geest]( introduced me to the SIB’s AWS-Docker]( for getting RStudio Server, Jupyter and VSCode running on an AWS EC2 using Docker

Thermo Fisher’s Learning Centers]( and

Education Connect,](,)


The user manual for the k-mer trees]( and SNP trees]( are relatively more straight forward **WHEN USING WORKFLOWS**. Nonetheless, their visualization could use improvement; I naturally turn to **python** for bioinformatics and **R** for visualization

3.1.2 Metagenomics

16s rRNA gene sequencing with Illumina,](,) which feeds into either the current gold-standard open-source (python) tool QIIME2]( by Bolyen *et al.* 2019,](,) the superseded (C++) gold-standard tool, Mothur]( by Schloss *et al.* 2009,](,) which has a 16S rRNA gene sequencing tutorial](

16s rRNA gene sequencing with CLC]( with associated white paper:]( The CLC workflow for 16S follows an amplicon-based OTU clustering workflow]( that uses read trimming using their ‘clc_quality_trim’ program,](,) but **I would rather use** trimmomatic]( by Bolger, Lihse & Usadel, 2014;](;) filtering samples based on the number of reads;](;) *de novo* or reference-based [OTU clustering];;) removal of low abundance OTUs;](;) OTU abundance analysis]( **but I prefer R for this;**

OTU nucleotide alignment with MUSCLE]( by Edgar, 2004]( to generate a maximum likelihood phylogenetic tree,](,) input for the alpha- and beta-diversity workflow]( **but I prefer vegan]( for this**

The microbial , PICRUST2,](,) and the interactive Human Microbiome Project (iHMP)](

Illumina SGS](

In CLC, whole metagenome shotgun sequencing functional analysis]( first includes the user *de novo* assembling a metagenome,](,) followed by annotation of the coding sequence (CDS) track with

BLAST,](,) Pfam domains,](,) and/or

Gene Ontology (GO).]( Then you map the original reads back to the annotated contigs using the ‘Map Reads to Reference’ in the Build Functional Profile]( tool. The resulting output can be visualized using stacked bar charts and sunburst plots in Visualization of the OTU abundance table]( **but as you guessed, I prefer R for this.** As you might expect, **I might use** the open-source Linux OS (python) tool PICRUSt2]( by Douglas *et al.* 2020]( to do this too.