Chapter 3 Genomics
3.1 **Scientific Programming: Bioinformatics & Computational Biology**
3.1.1 Genomics
National Center for Biotechnology and Information (NCBI)](https://ncbiinsights.ncbi.nlm.nih.gov/))
The Bacterial and Viral Bioinformatics Resource Center (BVBRC)](https://www.bv-brc.org/))
European Molecular Biology Laboratories (EMBL)](https://www.embl.org/)) European Bioinformatics Institute (EBI)](https://www.ebi.ac.uk/research))
QIAGEN’s Knowledge Hub,](https://www.qiagen.com/us/knowledge-and-support/knowledge-hub),) Bench Guide](https://www.qiagen.com/us/knowledge-and-support/knowledge-hub/bench-guide)) and Digital Insights,](https://digitalinsights.qiagen.com/),)
Swiss Institute for Bioinformatics (SIB)](https://www.sib.swiss/)) which Geert van Geest](https://github.com/GeertvanGeest)) introduced me to the SIB’s AWS-Docker](https://github.com/sib-swiss/AWS-docker)) for getting RStudio Server, Jupyter and VSCode running on an AWS EC2 using Docker
Thermo Fisher’s Learning Centers](https://www.thermofisher.com/us/en/home/technical-resources/learning-centers.html)) and
Education Connect,](https://www.thermofisher.com/us/en/home/digital-science/thermo-fisher-connect.html),)
Illumina,](https://www.illumina.com/science/education.html),)
The user manual for the k-mer trees](https://resources.qiagenbioinformatics.com/manuals/clcmgm/300/index.php?manual=Create_K_mer_Tree.html)) and SNP trees](https://resources.qiagenbioinformatics.com/manuals/clcmgm/300/index.php?manual=Create_SNP_Tree.html)) are relatively more straight forward **WHEN USING WORKFLOWS**. Nonetheless, their visualization could use improvement; I naturally turn to **python** for bioinformatics and **R** for visualization
3.1.2 Metagenomics
16s rRNA gene sequencing with Illumina,](https://www.illumina.com/areas-of-interest/microbiology/microbial-sequencing-methods/16s-rrna-sequencing.html),) which feeds into either the current gold-standard open-source (python) tool QIIME2](https://qiime2.org/)) by Bolyen *et al.* 2019,](https://www.nature.com/articles/s41587-019-0209-9),) the superseded (C++) gold-standard tool, Mothur](https://github.com/mothur/mothur)) by Schloss *et al.* 2009,](https://journals.asm.org/doi/10.1128/AEM.01541-09),) which has a 16S rRNA gene sequencing tutorial](https://training.galaxyproject.org/archive/2021-10-01/topics/metagenomics/tutorials/mothur-miseq-sop/tutorial.html))
16s rRNA gene sequencing with CLC](https://resources.qiagenbioinformatics.com/manuals/clcmgm/300/index.php?manual=Introduction_Metagenomics.html)) with associated white paper:](https://digitalinsights.qiagen.com/wp-content/uploads/2016/05/Characterizing-the-Microbiome-through-Targeted-Sequencing-of-Bacterial-16S-rRNA-and-Fungal-ITS-Regions_White-Paper_QIAGEN-Bioinformatics_0518_ww.pdf):) The CLC workflow for 16S follows an amplicon-based OTU clustering workflow](https://resources.qiagenbioinformatics.com/manuals/clcmgm/300/index.php?manual=Amplicon_based_OTU_clustering.html)) that uses read trimming using their ‘clc_quality_trim’ program,](https://resources.qiagenbioinformatics.com/manuals/clcassemblycell/400/index.php?manual=Quality_trimming.html),) but **I would rather use** trimmomatic](http://www.usadellab.org/cms/?page=trimmomatic)) by Bolger, Lihse & Usadel, 2014;](https://pubmed.ncbi.nlm.nih.gov/24695404/);) filtering samples based on the number of reads;](https://resources.qiagenbioinformatics.com/manuals/clcmgm/300/index.php?manual=Filter_Samples_Based_on_Number_Reads.html);) *de novo* or reference-based [OTU clustering] https://resources.qiagenbioinformatics.com/manuals/clcmgm/300/index.php?manual=OTU_clustering_parameters.html);;) removal of low abundance OTUs;](https://resources.qiagenbioinformatics.com/manuals/clcmgm/300/index.php?manual=Remove_OTUs_with_Low_Abundance.html);) OTU abundance analysis](https://resources.qiagenbioinformatics.com/manuals/clcmgm/300/index.php?manual=Abundance_analysis.html)) **but I prefer R for this;**
OTU nucleotide alignment with MUSCLE](https://resources.qiagenbioinformatics.com/manuals/clcmgm/300/index.php?manual=Align_OTUs_with_MUSCLE.html)) by Edgar, 2004](https://academic.oup.com/nar/article/32/5/1792/2380623?login=true)) to generate a maximum likelihood phylogenetic tree,](http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual),) input for the alpha- and beta-diversity workflow](https://resources.qiagenbioinformatics.com/manuals/clcmgm/300/index.php?manual=Estimate_Alpha_Beta_Diversities_workflow.html)) **but I prefer vegan](https://vegandevs.github.io/vegan/index.html)) for this**
The microbial , PICRUST2,](https://github.com/picrust/picrust2),) and the interactive Human Microbiome Project (iHMP)](https://portal.hmpdacc.org/))
Illumina SGS](https://www.illumina.com/areas-of-interest/microbiology/microbial-sequencing-methods/shotgun-metagenomic-sequencing.html))
In CLC, whole metagenome shotgun sequencing functional analysis](https://resources.qiagenbioinformatics.com/manuals/clcmgm/300/index.php?manual=Functional_analysis.html)) first includes the user *de novo* assembling a metagenome,](https://resources.qiagenbioinformatics.com/manuals/clcmgm/300/index.php?manual=De_Novo_Assemble_Metagenome.html#sec:de_novo_assemble_metagenome),) followed by annotation of the coding sequence (CDS) track with
BLAST,](https://resources.qiagenbioinformatics.com/manuals/clcmgm/300/index.php?manual=Annotate_CDS_with_Best_BLAST_Hit.html#sec:annotate_cds_with_blast),) Pfam domains,](https://resources.qiagenbioinformatics.com/manuals/clcmgm/300/index.php?manual=Annotate_CDS_with_Pfam_Domains.html#sec:annotate_cds_with_pfam),) and/or
Gene Ontology (GO).](https://resources.qiagenbioinformatics.com/manuals/clcmgm/300/index.php?manual=Download_GO_Database.html#sec:download_go).) Then you map the original reads back to the annotated contigs using the ‘Map Reads to Reference’ in the Build Functional Profile](https://resources.qiagenbioinformatics.com/manuals/clcmgm/300/index.php?manual=Build_Functional_Profile.html#sec:functional_profile)) tool. The resulting output can be visualized using stacked bar charts and sunburst plots in Visualization of the OTU abundance table](https://resources.qiagenbioinformatics.com/manuals/clcmgm/300/index.php?manual=Visualization_OTU_abundance_tables.html#sec:visualizationotu)) **but as you guessed, I prefer R for this.** As you might expect, **I might use** the open-source Linux OS (python) tool PICRUSt2](https://github.com/picrust/picrust2)) by Douglas *et al.* 2020](https://www.nature.com/articles/s41587-020-0548-6)) to do this too.