Microbial community analysis by single-amplicon high-throughput next generation sequencing: data analysis–from raw output to ecology

Abstract

Quantifying the functional and taxonomic diversity of microbial assemblages is essential to understanding almost all aspects of microbial ecology. In recent years, the advent of Next Generation Sequencing (NGS) technology has accelerated this process. It is now common practice to target and amplify phylogenetic and/or functional marker genes, and use NGS approaches to characterise their diversity across multiple samples. However, all NGS approaches contain inherent methodological biases, producing both the high-quality data required by researchers, and in addition, erroneous sequences and noise. Careful bioinformatic analysis of NGS data is therefore required to quality filter and process sequences in order to avoid misleading inferences from artifactual results. A similar consideration must also be given to any downstream statistical analysis, as an incorrect choice of approach can also produce false conclusions. These various analytical steps and considerations can appear daunting to the uninitiated and may be perceived as a hurdle to completing research. In this chapter, we aim to provide the methods and guidance required to overcome this hurdle, and impart the skills required for a novice bioinformatician to produce a basic analysis of their NGS amplicon data. We focus on data produced using two common NGS technologies, the historically more widely used 454-pyrosequencer, and the currently more widely used Illumina platform. We cover methods for quality filtering and denoising data, picking Operational Taxonomic Units (OTUs), assigning taxonomy to sequences and basic statistical analyses required for hypothesis testing. Implementation of these methods is demonstrated for both of two commonly used pipelines (QIIME and mothur), and additional stand-alone packages (including R), providing the reader with maximum flexibility when analysing their data.

Publication
In Hydrocarbon and Lipid Microbiology Protocols.
Date