An Overview of the Application of Deep Learning in Short-Read Sequence Classification

Kristaps Bebris; Inese Polaka

doi:10.7250/itms-2020-0005

An Overview of the Application of Deep Learning in Short-Read Sequence Classification

Kristaps Bebris, Inese Polaka

Abstract

Advances in sequencing technology have led to an ever increasing amount of available short-read sequencing data. This has, consequently, exacerbated the need for efficient and precise classification tools that can be used in the analysis of these data. As it stands, recent years have shown that massive leaps in performance can be achieved when it comes to approaches that are based on heuristics, and apart from these improvements there has been an ever increasing interest in applying deep learning techniques to revolutionize this classification task. We attempt to study these approaches and to evaluate their performance in a reproducible fashion to get a better perspective on the current state of deep learning based methods when it comes to the classification of short-read sequencing data

Keywords:

Bioinformatics; Computational Biology; Machine Learning

Full Text:

PDF

References

P. Turnbaugh et al. “The Human Microbiome Project,” Nature, vol. 449, pp. 804–810, 2007. https://doi.org/10.1038/nature06244

E. Pasolli et al. “Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle,” Cell, vol. 176, no. 3, pp. 649–662.e20, 2019. https://doi.org/10.1016/j.cell.2019.01.001

D. E. Wood, J. Lu, & B. Langmead, “Improved metagenomic analysis with Kraken 2,” Genome Biology, vol. 20, Article no. 257, 2019. https://doi.org/10.1186/s13059-019-1891-0

K. Vervier et al. “Large-scale machine learning for metagenomics sequence classification,” Bioinformatics, vol. 32, no. 7, pp. 1023–1032, 2016. https://doi.org/10.1093/bioinformatics/btv683

R. Menegaux and J.-P. Vert, “Continuous Embeddings of DNA Sequencing Reads and Application to Metagenomics. J. Comput. Biol., vol. 26, no. 6, pp. 509–518, June 2019. https://doi.org/10.1089/cmb.2018.0174

M. Rojas-Carulla et al. “GeNet: Deep Representations for Metagenomics,” bioRXiv, preprint. Available: https://doi.org/10.1101/537795

Q. Liang et al. “DeepMicrobes: taxonomic classification for metagenomics with deep learning,” NAR Genomics & Bioinformatics, vol. 2, no. 1, 2020. https://doi.org/10.1093/nargab/lqaa009

S. Mangul et al. “Systematic benchmarking of omics computational tools,” Nat. Commun., vol. 10, Art. no. 1393, 2019. https://doi.org/10.1038/s41467-019-09406-4

P. P. Gardner et al. “A meta-analysis of bioinformatics software benchmarksreveals reveals that publication-bias unduly influences software accuracy,” bioRxiv, preprint. Available: https://doi.org/10.1101/092205

MGI DNBSEQ-T7 [Online]. Available: https://en.mgitech.cn/products/instruments_info/5/ [Accessed September 2020]

Zymo control sample documentation [Online]. Available: https://files.zymoresearch.com/protocols/_d6300_zymobiomics_microbial_community_standard.pdf [Accessed August 2020]

Vowpal Wabbit documentation [Online]. Available: https://github.com/VowpalWabbit/vowpal_wabbit/wiki [Accessed September 2020]

MetaVW data store [Online]. Available: http://cbio.mines-paristech.fr/largescalemetagenomics/large-scale-metagenomics-1.0.tar.gz [Accessed August 2020]

GitHub repository [Online]. Available: https://github.com/lucren/itms_bio_ml_2020 [Accessed September 2020]

R. Maier, R. Zimmer, & R. Küffner, “A Turing test for artificial expression data,” Bioinformatics, vol. 29, no. 10, pp. 2603–2609, 2013. https://doi.org/10.1093/bioinformatics/btt438

NCBI taxonomic data [Online]. Available: ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz [Accessed July 2020]

A. Joulin, E. Grave, P. Bojanowski and T. Mikolov, “Bag of tricks for efficient text classification,” arXiv preprint 1607.01759, 2016.

DeepMicrobes documentation [Online]. Available: https://github.com/MicrobeLab/DeepMicrobes/blob/master/document/train.md [Accessed August 2020]

GeNet implementation [Online]. Available: https://github.com/lucren/GeNet/blob/master/code/genet_train.py [Accessed September 2020]

fastDna paired-end functionality [Online]. Available: https://github.com/rmenegaux/fastDNA/blob/b4aa88cf95e50e38d35e996b1a0b4a8b626f2fee/src/model.cc [Accessed August 2020]

IAPT taxonomic nomenclature [Online]. Available: https://www.iapt-taxon.org/nomen/main.php?page=art3[Accessed September 2020]

Kraken2 manual [Online]. Available: https://github.com/DerrickWood/kraken2/wiki/Manual [Accessed September 2020]

DOI: 10.7250/itms-2020-0005

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Username
Password
Remember me

Information Technology and Management Science

An Overview of the Application of Deep Learning in Short-Read Sequence Classification

Abstract

Keywords:

Full Text:

References

Refbacks