Nanopore sequencing and assembly of a human genome with ultra-long reads

Miten Jain, Sergey Koren, Karen H. Miga, Josh Quick, Arthur C. Rand, Thomas A. Sasani, John R. Tyson, Andrew D. Beggs, Alexander T. Dilthey, Ian T. Fiddes, Sunir Malla, Hannah Marriott, Tom Nieto, Justin O'Grady, Hugh E. Olsen, Brent S. Pedersen, Arang Rhie, Hollian Richardson, Aaron R. Quinlan, Terrance P. SnutchLouise Tee, Benedict Paten, Adam M. Phillippy, Jared T. Simpson, Nicholas J. Loman*, Matthew Loose

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

662 Citations (Scopus)

Abstract

We report the sequencing and assembly of a reference genome for the human GM12878 Utah/Ceph cell line using the MinION (Oxford Nanopore Technologies) nanopore sequencer. 91.2 Gb of sequence data, representing ∼30× theoretical coverage, were produced. Reference-based alignment enabled detection of large structural variants and epigenetic modifications. De novo assembly of nanopore reads alone yielded a contiguous assembly (NG50 ∼3 Mb). We developed a protocol to generate ultra-long reads (N50 > 100 kb, read lengths up to 882 kb). Incorporating an additional 5× coverage of these ultra-long reads more than doubled the assembly contiguity (NG50 ∼6.4 Mb). The final assembled genome was 2,867 million bases in size, covering 85.8% of the reference. Assembly accuracy, after incorporating complementary short-read sequencing data, exceeded 99.8%. Ultra-long reads enabled assembly and phasing of the 4-Mb major histocompatibility complex (MHC) locus in its entirety, measurement of telomere repeat length, and closure of gaps in the reference human genome assembly GRCh38.

Original languageEnglish
Pages (from-to)338-345
Number of pages8
JournalNature Biotechnology
Volume36
Issue number4
DOIs
Publication statusPublished - 1 Apr 2018
Externally publishedYes

Bibliographical note

Funding Information:
We acknowledge the support of Oxford Nanopore Technologies staff in generating this data set, in particular R. Dokos, O. Hartwell, J. Pugh, and C. Brown. We thank M. Akeson for his support and insight. We thank R. Poplawski and S. Thompson for technical assistance with configuring and using cloud-based file systems with millions of files on CLIMB. We thank W. Timp and R. Workman for generating the R9.4 methylation training data for nanopolish. We thank T. Allers for assistance with PFGE. We thank A. Pizarro at Amazon Web Services for hosting the human genome data set as an Amazon Web Services Open Data set. This study utilized the computational resources of the NIH HPC Biowulf cluster (https://hpc.nih.gov). This study was partially supported by the UK Antimicrobial Resistance Cross Council Initiative (JOG: MR/N013956/1), Rosetrees Trust (JOG: A749), the Intramural Research Program of the National Human Genome Research Institute, National Institutes of Health (S.K., A.D., A.R., A.M.P.), the BBSRC (M.L.: BB/N017099/1 and BB/M020061/1), the Canadian Institutes of Health Research (J.R.T.,T.P.S.: #10677), Brain Canada Multi-Investigator Research Initiative Grant with Genome British Columbia the Michael Smith Foundation for Health Research, and the Koerner Foundation (J.R.T.,T.P.S.), the Canada Research Chair in Biotechnology and Genomics-Neurobiology (T.P.S.), the Ontario Institute for Cancer Research and the Government of Canada (J.T.S.: OGI-129), the US National Cancer Institute (A.R.Q., B.S.P., T.A.S.: NIH U24CA209999), the Wellcome Trust (A.D.B.: 102732/Z/13/Z, M.L.: 204843/Z/16/Z), Cancer Research UK (A.D.B.: A23923), the MRC (A.D.B.: MR/M016587/1), the MRC Fellowship in Microbial Bioinformatics as part of CLIMB (N.J.L.) and the NIHR Surgical Reconstruction and Microbiology Research Centre (SRMRC).

Publisher Copyright:
© 2018 Nature Publishing Group. All rights reserved.

Fingerprint

Dive into the research topics of 'Nanopore sequencing and assembly of a human genome with ultra-long reads'. Together they form a unique fingerprint.

Cite this