We report the sequencing and assembly of a reference genome for the human GM12878 Utah/Ceph cell line using the MinION (Oxford Nanopore Technologies) nanopore sequencer. 91.2 Gb of sequence data, representing ∼30× theoretical coverage, were produced. Reference-based alignment enabled detection of large structural variants and epigenetic modifications. De novo assembly of nanopore reads alone yielded a contiguous assembly (NG50 ∼3 Mb). We developed a protocol to generate ultra-long reads (N50 > 100 kb, read lengths up to 882 kb). Incorporating an additional 5× coverage of these ultra-long reads more than doubled the assembly contiguity (NG50 ∼6.4 Mb). The final assembled genome was 2,867 million bases in size, covering 85.8% of the reference. Assembly accuracy, after incorporating complementary short-read sequencing data, exceeded 99.8%. Ultra-long reads enabled assembly and phasing of the 4-Mb major histocompatibility complex (MHC) locus in its entirety, measurement of telomere repeat length, and closure of gaps in the reference human genome assembly GRCh38.
Bibliographical noteFunding Information:
We acknowledge the support of Oxford Nanopore Technologies staff in generating this data set, in particular R. Dokos, O. Hartwell, J. Pugh, and C. Brown. We thank M. Akeson for his support and insight. We thank R. Poplawski and S. Thompson for technical assistance with configuring and using cloud-based file systems with millions of files on CLIMB. We thank W. Timp and R. Workman for generating the R9.4 methylation training data for nanopolish. We thank T. Allers for assistance with PFGE. We thank A. Pizarro at Amazon Web Services for hosting the human genome data set as an Amazon Web Services Open Data set. This study utilized the computational resources of the NIH HPC Biowulf cluster (https://hpc.nih.gov). This study was partially supported by the UK Antimicrobial Resistance Cross Council Initiative (JOG: MR/N013956/1), Rosetrees Trust (JOG: A749), the Intramural Research Program of the National Human Genome Research Institute, National Institutes of Health (S.K., A.D., A.R., A.M.P.), the BBSRC (M.L.: BB/N017099/1 and BB/M020061/1), the Canadian Institutes of Health Research (J.R.T.,T.P.S.: #10677), Brain Canada Multi-Investigator Research Initiative Grant with Genome British Columbia the Michael Smith Foundation for Health Research, and the Koerner Foundation (J.R.T.,T.P.S.), the Canada Research Chair in Biotechnology and Genomics-Neurobiology (T.P.S.), the Ontario Institute for Cancer Research and the Government of Canada (J.T.S.: OGI-129), the US National Cancer Institute (A.R.Q., B.S.P., T.A.S.: NIH U24CA209999), the Wellcome Trust (A.D.B.: 102732/Z/13/Z, M.L.: 204843/Z/16/Z), Cancer Research UK (A.D.B.: A23923), the MRC (A.D.B.: MR/M016587/1), the MRC Fellowship in Microbial Bioinformatics as part of CLIMB (N.J.L.) and the NIHR Surgical Reconstruction and Microbiology Research Centre (SRMRC).
© 2018 Nature Publishing Group. All rights reserved.