First Complete Sequence of a Human Genome

National Institutes of Health

The Human Genome Project, completed in 2003, covered about 92% of the total human genome sequence. The technologies to decipher the gaps that remained didn’t exist at the time. But scientists knew that the last 8% likely contained information important for fundamental biological processes.

Since then, researchers have developed better laboratory tools, computational methods, and strategic approaches. The final, complete human genome sequence was described in a set of six papers in the April 1, 2022, issue of Science. Companion papers were also published in several other journals.

The work was done by the Telomere to Telomere (T2T) consortium. T2T is led by researchers at NIH’s National Human Genome Research Institute (NHGRI), the University of California, Santa Cruz, and the University of Washington, Seattle. NHGRI was the primary funder.

“Short-read” technologies were originally used to sequence the human genome. These provide several hundred bases of DNA sequence at a time, which are then stitched together by computers. Such methods still leave some gaps in genome sequences. 

Over the past decade, two new DNA sequencing technologies emerged that can read longer sequences without compromising accuracy. The PacBio HiFi DNA sequencing method can read about 20,000 letters with nearly perfect accuracy. The Oxford Nanopore DNA sequencing method can read even more—up to 1 million DNA letters at a time—with modest accuracy. Both were used to generate the complete human genome sequence.

In total, the new project added nearly 200 million letters of the genetic code. This last 8% of the genome includes numerous genes as well as repetitive DNA sequences, which may influence how cells function. Most of the newly added sequences were in the centromeres, the dense middle sections of chromosomes, and near the repetitive ends of each chromosome.

Read Full Article