Twenty-one years ago, the Human Genome Project decoded the human genome for the first time—but some important regions remained unidentified at the time. An international research consortium has now filled in the gaps. For the first time, the DNA of all human chromosomes was sequenced end-to-end. As a result, sections of genes characterized by infinite repeats at the center and at the ends of chromosomes can now be read. This opens new insights into gene regulation and cell division, into the scope of variation in the human genome and into the etiology of disease.
The first sequencing of the human genome in 2001 was pioneering because it provided a reference to nearly six million base pairs and about 25,000 protein-coding genes in our genome for the first time. However, this first reference genome was not complete, comprising only about 92 percent of the entire DNA sequence. In this version, several million core positions in the genome are marked only with the letter “N” rather than one of the abbreviations for the four DNA bases. These regions, which have not yet been deciphered, are located mainly in the so-called centromeres, the central nodes of chromosomes, which are essential for cell division. But there are also undisassembled regions at the ends of chromosomes, telomeres.
New sequencing technology closes the gaps
One reason for these gaps was the limitations of the sequencing techniques used at the time: they split the genome into countless pieces of DNA, each only about a hundred bases long. They must be reassembled later in the correct order. But this is impossible when hundreds or thousands of such pieces are almost identical – this is exactly the case with the centromers and telomeres of chromosomes. The regions of the genome there consist of a myriad of repetitive DNA sequences. Reconstructing these elements based on short bits of DNA is like trying to put together a jigsaw puzzle from thousands of pieces of similar colors: “It’s like, for example, you just have bits of heaven,” explains Winston Tempe of Johns Hopkins University, He is a participant in the Telomere-to-Telomere (T2T) Consortium.
But in the meantime, sequencing technology has made progress. Two new methods now allow the genome to be split into significantly longer sections. So-called Oxford nanopore sequencing can read stretches of DNA up to a million bases long, albeit with moderate accuracy. The second system from Pacific Biosciences creates passages that are about 20,000 bases long, but can read them with an accuracy of 99 percent. T2T scientists have now used both methods together to completely decode missing parts of the human genome for the first time. The genetic material for this came from a lineage of human cells, and fortunately, all genetic material comes from only one parent. This means that the sister chromosomes are also identical, which makes sequencing easier.
New genes, new variants, and a first look at the centromere
The result of the T2T project is now the first fully decoded human genome. Nearly 200 million bases that were missing so far – almost as many bases as there are in an entire chromosome – have now been decoded. Among them, 99 previously unknown protein-coding genes and nearly 2,000 other candidate genes. The genome, nicknamed T2T-CHM13, also corrects thousands of structural errors in the previous reference genome. “We’re now seeing chapters in the Book of Life that we couldn’t read before,” says Evan Eichler of the University of Washington. “The blueprint of our entire genome will revolutionize our ideas about genetic variation, disease, and human evolution.” For example, many sections of DNA that have now been completed cover gaps in gene regions whose variants are potential causes of disease. “We can now identify them because we have a more complete and accurate reference genome,” says Karen Mega of the University of California, Santa Cruz.
Also important are new ideas about the structure of centromeres, the junctions that hold the two halves of a chromosome together. They play an important role in meiosis, the meiotic division that separates these sister chromatids. “If this step of meiosis goes wrong, chromosomal abnormalities that cause miscarriages or genetic diseases can occur,” explains Nicholas Altimos of the University of California, Berkeley. Cancer can also be the result of such a disorganized division. It is crucial to know the code for the centromeres in order to be able to identify the causes of these anomalies. This is exactly what is now possible thanks to the new T2T reference genome. “Before, all we had was a very blurry picture of what was hiding there. But now it’s clear right down to the individual DNA base,” says Ultemos.
The first complete and complete decoding of the human genome is an important milestone, comments Bob Waterston of the University of Washington, one of the collaborators on the first human genome project. “We would have loved to do this 20 years ago, but the technology wasn’t ready at the time.” But the work of the T2T union does not end there: they are already working on creating a genome with normal, to decipher the set of chromosomes that arose from both parents. They also want the genomes of people from different populations to be sequenced with similar accuracy and completeness – this could provide new insights into the similarities, differences, and evolution of different human species.
“Alcohol buff. Troublemaker. Introvert. Student. Social media lover. Web ninja. Bacon fan. Reader.”