Completely Mapped Human DNA

A group of around 100 international scientists have managed to decipher the human DNA code in its entirety, including large parts that are still missing. Thousands of bugs in the existing version have been fixed. This makes their new description of the human genome (called T2T-CHM13) by far the most comprehensive and accurate to date.

Twenty-one years ago, two rival groups of scientists announced (a little prematurely) the completion of the entire human genome: the sequence of “letters” of all the DNA in a human cell. DNA is made up of long chains of four chemical bases, abbreviated as A, T, C and G. The complete genome has over three billion bases. The two descriptions, both commercial from the company of the American molecular biologist Craig Venter, Celera Genomicsif that of the State Human Genome Project, were groundbreaking and valuable, but far from complete. Large complex parts were still missing and there were many duplicates and errors.

In the years that followed, researchers at the Human Genome Project filled in more and more of the gaps, even after adding their description in 2004 with a publication in Nature almost fully explained. But even their latest version (GRCh38), was still incomplete. Millions of bases were still unknown, about 8% of the total genome. Of the 23 pairs of chromosomes harboring human cell DNA, important pieces were missing, wrong, or estimated.

200 million

As of this week, it’s over. The so-called Telomere-to-Telomer (T2T) consortium published six articles on Thursday in the scientific journal Science – and there are more studies in other scientific journals. Their combined effort adds some 200 million bases of genetic information.

In one accompanying commentary in Science American geneticist Deanna Church sees the update as an important step in creating models that can be used to support personalized DNA analysis and medical treatment.

When they divide, the chromosomes are shaped like an X. The missing parts that have now been mapped are the short upper parts of five chromosomes, long stretches with many repeats, and the ends (telomeres) and junctions (centromeres) of chromosomes.

“It’s great that this has finally happened,” says Johan den Dunnen, professor emeritus of medical genome technology at LUMC. “This first version was sold as complete at the time, but now it’s really complete – except for the Y chromosome.”

Until recently, newly mapped coins were difficult to disentangle. A piece of DNA to be sequenced could be up to a thousand bases long with previously used technology. That’s why genome researchers cut DNA into random pieces. After determining the DNA sequence, they reassemble the pieces. But some biologically important elements are a hundred times larger and contain so many repeating pieces that the puzzle cannot be put together. With the most modern techniques, long read and ultra-long read sequencingchunks of 20,000 and even 100,000 bases could be accurately read in the T2T project.

crawling sperm

Another advantage was the source of the DNA used by the researchers. A normal cell has two copies of each chromosome, and they differ from each other because each copy comes from one of the parents. T2T scientists used DNA from a cell line derived from a proliferating spermatozoon, in which all pairs of chromosomes are identical. This makes it easier to determine the DNA sequence of each chromosome.

However, the new update still lacks the Y chromosome – the cell line only contained the X chromosome from the father (males have one X chromosome and one Y chromosome, females have two X chromosomes). Another disadvantage: this version of the human genome comes from a single individual. But it forms a base to which further research can add newly discovered variations in genes.

The new version provides a more comprehensive basis for scientific research. “By filling in the missing parts, a series of new genes have been discovered, of which at least four are medically relevant,” says Den Dunnen. For the practice of treatment, the new database may offer a solution for patients for whom there is as yet no explanation for their condition. “Until now, we’ve typically looked for abnormalities in a patient’s DNA in the 2009 HG19 release. We then use short 100-base chunks of the patient to find where the genetic changes are and see if those can lead to health issues,” says Den Dunnen. “But in this older version, parts are missing, so you miss them, or the program points to the wrong place.”

Puzzle

The next version, hg38 from 2013, already contained more, says Den Dunnen. “It solved a number of diseases whose genetic changes were precisely in these missing places.” He expects the new genetic map to again clarify certain diseases. “With this complete DNA sequence, all the 100-base pieces of the puzzle can be better placed into place.”

He hopes that the modern technique with which longer stretches of DNA can be mapped will soon become so inexpensive that it can be used as a standard in diagnostics. “With longer sequences of patient DNA, we can find the right place on the genome much better.”

The new T2T database will not immediately replace the old HGP database. Den Dunnen: “Due to all the additions, the previous work will first have to be renumbered.”

Check Also

Start Spending Less on Your Everyday Purchases

Shop Smart: Start Spending Less on Your Everyday Purchases

Remember the times when our grandmas used to clip coupons from newspapers and magazines to …

Leave a Reply

Your email address will not be published. Required fields are marked *