New AI tool searches genetic haystacks to find disease-causing variants

Scientists have developed a way to sift through millions of differences in a person’s genetic blueprint to detect those that threaten our health, and tested the new tool on a biomedical database of more than 450,000 people in the UK, according to a series of papers released on Thursday. in Science magazine.

The research marks a critical step towards harnessing the full power of the genome for medicine and demonstrates a new way artificial intelligence can be applied to human health problems, experts said.

A problem that has frustrated doctors for years stems from the fact that, although we are 99.6 percent similar at the level of DNA, each of us has an average of 4 million variants, sections of the genetic code in which we differ from each other .

It has been extremely difficult to determine which ones cause disease and which ones don’t, said Kyle Farh, vice president of artificial intelligence at San Diego-based biotech company Illumina.

Farh and an international team of nearly 100 researchers have created an algorithm designed to help medicine clarify some of the uncertainties. Our goal is to eliminate variants of unknown significance, which is the main roadblock to unlocking the value of genomic medicine, Farh said.

Just as ChatGPT can learn to predict human speech by having engineers feed it a large amount of text, the new algorithm has been trained to make medical predictions based on reading genomes.

The scientists built the algorithm, dubbed PrimateAI-3D, using the genetic blueprints of 233 different primate species. This basis highlights the variants that can be tolerated by primates, including humans, and those that prove deadly. Scientists look for places where the sequence is the same from one primate to another, a clear sign that any change is disastrous.

It’s a brilliant idea. As soon as I read the paper, I sent it to my team and said, “We have to move forward,” said Stephen Kingsmore, president and CEO of Rady Children’s Institute for Genomic Medicine, a San Diego-based facility that reverse-engineers genomes of 1,000 families per year for 90 hospitals in the United States.

Kingsmore said that in about a quarter of cases, doctors sequence a patient’s genome only to find a variant with an unknown health impact.

We’re doing him a great disservice, him She said. Parents raise their hands and say: does the child have a disease or not? and we can only say, Maybe.

Until now, hospitals looking at genetic variants in their patients have often consulted a large archive called ClinVar. The new PrimateAI-3D algorithm scans about 70 million genetic variants, a selection that is more than 1,000 times larger than ClinVar, Farh said.

The 3D in the name refers to the three-dimensional structure of proteins, a key factor in distinguishing which mutations will wreak havoc. Many diseases are caused by mutations that damage a protein or cause the body to produce too much or too little of it.

It’s unclear how much of a difference the algorithm will make to the course of everyday medicine, but they show it surpasses anything we currently have, said Bruce Gelb, director of the Mindich Child Health and Development Institute at the Icahn School of Medicine at Mount Sinai.

Gelb, who was not part of the study team, said he saw an older version of the algorithm described in Nature Genetics in 2018. The older version was based on just six nonhuman primate species, as opposed to the 233 primate species in the new version. That’s a very big increase, and it gives it a lot more statistical power to find things, Gelb said.

Matthew Lebo, who directs the Molecular Medicine Laboratory at Mass General Brigham, said PrimateAI-3D won’t eliminate the problem of finding variants of unknown significance, but it will help doctors prioritize which variants they are studying for a specific disease.

The new tool should also help pharmaceutical companies search for new drugs. Clinical trials often fail because genetic scientists are targeting it incorrectly and not relevant to the disease, Farh said. The use of artificial intelligence and genomics to select the right targets should significantly reduce the failure rate of late-stage clinical trials.

Illumina said it will make the new tool widely available in future releases of its software products.

By testing the new algorithm on hundreds of thousands of patient genomes in the UK biobank, we found that 97 percent of the general population carry a rare variant that has some kind of significant health effect, Farh said. While the algorithm can’t account for the influence of diet and environmental factors, she explained, we can basically predict people’s cholesterol and glucose levels, and thus their risks of cardiovascular disease or diabetes, from the genome by predicting the effects of these variants.

Kingsmore said genome science has forced medicine into artificial intelligence for years because of the size of our genetic blueprint. A genome is a long code written in four letters: A, T, G, C. Each letter represents one of the four chemical bases of which our DNA is made up: adenine, thymine, guanine and cytosine. A complete genome is like a stairway containing about 3 billion steps, with a pair of letters at each.

The National Institutes of Health estimates that genome sequencing is now generating up to 40 billion gigabytes of data each year, the equivalent of about 10 million complete genomes.

The reason AI is so good, he said, is that the medical workforce is so ill-prepared to extract answers from such an ocean of data.

#tool #searches #genetic #haystacks #find #diseasecausing #variants

Leave a Comment