Machine learning helped to diagnose cancer at the DNA of microorganisms in the blood

Scientists have created the most complete library of traces of nucleic acids of the microbiota in the analyses of cancer patients and with the help of machine learning methods have identified microorganisms that are specific for different types of tumors. In the future this will allow to create a new universal method of diagnosing cancer in its early stages by a blood test. Article published in the journal Nature.

Cancer is usually regarded as a disease that is directly related to the human genome, but recent studies have found that the microbiome also influences on the tumor and prevents their effective treatment. Still not known what contribution the inhabitants of our body contribute to the development of various types of cancer. To explore this relationship difficult because in the process of collecting and analyzing samples a high probability to contaminate their complementary nucleic acids.

The search for simple and effective analysis, which could identify cancer of various types in the early stages, employs dozens of researchers around the world. Now developing a series of teststhat determine the tumor’s blood sample; as markers, using specific proteins or mutant DNA. Modern technology of statistical processing of the results of DNA sequencing allow to exclude from the analysis the genetic traces that were included in the sample from the outside, and gives them the opportunity to diagnose cancer on the composition of the microbiota.

A group of scientists under the leadership of Gregory Pura (Gregory Poore), University of California, San Diego, analyzed more than 18 thousand samples of 33 tumors from the cancer genome Atlas for the presence of microbial DNA or RNA. To the database use two independent selection method and excluded material, which could be due to technical error or contamination of the sample.

The researchers then applied to the data (untagged data zastraivaya 17 thousand samples of 32 types of cancer) stochastic gradient boosting, which was to distinguish tumor samples from normal and classify them according to types of cancer. To check the result of the sample broke into two parts, trained the algorithm on each of them separately, and then applied the obtained model to the other half of the data. After this, selected samples only patients in the first and second stages of cancer (diagnosis existing methods work poorly in the early stages) and have trained on them.

The final stage was to test the algorithm in real-life conditions. To do this, the researchers took blood tests from 69 healthy individuals and 100 patients in the third or fourth stage of one of the three types of cancer (prostate, lung or melanoma) and sequenced plasma extracellular DNA. To the obtained samples of microbial DNA uses an algorithm developed at the cancer genome Atlas.

In the end, even after rejections by more than 90 percent of the data, the model successfully determined the type of tumor as in all stages, and only at the earliest. When testing in real conditions, the algorithm identified more than 90 percent of patients with cancer, and gave no false-positive results for healthy people. The model in 81% of cases correctly distinguished the samples of patients with lung cancer and prostate cancer.

It remains unclear how the organisms whose DNA was used in the analysis are associated with cancer development. It is also unclear where these creatures into tumors in immune cells or connective tissue — traces of them could get into the blood in any of these cases. It is not even known whether they were alive during the capture analysis.

In addition to traces of microorganisms, researchers are trying to highlight other markers of tumors. In 2018, scientists have created a test that allows high accuracy to diagnose eight types of cancer in the early stages. The basis of this analysis is the determination of extracellular DNA and proteins characteristic of tumors.

Alice Bahareva

Leave a Reply

Your email address will not be published.