How machine learning is changing crime-solving tactics

computer
Credit: Public Domain

Modern forensic DNA analyses are crucial to crime scene investigations; however the interpretation of the DNA profiles can be complex. Two researchers from the Forensics and National Security Sciences Institute (FNSSI) have turned to computer technology to assist complicated profile interpretation, specifically when it comes to samples containing DNA from multiple people.

"There is a massive amount of data that is not being considered, simply due to our limited capability as human beings," says Michael Marciano, FNSSI research assistant professor, explaining why they're counting on computers to make data-driven predictions.

Marciano and Jonathan Adelman, FNSSI research assistant professor, have developed a new method to predict the number of people contributing to mixed DNA samples, the results of which are published online in Forensic Science International: Genetics ahead of the journal's March issue.

Additionally, the duo's method, dubbed Probabilistic Assessment for Contributor Estimate (PACE), is patent pending. The SU-owned intellectual property is newly licensed to NicheVision, a forensic software company based in Akron, Ohio.

In order to "deconvolute" or separate a mixed DNA sample into individuals' genetic information, current technology requires the analyst to identify how many people contributed to the sample. Marciano likens the challenge of predicating contributor numbers to looking at a jar of colored candies, where two or three colors may be easy to spot, but more colors may be hidden in the center of the jar.

To predict the number of individuals included in a mixed sample, Marciano, a trained molecular biologist with a background in forensic DNA analysis, teamed up with Adelman, a computer scientist and statistician. Together, they applied an established computer science method called machine learning to the problem of untangling mixed DNA samples.

Machine learning, a branch of artificial intelligence, uses existing data to train computers how to solve problems on their own with new data. The method works best with complex problems and in cases with a lot of example data for the training phase, making machine learning a great match for the DNA analysis challenge, Adelman says.

While machine learning has been used extensively in other fields, from stock market trading to spam filtering, Adelman and Marciano say they've never seen it applied to forensics science. To arrive at this novel application took "two people with different backgrounds and a white board," Marciano says.

After training their algorithms on massive amounts of data from the New York City Office of the Chief Medical Examiner and the Onondaga County Center for Forensics Sciences, PACE's prediction powers were put to the test identifying the number of people included in mixed samples with known numbers of contributors—and it passed with flying colors.

As detailed in their upcoming journal article, PACE improved prediction accuracy of three- or four-person mixed samples by 6 percent and 20 percent, respectively, over current methods. What's more, PACE is able to accurately classify the samples in a matter of seconds, as compared to the up to nine hours required for current methods.

PACE represents a major leap forward in DNA analysis, Adelman says. "Incremental improvements happen in technology development all the time, but this could completely change how the problem of 'deconvoluting' mixed samples is solved," he says. "It looks like disruptive technology."

Explore further: Computer trained to predict which AML patients will go into remission, which will relapse

More information: Michael A. Marciano et al. PACE: Probabilistic Assessment for Contributor Estimation— A machine learning-based assessment of the number of contributors in DNA mixtures, Forensic Science International: Genetics (2017). DOI: 10.1016/j.fsigen.2016.11.006