by Anthony W. Accurso
New techniques using artificial intelligence to analyze voices fall short of meeting the standard for court admissibility, but that hasn’t stopped police from coercing plea deals out of defendants while claiming the “evidence” against them is sound.
For over a 100 years, the ability to identify a person’s voice in a recording has been linked to criminal investigation. Known as “voiceprinting” or “vocal fingerprinting,” this kind of forensics has historically been disfavored in U.S. courts. This is because vocal analysis has inherent difficulties and suffers from a lack of underlying data validating its reliability.
One of the most notorious intersections between courts and voiceprinting involved Edward Lee King, a man suspected of looting and arson in the 1965 riots in Watts, a Los Angeles neighborhood. CBS aired an interview with a man – whose face was not shown – who admitted to torching a store he looted as part of the riots. Police later arrested King on unrelated drug charges and found the business card of a CBS staffer in his wallet.
Police then secretly recorded King and asked Lawrence Kersta, an engineer at Bell Labs, to compare the recording of King’s voice to the voice from the CBS interview. He claimed the voices matched, and his analysis popularized the use of sound spectrograms and identifying voiceprints.
After Kersta’s testimony led to King’s conviction, speech scientists and acoustical engineers made a public stand, eventually convincing a judge to reverse the conviction. The event sparked a flurry of research which entirely discredited Kersta’s method of analysis. According to the Journal of Law and the Biosciences: “the eulogy for voiceprints was given by the National Academy of Sciences in 1979, following which the FBI ceased offering such experts … and the discipline slid into decline.”
This didn’t stop the U.S. Coast Guard from going after Nate Libby on the basis of voiceprint evidence. On December 3, 2020, the Coast Guard received a call for help over a radio channel reserved for emergency use. The broadcaster claimed a 42-foot boat with three occupants was sinking just off the Maine coast in water that can induce hypothermia in as little as 30 minutes of exposure. The transmission ended before the Coast Guard could get a definite location.
After some attempts to locate the ship, its crew, or any wreckage, Maine Marine Patrol officer Nathan Stillwell began investigating the event as a fraudulent distress call over maritime radio, a federal class D felony. Some people said the voice on the call might be that of Libby, a dock worker at Atwood Lobster Company.
Stillwell interviewed Libby and a coworker, Duane Maki, about the incident, but he also surreptitiously recorded their voices for comparison against the recording of the distress call. He sent the recordings of the distress call and Libby and Maki’s voices to Dr. Rita Singh, a computer scientist at Carnegie Mellon University and author of the textbook Profiling Humans From Their Voice. Dr. Singh had been consulting for the Coast Guard since at least 2014, when she claims, she was first asked to review voice samples which “helped solve [a] crime.”
Dr. Singh used computational algorithms – trained AI programs – to decide that “the unknown voice in the four mayday recordings came from the same speaker as Person 1, who identified himself as Nate Libby.”
Using this evidence, Libby was indicted and eventually persuaded to take a plea deal. He was sentenced to time served, three years of supervised release, and payment of $17,500 in restitution.
James L. Wayman, a voice recognition expert who works on a subcommittee of the U.S. National Institute of Standards and Technology, said “[i]t’s surprising that the old term has come back into vogue,” noting that the “FBI has frequently testified against the admissibility of voice evidence in cases, which is a really interesting wrinkle.”
He says there are too many uncontrollable variables when trying to identify a person recorded in two different contexts, such as “channel effects,” the influence background noise has on how a voice sounds.
Science historian Mara Mills, who co-authored an essay on voiceprinting and the courts, has stated that even modern techniques lack sufficient accuracy for serious convictions. “People do try to use it as evidence in courts, but it’s not the kind of thing that would send someone to jail for life,” Mills said. “Even with machine learning, that kind of certitude isn’t possible with voiceprinting.”
Some jurists have dubbed the kind of analysis Dr. Singh performed as conforming to “the sharpshooter fallacy,” where “someone fires a bullet in the side of a barn and then draws a circle around the bullet hole afterward to show they’ve hit their mark.” If Libby sounds similar enough to the voice in the distress call, he will be identified as the unknown person when compared to Maki’s voice, which sounds distinctly different.
What is missing is a comparison of several voices similar to Libby’s and a methodological explanation of which characteristics from Libby’s voice are similar to the unknown person in the recording, accompanied by statistics of these vocal characteristics as present in the general population. These explanations are required in order for any field of forensics to be considered scientific enough to be used as evidence in court.
The analysis returned by Dr. Singh to the Coast Guard did not appear to have such details, or none were disclosed as part of the journalistic investigation into Libby’s case. Dr. Singh’s work is also controversial for the claims she makes about being able to describe a person’s physical characteristics based on their voice.
During the Coast Guard consult in 2014, Dr. Singh claimed to have discerned several identifying characteristics of the unknown person in the recording. “I was able to tell them how old the person was, how tall he was, where he was from, probably where he was at the time of calling, approximately what kind of area, and a bunch of things about the guy.”
Dr. Singh has, in other contexts, claimed to be able to reverse this process and recreated what she claimed to be the voice of the 17th-century Dutch painter Rembrandt, based on a portrait of him. Good science differs from pseudoscience in that it can be validated or debunked, but there is no way to verify whether the voice generated by Dr. Singh from Rembrandt’s painting is anything like the actual voice of that historical person.
Voices can change over time or alter because of accidents or lifestyle choices. People can train their voices to sound different, or they can train AIs to reproduce the voices of others. Absent very controlled circumstances, it is improbable that such claims are realistic, and such kinds of analysis are unworthy of their use in criminal investigations or courts.
Further, admission of expert analysis in courts cannot currently rely on AI models because, once an AI is “trained,” the decisions it makes are opaque to its programmers. “We know how to train them, right?” said Wayman. “But we don’t know what it is exactly that they’re doing…. These are some major forensic issues.”
In the meantime, Dr. Singh believes she can profile a person from a few sentences or less. “Sometimes,” she claims, “one word is enough.” Many respond to her various claims with a single word as well – false.
As a digital subscriber to Criminal Legal News, you can access full text and downloads for this and other premium content.
Already a subscriber? Login