What Is Audio Forensics?

“…every human being possesses a distinctive voice of his own, which is as easily distinguished by the ear as are facial characteristics by the eye.”

– Quintilian
(University of Chicago, 2012)

Audio Forensics – Overcoming Disguised Speech

Quintilian was an ancient Roman rhetorician, a form of philosophy. This is likely to be the oldest remaining statement on voice identification. While there’s truth in it, had Quintilian lived in the modern era, he would be easily dismissed as an optimist.

The greatest challenge in identifying a speaker’s distinctive vocal characteristics remains the ability of those with malicious intentions to intentionally alter their voice. Problematically, as methodology for detecting disguised speech improves, so does the technology and methods of those seeking to avoid detection.

When working with a recording and a pool of suspects, the identification process involves creating a series of known exemplar recordings of the suspects speaking, and comparing it to the unknown exemplars, collected from the original recording in which the speaker’s identity is unknown. This affords the guilty party two opportunities to fool investigators. If they believe they may be recorded while committing a crime they may disguise their voice. They may later disguise their voice when asked to produce known exemplars. In the former scenario, a criminal could employ Electronic or Non-Electronic disguise methods. Electronic disguises, such as digital voice scramblers are incredibly difficult to overcome. They are completely destructive of fundamental frequency and vowel and consonant articulation information. However, at least when dealing with this form of disguise it is clear the criminal is disguising their speech. Electronic disguise is most common in cases of kidnapping or blackmail.

Non-Electronic disguise methods are extremely varied. A popular method in film is a criminal holding cloth over their mouth when making a recorded call. Apparently most criminals find this to be one step too many, because the most common methods are whispering or assuming a false accent. These methods have a notable impact on intra-speaker variation, specifically the fundamental frequency values are affected. In the case of imitating an accent, European research has found that that method is ineffective in hiding the fact the speaker’s voice is being disguised. (Adrian Leemann, 2015)

An interesting example from history is convicted serial killer Paul Michael Stephani, the Weepy-Voiced Killer. He killed four women in the Minneapolis area from 1980 to 1982. Several times he called police and in a weeping voice asked them to stop him from killing again. During his trial, voice identification was at issue, and early Aural Acoustic Speaker Identification practitioners supported an identification conclusion, which was also supported by the killer’s wife who recognized his voice while listening to the 911 calls. In this case, the disguise was only successful in altering his prosody and fundamental frequency values. (BobSeger1981, 2013)

When disguised speech is used in the original recording from which the Unknown Exemplars are gathered, sometimes identifying the presence of disguised speech is all that can be done. When a suspect is disguising their speech while providing Known Exemplar samples, the forensic examiner has significantly more opportunities to reduce its impact.

The first step is detection. Any known exemplar collection and analysis protocol should include detection practices for disguised speech. The best indicator is inconsistent phonetic characteristics. It is very difficult to consistently disguise your speech for prolonged periods of time, so evidence collection should include asking the suspect to read a long statement or passage, which should also be phonetically challenging to read aloud. Speech Language Pathologists use diagnostic passages in the diagnosis of speech disorders, as well as assess progress in treatment. These same passages are excellent tools for disguised speech detection. The suspect should also be engaged in casual conversation. Any breaks or inconsistencies might provide clues as to the suspect’s natural speaking voice.

Common methods of disguised speech when providing known exemplars is intentionally struggling in the pronunciation of words which are forensically significant to the crime being recorded, feigning poor fluency, and emphasizing syllables in unusual ways.

The key to successfully conducting a speaker comparison when you suspect disguised speech is present is to identify what elements of their speech the subject is altering. Often times the perpetrator believes they are successfully disguising their entire speech, but not knowing how speech scientists divide and categorize speech, they in fact are only affecting one criterion. If they are only altering their fundamental frequency, a successful analysis can still be done by correctly weighting their vowel and consonant articulation. If their articulation is altered, prosody, resonance, nasality, and abnormalities may still be of forensic consequence.