Audio Enhancement starts with the ear. Sound pressure waves are funneled by our outer ear to our ear canal, where they reverberate our tympanic membrane, aka. eardrum, the same as a stick on a drum. We can only perceive a small window of the acoustic frequency range, human hearing ranges from 20 Hz to 20,000 Hz (20 kHz). But all sounds are not received equally, we are most receptive to sounds in the 2 kHZ to 5 kHz frequency range. This range of optimum hearing is the reason that fire alarms are so high pitched. To us they sound very loud, irritating and even painful, when they have almost the same sound pressure level as a car horn. This has been developmentally helpful for our species, when our ancestors were concerned with the detection of prey and predators who use high frequency noises.

Unfortunately, in the arena of forensic audio enhancement our ears aren’t on our side. While we hear best at ranges of 2 kHz to 5 kHz, conversations occur between 80 Hz and 3 kHz, only overlapping between 2 kHz and 3 KHz. Biologically, we don’t hear each other speak as well as we hear some environmental noises. When a recording is made, but recorded speech cannot be clearly heard, it may require forensic audio enhancement.

Before enhancement can be done, the cause of the issue has to be identified. Is the speaker too far from the recording device? Was the recording device hidden in a pocket or purse, and now the speech sounds muffled? Is there loud noise which makes it difficult to hear the speech, such as music or a car engine? Is there electronic interference causing a hiss? These are common problems and all have different solutions.

Though the method will depend on the cause of the problem, the solution is generally always the same. Identify the frequency band in which speech exists, amplify that, while also decreasing the intensity of other bands.

For instance, in one scenario a suspect of a police investigation was speaking with an undercover officer who had a secret recording device in a busy parking lot. The suspect said something of great legal significance while a truck was reversing, and what he said could not be clearly heard over the loud beeps of the truck. In this situation I was able to determine the suspect (who was very deep voiced) had a speech range between 80 and 2,200 Hz, and the truck’s reversal beeps operated at a higher range, roughly 1,000 to 6,000 Hz. In this situation the speech was able to be recovered by amplifying the 80-1000 Hz range, and reducing all frequencies above that. While it did slightly distort the voice of the speaker, it made his speech much more intelligible.

box filter
An example of this type of filter, sometimes called a box filter.

Challenges occur when the interfering noise operates in the same frequency range as the speaker. This is a problem best exampled when music is involved. The best way to learn audio enhancement is to pick a song with a singer and multiple instruments, and try to remove each element one at a time. Instruments tend to have very narrow frequency ranges, so it can be easy to remove most instruments such as flutes, guitars and pianos. However, when you try to separate instruments like drums and trumpets from the singer, you will start to have trouble. This is because those instruments have a very similar fundamental frequency range as human speech.

The most destructive noise for speech is white noise, which is often called ‘static’. White noise can be caused by electronic interference, such as when a digital transmission device’s signal is blocked. The classic example is a radio which loses its connection. White noise is especially problematic because it is evenly distributed across the entire audible frequency range, from 20 to 20,000 Hz. It can’t be isolated and reduced like other sounds can. That’s the reason that some people sleep better at night with a white noise machine near their bed, because any other noise is less likely to be heard over it and interrupt their sleep.

Audio Enhancement is a fundamental practice in any type of audio forensic investigation, which is made challenging by every recording being different. Identifying the spectral characteristics of the speaker as well as the interrupting noise is not just a science but also a skill that should be practiced.