Editing Narration and Speech for Video

With video presentations and pre-recorded video streams taking the place of many “in-person” activities—from church services to product demonstrations—the quality of recorded narration and speech is more important than ever. Fortunately, those working with video programs already have many of the tools needed to edit speech, without having to use dedicated audio software. Furthermore, audio plug-ins can provide additional functionality.

The examples given here are based on Magix Vegas, a popular video editing program for Windows. However, the same techniques apply to most other video editing programs, as well as audio-only programs if the speech is being edited outside of the video production environment.

Controlling P-Pops

P-pops result from plosives, when a sudden rush of air from a sound like “p” or “b” hits the mic. Although it’s best to prevent pops at the source with pop filters (Fig. 1) and proper mic technique (e.g., not speaking too close to the mic), editing can fix pops as well.

1_Pop Filter

Figure 1: Gator’s RI-Popfilter is an inexpensive, nylon-screen pop filter that controls pops at the source. Its C-clamp fits most standard mic stand shafts and booms.

Locate the pop in the audio; pops have a distinctive shape you’ll learn to recognize. Split the audio just before the pop begins, and add a fade-in. The fade-in length and curve controls how much you reduce the pop (Fig. 2).

2_P-Pop reduce

Figure 2: The upper image shows the original file. The middle image shows fading in on a p-pop to reduce the low-frequency “pop” that can occur from plosives and directional mics. The bottom image shows that the “p” sound is still present, but its level has been controlled by the fade-in.

Reducing Breath Noise

Although it’s possible to cut breath noises, if you remove all of them speech sounds unnatural—we’re humans, and we breathe. I usually cut some inhales, but also reduce the volume of others. You do this by splitting an audio clip at the start and end of the inhale, and reducing only the inhale’s level. Depending on the program, this can be done with volume automation, or varying the clip’s level (Fig. 3).

3_Reduce Breath Noise

Figure 3: Gain is being reduced for only the isolated clip of breath noise.

De-Essing Sibilants

Those nasty “ess” sounds can be a problem, particularly if you use compression or limiting, or increase the treble to make the voice more intelligible. Although there are de-esser plug-ins (like IK Multimedia’s T-Racks De-Esser) and dedicated hardware units, you can also use a Multiband Dynamics plug-in (Fig. 4), and compress only the high frequencies to reduce ess sounds.

4_Multiband dynamics

Figure 4: The Multiband Dynamics processor in Vegas reduces ess sounds automatically.

When you need a really fine degree of control, you can also de-ess manually. The technique is similar to fixing breath noises. Locate the sibilant (the waveform will look like a ball of dense sound; see Fig. 5), split before and after the ess, and lower the level.

5_Reduce Sibilance

Figure 5: It’s possible to de-ess manually by locating the ess sound, isolating it, and reducing its level.

Attenuating Wind Noise

When recording outside, especially with a shotgun or lavalier mic, wind noise often comes along as an unwelcome guest. Like p-pops, it’s better to stop this at the sound with an appropriate acoustic wind shield filter (Fig. 6).

6_Wind shield

Figure 6: The Røde Minifur-Lav is an artificial fur wind shield designed specifically for lavalier mics.

However if the wind noise is already “baked into” a track, you still have some options. Fortunately, most wind noise consists of low frequencies, many of which are below the range of the human voice. Adding a steep, low-cut filter (use the steepest rolloff possible; see Fig. 7), just below the voice range can help. This probably won’t eliminate wind noise, but its volume will likely be lower.

7_Reduce Wind

Fig. 7: Vegas’s Track EQ plug-in is using a low-cut shelf filter to reduce low frequencies. Note the relatively sharp filter rolloff of 24 dB/oct.

Dealing with Mouth Clicks

These are sharp, trebly, short-duration transients that distract from the narration. The easiest solution is to use iZotope’s RX7 restoration plug-in, whose Mouth De-Clicker function (available in all RX7 versions except RX7 Elements) is almost 100% effective. If you don’t have RX7, locate the click, which will look like a needle in the waveform (Fig. 8).

8_Mouth Click

Figure 8: A mouth click (circled in white) is an extremely short, high-frequency transient.

When you split a waveform in Vegas, it adds an automatic, short fade-in and fade-out to prevent clicks at the split point. We can use this to advantage with mouth clicks: split directly in the center of the click, and the fades will eliminate it most of the time (Fig. 9). If not, simply cut the section with the click. Mouth clicks are of such short duration that cutting it will generally not produce an audible discontinuity.

9_Kill Mouth Click with Split

Figure 9: Splitting exactly on a mouth click in Vegas will almost always get rid of it.

Phrase-by-Phrase Gain Changes

To achieve a consistent voice level, many people use a compressor or limiter to narrow the dynamic range. However, sometimes this can produce pumping and other undesirable sonic artifacts. I prefer to use normalization or gain changes to make individual phrases more uniform (Fig. 10). Because you’re changing level—it’s no different from turning up a volume control—there are no artifacts. Then, if you want to add compression or limiting, you can use a much lesser amount, and attain better consistency than you could by using dynamics processing alone.

10_Phrase by phrase

Figure 10: The upper and lower waveforms are the same audio file. However, the lower version has been split in multiple places to isolate individual phrases, and the gain varied for the phrases to create a more uniform level.

Adjusting Equalization

The ear is most sensitive around 3-4 kHz, so boosting voice frequencies in that range will make the voice “pop” more. However, a little goes a long way; excessive boosting can make the voice harsh and screechy. Similarly, adding a gentle high-frequency shelf can increase the overall articulation, while trimming the low frequencies helps reduce “boominess” (although you don’t want to cut too much, or the voice will lose “warmth”). Fig. 11 shows a typical EQ curve for male voice.

11_Vocal EQ

Figure 11: This EQ curve is designed to make a male voice stand out more. The exact settings will differ for various speakers, because their voices have different timbres.

Specialized Vocal Processing Tools

We mentioned RX7’s mouth de-click function, but the entire family of iZotope RX7 audio restoration tools is exceptional. Recently I was given some audio to repair that was recorded by actress Ali McGraw, and the voice had a considerable amount of background noise throughout. RX7’s Voice De-Noise function (Fig. 12) got rid of most of it, which made a major improvement in the overall quality.

12_Voice De-Noise

Figure 12: Voice is being separated from the noise, so that the noise can be attenuated independently.

RX7 even has a function where you can split a phrase that runs together into two phrases, and change the ending pitch of the first phrase so that it sounds like the person actually ended the sentence there—sort of like adding a virtual period.

Of the various RX7 versions, RX7 Elements has basic repair options, but I’d recommend the Standard version, which has a comprehensive array of tools. The Advanced version offers features useful for complex applications, like fixing problematic movie dialog. For example, it can isolate dialog from noise, “de-rustle” the sound that often occurs with lavalier mics, reduce wind noise, minimize reverb, and even match ambiances—ideal when looping dialog that needs to match the sound of dialog recorded on location.

Magix Sound Forge Pro is another option. This software program’s main orientation is sophisticated audio editing, but it also includes restoration tools of its own, and incorporates iZotope RX Elements and iZotope Ozone Elements for signal processing. If you don’t have extreme restoration needs, Sound Forge Pro is an excellent choice because of the extensive editing and format conversion capabilities. Among other functions, editing allows removing “ummms,” coughs, and the like.

The Fix Is In

It’s worth taking the time to use these techniques to turn “okay” narration or presentations into polished, professional-level audio. The impact of quality audio on the listener is huge, and regardless of the subject matter, superior sound quality not only attracts attention, but gives a feeling of confidence in what’s being said.

Leave a Reply