Adobe Audition Digital Audio Primer
Understanding the fundamentals of sound is the first step in learning about digital audio. In this primer, we’ll introduce the basics of sound so you can work more effectively with Adobe® Audition™ and the rest of your digital audio or video toolkit.
Sound is created by vibrations, such as those produced by a guitar string, vocal cords, or a speaker cone. These vibrations move the air molecules near them, forcing molecules together, and as a result raising the air pressure slightly. The air molecules that are under pressure then push on the air molecules surrounding them, which push on the next set of air molecules, and so forth, causing a wave of high pressure to move through the air; as high pressure waves move through the air, they leave low pressure areas behind them. When these pressure lows and highs—or waves—reach us, they vibrate the receptors in our ears, and we hear the vibrations as sound.
When you see a visual waveform that represents audio, that waveform represents these pressure waves. The zero line in the waveform is the pressure of air at rest. When the line swings up, it represents higher pressure, and when it swings low, it represents lower pressure. This waveform is the equivalent of the pressure waves in the air.
Amplitude reflects the change in pressure from the peak of the waveform to the trough. Cycle describes the amount of time it takes a waveform to go from one amplitude, all the way through its amplitude changes, until it reaches the same amplitude again. Frequency describes the number of cycles per second, where one Hertz (Hz) equals one cycle per second. That is, a waveform at 1,000 Hz goes through 1,000 cycles every second. Phase measures how far through a cycle a waveform is. There are 360 degrees in a single cycle; if you start measuring at the zero line, a cycle reaches 90 degrees at the peak, 180 degrees when it crosses the zero line, 270 degrees at the trough, and 360 degrees when it completes at zero. Wavelength is the distance, measured in units such as inches or centimeters, between two points with the same degree of phase.
When two or more sound waves meet, their amplitudes add to and subtract from each other. If the peaks and troughs of the two waveforms line up, they are said to be in phase. In this case, each peak adds to the peak in the other waveform, and each trough subtracts from the other troughs, resulting in a waveform that has higher amplitude than either individual waveform.
Sometimes the peaks of one waveform match up with the troughs of the other waveform. The peaks and the troughs will cancel each other out, resulting in no waveform at all. Such waveforms are said to be 180 degrees out of phase.
In all other cases, the waves are out of phase by some other amount. This results in a waveform that is more complex than either of the original waveforms; continuing to add waves makes a more and more complicated waveform. Keep in mind, however, that a single instrument can create extremely complex waves on its own because of the unique structure of the instrument, which is why a violin and a trumpet sound different even when playing the same note. When you see music, voice, noise, or other complicated sound represented by a waveform, you are seeing the result of adding all of the waveforms from each sound together.
A microphone works by converting the pressure waves of sound into changes in voltage on a wire. These changes in voltage match the pressure waves of the original sound: high pressure is represented by positive voltage, and low pressure is represented by negative voltage. Voltages travel down the microphone wire and can be recorded on to tape as changes in magnetic strength or on to vinyl records as changes in amplitude in the groove. A speaker works like a microphone in reverse, taking the voltage signals from a microphone or recording and vibrating to re-create the pressure wave.
Unlike analog storage media such as magnetic tape and vinyl records, computers store audio information digitally as a series of zeroes and ones. In digital storage, the original waveform is broken up into individual samples. This is known as digitizing or sampling the audio, and is sometimes called analog-to-digital conversion. The sampling rate defines how often a sample is taken. For example, CD-quality sound has 44,100 samples for each second of a waveform.
The higher the sampling rate, the closer the shape of the digital waveform will be to that of the original analog waveform. Low sampling rates limit the range of frequencies that can be recorded, which can result in a recording that poorly represents the original sound.
The sampling rate limits the frequency range of the audio file; to reproduce a given frequency, the sampling rate must be at least twice that frequency. For example, if the audio contains audible frequencies as high as 8,000 Hz, you need a sample rate of 16,000 samples per second to represent this audio accurately in digital form. This calculation comes from the Nyquist Theorem, and the highest frequency that can be reproduced by a given sample rate is known as the Nyquist Frequency. CDs have a sample rate of 44,100 samples per second that allows sampling up to 22,050 Hz, which is higher than the limit of human hearing, 20,000 Hz.
Just as the sample rate determines the frequency resolution, the bit depth determines the amplitude resolution. A bit is a computer term meaning a single number that can have a value of either zero or one. A single bit can represent two states, such as on and off. Two bits together can represent four different states: zero/zero, one/zero, zero/one, or one/one. Each additional bit doubles the number of states that can be represented, so a third bit can represent eight states, a fourth 16, and so on.
Amplitude resolution is just as important as frequency resolution. Higher bit-depth means greater dynamic range, a lower noise floor, and higher fidelity. When a waveform is sampled, each sample is assigned the amplitude value closest to the original analog wave. With a resolution of two bits, each sample can have one of only four possible amplitude positions. With three-bit resolution, each sample has eight possible amplitude values. CD-quality sound is 16-bit, which means that each sample has 65,536 possible amplitude values. DVD-quality sound is 24-bit, which means that each sample has 16,777,216 possible amplitude values.
Where Adobe Audition fits into the process
When you record audio on your computer, Adobe Audition tells the sound card to start the recording process and specifies what sampling rate and bit depth it should use. The hardware that the sound card uses determines the sample rates and bit depths that it is capable of recording. Most cards are capable of recording and playback at CD-quality settings, and often at other settings as well, such as a 48 kHz sample rate, which is common in film and video post-production. Your sound card probably has both Line In and Microphone In ports through which it can accept analog signals. The sound card samples the audio at the specified sample rate and assigns each sample an amplitude value. Adobe Audition stores each sample in sequence until you stop recording. Once you’ve recorded the audio, you can use Adobe Audition to edit the audio or save it to disk as a file.
When you play a file in Adobe Audition, the process happens in reverse. Adobe Audition tells the sound card that it is going to play a file, and sends the samples to the sound card. The sound card reconstructs the original waveform and sends it out as an analog signal from the Line Out port to your speakers.
An audio file on your hard drive, such as a WAV file, consists of a small header telling the audio program what the sample rate and bit depth of the audio is, and then a long series of numbers, one for each sample. These files can be very large. At 44,100 samples per second and 16 bits per sample, for example, a file includes 705,600 bits per second. This equals 86 kilobytes per second and more than 5 megabytes per minute. Stereo sound has two channels, so CD-quality sound requires a little more than 10 megabytes per minute.
In contrast to a digital audio file, a MIDI file might be as small as 10 kilobytes per minute, so you can store up to one hundred minutes of MIDI per megabyte. MIDI and digital audio are fundamentally different: digital audio is a digital representation of a sound wave, MIDI is a language of instructions for musical instruments. A digital audio file seeks to exactly represent an audio event just like a tape recorder, whether it’s a musical performance, a person talking, or any other sound. MIDI, on the other hand, is more like sheet music. It acts as instructions for the re-creation of a musical selection. MIDI files record information such as the note to be played, the instrument to play the note on, the pan and volume of that particular note, and so on. When a MIDI file is played back, the sound card takes this information and uses its synthesizer to re-create the note on the right instrument. Because every synthesizer sounds different, the MIDI file will sound different depending on what sound card plays it back. Also, a MIDI file cannot record sounds that cannot be resynthesized from short instructions, such as the human voice. MIDI support in Adobe Audition is limited to playback of MIDI files.
To summarize, the process of sampling or digitizing audio starts with a pressure wave in the air. A microphone converts this pressure wave into voltage variations. An analog-to-digital converter, such as those found in a sound card, samples the signal at the sample rate and bit depth you choose. Once the sound has been transformed into digital information, Adobe Audition can record, edit, alter the sound of, mix, and save your digital audio files. The possibilities for manipulation of digital audio within Adobe Audition are limited only by your imagination.