What is audio?

Just a guy who loves to write code and watch anime.
Sound Is a Wave
Sound is vibration. A speaker pushes air forward and pulls it back. That creates a pressure wave. Your ear catches that wave and your brain turns it into something you hear.
┌─╮ ┌─╮
│ │ │ │
│ │ │ │
─────│───────│─│───────│──── silence
│ │
│ │
╰─────╯
one cycle
That is a waveform. The height is how loud it is. The speed of the cycles is the pitch. A deep bass note is a slow wave. A high whistle is a fast one.
Sound in the real world is smooth and continuous. It flows. A computer cannot store that. Computers only understand numbers. So we need to turn the wave into numbers.
Sampling. Catching the Wave
To record sound digitally you measure the wave over and over again. Each measurement is called a sample. You write down how high the wave is at that exact moment.
● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
●
Each ● is one sample.
A snapshot of the wave at that moment.
Do this thousands of times per second and you get a copy of the original wave. Not perfect. But close enough that your ears cannot tell.
Sample Rate. How Often You Measure
The sample rate is how many snapshots you take per second.
Sample Rate Per Second Used For
──────────────────────────────────────────────
8,000 Hz 8,000 Phone calls
44,100 Hz 44,100 Music and CDs
48,000 Hz 48,000 Video and pro audio
96,000 Hz 96,000 High-res audio
CD quality is 44,100 Hz. That is 44,100 snapshots every second.
Why that number. Humans can hear up to about 20,000 Hz. There is a rule in signal theory that says you need to sample at least twice the highest sound you want to capture. 20,000 times 2 is 40,000. They rounded up to 44,100.
Bit Depth. How Precisely You Measure
Sample rate is how often. Bit depth is how precise each sample is.
Think of it like a ruler. An 8 bit sample gives you a ruler with 256 marks. The wave has to snap to the nearest one. Not very accurate.
A 16 bit sample gives you 65,536 marks. Way more detail. The wave can be described much more closely to what it actually sounds like.
8-bit: 256 levels. Sounds grainy and rough.
16-bit: 65,536 levels. Sounds clean and smooth.
24-bit: 16,777,216 levels. Studio quality.
Low bit depth means the wave gets rounded a lot. You hear that rounding as noise and crunch. 16 bit is the standard for music. 24 bit is used in studios.
Channels. How Many Waves
Mono is one channel. One stream of samples. One speaker.
Stereo is two channels. Left and right. That is how sound can move between your ears.
Surround sound is five or six or seven channels. One for each speaker in the room.
More channels means more data. Double the channels. Double the size.
How Big Is Audio Actually
Take CD quality. 44,100 samples per second. 2 bytes per sample. Stereo.
44,100 x 2 bytes x 2 channels = 176,400 bytes per second
176,400 x 60 = 10,584,000 bytes per minute
≈ 10 MB per minute
About 10 megabytes per minute. A four minute song is roughly 40 MB raw.
A full album of 60 minutes is around 600 MB. That is why CDs hold 700 MB. It was built around this math.
How the Data Is Laid Out
The simplest layout is sample by sample. In order. Left then right. Repeating.
[Left] [Right] [Left] [Right] [Left] [Right] ...
Each pair is one moment in time.
That is raw PCM audio. No compression. No tricks. Just a flat stream of numbers that describe the wave.
WAV files are basically this. A small header with the sample rate and bit depth and channel count. Then the raw numbers. That is why WAV files are huge.
So Why Are Music Files Not 40 MB
Because compression.
Lossless Compression
FLAC is the most common lossless audio format. It looks at the raw samples and finds patterns. Audio waves are predictable. Each sample is usually close to the one before it. FLAC uses that to store the data in fewer bytes.
A 40 MB song becomes about 25 MB as a FLAC file.
Decompress it and you get back every single original number. Nothing lost. Nothing changed.
Lossy Compression
MP3 and AAC and OGG throw away sounds you probably will not notice.
They use models of human hearing. They know what your ear is bad at. A quiet sound next to a loud sound. You will not hear the quiet one. Very high frequencies. Most people cannot hear those anyway. Small details in busy parts. Your brain fills those in.
Remove all of that and a 40 MB song drops to about 4 MB. Most people cannot hear the difference on normal headphones.
But that data is gone forever. Encode to MP3 and the numbers change. Do it again and it gets worse. Every pass loses more.
The Tradeoff
Format Compression Quality Size Per Minute
────────────────────────────────────────────────────
WAV None Perfect ~10 MB
FLAC Lossless Perfect ~6 MB
AAC Lossy Great ~1 MB
MP3 Lossy Good ~1 MB
OGG Lossy Good ~1 MB
More compression means smaller files but more is lost. Less compression means bigger files but perfect quality. Every format sits somewhere on that line.
Metadata. The Extra Stuff
Audio files carry more than just sound. Artist name. Album. Track number. Genre. Year. Sometimes lyrics. Sometimes album art.
These are stored as tags inside the file. Usually a few kilobytes. Unless there is album art embedded. Then you might have a megabyte of image data sitting inside your audio file.
The Full Picture
Sound is a wave in the air. To store it you measure that wave thousands of times per second. Each measurement is a number. You write those numbers down in order. That is digital audio.
Everything else is just how often you measure. How precise each measurement is. How many channels you record. And how cleverly you shrink the result.
Next time you press play remember your speakers are just reading a list of numbers and pushing air back and forth to match. Your ears and brain do the rest.a






