Introduction

The defining characteristic of musical notes is their pitch. Pitch is how we arrange notes on a scale, the most widely used of which is the chromatic scale found in most western instruments, such as the piano. The chromatic scale is logarithmic and divides pitches evenly into octaves of twelve notes. The reference note is typically A above middle C, or A4 in scientific pitch notation, with the 4 signifying the fourth octave. On an 88-key piano, A4 is the A above the fourth C from the left and has a target pitch of 440Hz). The pitch of each note above it grows exponentially by a factor of $2^{1/12}$ such that A of the next octave is twice the pitch (880Hz) of A above middle C.

The following table shows derived pitch of each note on an 88-key piano using this standard:

For string instruments, like the piano, the pitch can be adjusted by changing the tension of the strings. Over time, the strings tend to lose tension, which results in a lower pitch, making the notes sound "flat," necessitating tuning.

So, how do we know when each string is tuned to the correct pitch? Some people are gifted with perfect absolute pitch, but that's a rare gift, and while they can identify notes with 98% accuracy, it is not clear how good they are at identifying how out-of-tune a note is. What complicates matters with instruments like the piano is that except for the bass notes, most notes consist of two or three strings, each of which must vibrate with the same frequency when struck by its hammer. Even minor differences can result in harsh tones.

Sample recording

Fortunately, we can take the guess work out of it with a little digital signal processing. Let us start with a recording of A4 from the Electronic Music Studios at the University of Iowa.

The stereo (two channels) recording was made with a sampling rate of 44,100 Hz over 22 seconds.

You can hear the note being struck hard followed by a very long fade, with it being barely audible for more than half the recording. Let's chop off the first part and most of the fade to get a clear tone, two seconds long:

And now let's analyze it.

Spectral analysis

If we perform a spectral analysis on both channels serially, we can see that the dominant frequency in the signal is right around 440 Hz (indicated by the dashed line):

But besides the fundamental frequency, there is almost as strong a spike at 880 Hz and a few weaker spikes several multiples after that corresponding to integer multiples of the fundamental frequency. These are known as harmonics and are caused by the various modes) in a vibrating string fixed at both ends. The relative strength of each of these harmonics is what give the instrument its timbre. Without the harmonics, we'd be left with an obnoxious, synthesized sounding sound wave. The harmonics do make it more difficult, however, to compute how off-tune the note is. Enter Finite Impulse Response (FIR) filters.

FIR filters

FIR filters can dampen all frequencies outside a narrow band around the fundamental frequency we are tuning for, allowing us to inspect a nice, clean wave form.

Take the following FIR filter:

It doesn't look like much, but if we convolve that window with the original signal, the expected frequency response is:

Basically, all frequencies other than a few notes around the target frequency are seriously dampened. Let's confirm by convolving the recording with that filter.

Indeed, all the harmonics are now seriously squished. Let's look at the filtered waveform:

A beautiful, clean sine wave whose frequency can easily be measured by computing the distance between peaks. Compare this waveform with the original:

Let's estimate the frequency from the filtered signal.

Almost perfect, just slightly over-tightened but still 0.1% from the target.

The above method of pitch estimation is the mechanism the intuno project uses.