The defining characteristic of musical notes is their pitch. Pitch is how we arrange notes on a scale, the most widely used of which is the chromatic scale found in most western instruments, such as the piano. The chromatic scale is logarithmic and divides pitches evenly into octaves of twelve notes. The reference note is typically A above middle C, or A4 in scientific pitch notation, with the 4 signifying the fourth octave. On an 88-key piano, A4 is the A above the fourth C from the left and has a target pitch of 440Hz). The pitch of each note above it grows exponentially by a factor of $2^{1/12}$ such that A of the next octave is twice the pitch (880Hz) of A above middle C.
The following table shows derived pitch of each note on an 88-key piano using this standard:
import pandas as pd
names = 'A A# B C C# D D# E F F# G G#'.split()
pitches = pd.DataFrame()
for n in range(88):
pitches.loc[(12 + n - 3) // 12, names[n % 12]] = 27.5 * 2**(n / 12)
pd.set_option('display.precision', 1)
pitches.index.name = 'octave'
pitches[names[3:] + names[:3]].fillna('')
C | C# | D | D# | E | F | F# | G | G# | A | A# | B | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
octave | ||||||||||||
0 | 27.5 | 29.1 | 30.9 | |||||||||
1 | 32.7 | 34.6 | 36.7 | 38.9 | 41.2 | 43.7 | 46.2 | 49.0 | 51.9 | 55.0 | 58.3 | 61.7 |
2 | 65.4 | 69.3 | 73.4 | 77.8 | 82.4 | 87.3 | 92.5 | 98.0 | 103.8 | 110.0 | 116.5 | 123.5 |
3 | 130.8 | 138.6 | 146.8 | 155.6 | 164.8 | 174.6 | 185.0 | 196.0 | 207.7 | 220.0 | 233.1 | 246.9 |
4 | 261.6 | 277.2 | 293.7 | 311.1 | 329.6 | 349.2 | 370.0 | 392.0 | 415.3 | 440.0 | 466.2 | 493.9 |
5 | 523.3 | 554.4 | 587.3 | 622.3 | 659.3 | 698.5 | 740.0 | 784.0 | 830.6 | 880.0 | 932.3 | 987.8 |
6 | 1046.5 | 1108.7 | 1174.7 | 1244.5 | 1318.5 | 1396.9 | 1480.0 | 1568.0 | 1661.2 | 1760.0 | 1864.7 | 1975.5 |
7 | 2093.0 | 2217.5 | 2349.3 | 2489.0 | 2637.0 | 2793.8 | 2960.0 | 3136.0 | 3322.4 | 3520.0 | 3729.3 | 3951.1 |
8 | 4186.0 |
For string instruments, like the piano, the pitch can be adjusted by changing the tension of the strings. Over time, the strings tend to lose tension, which results in a lower pitch, making the notes sound "flat," necessitating tuning.
So, how do we know when each string is tuned to the correct pitch? Some people are gifted with perfect absolute pitch, but that's a rare gift, and while they can identify notes with 98% accuracy, it is not clear how good they are at identifying how out-of-tune a note is. What complicates matters with instruments like the piano is that except for the bass notes, most notes consist of two or three strings, each of which must vibrate with the same frequency when struck by its hammer. Even minor differences can result in harsh tones.
import requests
import tempfile
# Download recording for A above middle C on the piano from the
# University of Iowa (3.7 MB):
rec_file = tempfile.NamedTemporaryFile()
with requests.get('http://theremin.music.uiowa.edu/sound%20files/'
'MIS/Piano_Other/piano/Piano.pp.A4.aiff', stream=True) as r:
if r.ok:
for chunk in r.iter_content(chunk_size=None):
rec_file.write(chunk)
The stereo (two channels) recording was made with a sampling rate of 44,100 Hz over 22 seconds.
import aifc
import pyaudio
import numpy as np
# Read in the recording and convert to a numpy array
rec_a440 = aifc.open(rec_file.name)
data = rec_a440.readframes(rec_a440.getnframes())
data_np = np.frombuffer(data, dtype='<i2').reshape(-1, rec_a440.getnchannels())
length = data_np.shape[0] / rec_a440.getframerate()
print(f'Sampling rate: {rec_a440.getframerate()}, length: {length:.02f} sec')
def play_numpy_array(arr, frame_rate, format=pyaudio.paInt16, bufsize=1024):
stream = pyaudio.PyAudio().open(format=format, channels=arr.shape[-1],
rate=frame_rate, frames_per_buffer=bufsize,
input=False, output=True)
for i in range(0, len(arr), bufsize):
stream.write(data_np[i:i+bufsize, :].astype(np.int16).tobytes())
play_numpy_array(data_np, rec_a440.getframerate())
Sampling rate: 44100, length: 22.47 sec
You can hear the note being struck hard followed by a very long fade, with it being barely audible for more than half the recording. Let's chop off the first part and most of the fade to get a clear tone, two seconds long:
snippet = data_np[44100:132300]
play_numpy_array(snippet, 44100)
And now let's analyze it.
from IPython.core.display import HTML, display
def set_css_in_cell_output():
display(HTML('''<style>
.output_image> img { height: auto; }
.jupyter-widgets > * { color: white !important; }
pre { white-space: pre-wrap; }
</style>'''))
get_ipython().events.register('pre_run_cell', set_css_in_cell_output)
%config InlineBackend.rc = { \
'figure.figsize': (10, 3), 'figure.dpi': 140 \
}
%config InlineBackend.figure_format='retina'
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('dark_background')
plt.plot(np.arange(len(snippet)) / 44100, snippet, lw=.8)
plt.xlabel('Time (seconds)')
_ = plt.legend(['left', 'right'])
If we perform a spectral analysis on both channels serially, we can see that the dominant frequency in the signal is right around 440 Hz (indicated by the dashed line):
from scipy import signal
plt.loglog(*signal.welch(snippet.flatten('F'), scaling='spectrum',
fs=rec_a440.getframerate(), nperseg=2048))
plt.xlabel('Frequency [Hz]')
plt.ylabel('Linear spectrum [V**2]')
plt.title(f'Power spectrum')
_ = plt.axvline(x=440, linestyle=':')
But besides the fundamental frequency, there is almost as strong a spike at 880 Hz and a few weaker spikes several multiples after that corresponding to integer multiples of the fundamental frequency. These are known as harmonics and are caused by the various modes) in a vibrating string fixed at both ends. The relative strength of each of these harmonics is what give the instrument its timbre. Without the harmonics, we'd be left with an obnoxious, synthesized sounding sound wave. The harmonics do make it more difficult, however, to compute how off-tune the note is. Enter Finite Impulse Response (FIR) filters.
FIR filters can dampen all frequencies outside a narrow band around the fundamental frequency we are tuning for, allowing us to inspect a nice, clean wave form.
Take the following FIR filter:
from scipy.signal import firwin, freqz
def note_filter(freq, sample_freq=44100, note_width=3, size=250):
scale = 2**(note_width/12)
passband = (freq / scale, freq * scale)
return firwin(size, passband, fs=sample_freq, pass_zero=False, scale=False)
fir = note_filter(440, 44100)
plt.plot(fir)
plt.show()
It doesn't look like much, but if we convolve that window with the original signal, the expected frequency response is:
def show_freq_response(data, fs, npoints=None):
w, h = freqz(data, 1, fs=fs)
npoints = len(w) if npoints is None else npoints
plt.plot(w[:npoints], 20 * np.log10(abs(h[:npoints])))
plt.xlabel('Frequency [Hz]')
plt.ylabel('Amplitude [dB]')
plt.show()
show_freq_response(fir, 44100)
Basically, all frequencies other than a few notes around the target frequency are seriously dampened. Let's confirm by convolving the recording with that filter.
filtered = np.convolve(snippet.flatten('F'), fir)
plt.loglog(*signal.welch(filtered, scaling='spectrum',
fs=rec_a440.getframerate(), nperseg=2048))
plt.xlabel('Frequency [Hz]')
plt.ylabel('Linear spectrum [V**2]')
plt.title(f'Power spectrum')
_ = plt.axvline(x=440, linestyle=':')
Indeed, all the harmonics are now seriously squished. Let's look at the filtered waveform:
_ = plt.plot(filtered[1000:2000])
A beautiful, clean sine wave whose frequency can easily be measured by computing the distance between peaks. Compare this waveform with the original:
_ = plt.plot(snippet[1000:2000, 1])
Let's estimate the frequency from the filtered signal.
diffs = np.diff(np.where(np.diff(np.signbit(filtered))))
uniq, counts = np.unique(diffs, return_counts=True)
44100 / np.mean(diffs) / 2
440.51816375401233
Almost perfect, just slightly over-tightened but still 0.1% from the target.
The above method of pitch estimation is the mechanism the intuno
project uses.