In the first post, we used the discrete Fourier transform (DFT) to compute the magnitudes and phases in a signal. In the second post, we computed pitch shifted magnitudes and phases. In this last post, we will construct the new, pitch shifted signal.
The reconstructed signal pieces in bin 1
Using the inverse DFT on the magnitudes and phases computed in the previous post produces a new signal, frame by frame.
Here are the pieces of the signal produced by bin 1 in frames 4, 5, 6, and 7, for samples 48 through 55.
We chose these samples, as they are far enough into the signal to not be impacted much by the first frame, for which, as before, we do not have as much information. These are also the samples where all four frames overlap. We are using 75 percent overlap, and so each frame moves by a quarter of its length from the previous frame, and each sample is covered by exactly four frames. These chosen samples are covered by the chosen frames.
Since this is bin 1, all of these represent the frequency 62.5 Hz. However, note that in each consecutive frame, the signal is "sped up", moving further to the left.
This is how the pitch shift succeeds. The inverse DFT uses its own DFT frequencies. For bin 1, the frequency is always 62.5 Hz. However, constantly recomputing the phase to represent a new frequency, over overlapping frames, produces a combined signal that is the new, pitch shifted frequency.
Of course, the graph above shows only one bin, whereas the signal would be the combination of all bins.
Constructing the signal
Constructing the new, pitch shifted signal is simple. We simply add the signal produced by the inverse DFT, for all frames, properly accounting for the movement of frames.
To smooth out the signal, we apply the Hann window of length 32 over the output of each frame, before combining frames. The Hann window has a nice property. At 75 percent overlap, the weight applied by the four Hann windows over four frames to each output sample is approximately the same (see the discussion of optimal overlap in Amplitude flatness). In this example, it is about 1.93. We divide the combined, windowed output by the weight.
The following picture shows the original signal (dotted line) and the pitch shifted signal (solid line).
Of course, there are not enough frames at the beginning and at the end. This picture was created with only 12 frames.
Where there are four overlapping frames, the pitch shifted signal is as expected: a single wave with a higher frequency than the input wave.
Next steps
There are no next steps. This is it.
But... if you want to stretch or shrink a signal without changes in pitch, follow the same algorithm. Instead of pitch shifting, you could slow down or speed up the output (inverse DFT) frames.
authors: mic
Add new comment