In the previous post, we used the discrete Fourier transform (DFT) to compute the magnitudes and phases in a signal. In this post, we will pitch shift these magnitudes and phases. In the next post, we will construct a new, pitch shifted signal.
We used a simple frequency of 70 Hz. In real life, it is not likely that we will know exactly what frequencies are present in our signal and so we are putting together a methodology that works for all signals.
Examine a DFT bin over the first frame
Most of the magnitude computed with the DFT in the previous post appeared in bin 1. Let's examine that bin.
The computed magnitude is 0.954. It is already divided by the coherent gain of the Hann window.
The phase is -0.369.
With the sampling rate of 2000 Hz and DFT length of 32, bin 1 is at the frequency 62.5 Hz.
We know that the input frequency is 70 Hz, but in a real-life situation we will not know that. The only thing we will know are the three pieces of information above: the frequency 62.5 Hz with magnitude 0.954 and phase -0.369.
Examine the same DFT bin over the second frame
For the first frame, we took the DFT of samples 1 through 32. For the second frame, we will take the DFT of some new 32 samples. In part 1 we discussed overlapping these DFT frames by 75% and so we will move over by 25% or 8 samples. We will take the DFT of samples 9 to 40.
For the same bin frequency of 62.5 Hz (bin 1), we get magnitude of 0.965 and phase of -2.118.
Here is why the phase is important. The phase between the first and second DFT frame changed by -1.749, from -0.369 to -2.118. However, if this was truly the frequency 62.5 Hz, over 8 samples, the phase should have changed by -2 π 8 * 62.5 = -1.571.
We are therefore not working with the frequency 62.5 Hz. We are working with the frequency 62.5 – (-1.749 – (-1.571)) * 2000 / (2 π * 8) = 69.6 Hz.
Because of precision, this is not exactly our input frequency of 70 Hz, but it is close.
In short, when we have more than one frame, we can use the change in the phase to compute the actual frequency that is carried with the DFT. We do not have to rely only on the DFT frequencies.
Pitch shifting the first frame
To construct the pitch shifted signal with the inverse DFT, we need a magnitude and phase for each DFT bin. Let's compute those for the first frame, bin 1.
The first frame is not like other frames. Nothing comes before the first frame. We cannot compute the actual frequency. In bin 1, all we can do is pitch shift 62.5 Hz to 66.2 Hz by multiplying by 1.0594 (one semitone).
We leave the magnitude at 0.954 as computed above.
The phase is tricky. We could set the phase at all bins at 0, as if we were working with a signal that just started. However, we have some information about the phase. We could just use the phase as computed by the DFT: -0.369 in bin 1. This is better, but it ignores that we have multiple bins with new, pitch shifted frequencies.
Let's work as if there is a previous frame. In bin 1, that nonexistent previous frame would have the same frequency (the pitch shifted 66.2 Hz) and a phase of 0. The current frame – the first frame – after 8 samples for 66.2 Hz would have the phase -1.664. This computation recognizes how the different bin frequencies may interact over time.
For the first frame, in bin 1, the pitch shifted signal will have:
- Frequency: 66.2 Hz
- Magnitude: 0.954
- Phase: -1.664.
Pitch shifting the second frame
The new frequency is the pitch shifted actual frequency. 69.6 * 1.0549 = 73.7 Hz.
The magnitude is as computed by the DFT: 0.965.
Let's compute the phase. If the frequency is 73.7 Hz, we should expect the phase to change over 8 samples by 2 π 73.7 8 / 2000 = -1.853. We will add that to the phase of the same bin in the previous frame (-1.664) to calculate the new phase: -3.517.
For the second frame, in bin 1, the pitch shifted signal will have:
- Frequency: 73.7 Hz
- Magnitude: 0.965
- Phase: -3.517.
Pitch shifting the entire signal
We continue these computations for every DFT bin and for every next frame, moving each frame over by 8 samples. We have the magnitudes and phases of the pitch shifted signal at each frame. This is enough to compute the new signal at each frame. The pitch shifted frequencies, although listed above, are not important. The DFT only knows its own bin frequencies.
Some notes
Since we are accumulating phase over frames, we should ensure that the accumulated phase does not overflow. We reduce the accumulated phase over every frame to something between 0 and 2 π by adding or subtracting multiples of 2 π as needed.
The actual frequency is computed from the DFT frequency based on the difference in how we expect the DFT frequency phase will change and how the actual frequency phase changes. In that computation, if we want the actual frequency to be close to the DFT bin frequency, we can (artificially) reduce that change in phase by adding and subtracting multiples of 2 π as well. We place that difference in phase in the interval between -π and π.
What if the pitch shift is significant? With a significant pitch shift, we could use an extra step: placing the pitch shifted magnitudes and phases in a different bin – one that is closer to the pitch shifted actual frequency.
What if we want to preserve formants? Without overcomplicating things, a simple thing to do is to pitch shift only selected frequency intervals and not the whole DFT result. For example, if pitch shifting a vocal, simplistically, we may want to only transpose frequencies up to 1.2 kHz.
Next steps
You may have noticed already where the pitch shift happens. The pitch shift is not about the new, pitch shifted frequencies. The inverse DFT does not know anything about those. The inverse DFT uses its bin frequencies, with recomputed magnitudes and phases.
The pitch shift happens because of the proper computation of the phase. In each frame, we compute a phase that represents a pitch shifted frequency. Over multiple overlapping frames, we piece together a signal from small chunks. Each of these chunks moves with a phase that is appropriate for the pitch shifted frequency. We will see this in the next post.
authors: mic
Add new comment