admin: First posted on 2018 01 06
I thought designing a MIDI-to-wave synthesizer would be difficult. I wanted to be able to convert a MIDI file to a wave file without having to play the file and record the playback. As it turns out, a synthesizer can be very simple. I put one together – perhaps not a great one, but a good enough one.
Downloadable Sounds files (DLS)
The Windows "sound mapper" or "wave mapper" appears as a sound device in audio applications and plays MIDI files. It is essentially a virtual synthesizer that relies on a file called "gm.dls". This file contains information that tells the synthesizer how to play various notes on various instruments. On my computer, the file resides in "c:\Windows\SysWOW64\drivers", "c:\Windows\System32\drivers", and a couple of other folders. Roland holds the copyright for the gm.dls file. The file is Roland's 1996 interpretation of what sound data should be used in a MIDI synthesizer and how these data should be used.
The DLS file, in short, is a collection of wave samples, as in a list of Wave file format sound pieces that can be played similarly to any other wave file (with some complications as below) to produce different notes on different instruments. You would think that a synthesizer "synthesizes" the sound, as in "creates it", but, in this case, it does not. Since the DLS file contains actual wave samples, a synthesizer will play one of these samples when that synthesizer is told to play a note.
What is amazing about the gm.dls file is that it is only 3.28 Mb. Think about the fact that the file describes how 235 instruments should play over 80 notes (over 80 notes for almost all instruments, per instrument). I have seen complaints that the samples in this file should be updated and they probably should (the tuba, for example, does not sound like a tuba). But in 1996 Roland managed to fit all the information that was needed to play over 80 notes for 235 instruments, sampled at 16 bits and 22050 samples per second, in only 3.28 Mb. That is impressive.
Things are almost as simple as playing the wave samples as they are, but the synthesizer must make some adjustments to the sample data. For example:
- Samples are really short. To play a longer note, the synthesizer must loop a piece of the sample and play it over and over.
- The same sample can be used to play many notes on the same instrument. This means the sample must be pitch shifted as described below.
- The DLS file can contain "articulation" data specifying how the sample should be played. This could mean things like how quickly the sample should decay, at what level the gain of the sample should be sustained, whether there should be some low frequency oscillation in volume or pitch, and so on.
The fact that samples are short and that each sample can represent several notes helps with keeping the DLS file small.
Handling the DLS file
Documentation for the DLS file format is available from the MIDI Manufacturer's Association and it looks daunting. As I am going through it through and translating it into code for my synthesizer, I am discovering that I do not need to handle all of it. I can ignore some pieces of the DLS file and I can ignore some of the articulation information attached to instruments.
A DLS file, like gm.dls, is a Resource Interchange File Format (RIFF) file (as is a wave file or a MIDI file). Its data is organized in "chunks". The RIFF format allows a software to skip over chunks that the software does not recognize. The synthesizer does not have to handle everything.
- One chunk in the DLS file describes the instruments in the file (e.g., "fingered bass").
- Each instrument contains a list of "regions". For example, one of the fingered bass regions could state that notes 0 to 72 on the fingered bass should be played by using one of the wave samples in the file and that the specific wave sample, if unadjusted, will play the note 38 (these are made up example numbers). MIDI notes go from 0 to 127, where 60 is middle C and each unit up or down is a semitone up or down.
- Wave samples are organized in a wave pool table, which is simply a list of data pieces with the common Wave format. The region specifies where the relevant sample is in the DLS file. The region and the wave sample itself specify additional info, such as whether the sample needs additional tuning and what piece of the sample should be looped for a sustained note.
- The articulation data is a part of the region or the instrument. As before, it specifies things like how quickly the sample should decay, whether there should be some low frequency oscillation in pitch or gain, and so on.
Some chunks of the DLS file can safely be ignored. An example is the DLS ID chunk, which provides a globally unique identifier for pieces of the DLS file.
Finding the right wave sample is step one. The second step is pitch shifting the wave sample as needed. The pitch shifting can be easy. The synthesizer can resample the sound data by going through them faster or slower. Simplistically, suppose that a wave sample is designed to play the note A and we must use it to play the note C three semitones up. Whatever the frequency of A, the frequency of C is 23/12 = 1.1892 times higher and we must play the sample 1.1892 times faster. This means that for every output data sample, we will increment the input data samples with 1.1892. Since we are using non-integer increments, we will interpolate when between input data samples.
Understanding some of the articulation data is step three.
- Some wave samples are designed to be played through without looping pieces of the data (e.g., drums). Other wave samples specify that a piece of the wave sample should be looped to get a sustained note (e.g., the fingered bass).
- Most wave samples specify that they need additional fine tuning and gain adjustment. For example, while a wave sample must be designed to play the note A at 440 Hz, perhaps it produces the note A at 439 Hz and should be adjusted (resampled similarly to the resampling for changing the note) by 0.9977.
- Most instruments or regions contain an amplitude envelope, which states that the notes on this instrument or in this region should have a specific attack, decay, sustain, and release time (and sometimes more; i.e., the amplitude of the sample should be increased initially, then decayed, then held in place, and then brought to zero similarly to the natural decay of instruments).
There is a lot of additional articulation information in the file. I ignored it for now. I suppose some additional articulation information may be important depending on the instrument. For example, a Hammond organ might require the low frequency oscillation in pitch. The MIDI files I played with though, which did not have a Hammond organ, seemed to be good enough after handling the tuning and amplitude envelopes.
This was easy. It amounted to two days of work and was sufficient to successfully convert my MIDI files into wave files: understand selected chunks of the DLS file, resample the wave sample data appropriately, and apply some of the articulation information.
Of course, I only care about converting MIDI files to wave files and not, for example, about producing real time wave playback. So, I was careless about processing speeds. I could have written better code. Still, my synthesizer takes about a second to produce a minute of sampled wave data, so I suppose it could be a real time synthesizer.
I learned some things. For example, the DLS file documentation describes how the MIDI note on velocity should be interpreted – typically in terms of gain, but occasionally as a scale to the note initial attack. That has not been an easy piece of information to find elsewhere.
SoundFont files are similar to DLS files. Free SoundFont files seem easier to find than free DLS files. The SoundFont format seems more complex, but then, since people create SoundFont files today when hard drive space is abundant, they seem to ignore most of the SoundFont format specifications. It might be easier to handle SoundFont files than DLS files.
Similarly, because of the lack of space constraints, current SoundFont files seem to have realistic sound data, which would make the MIDI file sound better.
I suppose I should have started with a SoundFont file. But that is an experiment for another day.