lecture: Free Software Virtual Singer

Dehumanize the Human Voice


Virtual Singers such as Hatsune Miku are very popular in Japan.
Before that there was also an MBROLA-based Virtual Singer called Melissa and
a multilingual formant synthesizer called Virtual Singer by the French company Myriad.
All those Virtual Singers are based on proprietary software, until recently there was no free replacement.
In 2011 I discovered that the Music Technology Group had released their Spectral Modeling Synthesis library under the GNU GPL,
and I decided to use that one in my own UTAU compatible resampler.

YAMAHA uses Spectral Model Synthesis in their VOCALOID software which Miku is based on,
MBROLA uses nearly the same voice sampling technique that is used in VOCALOID: Diphone Concatenation using a Harmonic/Stochastic model.
Both programs rely on prerecorded voice samples from a trained speaker or singer, that have been labeled using signal visualisation tools.
Formant synthesizers such as eSpeak do not require recording the human voice, but they support much more languages with a less natural sound.
Like MBROLA, eSpeak is designed only for speech, but there is a singing synthesis frontend called eCantorix.
eCantorix can also import score files from VOCALOID and UTAU, but normally you just input a MIDI file created using Rosegarden or any other sequencer.
eSpeak can be also used as a frontend for MBROLA, which produces a more natural sound, it can also be used to transcribe speech into different formats of phonetic symbolic text.
MBROLA also allows specifying up to much more than 100 pitch points per phoneme, which is useful for singing or expressive speech synthesis.
After reading various papers published by the authors of MBROLA, I concluded that it would be possible for me to write a free software replacement for MBROLA.

In 2014 when I wrote my Master's thesis at the Blindenzentrum of the Mittelhessen University of Applied Sciences, I evaluated different algorithms for spectral voice processing and voice recording methods.
Before I started working at the Blindenzentrum, I was asked by a visually impaired
VOCALOID user who wanted to use my resampler. At that time there was only a graphical frontend, it was not yet possible to use my resampler from the command line.
I had found that some GPLed resamplers for UTAU use Masanori Morise's WORLD speech synthesizer, so based my work on the same toolkit.
Recently a new version of WORLD was released, which uses more advanced agorithms that work by using aperiodicity instead of pulse phase.
Because I am still interested in music technology and accessibility, I continue to work on my MBROLA/VOCALOID replacement, which is now called MBROLOID.