
Clunky and robotic TTS runs parallel to early drum machines and sequencers that applied various algorithms to make them more "human", and they can now add breath sounds and stammering to do this. There are interesting possibilities in terms of "effects" that can be added, such as prosody, that can run the range from tone inflection right up to rap!

Robotic voices are always annoying, but robotic music will always have artistic value. (You don't always want to necessarily remove the machine from music.) What is more interesting is the feedback loop of dictated text and the synthesized version of it.

WaveNet: A Generative Model for Raw Audio | DeepMind

Popular Posts