Baidu’s new system can learn to imitate every accent
At the start of this year, Chinese search giant Baidu introduced a new system called DeepVoice. It uses deep learning, a popular artificial intelligence technique, to build a system that can convert text-to-speech. The first version was able to produce short sentences that, at least on a cursory listen, were nearly indistinguishable from a real person. That system could learn one voice at a time, and required hours of data to master each one.
DeepVoice 2, which debuted in May, could imitate a voice with just half an hour of data, and a single system could learn hundreds of different accents. Today, Baidu is introducing the third and final version of DeepVoice; the company says this version can learn 2,500 voices with just a half an hour of data each. Baidu says that “having a system that is able to effectively generate a wide variety of voices opens the door to many use cases that would otherwise not be feasible. For example, each character in an audio book or a video game would have his or her own unique voice for a more enhanced user experience.”