You are here
Dublin startup slashes cost of voice generation technology
VOYSIS, a Dublin-based startup, said that it has shrunk the processing power required to run cutting-edge Wavenet voice generation technology so that it can work on a mobile phone or other consumer device even without a connection to the Internet.
The company, which will begin selling the system to developers and manufacturers from Thursday, said that the advances it has made will make it easier and less expensive to create chatbots and digital assistants with realistic-sounding synthesised human voices.
The market for text-to-speech applications is forecast to grow to more than US$3 billion by 2022, up from US$1.3 billion in 2016, according to data compiled by Ireland-based Research and Markets. Sales of digital assistants, many of which will incorporate computer-generated voices, are expected to hit US$4 billion by the same year, according to Colorado-based market intelligence firm Tractica.
DeepMind, the London-based artificial intelligence (AI) company owned by Alphabet Inc, first developed Wavenets, an AI-based method for creating more human-like computer speech, in 2016. The method has since found its way into the digital assistant and cloud-computing offerings of DeepMind's sister company Google, which uses them to create 30 distinct voices in 14 different languages. Before Wavenets, voice synthesisers worked primarily by combining individual syllables, either voiced by a human actor or generated digitally, to create words.
Wavenets, by contrast, are trained on sound waves instead of syllables. They learn to accurately predict how the shape of the wave will change, making a new prediction as frequently as 24,000 times per second. The result is much more realistic sounding speech, which can include nuances such as accents, lip smacks and verbal ticks.
But, despite the improvements that Google has made to reduce the amount of computing power that this technique uses, it still requires a consistent connection to its data centres, where the company uses specialised computer processors designed specifically for AI applications.
Baidu, the Chinese Internet giant, has also experimented with using cloud-based Wavenets, but has not yet put them into its products because of the amount of processing power that the technique requires.
Peter Cahill, a former academic researcher in computer-generated speech who co-founded Voysis in 2012, said that his company had managed to shrink its system to the point where, once the AI is trained, the software uses as little as 25 megabytes of memory - about the same size as four Apple Music or Spotify songs.
Mr Cahill said that the company intends to publish an academic research paper on its technology, including benchmark tests of its performance against other voice synthesisers, including cloud-based Wavenet systems, within the next six weeks.
Simon King, a professor who specialises in speech technology at the University of Edinburgh but is not affiliated with Voysis, said that the kinds of efficiency improvements that Voysis is announcing could spur more companies to adopt Wavenets for computer speech.
"It's likely to become the de facto approach used in commercial applications very soon," he said in a statement. "It provides more natural-sounding speech than all previous technologies."
In addition to Mr Cahill, Voysis' team includes Ian Hodson, a veteran software engineer who headed Google's text-to-speech efforts from 2010 to 2016.
Google had acquired Phonetic Arts, a speech synthesis company that he helped found. He also sold a previous voice synthesis company, Rhetorical Systems, to Nuance Communications in 2004.
Voysis currently employs 35 people in Dublin, Edinburgh and Boston, and has received US$8 million in venture funding from Boston-based Polaris Venture Partners.
The company sells a suite of voice and natural-language processing services, and said that it has several large US consumer companies as existing customers, but declined to name any, citing confidentiality agreements. BLOOMBERG