Common Voice

While the name sounds like something from the era of Modern and work unions, this project of Mozilla is really something I’d imagined when thinking about languages and computers. Basically this is an ever-growing set of CC0-licensed texts and recordings of the human voices in many of the earth’s languages — also targeting to cover the existing middle-aged male (english) speech recognition bias with a more balanced approach in algorithm.

I don't expect to benefit from the dataset and deep-speech algorithm per se since I’m a caveman, but I sure appreciate having a voice recognition software that is simultaneously free, good and multi-lingual! And as a young linguist I hope this data can be later perfected and re-published into language corpora and integrated with tatoeba, jisho.org and the like. Speaking of Japanese — it’s still missing! I hope during my study there I’ll be able to promote the project.

I hope I will be covering this project more in the future, but I strongly encourage you to participate, especially if your mother tongue is different from english and if you have any peculiarities in your speech or you speak a second language on an average level or better. Don’t be shy of your accent or voice — it should be recorded and heard so you will have one more way to control the machines and interact! Many may disregard voice recognition as something not for you — but nonetheless it should be recognized as a valuable accessibility tool.

I think voice technologies of today are lagging behind their possibilities — as the human speech should be intelligible for machines, so in due time should the voice of machine become more beautiful and simple to sound. (After all, Vocaloids may give many a sense of direction)

And some comedy: with russian Common Voice, most of the sentences I’ve checked were taken from UN sessions on nuclear weapons and now-obscure soviet children’s books. That’s both absurd and fun — while I'm somewhat displeased with examples of my language being entangled in politics, it still gives a funny feeling of association with the speakers — so it feels just like some young adult fiction on world peace and I got the feeling that many of contributors share that feelings too, by the sound of them! I hope your languages also have something unusual or reoccurring in its corpora.

Feedback is welcome.