I remember the days of my childhood, when hollow computer and robot voices belonged to a literary genre, Science-Fiction, and spacecraft interiors. Nowadays, talking to computers has become part of everyday life and our living rooms. But do computers understand Yorùbá? Or why don’t they understand African languages at all?
Kọ́lá Túbọ̀sún is a linguist, poet and journalist living in Lagos, Nigeria. He regularly publishes about Yorùbá culture, Nigerian literature and language. He is running several websites, like the database on Yorùbá names or his personal blog. Currently Kọ́lá Túbọ̀sún is developing a revolutionary “Text-To-Speech” (TTS) application for his mother tongue. We had a talk about the chances this new application offers for the Yorùbá speaking people at home in Nigeria and Benin and the diaspora abroad. Get to know the linguist’s work from behind the scenes!
Kọ́lá, what does “Text-to-Speech” mean exactly?
The technical name is “speech synthesis”. It’s a way that computers have been trained to convert text input into simulated speech. You may have seen it with Siri, Amazon Echo, Microsoft Cortana or Google Home.
You are working on the first “Text-to-Speech“ application for Yorùbá language. It is incredible that this has not been developed yet, abi? Aren’t you around 30 millions of people? Is it a failed language politics phenomenon, that African languages have not yet been prepared to be used by machines?
It’s possibly not the first that has been researched or developed, but it is the first I have seen made available to use online and for free. I wrote a quasi-angry essay about this a while ago, that Siri exists in Danish and Swedish and Norwegian, three languages that, combined, amount to like 15 million people. Yorùbá is 30 million people and no one had deemed it fit to create something of this nature.
It is not just about Yorùbá, of course. African languages have just generally not been considered relevant or profitable enough to carry along with the progress of technology. It began from something as benign as unicode not having the right characters to write Yorùbá properly (and countless other African languages that use special characters), or Microsoft Word not recognizing African names as being real. You must have encountered those red wriggly lines placed under a name that the computer doesn’t recognize, even when you’re using the computer in a country where that language is spoken. In essence, the damage is not new.
But my focus and the focus of my team has been how to engage the African technological, linguistic, and literary establishment to prioritize the efforts to increase the viability of African languages. I’m not alone. I am familiar with the effort of organizations like the ALT-I in Ìbàdàn who share similar goals. But the efforts have been far in-between because of lack of resources or a critical mass of interested engineers and linguists. My hope has always been to lead that march where possible, through advocacy and direct action, but also to challenge the ecosystem to pay more attention to these often neglected dimensions of our underdevelopment. You can almost bet today that any new Artificial Intelligence or speech technology product that is launched by a big corporation doesn’t have an African language in it. That is not encouraging. Neither is it that we have to wait for them before anything can change.
But why is this even the case? Why is a continent of about one point two billion people not worth catering to? How many people on that continent speak English as a native language? Google says just 6.5m. In every African country, the majority actually speak a different language; and English is a language of the school-educated. This means that artificial intelligence and technological applications that are created to cater to European languages do, by default, exclude countless number of people. This can’t be sustainable unless, of course, we mean that this future is being designed exclusively to exclude.
Which new opportunities would a Yorùbá TTS app offer? Where could it be used?
The more I think and talk about the prospects, the more I realize that there are ways more than I can list. But I’ll usually use my grandfather as an example, who is about 90 years old now. He can read and write in Yorùbá but can’t read or write a word in English. He has a mobile phone, which he can’t use to do anything other than receive and make phone calls, usually by guessing around the functionalities. If he could dictate text messages using Yorùbá, and have his texts read out to him in Yorùbá, then he would have been brought to buy into the value of technology. There are millions more like him around the continent who are being left behind because we assume that English should be good enough. It isn’t. It's the same problem for ATM machines. We can't find an ATM machine in Nigeria today that one can use in a Nigerian language. This means that millions of people who would otherwise put their money in banks would not do so because they can’t trust the machine to give them their money when they need it. They are excluded. That is a big shame, for a country of over 400 languages. Other uses I can think of from the top of my head include applications for disabled people, apps that can read text messages aloud to blind people, for instance, or a type of Siri that can comprehend Yorùbá or Edo or Ibani or Igbo instructions, via speech. This is where the world is going anyway. Why leave ourselves behind?
You also offer a free Yorùbá keyboard to download on your blog and run the website www.yorubaname.com, where we can look up the meaning behind traditional names. You have included Lukumí names in this list and recently travelled to Brazil. What do you have in mind, when you think about the Yorùbá diaspora and the TTS?
This first stage in the TTS project is going to be more useful for people who already know how to tone-mark Yorùbá words, or those just willing to learn it by trial and error. The next update will cater to those who know Yorùbá words but can’t write it properly with tonemarks. I'm talking about automatic diacritic insertion, which we have already started work on. We need to allow every kind of user to use the application, by, for instance having suggestions provided whenever a text is put in, disambiguating the text input. “Did you mean ajá or àjà?” “Did you mean owó or òwò or ọ̀wọ̀, or ọwọ̀?” Right now, you have to be absolutely correct in your input or the output will be disastrous.
As per the Yorùbá diaspora, We appreciate them as a very crucial key to the survival of Yorùbá (language and religion) in this century. They have shown us what happens when passion meets interest and need. So our work hopes to be able carry them along with whatever we create. We now have the YorubaName site translated to Yorùbá. We will launch the Portuguese, Spanish, and French versions in 2018. We hope to have every of our projects translatable into these diaspora languages so as to reach even more people.
You have launched the website http://www.ttsyoruba.com where everyone can enter texts and listen to them in Yorùbá language or share it on Social Media. As we know, Yorùbá is a tonal language. Are there difficulties for a tone language to be interpreted by a computer?
What was difficult was writing out the rules for the application, which ran into about 100 pages and over 3000 lines of linguistic rules and examples. Then the developer took it from there. But as you know, as soon as you give the computer the rules, it’s easier from there. Our next challenge is automatic speech recognition, which should be equally interesting. It’s one thing to train the computer to interpret text into speech. It’s another to have it understand spoken word - in this case Yorùbá with all its tone patterns - into text. When we solve that, we can have our own Siri (or Àdúkẹ́, which I think is a finer name).
It does not sound like fluid speech. Why? Will this be improved?
No, it’s not fluid speech. This is because the work application created with concatenation of syllables. A future update should operate on word-level or phrase-level, which is how normal humans speak. This is just the beginning.
Can you tell us about the process in developing the TTS? When you were talking about your microphone it was like someone talking about his Lamborghini! Are there a lot of technical equipment or programs necessary?
Lol. When we make the codes open source, you will see. I’m working on a paper about the process already. But the method we deployed here was quite rudimentary, very manual. It involved putting all the possible syllables in Yorùbá, with all their possible phonetic and phonemic combinations, into a pool, and then writing hundreds of phonological rules for combining them. It’s that simple. There are more modern and more complex ways to achieve a more natural-sounding result, and that’s the focus for the next phase of the project. For instance, other natural language processing tools involve leaving the computer to figure out the rules by itself, and using what the computer has learnt to create new outcomes. It’s not easy to do, for African languages especially, or we’d have seen it done already. But who says things have to be easy in order to be embarked on?
I used a AT2020+ microphone, which we bought in 2015 after we raised money for the YorubaName.com dictionary. It has served us well so far, but it cost a fortune.
What do you record? You cannot record single letters or sounds, I guess, more something like phrases. How does it work?
You record syllable segments. Yorùbá is a syllable-based tone language. So you record all the syllables until you exhaust the list and end up with hundreds of sound segments. Then you proceed to eliminate the sounds that aren't used in Yorùbá speech. And then you write the rules - which is the hardest part, since they sometimes clashed with each other, especially the nasals. I think the fact about Yorùbá being a tone language makes it harder to approach the work in any other way than from the syllable level. But I am open to learning more.
Who are you working with for the programing and technical solutions?
The Python code we relied on was written by Adédayọ̀ Olúòkun, whom I met on LinkedIn. She's currently a PhD student of Natural Language Processing at Universite de Lorraine in France. The site was designed by Hafiz Adewuyi. The head developer at YorubaName, Dadépọ̀ Adérẹ̀mí, helped with finding a solution to our unicode normalising problem, which almost derailed the work at the end. Apparently, diacritics aren't created equally. The efforts of these guys contributed to the success of the application.
How is the project financed?
We raised about $1600 through crowdfunding (Indiegogo) earlier in 2017 for the TTS project. We also have a lot of volunteers on the team who give lots of their free time or for little stipends. In late 2017, a kind couple who have used our work in the past, reached out to us with a commitment. On January 1, 2018, they sent us $8000 to support our work to create a multimedia dictionary of Yorùbá. They have chosen to remain anonymous.It is gestures like this that keep us afloat. We are still looking for more funds, of course, and we’ve got commitments from a few more people. So anyone interested in supporting us can donate through the PayPal button on TTSYoruba.com or just reach out directly.
How can people contact you, if they were interested in your new application?
Sending an email to email@example.com will always get you to us.
Thank you for this interview!
Orisha Image Links:
Read an older interview about Yorùbá culture in general with Kọ́lá Túbọ̀sún
Get the free Yorùbá Melody Audio Course spoken by Kọ́lá Túbọ̀sún (English, Spanish, Portuguese)