Manding Language Tech Resources and Initiatives

 

This is a running list of tech resources and initatives related to Manding language varieties like Bambara, Jula, Maninka and Mandinka.

It is NOT a list of resources for learning Manding varieties (or at least, they aren’t specifically meant for learning). For that, please see the General Resources page.

I made it as a simple way to keep track of all the projects, resources and initiatives that I keep coming across.

Corpora

Apps/Programs

  • Bambara Reference Corpus (Fr. Corpus Bambara de Référence) — A web-based text analysis or search engine app that lets you search across a huge collection of texts written in Latin-based Bambara. See this page of mine for some tips in English about how to use it.

  • Maninka Reference Corpus (Fr. Corpus Maninka de Référence) — A web-based text analysis and/or search engine app that lets you search across collections of Maninka texts written in both Latin-based orthography and N’ko-based orthography

  • N’ko Corpus — Another searchable corpora app. This one is N’ko only and distinct from the Maninka Reference Corpus listed above.

Machine Translation

Apps/Programs

  • GoogleTranslate — Machine translation software service that can handle Latin-based Bambara. See this page for a write-up with links and testing and review video that I did. In June 2024, they added N’ko (which is Manding [typically closest to Maninka] written in the N’ko script) and Jula (which they call “Dyula”). I have yet to test these.

  • Bambara MT — A web-based software app that is designed to “Translate text to Bambara and convert it to speech with options to enhance audio quality.” Uses a bunch of machine learning models and datasets, which are listed on Aboubacar Ouattara’s HuggingFace profile (but also elsewhere here on this page).

Text-to-Speech (TTS)

Data Sets

  • bambara-tts — A dataset of scraped Bambara language texts such the dialogs from José Morales’s J’apprends le bambara and Charles Bailleul’s collection of Bambara proverbs.

Automatic Speech Recognition (ASR)

Apps/Programs

  • Bambara ASR — A web-based demo app to which you can upload Bambara language audio files and get a rough transcription.

  • Kouma Bi Boro [sic, Kuma b’i boro] — Android-based app for recognizing Jula as spoken in Côte d’Ivoire.

  • Open Moise — A software initiative based out of Abidjan that is focused on creating smartphones that can recognize Jula and other local languages of Côte d’Ivoire. See this video profile of mine of one of the developers behind the project for more information.

Data Sets

  • Jeli-ASR — A dataset of 30 hours of Bambara language storytelling that is recorded, transcribed and translated. Plus an ASR model. See this write-up for my assessment of the dataset/corpus.

  • bambara-asr — A dataset. Appears to use the data from Jeli-ASR (and perhaps other sources).

  • nicolingua — Includes Maninka language data from Guinea.

Articles

  • Deploying a Speech Recognition Model for Under-Resourced Languages: A Case Study on Dioula Wake Words 1, 2, 3, and 4 [LINK]

Misc

Nothing here yet