The last few months have been full of activity at Lanfrica, and we are happy to announce that Lanfrica has been officially launched.
What is Lanfrica?
Lanfrica aims to mitigate the difficulty encountered when seeking African language resources by creating a centralized, language-first catalog.
For instance, if you're looking for resources such as linguistic datasets or research papers in a particular African language, Lanfrica will point you to sources on the web with resources in the desired language. If those resources do not exist, we adopt a participatory approach by allowing you to contribute papers or datasets.
At Lanfrica, we employ a language-focused approach. With 2,199 African languages accounted for, our language section boasts of all the African languages—yes, all of them, including the extinct ones! We have created algorithms that can tell, with much effectiveness, the African language(s) involved in a resource, enabling us to curate even works that do not explicitly specify the African languages they worked on (and there are many).
Lanfrica offers enormous potential for better discoverability and representation of African languages on the web. Lanfrica can provide useful statistics on the progress of African languages. As a simple illustration, the language filter section offers an immediate overview of the number of existing natural language processing (NLP) resources for each African language.
From this search result, you can easily see that among South African languages, Afrikaans has 28 NLP resources, while Swati has just eight. Or, to take another example, the Gbe cluster languages of Benin have far fewer NLP resources than some of the South African languages.
Such insight can lead to better allocation of funds and efforts towards bringing the more under-researched languages forward in NLP, thereby fostering the equal progress of African languages.
Lanfrica v1 is just the beginning. We have major updates coming up in the future:
We plan to enable our users to sign up and add to or edit the resources on Lanfrica.
Our current resources currently consist of NLP datasets. Next, we plan to work on publications in computational linguistics and linguistic publications. See the infographic above for all the types of resources planned for inclusion.
We are exploring various techniques to simplify the process through which relevant resources are identified and connected to Lanfrica.
For more updates as we move forward, become part of the Lanfrica community by joining our Slack or following us on Twitter.
This article originally appeared on the Lanfrica blog and is republished with permission.