Helley, M.; Khumalo, L.; Steyn, J. & van Zaanen, M.
Helley, M.; Khumalo, L.; Steyn, J. & van Zaanen, M. 2022. Training of Digital Language Resources Skills in South Africa. In Darja Fišer and Andreas Witt (Eds). CLARIN: The Infrastructure for Language Resources. Deutschland: De Gruyter.
Publication year: 2022

South Africa recognizes eleven official languages, although more languages are spoken in the country. Most of these languages are considered under-resourced: there is only a limited set of computational resources available. This includes linguistic data collections as well as computational linguistic tools. This scarcity of resources limits the computational linguistic and more applied (e.g., digital humanities) work on these languages. However, in South Africa there is currently also a lack of people who know how to use these resources.The South African Centre for Digital Language Resources (SADiLaR) is a government-funded research infrastructure that aims to tackle both problems. First, it runs a digitization programme, which develops new digital language resources. This programme digitizes analogue linguistic data collections, but also develops new computational linguistic tools. Second, a digital humanities programme aims to build research capacity in the field of digital humanities. This is done through training events, among other initiatives, which have recently been clustered in the SADiLaR-run “Escalator project”. Escalator aims to develop a community of practice in the field of digital humanities. By taking a comprehensive approach to training events with follow-ups, combined with the development of a Champions Initiative programme consisting of the training of experts, Escalator aims to make it easier for researchers to transition into more computational types of research in the humanities and social sciences.This chapter will provide a historical overview of the field of natural language processing and digital humanities in South Africa. In particular, it will focus on the development of computational linguistic resources and their application. Additionally, an overview of activities in this area performed by SADiLaR will be provided, illustrating information sharing with language communities as well as researchers.

Keywords: linguistic resources, South Africa, digital humanities, training, digital championship programme

Leave a Reply

Your email address will not be published. Required fields are marked *