Kōrero Māori - indigenous language revitalisation powered by machine learning
- 👤 Speaker: Keoni Mahelona & Peter-Lucas Jones
- 📅 Date & Time: Tuesday 30 October 2018, 12:00 - 13:00
- 📍 Venue: Department of Engineering - Lecture Room 12
Abstract
Te Reo Irirangi o Te Hiku o Te Ika (Te Hiku Media) is a non-profit organisation whose mission is to preserve and promote te reo Māori, the indigenous language of New Zealand. Over the past 30 years we’ve recorded thousands of hours of the stories of our people, most of whom were native speakers. These stories are rich in culture and traditional knowledge around science, the environment, and traditional Māori medicine. Today, we operate in digital industries creating technology to help document, conserve, and share the language and knowledge in novel ways. Central to the development of technology and the collection of data is the formalisation of our cultural practices into our Kaitiakitanga License (1). The license outlines the way that people are able to access data gathered and acknowledges the value of open source technologies but recognises the impact of colonisation on indigenous peoples’ ability to access those technologies. This discussion will provide insight into the Kōrero Māori (2) project and its progress to date in creating speech to text, text to speech, and pronunciation tools. We demonstrate how innovation in language revitalisation succeeds when an indigenous organization leads the corpus collection and technology development. We collected more than 300 hours of labeled corpus in ten days. This enabled the creation of an automatic speech recognition (ASR) tool for te reo Māori using Mozilla’s DeepSpeech (3) project with a word error rate of 14%. The ASR tool is being used to speed up the transcription of our native speaker archives (4).
(1) https://github.com/tehikumedia/corpora#license-kaitiakitanga (2) https://koreromaori.com (3) https://github.com/mozilla/DeepSpeech (4) https://koreromaori.io
Series This talk is part of the CUED Speech Group Seminars series.
Included in Lists
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- CUED Speech Group Seminars
- Department of Engineering - Lecture Room 12
- Guy Emerson's list
- Information Engineering Division seminar list
- PhD related
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Keoni Mahelona & Peter-Lucas Jones
Tuesday 30 October 2018, 12:00-13:00