Conference Paper/Proceeding/Abstract 11 views
Spoken Document Retrieval for an Unwritten Language: A Case Study on Gormati
Sanjay Booshanam,
Kelly Chen,
Ondrej Klejch,
Thomas Reitmaier
,
Dani Kalarikalayil Raju,
Electra Wallington,
Nina Markl,
Jen Pearson
,
Matt Jones
,
Simon Robinson
,
Peter Bell
Findings of the Association for Computational Linguistics: EMNLP 2025, Pages: 22497 - 22509
Swansea University Authors:
Thomas Reitmaier , Jen Pearson
, Matt Jones
, Simon Robinson
Abstract
Speakers of unwritten languages have the potential to benefit from speech-based automatic information retrieval systems. This paper proposes a speech embedding technique that facilitates such a system that we can be used in a zero-shot manner on the target language. After conducting development expe...
| Published in: | Findings of the Association for Computational Linguistics: EMNLP 2025 |
|---|---|
| ISBN: | 979-8-89176-335-7 |
| Published: |
Suzhou, China
Association for Computational Linguistics
2025
|
| Online Access: |
https://aclanthology.org/2025.findings-emnlp.1224/ |
| URI: | https://cronfa.swan.ac.uk/Record/cronfa70213 |
| Abstract: |
Speakers of unwritten languages have the potential to benefit from speech-based automatic information retrieval systems. This paper proposes a speech embedding technique that facilitates such a system that we can be used in a zero-shot manner on the target language. After conducting development experiments on several written Indic languages, we evaluate our method on a corpus of Gormati – an unwritten language – that was previously collected in partnership with an agrarian Banjara community in Maharashtra State, India, specifically for the purposes of information retrieval. Our system achieves a Top 5 retrieval rate of 87.9% on this data, giving the hope that it may be useable by unwritten language speakers worldwide. |
|---|---|
| College: |
Faculty of Science and Engineering |
| Start Page: |
22497 |
| End Page: |
22509 |

