| Abstract: |
Numbers are essential part of communication, yet machine translation models often
struggle with its accurate translation, especially in low-resource languages. Yorùbá
language, spoken by over 40 million people, has no published work that addressed
the computational translation of Yorùbá numerals into English or Arabic digits. This
paper presents a Transformer-based model for the conversion of Yorùbá cardinal
numerals to their English equivalents. The research utilised the fine-tuned Flan-T5
'small' model for its computational efficiency and adaptability across text-to-text
tasks. A dataset of 50,000 Yorùbá numerals was systematically generated using a
rule-based algorithm and partitioned into 80% training and 20% testing sets. The
model demonstrated remarkable computational efficiency, performing inference on 30 samples per second. Performance was evaluated using multiple metrics:
accuracy, Character Error Rate (CER), Word Error Rate (WER), and Bilingual
Evaluation Understudy (BLEU) score. The fine-tuned model achieved exceptional
results: 99.96% accuracy, 0.000068 CER, 0.00010 WER, and 0.9999 BLEU. The
near-perfect accuracy confirms the model's ability to correctly translate the vast
majority of Yorùbá numerals. The extremely low CER reflects precise characterlevel generation, while the minimal WER indicates outstanding performance in
predicting complete numeral words, essential for accurate translation. The BLEU
score approaching 100% demonstrates that model outputs are nearly identical to
reference translations, further validating translation fidelity. This work constitutes
the first computational model for Yorùbá-to-English numeral translation, achieving
state-of-the-art performance. The model is readily applicable to downstream NLP
tasks, particularly text normalisation in text-to-speech systems, thereby
contributing to language technology development for Yorùbá and serving as a
template for similar low-resource languages.
|