Skip to content

Oyebamiji-Micheal/English-to-Yoruba-Translation-using-RNN

Repository files navigation

English to Yoruba Translation using RNN

Language ML Tool Framework reposize Topic Model

A beginner's approach to Sequence to Sequence modeling


Table of Contents

Overview

This project is a beginner’s approach to Sequence to Sequence (Seq2Seq) modeling with Recurrent Neural Networks (RNNs), specifically LSTMs. My aim here is to understand the foundation of Seq2Seq modeling and progressively build my understanding of NLP workflow.

Some of my past projects on natural language processing include:

In this project, I implemented a Seq2Seq model using RNN (LSTM). The approach is straightforward, with a basic word-level tokenizer and no advanced tokenization techniques like BPE. The goal is to get familiar with the typical Seq2Seq modeling workflow, rather than focusing on achieving state-of-the-art results.

Dataset

The dataset used for this project was obtained from the Zindi AI4D Yoruba Machine Translation Challenge. It consists of 10,000 Yoruba to English parallel sentence pairs.

Model

In the notebook, I followed a tutorial by Bentrevett, based on the paper Sequence to Sequence Learning with Neural Networks. While the original paper uses a 4-layer architecture, he used a simpler 2-layer setup for both the encoder and decoder, focusing on building a fundamental understanding of how Seq2Seq models work. Training was done for 10 epochs.

Result

Given the limited dataset size and the simple model architecture, I was not expecting so much. However, this project helped me learn how to work with Seq2Seq models: from preparing the data, building the encoder and decoder, to training the model and evaluating it. In the future, I aim to explore more interesting techniques like using pre-trained word embeddings and different tokenization methods.

See you in the next one 🙂

Releases

No releases published

Packages

No packages published