Keywords-to-text Generator

Welcome to the GitHub repository for the keywords-to-text generator web app. In this application, still under development, we help users by generating syntactically correct text from keywords. In the current version of the app, users can input 2-5 keywords to generate conversational style text. The app isn't currently well suited to generate longer sentences. This issue will be addressed in the future.

Files and Codes

requirements.txt: List of dependencies for the development of the app. Use pip install -r requirements.txt to install.

scrape_web.ipynb: Contains code that creates the dataset for the app. We download human entered posts on Reddit using the Reddit API. They are then split into sentences. In total, there are 1.84 million sentences.

preprocess_dataset.ipynb: Code that cleans up the text data. It removes any non-ascii characters in the text and prepares the dataset for LLM fine-tuning.

model_dev.ipynb: Extracts keywords from sentences which form the features for the dataset. The sentences form the ground truth. The features and the ground truth are then tokenized using the t5-small tokenizer.

train_text_generator.ipynb: Contains code to fine-tune the T5-small LLM model (60.5 million parameters) using 126,000 short sentences (containing 3 to 7 words). The trained model is pushed to the Hugging Face feature store. We evaluate the model on the test set using the BERT scores. We obtain a median F1 score of 0.89.

app/app.py: Code for the Flask App.

app/requirements.txt: List of dependencies to run the application code.

app/Dockerfile: Dockerfile for containerized deployment.

app/heroku.yml: Creates Docker image on Heroku.

app/templates/index.html: HTML code for website.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Keywords-to-text Generator

Files and Codes

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
app		app
README.md		README.md
model_dev.ipynb		model_dev.ipynb
preprocess_dataset.ipynb		preprocess_dataset.ipynb
requirements.txt		requirements.txt
scrape_web.ipynb		scrape_web.ipynb
train_text_generator.ipynb		train_text_generator.ipynb

burning-river/text_generator

Folders and files

Latest commit

History

Repository files navigation

Keywords-to-text Generator

Files and Codes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages