Image-Captioning-Keras

It is a photo caption generator which generates text describing the given image .

How it works

First features from the image dataset are extracted using a pretrained model (VGG16) in this case and stored in a file called 'features.pkl'.This is done in the file 'features.py'

2) Then the text is cleaned so that it is easier for the model to learn . The cleaning involves removing all the punctuations ,converting all uppercase letters to lower case letters removing words having length of one letter and then the description of each image is stored in 'descriptions.txt' . This is done in 'text.py'

3) For the model to generate we need a first kick off word and for the senntence to end we need a last kickoff word therefore we add "startseq" in the beginning of the string and "endseq" at the end of string .Each and every word is assigned a number and hence every sentence is converted into a vector. This is done using keras inbuilt Tokenizer . The created tokenizer is then stored in "tokenizer.pkl". This is done in "tok.py"

4)Then sequences are created because we know for the lstm to work we need to divide the sentence into prefix arrays
. For eg The sentence "startseq dog is running through the grass endseq" is divided into

   x1            x2                                                                  y
   photo         startseq                                                           dog
   photo         startseq dog                                                       is
   photo         startseq dog is                                                    running
   photo         startseq dog is running                                            through
   photo         startseq dog is running through                                    the                      
   photo         startseq dog is running through the                                grass
   photo         startseq dog is running through the grass                          endseq

Then it is converted into sentence vector using the previously created tokenizer and finally it is fed into the neural network
The output of the VGG16 is a 4096 vector which is processed by a dense layer of size 256 to give an output of 256 . The language model expects a vector of size 34 and which are fed into Embedding layer which outputs a vector of size 256 which is fed into a decoder and a final Dense layer of size 256 is added with activation function as softmax that makes the final prediction

Requirements

Python 3
Keras(2.2.4)(Gpu-Version with Cuda and CuDnn installed)
Tensorflow(1.9.0)
Numpy
Graphics Card( GE-Force 1050 Ti 4gb)
RAM-16gb

Network Structure

Using the Caption generator

Clone the repository
Change directory to the directory where the file generate_caption.py is located
Download the pretrained model and place in the current working directory.
To generate a caption , enter the following command:
```
 python generate_caption.py /path/to/image/
```

Result

Image

Generated Text-startseq dog is running through the grass endseq

Image

Generated Text-startseq man in red helmet is riding bike endseq

Image

Generated Text-startseq two men are playing soccer on the grass endseq

Image

Generated text - startseq two girls are playing instruments endseq

Trained model can be found at model

References

CS 231n-http://cs231n.stanford.edu/reports/2016/pdfs/362_Report.pdf
Andrej Karpathy Talk-https://cs.stanford.edu/people/karpathy/sfmltalk.pdf
Machine Learning Mastery-https://machinelearningmastery.com/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Image-Captioning-Keras

How it works

Requirements

Network Structure

Using the Caption generator

Result

Image

Generated Text-startseq dog is running through the grass endseq

Image

Generated Text-startseq man in red helmet is riding bike endseq

Image

Generated Text-startseq two men are playing soccer on the grass endseq

Image

Generated text - startseq two girls are playing instruments endseq

References

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Flickr8k_text		Flickr8k_text
README.md		README.md
descriptions.txt		descriptions.txt
features.py		features.py
generate_caption.py		generate_caption.py
text.py		text.py
tok.py		tok.py
tokenizer.pkl		tokenizer.pkl
train.py		train.py

themechanicalcoder/Image-Captioning

Folders and files

Latest commit

History

Repository files navigation

Image-Captioning-Keras

How it works

Requirements

Network Structure

Using the Caption generator

Result

Image

Generated Text-startseq dog is running through the grass endseq

Image

Generated Text-startseq man in red helmet is riding bike endseq

Image

Generated Text-startseq two men are playing soccer on the grass endseq

Image

Generated text - startseq two girls are playing instruments endseq

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages