Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space

Overview

Tensorflow Implementation of Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space, (Nips) In this implementation included VGG16-LSTM baseline with beam search, Normal prior CVAE, GMM prior CVAE and AG-CVAE.

Usage

Training:

You will need to download image net weights for VGG16 first:https://yadi.sk/d/V6Rfzfei3TdKCH

Specify your mscoco directory in utils/parameters.py and launch:

python main.py --gpu 'your gpu'

It will train Normal CVAE prior model without fine-tuning, the best achieved result with using cluster vectors without fine-tuning is CIDER~0.8. Better results will be possible with some fine-tuning. If you want to train a model with fine-tuning, you can specify --fine_tune parameter.

Note: train/validation split can be changed simply by setting gen_val_captions parameter. Default is set to 4000 so we will have ~120000 in training set.

Note2: You will need to launch preprocess.py script first to obtain images hdf5 file. It is done for speed up image loading during fine-tuning the model.

Parameters

Parameters can be set directly in in utils/parameters.py file. (or specify through command line parameters). For example, if you want to train AG-CVAE model, which use cluster vectors as input to encoder and decoder, you can call:

python main.py --gpu 0 --embed_dim 256 --dec_hid 512 --epochs 50 --temperature 0.6 --gen_name ag --dec_drop 0.7 --dec_lstm_drop 0.7 --lr 0.001 --checkpoint ag_cv_test1 --coco_dir "/home/username/mscoco/coco/" --optimizer Adam --sample_gen greedy --c_v --prior AG

Generation

Two options:

Using main.py

After some training just launch:

python main.py --gpu 'your gpu' --mode inference

If you used fine-tuning will need just to add --fine_tune to the parameters:

python main.py --gpu 'your gpu' --mode inference --fine_tune

It will produce json file ready to use with mscoco evaluation tool

Using separate gen_caption.py script. It doesnt support fine-tuned model for now (will be modified soon). Can be used to generate captions for any images.

For list of required parameters:

python gen_caption.py -h

For example:

python -i gen_caption.py --img_path ./images/COCO_val2014_000000233527.jpg --checkpoint ./checkpoints/gaussian_nocv.ckpt --params_path ./pickles/params_Normal_False_gaussian_nocv_False

Where:

--params_path: saved Parameters class, can be saved by calling main.py --save_params
--checkpoint: saved checkpoint
--img_path: path for image
-i: for launching python in interactive mode so captions can be generated by calling generator.generate_caption(img_path). This can be also used in ipython notebook

Trained CVAE without cluster vectors checkpoint + parameters file can be downloaded at: https://yadi.sk/d/TCyXUmKk3SPVtc

Implementation progress

LSTM baseline (implemented)
CVAE baseline (implemented)
cluster vectors (impemented, vectors for test set generated using tensorflow object detection API and faster-RCNN)
beam search (implemented)
AG-CVAE (partially implemented)
GMM-CVAE (implemented)
Caption generation for new photos (partially implemented, will need to automate cluster vectors generation process)
fine_tune for better result (implemented)

Specific requirements

zhusuan - probabilistic framework https://github.com/thu-ml/zhusuan/ I used 0.3.0 version, it seems not working with 0.3.1 version.
tensorflow >= 1.4.1

Other files

prepare_cluster_vectors_train_val.ipynb - takes MSCOCO dataset json files and generates cluster vectors
prepare_test_vectors.ipynb - gets test set cluster vector file, prepared using tf.models API and generates cluster vector
gen_caption_example.ipynb - generate caption for some photo (without cluster vectors inputs)

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
obj_vectors		obj_vectors
ops		ops
utils		utils
vae_model		vae_model
README.md		README.md
gen_caption.py		gen_caption.py
gen_caption_example.ipynb		gen_caption_example.ipynb
main.py		main.py
main_test.py		main_test.py
prepare_cluster_vectors_train_val.ipynb		prepare_cluster_vectors_train_val.ipynb
prepare_test_vectors.ipynb		prepare_test_vectors.ipynb
preprocess.py		preprocess.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space

Overview

Usage

Parameters

Generation

Implementation progress

Specific requirements

Other files

About

Releases

Packages

Languages

yiyang92/vae_captioning

Folders and files

Latest commit

History

Repository files navigation

Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space

Overview

Usage

Parameters

Generation

Implementation progress

Specific requirements

Other files

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages