im2txt: Beam search just predicts one word, repeated over and over again after fine-tuning #7204

coliinkc · 2019-07-12T15:23:21Z

I am fine-tuning the im2txt model with my own small data set (2000 images) using a pretrained checkpoint (5M steps on COCO data).
The loss is currently around 0.2 after ~8800 steps and the model is still running.
I have noticed that the captions for my data, generated with the inference function, are getting worse. The same word is repeated over and over again

INFO:tensorflow:Successfully loaded checkpoint: model.ckpt-5008890
Captions for image disc_111097.jpg:
  0) stopped stopped stopped stopped stopped at a red light . (p=0.022266)
  1) stopped stopped stopped stopped at a red light . (p=0.019730)
  2) stopped stopped stopped stopped at a red light (p=0.010466)

The further I continue, the worse it gets. The output is nearly the same for every image including images from the COCO data set.
The only explanation I can think of it that it has something to do with the word counts. Since my data set is small, I merged the existing word counts file for the COCO data with a new word counts file for my data in the following way:

words that are only in the COCO word counts get added to new word_counts
words that are only in my data (min_word_count=1) get added to new word_counts
words that are in both word counts files get added to new word_counts with added counts

What is the reason, that beam search just predicts one word, repeated over and over again?

Additional info: The same problem occurs when I use my small data set for initial training, only that the extreme case of one repeated word already appears with the first checkpoint file after 317 steps.

Captions for image disc_111097.jpg:
  0) mountains mountains mountains mountains mountains mountains mountains mountains mountains mountains mountains mountains mountains mountains mountains mountains mountains mountains (p=1.000000)

No matter, how long I continue the initial training, the weights don't get updated anymore and the output stays exactly the same. All checkpoint files in initial training look exactly the same.

System information

What is the top-level directory of the model you are using: im2txt (https://github.com/tensorflow/models/tree/master/research/im2txt)
Have I written custom code (as opposed to using a stock example script provided in TensorFlow): no, I am just using the provided code with a pre-trained model
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04.2
TensorFlow installed from (source or binary): Tensorflow with Python2 using Anaconda
TensorFlow version (use command below): 1.13.1
Bazel version (if compiling from source): 0.22.0
CUDA/cuDNN version: 6.0.21
GPU model and memory: NVIDIA Corporation Device 119e, 256M
Exact command to reproduce:
bazel-bin/im2txt/run_inference --checkpoint_path=${CHECKPOINT_PATH} --vocab_file=${VOCAB_FILE} --input_files=${IMAGE_FILE}

The text was updated successfully, but these errors were encountered:

balansky · 2019-07-19T17:54:24Z

I believe there is a bug on beam search.

Here is how they get most likely words:
most_likely_words = np.argsort(word_probabilities)[:-self.beam_size][::-1]
But it is wrong. It actually get worst words instead.
I believe the correct way should be:
most_likely_words = np.argsort(word_probabilities)[-self.beam_size:][::-1]

gowthamkpr added the models:research models that come under research directory label Sep 10, 2019

idmontie mentioned this issue May 8, 2020

Pretrained model for img2txt? #466

Closed

gbaned assigned jaeyounkim Jun 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

im2txt: Beam search just predicts one word, repeated over and over again after fine-tuning #7204

im2txt: Beam search just predicts one word, repeated over and over again after fine-tuning #7204

coliinkc commented Jul 12, 2019 •

edited

Loading

balansky commented Jul 19, 2019

im2txt: Beam search just predicts one word, repeated over and over again after fine-tuning #7204

im2txt: Beam search just predicts one word, repeated over and over again after fine-tuning #7204

Comments

coliinkc commented Jul 12, 2019 • edited Loading

System information

balansky commented Jul 19, 2019

coliinkc commented Jul 12, 2019 •

edited

Loading