Implementation_of_Simple_and_Efficient_P
Implementation_of_Simple_and_Efficient_P
Research Article
Volume-1 | Issue-1| Jan-Jun-2024|
JOURNAL OF
Artificial Intelligence and
Imaging
Double Blind Peer Reviewed Journal
DOI: https://doi.org/10.48001/JoAII
ARTICLE HISTORY: ABSTRACT: Image captioning or picture captioning has become one of the most widely
used technologies in applications that generate and provide captions for specific
Received: 8th Jan, 2024
photographs. All these things are done with the help of deep neural networks. It identifies
Revised: 25th Jan, 2024
the specific objects in an image and their attributes and relationships. The purpose of this
Accepted: 8th Feb, 2024 research is to find different things in a photograph, figure out their relationships, and
Published: 20th Feb, 2024 write captions. The proposed system is implemented on dataset Flickr8k along with
python. The input images are pre-processed and then features from images are extracted
KEYWORDS:
using CNN. To translate the features and objects extracted by CNN to a natural sentence in
Caption, CNN English LSTM is utilized in the implementation. Different types of images are tested with
(Convolutional Neural
the proposed system. The results are presented with the generated image captions. The
Networks), Deep learning,
results presented shows the accuracy of the system. The presented method has potentials
Image captioning, LSTM
(Long Short-Term for such applications where image captioning is essential.
Memory)
1.2 LSTM
1.1 CNN
Figure 3: LSTM.
3.1 Dataset
3.3 Preprocessing
4. RESULTS
Figure 6: Image 2.
Figure 7: Image 3.
Figure 5: Image 1.
Figure 8: Image 4.
Image1 White crane is standing in the water White crane is flying over the water
Men in red shirt and black pants is walking Men in red shirt and black pants is walking down the
Image2
down the snowy hill snowy hill
Image3 Man is snowboarding on the side of mountain Man is snowboarding on the side of mountain
Image4 Man is standing on the rock Man in red shirt is standing on the rock
Image5 Man with red helmet is ridding bike on road Man in red shirt is riding bike on the side of road
Image7 Ship is standing in the water Man in red kayak is walking on the beach
Image9 Man in red shirt is sitting on the bench Man in red shirt is walking on the street
Image10 5 people are standing on grass Man in the red shirt walking on the street
Image11 Brown dog is running through the grass Brown dog is running through the grass
5. CONCLUSION so it cannot predict the words that are out of its vocabulary.
We could try other algorithms and methodologies for
The paper presents the implementation of image caption
increasing the accuracy of generating captions. For the
generator using the CNN-LSTM. This field has an
computer to speak every created caption, we may also
increasing rate for implementing applications for example
include a text-to-speech converter for applications used by
cases in Computer Vision(CV) and NLP domains.
blind people. The outcomes are displayed alongside the
Accuracy of the model is less for generating captions
automatically generated image captions, showcasing the
sometimes it may generate wrong captions or incomplete
system's accuracy. The demonstrated method holds promise
captions. This is due to the small dataset. By using large
for applications where image captioning is a crucial
dataset having 100000 images, we can generate more
requirement.
accurate models. The model depends on the dataset we use,