Impact Factor
7.883
Call For Paper
Volume: 12 Issue 06 June 2026
LICENSE
Automatic Image Captioning System Using Cnn-lstm: A Deep Learning Approach
-
Author(s):
Sahil Nandkumar Pawar | Rohit Kishor Pawar | Bhushan Prabhakar Zade | Aditya Nagesh Sagar
-
Keywords:
Image Captioning, Convolutional Neural Network, Long Short-Term Memory, VGG16, Flickr8k, BLEU Score, Deep Learning, Natural Language Processing, Encoder-Decoder Architecture, Transfer Learning.
-
Abstract:
Automatic Image Captioning Is A Challenging Task At The Intersection Of Computer Vision And Natural Language Processing That Involves Generating Semantically Meaningful Textual Descriptions From Visual Input. This Paper Presents A Deep Learning-based Image Captioning System That Integrates Convolutional Neural Networks (CNNs) For Visual Feature Extraction With Long Short-Term Memory (LSTM) Networks For Sequential Language Generation. The Proposed Architecture Employs A Pre-trained VGG16 Model As The Visual Encoder To Extract High-level Feature Representations, Which Are Subsequently Fed Into A Word-embedding-enhanced LSTM Decoder To Generate Context-aware, Grammatically Coherent Captions. The System Is Trained And Evaluated On The Flickr8k Dataset Comprising 8,000 Images With Five Human-annotated Captions Each, Supplemented By Custom Real-world Images To Assess Generalization Capability. Experimental Evaluations Using BLEU-1 Through BLEU-4 Metrics Demonstrate Competitive Captioning Performance With BLEU-1 Of 0.587 And BLEU-4 Of 0.142. Beam Search Decoding Further Improves Caption Quality Over Greedy Search. Results Confirm The Effectiveness Of The CNN-LSTM Pipeline For Automated Image Description, With Applications In Accessibility Tools, Content Indexing, And Human-computer Interaction.
Other Details
-
Paper id:
IJSARTV12I5105277
-
Published in:
Volume: 12 Issue: 5 May 2026
-
Publication Date:
2026-05-05
Download Article