Encoder-Decoder Based Long Short-Term Memory (LSTM) Model for Video Captioning

Proceedings of the IEEE:1-6 (forthcoming)
  Copy   BIBTEX

Abstract

This work demonstrates the implementation and use of an encoder-decoder model to perform a many-to-many mapping of video data to text captions. The many-to-many mapping occurs via an input temporal sequence of video frames to an output sequence of words to form a caption sentence. Data preprocessing, model construction, and model training are discussed. Caption correctness is evaluated using 2-gram BLEU scores across the different splits of the dataset. Specific examples of output captions were shown to demonstrate model generality over the video temporal dimension. Predicted captions were shown to generalize over video action, even in instances where the video scene changed dramatically. Model architecture changes are discussed to improve sentence grammar and correctness.

Links

PhilArchive

External links

  • This entry has no external links. Add one.
Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Similar books and articles

Deep Learning Based Video Captioning through Encoder-Decoder Based Long Short-Term Memory (LSTM).Grimsby Chelsea - forthcoming - International Journal of Advance Computer Science and Application.
Deep Learning Based Video Captioning through Encoder-Decoder Based Long Short-Term Memory (LSTM).Grimsby Chelsea - forthcoming - International Journal of Advanced Computer Science and Applications:1-6.
The Effects of Dance Movement Therapy in the Treatment of Depression.Xing Zhao - 2023 - European Journal for Philosophy of Religion 15 (4):388-402.

Analytics

Added to PP
2023-10-05

Downloads
258 (#11,209)

6 months
186 (#106,751)

Historical graph of downloads
How can I increase my downloads?

Author's Profile

Tosin Ige
University of Texas at El Paso

Citations of this work

Evaluating the level of Press Freedom in Modern Nigeria.Ajijola Samuel - forthcoming - International Journal of Research and Innovation in Social Sciences.
Comparative Analysis of Deep Learning and Naïve Bayes for Language Processing Task.Olalere Abiodun - forthcoming - International Journal of Research and Innovation in Applied Sciences.

Add more citations

References found in this work

Data Mining in the Context of Legality, Privacy, and Ethics.Amos Okomayin, Tosin Ige & Abosede Kolade - 2023 - International Journal of Research and Innovation in Applied Science 10 (Vll):10-15.
Adversarial Sampling for Fairness Testing in Deep Neural Network.Tosin Ige, William Marfo, Justin Tonkinson, Sikiru Adewale & Bolanle Hafiz Matti - 2023 - International Journal of Advanced Computer Science and Applications 14 (2).

Add more references