Proceedings of the 2022 3rd International Conference on Artificial Intelligence and Education (IC-ICAIE 2022)

Video Description Method based on Semantic Information Filtering and Sentence Length Modulation

Authors
Xiangqing Wang1, Xiaodong Cai1, *, Meixin Zhou1, Qingnan Huang1
1School of Information and Communication, Guilin University of Electronic Technology, Guilin, China
*Corresponding author. Email: caixiaodong@guet.edu.cn
Corresponding Author
Xiaodong Cai
Available Online 27 December 2022.
DOI
10.2991/978-94-6463-040-4_68How to use a DOI?
Keywords
Video description; Encoder-decoder; Fusion mechanism; Sentence length modulation; Deep learning
Abstract

In the current video description task, the spatial redundancy information in the video features is usually not effectively eliminated, and the commonly used loss function is composed of the logarithm of the probability of the correct word of the target, and the long sentences formed often bring great losses to the model. If the sentence length generated by the optimization of the log-likelihood loss function is too short, the description semantics will be incomplete and the accuracy will not be high. This paper proposes a video description method based on semantic information filtering and sentence length modulation to solve the above problems. Firstly, the model introduces a gated fusion mechanism, which removes redundant information in the semantic information of video features by screening the semantic features of the video, reduces the interference of redundant information on the generated description, and improves the accuracy of the description. Secondly, a new sentence length modulation loss function is proposed, which modulates the cross-entropy loss function with the label sentence length, which alleviates the tendency of the model to generate short sentences, and makes the semantics of the generated description close to the label, thereby improving the accuracy of the description. The experimental results on the MSVD dataset, which is widely used in this field, show that the method in this paper can significantly improve the accuracy of generating video descriptions, and all indicators are significantly better than existing models.

Copyright
© 2023 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the 2022 3rd International Conference on Artificial Intelligence and Education (IC-ICAIE 2022)
Series
Atlantis Highlights in Computer Sciences
Publication Date
27 December 2022
ISBN
10.2991/978-94-6463-040-4_68
ISSN
2589-4900
DOI
10.2991/978-94-6463-040-4_68How to use a DOI?
Copyright
© 2023 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Xiangqing Wang
AU  - Xiaodong Cai
AU  - Meixin Zhou
AU  - Qingnan Huang
PY  - 2022
DA  - 2022/12/27
TI  - Video Description Method based on Semantic Information Filtering and Sentence Length Modulation
BT  - Proceedings of the 2022 3rd International Conference on Artificial Intelligence and Education (IC-ICAIE 2022)
PB  - Atlantis Press
SP  - 446
EP  - 452
SN  - 2589-4900
UR  - https://doi.org/10.2991/978-94-6463-040-4_68
DO  - 10.2991/978-94-6463-040-4_68
ID  - Wang2022
ER  -