Proceedings of the 2017 International Conference on Information Technology and Intelligent Manufacturing (ITIM 2017)

Video Description Using Learning Multiple Features

Authors
Xin Xu, Chunping Liu, Haibin Liu, Yi Ji, Zhaohui Wang
Corresponding Author
Xin Xu
Available Online August 2017.
DOI
10.2991/itim-17.2017.34How to use a DOI?
Keywords
Video description, SIFT flow, VGG-16, mean pooling, LSTM
Abstract

Generating descriptions for open-domain videos is a major challenge for computer vision due to the complex dynamics. In this paper, we propose a video description model based on multiple features. In the process of encoding, we exploit two complementary features. The spatial one is extracted from the raw frame by VGG-16 model. The temporal one is extracted from the SIFT flow image by a fine-tuned VGG-16 model. In the process of decoding, we further add the mean pooling feature which represents holistic feature of the video. For generating sentence of the video, we utilize two-layer LSTMs model to generate sentence about the video. We evaluate several variants of our model on the MSVD dataset for METEOR metrics. The experimental results show that our model can be beneficial for generating sequence about the video.

Copyright
© 2017, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2017 International Conference on Information Technology and Intelligent Manufacturing (ITIM 2017)
Series
Advances in Intelligent Systems Research
Publication Date
August 2017
ISBN
10.2991/itim-17.2017.34
ISSN
1951-6851
DOI
10.2991/itim-17.2017.34How to use a DOI?
Copyright
© 2017, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Xin Xu
AU  - Chunping Liu
AU  - Haibin Liu
AU  - Yi Ji
AU  - Zhaohui Wang
PY  - 2017/08
DA  - 2017/08
TI  - Video Description Using Learning Multiple Features
BT  - Proceedings of the 2017 International Conference on Information Technology and Intelligent Manufacturing (ITIM 2017)
PB  - Atlantis Press
SP  - 137
EP  - 140
SN  - 1951-6851
UR  - https://doi.org/10.2991/itim-17.2017.34
DO  - 10.2991/itim-17.2017.34
ID  - Xu2017/08
ER  -