Proceedings of the First International Conference on Advances in Computer Vision and Artificial Intelligence Technologies (ACVAIT 2022)

Recent Advances in Audio-Visual Speech Recognition: Deep Learning Perspective

Authors
Diksha R. Pawar1, *, Pravin Yannawar1
1Department of Computer Science and Information Technology, Dr. Babasaheb Ambedkar Marathwada University, Aurangabad, Maharashtra, India
*Corresponding author. Email: dikshasalunke97@gmail.com
Corresponding Author
Diksha R. Pawar
Available Online 10 August 2023.
DOI
10.2991/978-94-6463-196-8_31How to use a DOI?
Keywords
ASR; audio feature extraction; AVSR; audio-video fusion; HMM; accuracy estimation methods; GNN; etc.
Abstract

Speech is the powerful engine of communication among human beings and language is meant for communicating with the world. This has motivated new researchers to study automatic speech recognition and expand a computer system so it can integrate and understand human speech. But the problem with speech recognition is the acoustic noisy environment can deeply corrupt audio speech. This polluted audio speech disturbs the whole recognition performance. So, the development of Audio-Visual Speech Recognition (AVSR) aims to solve the issues by utilizing visual pictures that are undisturbed by noise. This review paper's goal is to explain AVSR architectures, which include front-end operations, the utilized audio-visual dataset, and related studies, audio feature extraction, fusion and modeling techniques, and accuracy estimation methods.

Copyright
© 2023 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the First International Conference on Advances in Computer Vision and Artificial Intelligence Technologies (ACVAIT 2022)
Series
Advances in Intelligent Systems Research
Publication Date
10 August 2023
ISBN
978-94-6463-196-8
ISSN
1951-6851
DOI
10.2991/978-94-6463-196-8_31How to use a DOI?
Copyright
© 2023 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Diksha R. Pawar
AU  - Pravin Yannawar
PY  - 2023
DA  - 2023/08/10
TI  - Recent Advances in Audio-Visual Speech Recognition: Deep Learning Perspective
BT  - Proceedings of the First International Conference on Advances in Computer Vision and Artificial Intelligence Technologies (ACVAIT 2022)
PB  - Atlantis Press
SP  - 409
EP  - 421
SN  - 1951-6851
UR  - https://doi.org/10.2991/978-94-6463-196-8_31
DO  - 10.2991/978-94-6463-196-8_31
ID  - Pawar2023
ER  -