Proceedings of the International Conference on Computational Innovations and Emerging Trends (ICCIET- 2024)

Real-time Object Detection and Voice Labeling for Enhanced Accessibility and Visual Interaction

Authors
Matta Swathi1, Ramala Supraja1, *, Malavathu Lakshmi Prasanna1, Shaik Sameer1, Guntaka Rama Krishna Reddy1
1Lakireddy Bali Reddy College of Engineering, Mylavaram, India
*Corresponding author. Email: supraja123.ramala@gmail.com
Corresponding Author
Ramala Supraja
Available Online 30 July 2024.
DOI
10.2991/978-94-6463-471-6_70How to use a DOI?
Keywords
YOLO Version 7; Voice Labeling; Neural Network; Visual Interaction; Small Object Handling; Pre-trained Model; Versatility; Fast Interference
Abstract

This work introduces a new approach to real-time object recognition using YOLO Version 7, an advanced system capable of real-time object detection in images, videos, as well as live webcam feeds. Unlike traditional methods, this system verbally discusses everything it finds, including the object’s name and the accuracy and confidence levels of the algorithm. Apart from enhancing accessibility, computers may also be leveraged to develop educational and engaging resources. Using the MS COCO dataset and a pre-trained model, YOLO Version 7 ensures accurate and speedy object recognition, even for small objects. By using the speed and precision of the system, the initiative aims to make information less intimidating and engaging, particularly for individuals with visual impairments. The dataset ensures comprehensive evaluations with 118,287 training shots, 5,000 validating images, and 20,288 assessment images spanning 80 object classes. The following are the advantages of the proposed method: speed, accuracy, increased visual interactions, faster and less interference, flexibility in all situations, accurate and quick item recognition, and improved handling of small objects. The solution gathers data from several sources, including cameras and picture/video files, and recognizes objects using the YOLO Version 7 algorithm.

Copyright
© 2024 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the International Conference on Computational Innovations and Emerging Trends (ICCIET- 2024)
Series
Advances in Computer Science Research
Publication Date
30 July 2024
ISBN
10.2991/978-94-6463-471-6_70
ISSN
2352-538X
DOI
10.2991/978-94-6463-471-6_70How to use a DOI?
Copyright
© 2024 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Matta Swathi
AU  - Ramala Supraja
AU  - Malavathu Lakshmi Prasanna
AU  - Shaik Sameer
AU  - Guntaka Rama Krishna Reddy
PY  - 2024
DA  - 2024/07/30
TI  - Real-time Object Detection and Voice Labeling for Enhanced Accessibility and Visual Interaction
BT  - Proceedings of the International Conference on Computational Innovations and Emerging Trends (ICCIET- 2024)
PB  - Atlantis Press
SP  - 721
EP  - 733
SN  - 2352-538X
UR  - https://doi.org/10.2991/978-94-6463-471-6_70
DO  - 10.2991/978-94-6463-471-6_70
ID  - Swathi2024
ER  -