Combining audio and visual speech recognition using LSTM and deep convolutional neural network

Please use this identifier to cite or link to this item: http://localhost:8080/xmlui/handle/123456789/2053

Title:	Combining audio and visual speech recognition using LSTM and deep convolutional neural network
Authors:	Shashidhar R. Patilkulkarni S. Puneeth S.B.
Issue Date:	2022
Publisher:	Springer Science and Business Media B.V.
Citation:	International Journal of Information Technology (Singapore)
Abstract:	Human speech is bimodal, whereas audio speech relates to the speaker's acoustic waveform. Lip motions are referred to as visual speech. Audiovisual Speech Recognition is one of the emerging fields of research, particularly when audio is corrupted by noise. In the proposed AVSR system, a custom dataset was designed for English Language. Mel Frequency Cepstral Coefficients technique was used for audio processing and the Long Short-Term Memory (LSTM) method for visual speech recognition. Finally, integrate the audio and visual into a single platform using a deep neural network. From the result, it was evident that the accuracy was 90% for audio speech recognition, 71% for visual speech recognition, and 91% for audiovisual speech recognition, the result was better than the existing approaches. Ultimately model was skilled at enchanting many suitable decisions while forecasting the spoken word for the dataset that was used. © 2022, The Author(s), under exclusive licence to Bharati Vidyapeeth's Institute of Computer Applications and Management.
URI:	http://localhost:8080/xmlui/handle/123456789/2053
Appears in Collections:	Mathematics Department

Files in This Item:

There are no files associated with this item.

Presidency University Library Repository