Please use this identifier to cite or link to this item: http://localhost:8080/xmlui/handle/123456789/2053
Full metadata record
DC FieldValueLanguage
dc.contributor.authorShashidhar R.
dc.contributor.authorPatilkulkarni S.
dc.contributor.authorPuneeth S.B.
dc.date.accessioned2022-05-26T06:16:47Z-
dc.date.available2022-05-26T06:16:47Z-
dc.date.issued2022
dc.identifier.citationInternational Journal of Information Technology (Singapore)
dc.identifier.urihttp://localhost:8080/xmlui/handle/123456789/2053-
dc.description.abstractHuman speech is bimodal, whereas audio speech relates to the speaker's acoustic waveform. Lip motions are referred to as visual speech. Audiovisual Speech Recognition is one of the emerging fields of research, particularly when audio is corrupted by noise. In the proposed AVSR system, a custom dataset was designed for English Language. Mel Frequency Cepstral Coefficients technique was used for audio processing and the Long Short-Term Memory (LSTM) method for visual speech recognition. Finally, integrate the audio and visual into a single platform using a deep neural network. From the result, it was evident that the accuracy was 90% for audio speech recognition, 71% for visual speech recognition, and 91% for audiovisual speech recognition, the result was better than the existing approaches. Ultimately model was skilled at enchanting many suitable decisions while forecasting the spoken word for the dataset that was used. © 2022, The Author(s), under exclusive licence to Bharati Vidyapeeth's Institute of Computer Applications and Management.
dc.language.isoen
dc.publisherSpringer Science and Business Media B.V.
dc.titleCombining audio and visual speech recognition using LSTM and deep convolutional neural network
dc.typeArticle
Appears in Collections:Mathematics Department

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.