Amplified Publishing Category
Using AI to improve accessibility for communication
by Vinay P. Namboodiri
New age communication and publishing are being developed for the mainstream community. However, presently these are not accessible to sections of the community who for instance may be facing speech impairments or other similar impairments. In these papers we explore if we can do it better by using AI techniques.
Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis
Previously we had worked on the ability to generate speech from lip movements using our proposed system Lip2Wav. This research was published in a research paper in CVPR 2020. This provided the context for further work in this space.
The figure above shows an overview of our proposed system Lip2Wav. This was previous work done prior to starting the Amplified Publishing fellowship. The figure shows how the system uses a silent video, extracts a sequence of faces from the video and uses the sequence of faces and uses it to generate speech based only on the lip movements.
Read the paper, Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis
Personalized One-Shot Lipreading for an ALS Patient
Based on this work, we further explored an ability to use such a system to enable speech impaired users to be able to generate speech. In this work done during the fellowship, we pursued research on generating speech without requiring much data for an ALS patient. We also explored obtaining an adaptable AI system to generate speech for speech impaired users who could move their lips. This research was carried out in collaboration with CVIT, IIIT Hyderabad. The aim of this aspect of the project was that the lip movements for speech impaired users due to various different impairments would vary widely. It was also difficult to obtain significant amount of data from each user in this case. This system was our first work in this space to obtain adaptable AI-based speech generation systems based on lip movements.
Personalized One-Shot Lipreading for an ALS Patient
Bipasha Sen, Aditya Agarwal, Rudrabha Mukhopadhyay, Vinay Namboodiri, C V Jawahar
BMVC 2021
Sign Language Generation
There are two more works that we explored in this space in terms of improving accessibility
The first of these was our work on generating sign language automatically from speech. In this work, we created a dataset that had audio corresponding to the sign language videos. Based on this approach, we developed a novel multi-modal transformer that generated sign language directly from the speech in contrast to existing works that generate sign language either from gloss or text. This work has resulted also in a patent application for this system.
Parul Kapoor, Rudrabha Mukhopadhyay, Sindhu B Hegde, Vinay Namboodiri, C V Jawahar
Interspeech 2021
Translating Sign Language Videos to Talking Faces
In the second work, we developed the counterpart of this work in terms of generating talking faces from sign language videos. In this work, we developed an interesting system that could generate talking face videos based on sign language. These works are aimed towards improving the accessibility for speech impaired users to access communication and modern mainstream publishing platforms such as gaming or VR/AR worlds.
Translating Sign Language Videos to Talking Faces
Seshadri Majumder, Rudrabha Mukhopadhyay, C.V. Jawahar
12th Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP), 2021