Google is working on a sign language detection technology to let voice deprived users communicate in video conferencing.
This real-time sign language detection model can identify when someone is showing signs, and when they are done, Google said.
The model can connect to various video conferencing applications, and can set the user as the “speaker” when they show signs.
ADVERTISEMENT
Researchers at Google first ran the video on a model called PoseNet to estimate body poses. This model is said to break an entire HD image into small frames that can capture positions of the user’s body, like the eyes, nose, shoulders, and hands.
The system then sends this visual data to a model trained on videos of people using German Sign Language.
The model correctly predicted 8 times out of 10 when a person used sign language. With some additional optimising, the model can improve to about 91% accuracy.
ADVERTISEMENT
The system passes an ultrasonic audio tone to determine that a user is showing signs as video conferencing applications usually detect audio “volume” as talking rather than just detect speech, fooling the application into thinking that the user is speaking.
Researchers conducted an experiment in which participants were asked to communicate via sign language during a videoconference. They received a positive response that the system could detect sign language and treat it as audible speech.
Most video conference applications do not have a mechanism to detect sign language. The new model can help signers to communicate easily and effectively.