Real-time Lip Reading Through Deep Learning
Project Description

Lip reading is the process of transcribing the content of speech using only the movements of a speaker’s lips. It is particularly beneficial when applied to hearing aids, audio-less communication scenarios, forensic lip reading, and automated subtitle generation, among other areas. This project aims to lip read faces in videos with or without audio and provide a transcription of the spoken content. Existing architectures will be analysed and implemented with publicly available datasets (e.g., BBC, GRID, LRS-TED, etc.) and optimised to aim for near real-time analysis for live video streams.

Supervisor
MAK Brian Kan Wing
Quota
1
Course type
UROP1100
UROP2100
UROP3100
UROP4100
Applicant's Roles

- study the current literature on lip-reading
- explore and implement different architectures (e.g., LipNet) for lip-reading
- evaluate the models on various lip-reading datasets, from small GRID to large dataset LRS-TED
- optimise the process for live video streams to provide real-time analysis
- further improve the performance with data augmentation

Applicant's Learning Objectives

- To understand the various neural architectures of lip-reading models
- To optimise the lip-reading pipelines for real-time analysis
- To investigate various data augmentation methods to improve the system’s robustness

Complexity of the project
Challenging