Lip Reading Greek Words

Project page for Research program on Lip Reading with Greek words in the wild

Project description

Recent advances in the field of Computer Vision and Machine Learning (CVML), mainly enabled by Deep Convolutional Neural networks, have brought a new era of CVML applications. Traditional CVML approaches were based on hand crafted features, which was limiting the representational power of the features and consequently the performance of the final systems. Therefore Deep Learning is now a de-facto for tasks like image recognition, image segmentation and object detection.

For tasks requiring the processing of spatiotemporal information (e.g. video sequences), special effort is required. This is due to the lack of formalism in processing information that is evolving across time and spatial dimensions. A typical application of spatiotemporal information processing is the action recognition in videos. While human action mainly refers to the action a person or a group pf persons perform (walking, running, playing basketball), there are many other types of activities performed by humans like gestures, expressions and even movements produced during speech.

This work focuses on the problem of Lip Reading with emphasis in Greek words using Deep Learning. Lip reading refers to the technique of understanding a spoken word by using only visual information from the lips. The objective of the probject is to focues on research and development of Deep Learning models targeting the problem of user independent Lip Reading with emphasis Greek words.

Related publications

[C1] D. Kastaniotis, D. Tsourounis, A. Koureleas, B. Peev, C. Theoharatos and S. Fotopoulos, “Lip Reading in Greek words at unconstrained driving scenario,” 2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA), PATRAS, Greece, 2019, pp. 1-6. doi: 10.1109/IISA.2019.8900757


URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8900757&isnumber=8900660

Datasets

[D1] 50GW-Dataset: Fifty Greek words dataset, captured with mobile phone in driving scenario. In total 10 persons were recorded. The dataset is provided as croped mouth patches. For information about the data send an e-mail to Dimitris Kastaniotis dkastaniotis_avoid_spam at upatras dot gr .

Code

[C1] An implementation of [P1] will be soon made available here .

Team Members

Spiros Fotopoulos, Professor
Dimitris Kastaniotis, Postoctoral Researcher
Dimitris Tsourounis, PhD Candidate