Meaningful Statistical Machine Translation
Project Description
Machine translation is currently one of the most heavily researched areas of computer science research in North America, Europe, as well as Asia. Our research group, which is sponsored by the US government's highly prestigious DARPA agency (very rare for Asian universities!), has been one of the main leaders of this research area internationally ever since the founding of HKUST in the early 1990s. We led the development of machine learning systems that learn syntactic structure for automatic translation systems.

This research project seeks to develop the next generation of machine learning systems that automatically learn Chinese and English well enough to translate between them.

Today, web-based translation services of the type pioneered by our research group at HKUST (e.g., Babelfish, Google Translate, etc.) are providing useful services to millions of users each day.

However, the translation output of these systems is still often quite poor (often in hilarious ways!)

In this project, you will work with us on our current research thrust: for machine learning systems to learn SEMANTIC structure for automatic translation systems. To break the current performance barriers, it will be necessary for machines to learn not just the syntax of English and Chinese, but the contextual MEANING of the components within the sentences.
Supervisor
WU Dekai
Quota
5
Course type
UROP1100
UROP2100
UROP3100
UROP4100
Applicant's Roles
The applicant will (a) learn how to build current state-of-the-art statistical machine translation systems, and (b) help design and experimentally test new systems incorporating semantic models.
Applicant's Learning Objectives
(1) Understand the strengths and weaknesses of various AI and machine learning approaches to translation of human language strings
(2) Understand how semantic and context models can overcome the weaknesses of traditional machine translation models, as well as current state-of-the-art statistical and syntax-based statistical machine translation models
(3) Develop and practice research skills in formulating hypotheses, critically analyzing experimental data, and creatively designing new models
(4) Develop excellent software engineering skills
Complexity of the project
Challenging