Machine learning and Sentiment Classification in Chinese Financial Text
Project Description

In recent years, computer science researchers have introduced deep-learning based algorithms, such as Word2Vec, ELMo, and Google's BERT, that take into account word contexts, e.g., other words in the same text, and word sequences in summarizing texts. They show that these new algorithms can significantly outperform NLP assuming a bag-of-words structure in tasks such as sentiment classification in general texts, language translation, and answering a question. In this project, we will use BERT to train and fine-tune a language model for Chinese finance texts for sentiment classification.

Supervisor
HUANG Allen Hao
Co-Supervisor
YOU Haifeng
Quota
5
Course type
UROP1000
UROP1100
UROP2100
UROP3100
UROP4100
Applicant's Roles

- Use web scrawler to obtain financial texts from social media and other sources
- Train and fine-tune a BERT model in Chinese financial news for sentiment classification
- Use sentiment to predict stock returns
- Applicants should be familiar with Python

Applicant's Learning Objectives

Students will gain experience in Python programming and natural language processing in financial domain.

Complexity of the project
Challenging