In recent years, computer science researchers have introduced deep-learning based algorithms, such as Word2Vec, ELMo, and Google's BERT, that take into account word contexts, e.g., other words in the same text, and word sequences in summarizing texts. They show that these new algorithms can significantly outperform NLP assuming a bag-of-words structure in tasks such as sentiment classification in general texts, language translation, and answering a question. In this project, we will use BERT to train and fine-tune a language model for Chinese finance texts for sentiment classification.
- Use web scrawler to obtain financial texts from social media and other sources
- Train and fine-tune a BERT model in Chinese financial news for sentiment classification
- Use sentiment to predict stock returns
- Applicants should be familiar with Python
Students will gain experience in Python programming and natural language processing in financial domain.