The project will focus on bioinformatic and statistical analysis of state-of-the-art single cell RNA-sequencing data from human or non-human cells. The project aims to understand what are the differences in gene expression between individual cells, and how external conditions such as stress, physical perturbation, drug treatment, or physical properties like cell size, can contribute to heterogeneity within a population. The project is entirely computational. Genomic sequencing data has already been collected, and the student will NOT be performing any wet-lab work for this project.
**MATH/CSE/ECE students welcome
**Applicant will write bash scripts in linux to process the sequencing data into summarized formats
**Applicant will write R or MATLAB code to analyze the data from various different aspects, studying the variations between cells, and creating visualizations to summarize the data to support their conclusions
**Applicant may need to learn and apply new statistical models or analysis methods or algorithms to appropriately analyse and assess genomic datasets.
**Application will need to read and review current literature to understand how such data is interpreted, and will need to write up a project report at the end to summarize their findings from the data.
**Demonstrate ability to apply scripting in bash, R and/or MATLAB to pre-process and analyze large biological datasets to generate statistical insights as well as visualizations
**Articulate the complexity in single-cell biological data in terms of technical variation vs biological variation vs variation due to treatment condition
**Communicate analytical hypotheses and methods, as well as results and conclusions in a clear and logical manner, both verbally and in writing, to describe the work that the student has conducted