Software Architecture for Low Latency, High Performance, Deep Learning Accelerator
Project Description

Artificial intelligence applications on autonomous drones and self-driving vehicles are becoming more common. Innovations in new hardware, such as 400G Ethernet, NVMe-oF storage and Deep Learning Accelerator (GPU, TPU, ASIC, MPE, NVDLA, FPGA), require new programming workflow and framework in order to orchestrate low latency data flow from multiple heterogeneous computing architectures, communication fabrics and accelerators.
Programmers now can perform direct memory access from storage directly to GPU without any CPU I/O bottleneck or OS kernel context switching overhead; however, this requires new programming API and understanding of the underlying hardware architecture. Anyone with enthusiastic passion on software performance and low level computer architecture are welcome to explore the next evolution in high performance data engineering and AI.

Supervisor
WANG Yu-Hsing
Quota
3
Course type
UROP1100
UROP2100
UROP3100
UROP4100
Applicant's Roles

deep learning programming

Applicant's Learning Objectives

To learn the architecture of high performance data engineering and AI

Complexity of the project
Challenging