Artificial intelligence applications on autonomous drones and self-driving vehicles are becoming more common. Innovations in new hardware, such as 400G Ethernet, NVMe-oF storage and Deep Learning Accelerator (GPU, TPU, ASIC, MPE, NVDLA, FPGA), require new programming workflow and framework in order to orchestrate low latency data flow from multiple heterogeneous computing architectures, communication fabrics and accelerators.
Programmers now can perform direct memory access from storage directly to GPU without any CPU I/O bottleneck or OS kernel context switching overhead; however, this requires new programming API and understanding of the underlying hardware architecture. Anyone with enthusiastic passion on software performance and low level computer architecture are welcome to explore the next evolution in high performance data engineering and AI.
deep learning programming
To learn the architecture of high performance data engineering and AI