Self-Improving Sorter
Project Description

It is known how to sort n numbers in O(nlog n) time, which is optimal in the worst case. However, if the input data exhibits a distribution of low entropy, it becomes questionable whether O(nlog n) time is the best possible. A self-improving sorter has two phases, a training phase and an operation phase. In the training phase, the algorithm builds some auxiliary data structures from the training sample. In the operation phase, the algorithm sorts each input with the help of the auxiliary data structures in O(n + H) time, where H is the entropy of the input data. Note that the entropy of the data is an information-theoretic lower bound for sorting.

Supervisor
CHENG Siu Wing
Quota
1
Course type
UROP1100
Applicant's Roles

This project is about implementing one or two self-improving sorters and check their performance in experiments.

Applicant's Learning Objectives

The student will be exposed to some newly developed algorithms. The student will also have a chance to implement some non-trivial data structure, including the optimal binary search tree and possibly using some 2D line arrangement library package.

Complexity of the project
Moderate