It is known how to sort n numbers in O(nlog n) time, which is optimal in the worst case. However, if the input data exhibits a distribution of low entropy, it becomes questionable whether O(nlog n) time is the best possible. A self-improving sorter has two phases, a training phase and an operation phase. In the training phase, the algorithm builds some auxiliary data structures from the training sample. In the operation phase, the algorithm sorts each input with the help of the auxiliary data structures in O(n + H) time, where H is the entropy of the input data. Note that the entropy of the data is an information-theoretic lower bound for sorting.
This project is about implementing one or two self-improving sorters and check their performance in experiments.
The student will be exposed to some newly developed algorithms. The student will also have a chance to implement some non-trivial data structure, including the optimal binary search tree and possibly using some 2D line arrangement library package.