Discussion about this post

User's avatar
Theophilus Pedapolu's avatar

The obvious solution to writing a matrix in row-major order for this problem is to sort each row individually and then use a heap of size m containing the current smallest elements in each row to refill the matrix in row-major order. This is a O(mn * log(mn)) algorithm. However, we can make use of the fact that each row is k-sorted to make this algorithm slightly faster in the sorting phase.

Consider an individual row R. By the property of k-sorted rows, it follows that the ith smallest element in R must lie in the range of indices [max(0,i-k),min(len(R),i+k] < [0,min(len(R),i+k)]. Thus, the smallest element lies in [0,k], the 2nd smallest in [0,k+1], etc. This motivates us to use a heap of size k to sort the array. Namely, for each row R, we do the following:

1. Store the first k+1 elements of R in a heap. Keep a counter idx, for the next index not in the heap

2. For i = 0 to len(R)-1: pop the smallest element from the heap and place it at index I. If idx < len(R), add R[idx] to the heap and increment idx

We can see that this algorithm makes use of the k-sorted row property to ensure that when we are at index i, the heap contains all the elements at most k positions from i. This sorts each row in O(n* log k) time as opposed to O(n * log n) time. From here, we just proceed as normal, using a heap of size m to track the smallest elements in each row and refill the matrix:

1. Store a tuple of the first element of each row and its row index in a heap. Create a list L of size m that stores the next index that is not in the heap for each row (entries are initialized to 1)

2. Going left to right in row-major order: pop the smallest element from the heap and place it at the current position. Also use the popped element's row index to add the next element in its row to the heap (if possible) and increment the next index of the row in L.

Overall, this algorithm takes O(mn * log(km)) time and O(k+m) space.

Note: I would be interested in seeing what input sizes are needed for this algorithm to be faster than the naive algorithm (stated at the beginning) in practice. We are essentially doing a modified heapsort for each row, meaning it might still suffer from heapsort's poor cache locality. For certain inputs, quicksort on each row might still be faster. My guess is that k has to be small enough for the cache line to contain sufficient memory and yield good performance.

No posts

Ready for more?