JavaScript Mergesort: Top-Down vs Bottom-Up

In my continuing series on JavaScript algorithms, I thought it might be fun to examine two methods for implementing “mergesort”.

Mergesort is essentially a “divide and conquer” technique, where the algorithm breaks an array into smaller pieces. Each of the small pieces is sorted and then recursively merged back together.

Mergesort is an attractive option for sorting large arrays because it is fast… and we’ll look at its efficiency in a moment. However, it also has a disadvantage: the algorithm requires more memory than Selection Sort and Insertion Sort, so for systems in which memory usage must be kept low mergesort may not be a good option.

Memory Use

Mergesort works by copying all values of an array into a temporary array. The original array is then recursively broken into smaller chunks to sort… but the temporary array is used throughout the process as a placeholder.

Thus, sorting an array of size N requires the system have available memory of 2N (the original array, plus a placeholder of the same size). If you have a considerable amount of data to be sorted, memory usage quickly balloons.

By contrast, Selection Sort and Insertion Sort use very small amounts of memory – each of these algorithms requires only a temporary placeholder for the array index currently being moved.

Measuring Sort Efficiency

In a previous post on algorithms, I described the concept of Big-O notation as follows:

When we study the structure of algorithms, we represent the growth rate using Big-O Notation. In short, Big-O Notation is a mathematical forumla that demonstrates how a function performs as its inputs (or memory requirements) change.

In practice, Big-O Notation represents the mathematical limit (i.e. upper-bound, or ceiling) for performance given some input. It is the worst-case scenario for the algorithm.

Mergesort guarantees an array will be sorted in time proportional to N * log(N) – which is far better than the worst-case scenarios for Selection and Insertion sorts.

Array Size Selection:
O( (n^2 + n)/2 )
Insertion:
O( (n^2 + n)/2 )
Mergesort:
O( n * log(n) )
10 55 55 23
100 5050 5050 460
500 125,250 125,250 3107

*These results are not actual times… just the maximum number of iterations in the algorithm

In reality, the efficiency of mergesort is closer to O( 6N * log(N) ) (due to the number of array accesses for copy/move/compare actions) – so on smaller datasets, Selection and Insertion sort would actually run faster! But as the datasets grow in size, Mergesort quickly becomes a faster solution.

In theory, you could write a mergesort algorithm that utilized one of the other sorting methods only on the smallest chunks… thereby using the most efficient parts of each algorithm. I won’t get into that in this post, but it would be a cool exercise.

Implementations of Mergesort

There are two methods for implementing a mergesort algorithm: a top-down approach or a bottom-up approach.

Top-Down Method

The top-down approach to mergesort is perhaps the most obvious. Given an array of size N, the algorithm recursively breaks the array in half and then merges the results together.

``````
function topDownMergeSort(inputArray) {
var sort = function (array, low, high) {
if (high <= low) { return; }

var mid = Math.floor(low + (high - low) / 2);

//recursively sort the array by breaking it in half
sort(array, low, mid);
sort(array, mid + 1, high);

//merge() is defined elsewhere
ArraySort.merge(array, temp, low, mid, high);
};

var temp = []; //allocate space just once

sort(inputArray, 0, inputArray.length - 1);

return inputArray;
}
``````

The concept is pretty straight-forward - and is easily represented as a tree.

Using the tree as a visual aid, the root represents the entire array. It is split recursively into 2 halves; when the halves can no longer be split, the pieces are individually sorted. Then the sub-parts are merged back together, moving upwards through the tree until the algorithm is back at the root (at which point the array is completely sorted).

Bottom-Up Method

The bottom-up approach is perhaps a little less clear at first.

Rather than breaking the overall array into distinct pieces, bottum-up mergesort loops over the array using intervals of varying sizes. Each interval is sorted and merged together; subsequent loops over the array have larger intervals, effectively merging our previously sorted (smaller) intervals together.

``````
function bottomUpMergeSort(inputArray) {
var length = inputArray.length,
size = 1,
temp = []; //allocate space just once

for (size; size < length; size = size*2) {
var low = 0;

for(low; low < length-size; low += size*2) {
var mid = low + size - 1,
high = Math.min(low + (size*2 - 1), length -1);

ArraySort.merge(inputArray, temp, low, mid, high);
}
}

return inputArray;
}
``````

I personally feel that visualizing bottom-up mergesort is difficult without the help of an example like this:

Results

I graphed the expected/actual results for these two algorithms using ExtJS. The graphs display the number of items in the array (X axis) and the number of iterations needed to sort the array (Y axis).

Remember: both top-down and bottom-up mergesorts have the same estimated (worst-case) runtime. However, my analysis seems to indicate that bottom-up mergesort runs faster in practice! I am admittedly a bit unsure as to why that is the case... so perhaps some discussion on the topic is in order.

UPDATE: I asked the StackOverflow community to look at my analysis, and they seem to (generally) agree with me. It was correctly pointed out that I'm measuring "iterations", and not "speed" or "array accesses" - so my results are in fact valid. While my analysis isn't exactly "complete" in that sense, some people agreed that the bottom-up method should be somewhat faster as it doesn't have the recursive method calls (which lead to computing overhead).