Sunday, February 7, 2016

Heap Sort

This sorting algorithm has 3 main parts.

  1. Build Heap
  2. Swap 1st element and last element, Decrease Size by 1
  3. Heapify
you might be wondering now what the hell is heap?!!! Heap is a binary tree representation in which parent element will always have either highest or lowest element among its children - left child and right child. This binary tree will be represented in the array. If parent is holding highest element it is Max heap, else if parent is holding smallest element then it is Min heap. 

Below is the image of the Heap representation in tree form and also in array. 



As you can see in the tree, its a Max heap that is all the parents are having highest value when compared to (root, Left and Right). So to convert it into array - First element will be the root, followed by its immediate left child and then its immediate right child, followed by left child's children and then right child's children.
So if you have a parent index, say i, then its left child is at 2*i + 1, similarly its right child is present at 2*i+2. Similarly given an index j, the parent is present at location i/2.

First step, given an array we need to transform it into heap structure. Below is the code for doing it.

int BuildMaxHeap(int* pIntArray, int nSize)
{
int nReturnValue = 0;

if (pIntArray == 0)
{
return ERR_NULLINPUTARRAY;
}

if (nSize <= 0)
{
return ERR_INVALIDARRAYSIZE;
}

for (int i = nSize / 2; i >= 0; i--)
{
MaxHeapify(pIntArray, nSize, i);
}


return nReturnValue;
}

We start from the last parents that is parents just before leaves and check if it is satisfying Max Heap property. If not then we max heapify it. So once it is heapified we will move to next parent and repeat the procedure until root. 

int MaxHeapify(int* pIntArray, int nSize, int nIndex)
{
int nReturnValue = 0;
int nLeftIndex = LeftChildHeap(nIndex);
int nRightIndex = RightChildHeap(nIndex);
int nLargestIndex = nIndex;
int nTemp = 0;

if (nLeftIndex < nSize && pIntArray[nLeftIndex] > pIntArray[nLargestIndex])
{
nLargestIndex = nLeftIndex;
}

if (nRightIndex < nSize && pIntArray[nRightIndex] > pIntArray[nLargestIndex])
{
nLargestIndex = nRightIndex;
}

if (nLargestIndex != nIndex)
{
nTemp = pIntArray[nLargestIndex];
pIntArray[nLargestIndex] = pIntArray[nIndex];
pIntArray[nIndex] = nTemp;

MaxHeapify(pIntArray, nSize, nLargestIndex);
}

return nReturnValue;
}

So now what is this max heapify? - This will check among Parent, Left Child and Right Child which is largest element. Largest element will be swapped with Parent. Suppose left child had largest, then Parent and left elements are swapped. Once swapped we do max heapify to the exchanged element. That is, in our case if left element was exchanged with Parent, then now max heapify the new left childs children by treating it as root.

So once you are done with this you are now having the heapified array. So by max heapify theory root (0th element in array) should have the highest element among all the elements in the array/heap. To get the sorted array, Take the 0th element and swap with last element of the array. Then decrease the size of the array by one. Now heapify the array from the root. When you do this next highest element will come to the root of the heap. Repeat the process until size becomes 1. 

int HeapSortIntegers(int* pIntArray, int nSize)
{
int nReturnValue = 0;
int nTemp = 0;

if (pIntArray == 0)
{
return ERR_NULLINPUTARRAY;
}

if (nSize <= 0)
{
return ERR_INVALIDARRAYSIZE;
}

BuildMaxHeap(pIntArray, nSize);

for (int i = nSize - 1, j = 1; i >= 1; i++, j++)
{
nTemp = pIntArray[0];
pIntArray[0] = pIntArray[i];
pIntArray[i] = nTemp;

MaxHeapify(pIntArray, nSize - j, 0);

}

return nReturnValue;
}

This looks so completed right? Does it have any benefit? Well, yes its runtime is O(nlogn). Its average case = worst case runtime and it sorts inplace. 

Wow then this should be better than quicksort as quicksort takes O(n^2) in its worst case scenario? 
Well yes and no. Yes because heapsort is always O(nlogn). No because given an array, for which quick sort runs in O(nlogn), if we execute for heap as well as quick sorts, then quick will be running better as heap sort has higher constants in its runtime 

Implementation is based on Chapter 6 - HeapSort of Introduction to Algorithms 3rd Edition [Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, Clifford Stein]