Thursday, August 20, 2015

Sorting Algorithms - Merge Sort

Now merge sort is based on the divide and conquer concept. Idea is to divide the problem into smaller sub problems and solve those sub problems. Once you are able to solve those sub problems, join them together to get the solution for the bigger problem. Now in case of merge sort we divide the problem into exactly two halves. Take that halves and again divide them into two halves again. Like this we will go until exactly one element is left in each of the sub problems. Now we start merging them by comparing each elements. The one having smaller element will be placed first and bigger one will be placed after that. Below diagram will help you to understand it.

First image is of division work. It shows how the problem is divided into smaller sub problem.


Once we are successful in dividing the problem into smaller sub problem now its time to conquer it that is to merge them. Below image shows how it will be merged.

For division purpose in coding we are using recursive function. Recursive functions are those which will call itself again and again with modified parameters until a predefined goal is met. Below is the code snippet for this

int MergeSortAlgo(int* pIntArray, int nStart, int nSize)
{
int nReturnValue = 0;
int nMid = -1;

nMid = nSize / 2;
if (nMid > 0)
{
//This is divide part of the algorithm. That is we are dividing the array into smaller sub arrays.
MergeSortAlgo(pIntArray, nStart, nMid);
MergeSortAlgo(pIntArray, nStart + nMid, nSize - nMid);
//This is conquering part of the algorithm. Here once after division of the array we are trying to merge them into one.
MergeArray(pIntArray, nStart, nSize);
}

return nReturnValue;
}

As you can see MergeSortAlgo is called repeatedly by itself with different parameters. It is dividing the problem into two which is done in step nSize/2.

Once all division is done it is going further to Merge those divided sub problems. Below is the code which is doing exactly that

int MergeArray(int *pIntArray, int nStart, int nSize)
{
int nReturnValue = 0;
int nMid = nSize / 2;
int nFirstStart = nStart;
int nSecondStart = nStart + nMid;
int nFirstEnd = nStart + nMid - 1;
int nSecondEnd = nStart + nSize - 1;

int i = nFirstStart, j = nSecondStart, k = 0;

int* pTempArray = 0;
pTempArray = (int*)malloc(nSize*sizeof(int));
if (pTempArray == 0)
{
//Error Dynamic memory not allocated.
return ERR_DYNAMICMEMNOTALLOC;
}
else
{
while (k < nSize && (i <= nFirstEnd && j <= nSecondEnd))
{
if (pIntArray[i] < pIntArray[j])
{
pTempArray[k++] = pIntArray[i++];
}
else
{
pTempArray[k++] = pIntArray[j++];
}
}

while (i <= nFirstEnd)
pTempArray[k++] = pIntArray[i++];

while (j <= nSecondEnd)
pTempArray[k++] = pIntArray[j++];

i = 0, j = 0, k = 0;

for (k = 0; k < nSize; k++)
pIntArray[nStart++] = pTempArray[k];

free(pTempArray);

}

return nReturnValue;
}

Nothing special here. First I have allocated a temporory space whose size is sum of arrays of sub problems. Then I comparing one element of Sub problem1 and one element of Sub problem2. Whichever is smaller will be placed in the temparory array. This is repeated for every elements.

Okay the order of growth of merge sort is Theta(nlogn) log base is 2. How do we get this well wait for my post on Masters Theorem.

Code is present in location https://github.com/harsha-kadekar/LibraryOfAlgorithms.git
I am learning from "Introduction to Algorithms" book 3rd edition. Authors - Thomas H Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford  Stein.

Sorting Algorithms - Insertion Sort

Okay I know I am again writing one more blog for insertion sort. The first one I had written when I was trying to develop a assembly language library containing all the sorting algorithm functions. Now I am creating a library in C language which will have many sorting algorithm functions and much more other functions. So the first one I tried was Insertion Sort. My primary book reference is "Introduction to Algorithms - 3rd Edition: Authors - Thomas H Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein"

The way I wrote this function is first I read the concept of insertion sort from the above mentioned book. Then I tried to create a C function based on the read concept. Once done I read the actual pseudo code given in the book and if it is much better I rewrote it in my library. So my version of insertion sort was different than in the book and 99% book version will be better and it happened in my case as well so I changed my logic to match with the book.

Imagine that there is a room, and inside which there are exactly 100 students and each student has class ID. This class ID is any number from 1 to 100. Now you are asked to arrange those students in a line in front of class building in the ascending order of their class IDs. So how do you do it? You will ask students to come out one at a time. First student will stand at the front of line. Now second student comes out. You will check his class ID and compare with the already stood students class ID. If his class ID is smaller than you will ask the earlier student to go back one position and new student will stand infront now. If the new students class ID is greater than the already standing student then you will ask the new student to stand at the next position after the already standing student. This process will repeat until all the students have stood in the line. This is nothing but Insertion sort. You will take a number and try to find the position of this number in the already sorted list of number. Once you found the position you insert that number to that position in the sorted list of number.

My initial code was like this:

//Take each number and place it in correct position
for (int i = 1; i <= nSize; i++)
{
int nKey = arNumbers[i];
int nPos = i;

//Find the position of the new number to be placed
for (int j = 0; j < i; j++)
{
if (arNumbers[j] > nKey)
{
nPos = j;
break;
}
}

if (nPos != i)
{
//Move all the numbers to the right to make the position for new number
for (int k = i - 1; k > nPos; k--)
{
arNumbers[k + 1] = arNumbers[k];
}

arNumbers[nPos] = nKey;

}
}

The above program works fine. But just that I have one extra for loop. I have one FOR loop for finding position to insert key element and one FOR loop to insert the key in the found position. But rather these two loops can be merged into to single loop and we can do both tasks together. Now below code is based on the book and in that loop is merged.

for (int i = 1; i < nSize; i++)
{
int nKey = arNumbers[i];
int j = i - 1;
while (j >= 0)
{
if (arNumbers[j] > nKey)
{
arNumbers[j + 1] = arNumbers[j];
}
else
{
break;
}
j--;
}

arNumbers[j + 1] = nKey;

}

The worst case of the algorithm arises when the array is sorted in the opposite direction. The order of growth for Insertion sort algorithm is  THETA(n^2). I will not go in detail of the proof for this. You can see that there is a loop inside a loop. First loop has to iterate n times. Inside loop also has to iterate n times. Thats why its n*n = n^2..