As I was developing the basic web server, I found the need for a key value or name value dictionary to store the name value pairs. Earlier as part of my MS course curriculum I had written a chained and open addressing hash tables in python. I followed the similar pattern. Along with that "The C programming Langauage" by Brian W. Kernighan and Dennis M. Ritchie book in section 6.6 also explains about creating the name value list.
I have created a chained hash table. One of the main feature of hash table or dictionary is searching or retrieving of the stored value is fast. It should be O(1). If I had used Open addressing table than I could have achieved this. But with chained hash table we are heavily dependent on the hash function. So there may be chances that search will take more than O(1).
Okay let me first explain the dictionary function. Suppose you have key value pairs like 'employee name' and 'employee id' or 'programming languages' and 'C, C++, javascript, C#', etc. Now I need a data structure to store such key value pairs. I should be able to keep on adding and updating such key value pairs without worrying about where they get stored and memory limit. For an external user it is an infinite memory bucket, where he/she keeps on dumping key:value pairs. User can add new key:value pair, access the already stored key:value pair, update value of existing key:value pair, search for a key.
Now let me explain the implementation details. Initially when I am creating the dictionary a default sized array will be created where all name value pairs will be stored. As the array gets filled up, we try to increase the array size and rebuild the dictionary. This, we will try to do by monitoring the load factor. Example if the load factor = (Number of current nodes/Size of Dictionary Array) is greater than 0.8 then we increase the Array size and rebuild the dictionary array. You may wonder why dictionary has to be rebuilt. See our hash function which gives the index of the array, where the name value needs to be stored will use the size of the array to generate the index. So when array size changed then for the same old key, hash value will give you some other index. So literary you lost all the old nodes of dictionary. To avoid this error, once new array is created, all the old values has to be re added/hashed to this new array.
Given a key, we will pass that key to a hash function. This hash function will give you an integer which will be within 0 and array size of dictionary. This will be index in the array where we will be saving the name value pair. There are chances of collision that is two keys can lead to generation of same index. In this scenario there are many ways to handle it. we are using chained hashing. Here each entry in the array is a singly linked list. Each node in the singly linked list, represent the name value pair. So if in an index already another name value pair is present then we will be adding to this list new name value pair.
So when given a key, First step is to get the index using the hash function. Using this index we can get the singly list of all the keys which are having same index. Now traverse the singly list to find the exact key value pair.
This is my dictionary node. A dictionary node has a key (strName) and a value (strValue). As this node will be part of the singly linked list, we have self referencing pointer 'next'
typedef struct structDictionaryNode
{
char* strName;
char* strValue;
struct structDictionaryNode* next;
}DictionaryNode;
This is my actual dictionary. LoadFactor will define when I need to rebuild by dictionary by increasing the size of buffer array. nSizeOfDictionary is the size of the buffer array that is total slots available for allocation. nCurrentSize will tell you how many key value pairs are currently present in the dictionary. So current load factor can be calculated by nCurrentSize/nSizeOfDictionary. dictionaryArray will be my buffer array, basically it is array of singly linked lists.
typedef struct
{
float LoadFactor;
int nSizeOfDictionary;
int nCurrentSize;
DictionaryNode** dictionaryArray;
}Dictionary;
This function is used to create the dictionary. As you can see we are using load factor as 0.8. Also size of the initial array is passed.
Dictionary* CreateDictionary(int nSizeOfDictory)
{
Dictionary *dict = 0;
int i = 0;
if (nSizeOfDictory == 0)
nSizeOfDictory = 10;
dict = (Dictionary*)malloc(sizeof(Dictionary) * 1);
if (dict == 0)
{
//Error handle it
return dict;
}
(*dict).LoadFactor = 0.8;
(*dict).nSizeOfDictionary = nSizeOfDictory;
(*dict).nCurrentSize = 0;
(*dict).dictionaryArray = (DictionaryNode**)malloc(sizeof(DictionaryNode*)*(*dict).nSizeOfDictionary);
for (int i = 0; i < (*dict).nSizeOfDictionary; i++)
(*dict).dictionaryArray[i] = 0;
return dict;
}
This is the hash function used to find the actual index where the node needs to be placed in the dictionary array.
unsigned hash(int nSizeOfTable, char *s)
{
unsigned hashval;
for (hashval = 0; *s != '\0'; s++)
hashval = *s + 31 * hashval;
return hashval % nSizeOfTable;
}
Every time you want to add a name value pair, we use below function. Also you can see, as we add a key we will verify if this addition has resulted in exceeding the load factor. If so then we are rebuilding the dictionary. If loadfactor is exceeded, then we are creating new dictionary with double the previous size.
int AddNameValueToDictionary(Dictionary *dict, char *strKey, char *strValue)
{
int nReturnValue = 0;
DictionaryNode *pHead = 0, *pCurrent = 0;
int i = 0;
float fTemp = 0.0;
int bFoundKey = 0;
if (dict == 0 || strKey == 0 || strValue == 0)
{
nReturnValue = ERR_INVALID_PARAMETERS_DICT;
return nReturnValue;
}
if (IsKeyExistsInDictionary(dict, strKey) == 1)
{
pHead = (*dict).dictionaryArray[hash((*dict).nSizeOfDictionary, strKey)];
if (pHead == 0)
{
//Actually Error
(*dict).dictionaryArray[hash((*dict).nSizeOfDictionary, strKey)] = CreateADictionaryNode(strKey, strValue);
(*dict).nCurrentSize++;
}
else
{
while (pHead != 0)
{
if (strcmp((*pHead).strName, strKey) == 0)
{
//Update Value
free((*pHead).strValue);
(*pHead).strValue = (char*)malloc(sizeof(char)*strnlen_s(strValue, MAXVALUESIZE) + 1);
memset((*pHead).strValue, '\0', strnlen_s(strValue, MAXVALUESIZE) + 1);
i = 0;
while (strValue[i] != '\0')
{
(*pHead).strValue[i] = strValue[i];
i++;
}
bFoundKey = 1;
break;
}
else
{
pHead = (*pHead).next;
}
}
if (bFoundKey == 0)
{
//Actually Error
pHead = (*dict).dictionaryArray[hash((*dict).nSizeOfDictionary, strKey)];
pCurrent = CreateADictionaryNode(strKey, strValue);
(*pCurrent).next = pHead;
(*dict).dictionaryArray[hash((*dict).nSizeOfDictionary, strKey)] = pCurrent;
(*dict).nCurrentSize++;
}
}
}
else
{
pHead = (*dict).dictionaryArray[hash((*dict).nSizeOfDictionary, strKey)];
if (pHead == 0)
{
(*dict).dictionaryArray[hash((*dict).nSizeOfDictionary, strKey)] = CreateADictionaryNode(strKey, strValue);
(*dict).nCurrentSize++;
}
else
{
pCurrent = CreateADictionaryNode(strKey, strValue);
(*pCurrent).next = pHead;
(*dict).dictionaryArray[hash((*dict).nSizeOfDictionary, strKey)] = pCurrent;
(*dict).nCurrentSize++;
}
}
fTemp = (*dict).nCurrentSize;
fTemp = fTemp / (*dict).nSizeOfDictionary;
if (fTemp > (*dict).LoadFactor)
{
nReturnValue = RebuildDictionary(dict, (*dict).nSizeOfDictionary * 2);
}
return nReturnValue;
}
It is common in dictionary operations to know if a key is already present in the dictionary. Following function will return 1 if key is found in dictionary, else 0.
int IsKeyExistsInDictionary(Dictionary *dict, char *strKey)
{
int bFound = 0;
int i = 0;
DictionaryNode *pHead = 0, *pCurrent = 0;
for (i = 0; i < (*dict).nSizeOfDictionary; i++)
{
pHead = (*dict).dictionaryArray[i];
if (pHead != 0)
{
pCurrent = pHead;
while (pCurrent != 0)
{
if (strcmp(strKey, (*pCurrent).strName) == 0)
{
bFound = 1;
break;
}
pCurrent = (*pCurrent).next;
}
}
}
return bFound;
}
This function rebuilds the dictionary with the new size.
int RebuildDictionary(Dictionary *dict, int nNewSize)
{
int nReturnValue = 0;
int i = 0;
int nOldSize = 0;
DictionaryNode *pHead = 0, *pCurrent = 0;
DictionaryNode** pNewArray = (DictionaryNode**)malloc(sizeof(DictionaryNode*)*nNewSize);
DictionaryNode** pOldArray = (*dict).dictionaryArray;
nOldSize = (*dict).nSizeOfDictionary;
(*dict).nSizeOfDictionary = nNewSize;
(*dict).dictionaryArray = pNewArray;
(*dict).nCurrentSize = 0;
for (i = 0; i < nOldSize; i++)
{
pHead = pOldArray[i];
while (pHead != 0)
{
pCurrent = pHead;
pHead = (*pCurrent).next;
(*pCurrent).next = 0;
nReturnValue = AddNodeToDictionary(dict, pCurrent);
if (nReturnValue != 0)
{
//Error handle it
}
}
}
free(pOldArray);
pOldArray = 0;
return nReturnValue;
}
This function will delete a particular node or name value pair from the dictionary.
int DeleteKeyFromDictionary(Dictionary *dict, char *strKey)
{
int nReturnValue = 0;
int i = 0;
DictionaryNode *pHead = 0, *pCurrent = 0, *pPrev = 0;
pHead = (*dict).dictionaryArray[hash((*dict).nSizeOfDictionary, strKey)];
if (pHead != 0)
{
pCurrent = pHead;
while (pCurrent != 0)
{
if (strcmp((*pCurrent).strName, strKey) == 0)
{
if (pPrev == 0)
{
(*dict).dictionaryArray[hash((*dict).nSizeOfDictionary, strKey)] = (*pCurrent).next;
}
else
{
(*pPrev).next = (*pCurrent).next;
}
free((*pCurrent).strName);
free((*pCurrent).strValue);
free(pCurrent);
dict->nCurrentSize--;
break;
}
pPrev = pCurrent;
pCurrent = (*pCurrent).next;
}
}
else
{
//Should I have to intimate user??
}
return nReturnValue;
}
These functions will delete entire dictionary.
int DeleteDictionaryNodeArray(DictionaryNode **dArray, int nSizeOfArray)
{
int nReturnValue = 0;
int i = 0, j = 0;
DictionaryNode *pCurrent = 0, *pHead = 0;
for (i = 0; i < nSizeOfArray; i++)
{
if (dArray[i] != 0)
{
pHead = dArray[i];
while (pHead != 0)
{
if ((*pHead).strName != 0)
{
free((*pHead).strName);
(*pHead).strName = 0;
}
if ((*pHead).strValue != 0)
{
free((*pHead).strValue);
(*pHead).strValue = 0;
}
pCurrent = (*pCurrent).next;
free(pHead);
pHead = pCurrent;
}
dArray[i] = 0;
}
}
free(dArray);
dArray = 0;
return nReturnValue;
}
int DeleteDictionary(Dictionary *dict)
{
int nReturnValue = 0;
int i = 0, j = 0;
DictionaryNode *pCurrent = 0, *pHead = 0;
if (dict != 0)
{
if ((*dict).dictionaryArray != 0)
{
nReturnValue = DeleteDictionaryNodeArray((*dict).dictionaryArray, (*dict).nSizeOfDictionary);
(*dict).dictionaryArray = 0;
}
(*dict).LoadFactor = 0;
(*dict).nCurrentSize = 0;
(*dict).nSizeOfDictionary = 0;
free(dict);
dict = 0;
}
return nReturnValue;
}
Given a key, get the corresponding value stored in dictionary. This is a very important function of dictionary.
DictionaryNode* GetNodeFromDictionary(Dictionary *dict, char *strKey)
{
DictionaryNode *pNode = 0;
pNode = dict->dictionaryArray[hash(dict->nSizeOfDictionary, strKey)];
while (pNode != 0)
{
if (strcmp(pNode->strName, strKey) == 0)
{
break;
}
pNode = pNode->next;
}
return pNode;
}
char* GetValueFromDictionary(Dictionary *dict, char *strKey)
{
char *strValue = 0;
DictionaryNode *pNode = GetNodeFromDictionary(dict, strKey);
if (pNode != 0)
{
strValue = pNode->strValue;
}
return strValue;
}
Complete code can be found at git - https://github.com/harsha-kadekar/BasicHttpServerClient
Code is present in file SupportingDataStructures.c and SupportingDataStructures.h files.
I have created a chained hash table. One of the main feature of hash table or dictionary is searching or retrieving of the stored value is fast. It should be O(1). If I had used Open addressing table than I could have achieved this. But with chained hash table we are heavily dependent on the hash function. So there may be chances that search will take more than O(1).
Okay let me first explain the dictionary function. Suppose you have key value pairs like 'employee name' and 'employee id' or 'programming languages' and 'C, C++, javascript, C#', etc. Now I need a data structure to store such key value pairs. I should be able to keep on adding and updating such key value pairs without worrying about where they get stored and memory limit. For an external user it is an infinite memory bucket, where he/she keeps on dumping key:value pairs. User can add new key:value pair, access the already stored key:value pair, update value of existing key:value pair, search for a key.
Now let me explain the implementation details. Initially when I am creating the dictionary a default sized array will be created where all name value pairs will be stored. As the array gets filled up, we try to increase the array size and rebuild the dictionary. This, we will try to do by monitoring the load factor. Example if the load factor = (Number of current nodes/Size of Dictionary Array) is greater than 0.8 then we increase the Array size and rebuild the dictionary array. You may wonder why dictionary has to be rebuilt. See our hash function which gives the index of the array, where the name value needs to be stored will use the size of the array to generate the index. So when array size changed then for the same old key, hash value will give you some other index. So literary you lost all the old nodes of dictionary. To avoid this error, once new array is created, all the old values has to be re added/hashed to this new array.
Given a key, we will pass that key to a hash function. This hash function will give you an integer which will be within 0 and array size of dictionary. This will be index in the array where we will be saving the name value pair. There are chances of collision that is two keys can lead to generation of same index. In this scenario there are many ways to handle it. we are using chained hashing. Here each entry in the array is a singly linked list. Each node in the singly linked list, represent the name value pair. So if in an index already another name value pair is present then we will be adding to this list new name value pair.
So when given a key, First step is to get the index using the hash function. Using this index we can get the singly list of all the keys which are having same index. Now traverse the singly list to find the exact key value pair.
This is my dictionary node. A dictionary node has a key (strName) and a value (strValue). As this node will be part of the singly linked list, we have self referencing pointer 'next'
typedef struct structDictionaryNode
{
char* strName;
char* strValue;
struct structDictionaryNode* next;
}DictionaryNode;
This is my actual dictionary. LoadFactor will define when I need to rebuild by dictionary by increasing the size of buffer array. nSizeOfDictionary is the size of the buffer array that is total slots available for allocation. nCurrentSize will tell you how many key value pairs are currently present in the dictionary. So current load factor can be calculated by nCurrentSize/nSizeOfDictionary. dictionaryArray will be my buffer array, basically it is array of singly linked lists.
typedef struct
{
float LoadFactor;
int nSizeOfDictionary;
int nCurrentSize;
DictionaryNode** dictionaryArray;
}Dictionary;
This function is used to create the dictionary. As you can see we are using load factor as 0.8. Also size of the initial array is passed.
Dictionary* CreateDictionary(int nSizeOfDictory)
{
Dictionary *dict = 0;
int i = 0;
if (nSizeOfDictory == 0)
nSizeOfDictory = 10;
dict = (Dictionary*)malloc(sizeof(Dictionary) * 1);
if (dict == 0)
{
//Error handle it
return dict;
}
(*dict).LoadFactor = 0.8;
(*dict).nSizeOfDictionary = nSizeOfDictory;
(*dict).nCurrentSize = 0;
(*dict).dictionaryArray = (DictionaryNode**)malloc(sizeof(DictionaryNode*)*(*dict).nSizeOfDictionary);
for (int i = 0; i < (*dict).nSizeOfDictionary; i++)
(*dict).dictionaryArray[i] = 0;
return dict;
}
This is the hash function used to find the actual index where the node needs to be placed in the dictionary array.
unsigned hash(int nSizeOfTable, char *s)
{
unsigned hashval;
for (hashval = 0; *s != '\0'; s++)
hashval = *s + 31 * hashval;
return hashval % nSizeOfTable;
}
Every time you want to add a name value pair, we use below function. Also you can see, as we add a key we will verify if this addition has resulted in exceeding the load factor. If so then we are rebuilding the dictionary. If loadfactor is exceeded, then we are creating new dictionary with double the previous size.
int AddNameValueToDictionary(Dictionary *dict, char *strKey, char *strValue)
{
int nReturnValue = 0;
DictionaryNode *pHead = 0, *pCurrent = 0;
int i = 0;
float fTemp = 0.0;
int bFoundKey = 0;
if (dict == 0 || strKey == 0 || strValue == 0)
{
nReturnValue = ERR_INVALID_PARAMETERS_DICT;
return nReturnValue;
}
if (IsKeyExistsInDictionary(dict, strKey) == 1)
{
pHead = (*dict).dictionaryArray[hash((*dict).nSizeOfDictionary, strKey)];
if (pHead == 0)
{
//Actually Error
(*dict).dictionaryArray[hash((*dict).nSizeOfDictionary, strKey)] = CreateADictionaryNode(strKey, strValue);
(*dict).nCurrentSize++;
}
else
{
while (pHead != 0)
{
if (strcmp((*pHead).strName, strKey) == 0)
{
//Update Value
free((*pHead).strValue);
(*pHead).strValue = (char*)malloc(sizeof(char)*strnlen_s(strValue, MAXVALUESIZE) + 1);
memset((*pHead).strValue, '\0', strnlen_s(strValue, MAXVALUESIZE) + 1);
i = 0;
while (strValue[i] != '\0')
{
(*pHead).strValue[i] = strValue[i];
i++;
}
bFoundKey = 1;
break;
}
else
{
pHead = (*pHead).next;
}
}
if (bFoundKey == 0)
{
//Actually Error
pHead = (*dict).dictionaryArray[hash((*dict).nSizeOfDictionary, strKey)];
pCurrent = CreateADictionaryNode(strKey, strValue);
(*pCurrent).next = pHead;
(*dict).dictionaryArray[hash((*dict).nSizeOfDictionary, strKey)] = pCurrent;
(*dict).nCurrentSize++;
}
}
}
else
{
pHead = (*dict).dictionaryArray[hash((*dict).nSizeOfDictionary, strKey)];
if (pHead == 0)
{
(*dict).dictionaryArray[hash((*dict).nSizeOfDictionary, strKey)] = CreateADictionaryNode(strKey, strValue);
(*dict).nCurrentSize++;
}
else
{
pCurrent = CreateADictionaryNode(strKey, strValue);
(*pCurrent).next = pHead;
(*dict).dictionaryArray[hash((*dict).nSizeOfDictionary, strKey)] = pCurrent;
(*dict).nCurrentSize++;
}
}
fTemp = (*dict).nCurrentSize;
fTemp = fTemp / (*dict).nSizeOfDictionary;
if (fTemp > (*dict).LoadFactor)
{
nReturnValue = RebuildDictionary(dict, (*dict).nSizeOfDictionary * 2);
}
return nReturnValue;
}
It is common in dictionary operations to know if a key is already present in the dictionary. Following function will return 1 if key is found in dictionary, else 0.
int IsKeyExistsInDictionary(Dictionary *dict, char *strKey)
{
int bFound = 0;
int i = 0;
DictionaryNode *pHead = 0, *pCurrent = 0;
for (i = 0; i < (*dict).nSizeOfDictionary; i++)
{
pHead = (*dict).dictionaryArray[i];
if (pHead != 0)
{
pCurrent = pHead;
while (pCurrent != 0)
{
if (strcmp(strKey, (*pCurrent).strName) == 0)
{
bFound = 1;
break;
}
pCurrent = (*pCurrent).next;
}
}
}
return bFound;
}
This function rebuilds the dictionary with the new size.
int RebuildDictionary(Dictionary *dict, int nNewSize)
{
int nReturnValue = 0;
int i = 0;
int nOldSize = 0;
DictionaryNode *pHead = 0, *pCurrent = 0;
DictionaryNode** pNewArray = (DictionaryNode**)malloc(sizeof(DictionaryNode*)*nNewSize);
DictionaryNode** pOldArray = (*dict).dictionaryArray;
nOldSize = (*dict).nSizeOfDictionary;
(*dict).nSizeOfDictionary = nNewSize;
(*dict).dictionaryArray = pNewArray;
(*dict).nCurrentSize = 0;
for (i = 0; i < nOldSize; i++)
{
pHead = pOldArray[i];
while (pHead != 0)
{
pCurrent = pHead;
pHead = (*pCurrent).next;
(*pCurrent).next = 0;
nReturnValue = AddNodeToDictionary(dict, pCurrent);
if (nReturnValue != 0)
{
//Error handle it
}
}
}
free(pOldArray);
pOldArray = 0;
return nReturnValue;
}
This function will delete a particular node or name value pair from the dictionary.
int DeleteKeyFromDictionary(Dictionary *dict, char *strKey)
{
int nReturnValue = 0;
int i = 0;
DictionaryNode *pHead = 0, *pCurrent = 0, *pPrev = 0;
pHead = (*dict).dictionaryArray[hash((*dict).nSizeOfDictionary, strKey)];
if (pHead != 0)
{
pCurrent = pHead;
while (pCurrent != 0)
{
if (strcmp((*pCurrent).strName, strKey) == 0)
{
if (pPrev == 0)
{
(*dict).dictionaryArray[hash((*dict).nSizeOfDictionary, strKey)] = (*pCurrent).next;
}
else
{
(*pPrev).next = (*pCurrent).next;
}
free((*pCurrent).strName);
free((*pCurrent).strValue);
free(pCurrent);
dict->nCurrentSize--;
break;
}
pPrev = pCurrent;
pCurrent = (*pCurrent).next;
}
}
else
{
//Should I have to intimate user??
}
return nReturnValue;
}
These functions will delete entire dictionary.
int DeleteDictionaryNodeArray(DictionaryNode **dArray, int nSizeOfArray)
{
int nReturnValue = 0;
int i = 0, j = 0;
DictionaryNode *pCurrent = 0, *pHead = 0;
for (i = 0; i < nSizeOfArray; i++)
{
if (dArray[i] != 0)
{
pHead = dArray[i];
while (pHead != 0)
{
if ((*pHead).strName != 0)
{
free((*pHead).strName);
(*pHead).strName = 0;
}
if ((*pHead).strValue != 0)
{
free((*pHead).strValue);
(*pHead).strValue = 0;
}
pCurrent = (*pCurrent).next;
free(pHead);
pHead = pCurrent;
}
dArray[i] = 0;
}
}
free(dArray);
dArray = 0;
return nReturnValue;
}
int DeleteDictionary(Dictionary *dict)
{
int nReturnValue = 0;
int i = 0, j = 0;
DictionaryNode *pCurrent = 0, *pHead = 0;
if (dict != 0)
{
if ((*dict).dictionaryArray != 0)
{
nReturnValue = DeleteDictionaryNodeArray((*dict).dictionaryArray, (*dict).nSizeOfDictionary);
(*dict).dictionaryArray = 0;
}
(*dict).LoadFactor = 0;
(*dict).nCurrentSize = 0;
(*dict).nSizeOfDictionary = 0;
free(dict);
dict = 0;
}
return nReturnValue;
}
Given a key, get the corresponding value stored in dictionary. This is a very important function of dictionary.
DictionaryNode* GetNodeFromDictionary(Dictionary *dict, char *strKey)
{
DictionaryNode *pNode = 0;
pNode = dict->dictionaryArray[hash(dict->nSizeOfDictionary, strKey)];
while (pNode != 0)
{
if (strcmp(pNode->strName, strKey) == 0)
{
break;
}
pNode = pNode->next;
}
return pNode;
}
char* GetValueFromDictionary(Dictionary *dict, char *strKey)
{
char *strValue = 0;
DictionaryNode *pNode = GetNodeFromDictionary(dict, strKey);
if (pNode != 0)
{
strValue = pNode->strValue;
}
return strValue;
}
Complete code can be found at git - https://github.com/harsha-kadekar/BasicHttpServerClient
Code is present in file SupportingDataStructures.c and SupportingDataStructures.h files.