## Posted By

davince on 06/23/11

# Hashing function for C

/ Published in: C

The General Hash Functions Library has the following mix of additive and rotative general purpose string hashing algorithms.

• RS Hash Function A simple hash function from Robert Sedgwicks Algorithms in C book. I've added some simple optimizations to the algorithm in order to speed up its hashing process.
• JS Hash Function A bitwise hash function written by Justin Sobel
• PJW Hash Function This hash algorithm is based on work by Peter J. Weinberger of AT&T Bell Labs. The book Compilers (Principles, Techniques and Tools) by Aho, Sethi and Ulman, recommends the use of hash functions that employ the hashing methodology found in this particular algorithm.
• ELF Hash Function Similar to the PJW Hash function, but tweaked for 32-bit processors. Its the hash function widely used on most UNIX systems.
• BKDR Hash Function This hash function comes from Brian Kernighan and Dennis Ritchie's book "The C Programming Language". It is a simple hash function using a strange set of possible seeds which all constitute a pattern of 31....31...31 etc, it seems to be very similar to the DJB hash function.
• SDBM Hash Function This is the algorithm of choice which is used in the open source SDBM project. The hash function seems to have a good over-all distribution for many different data sets. It seems to work well in situations where there is a high variance in the MSBs of the elements in a data set.
• DJB Hash Function An algorithm produced by Professor Daniel J. Bernstein and shown first to the world on the usenet newsgroup comp.lang.c. It is one of the most efficient hash functions ever published.
• DEK Hash Function An algorithm proposed by Donald E. Knuth in The Art Of Computer Programming Volume 3, under the topic of sorting and search chapter 6.4.
• AP Hash Function An algorithm produced by me Arash Partow. I took ideas from all of the above hash functions making a hybrid rotative and additive hash function algorithm based around four primes 3,5,7 and 11. There isn't any real mathematical analysis explaining why one should use this hash function instead of the others described above other than the fact that I tired to resemble the design as close as possible to a simple LFSR. An empirical result which demonstrated the distributive abilities of the hash algorithm was obtained using a hash-table with 100003 buckets, hashing The Project Gutenberg Etext of Webster's Unabridged Dictionary, the longest encountered chain length was 7, the average chain length was 2, the number of empty buckets was 4579.

1. #ifndef INCLUDE_GENERALHASHFUNCTION_CPP_H
2. #define INCLUDE_GENERALHASHFUNCTION_CPP_H
3.
4.
5. #include <string>
6.
7.
8. typedef unsigned int (*HashFunction)(const std::string&);
9.
10.
11. unsigned int RSHash (const std::string& str);
12. unsigned int JSHash (const std::string& str);
13. unsigned int PJWHash (const std::string& str);
14. unsigned int ELFHash (const std::string& str);
15. unsigned int BKDRHash(const std::string& str);
16. unsigned int SDBMHash(const std::string& str);
17. unsigned int DJBHash (const std::string& str);
18. unsigned int DEKHash (const std::string& str);
19. unsigned int APHash (const std::string& str);
20.
21.
22. #endif
23.
24.
25.
26. generalhashfunction.cc
27.
28. #include "GeneralHashFunctions.h"
29.
30. unsigned int RSHash(const std::string& str)
31. {
32. unsigned int b = 378551;
33. unsigned int a = 63689;
34. unsigned int hash = 0;
35.
36. for(std::size_t i = 0; i < str.length(); i++)
37. {
38. hash = hash * a + str[i];
39. a = a * b;
40. }
41.
42. return (hash & 0x7FFFFFFF);
43. }
44. /* End Of RS Hash Function */
45.
46.
47. unsigned int JSHash(const std::string& str)
48. {
49. unsigned int hash = 1315423911;
50.
51. for(std::size_t i = 0; i < str.length(); i++)
52. {
53. hash ^= ((hash << 5) + str[i] + (hash >> 2));
54. }
55.
56. return (hash & 0x7FFFFFFF);
57. }
58. /* End Of JS Hash Function */
59.
60.
61. unsigned int PJWHash(const std::string& str)
62. {
63. unsigned int BitsInUnsignedInt = (unsigned int)(sizeof(unsigned int) * 8);
64. unsigned int ThreeQuarters = (unsigned int)((BitsInUnsignedInt * 3) / 4);
65. unsigned int OneEighth = (unsigned int)(BitsInUnsignedInt / 8);
66. unsigned int HighBits = (unsigned int)(0xFFFFFFFF) << (BitsInUnsignedInt - OneEighth);
67. unsigned int hash = 0;
68. unsigned int test = 0;
69.
70. for(std::size_t i = 0; i < str.length(); i++)
71. {
72. hash = (hash << OneEighth) + str[i];
73.
74. if((test = hash & HighBits) != 0)
75. {
76. hash = (( hash ^ (test >> ThreeQuarters)) & (~HighBits));
77. }
78. }
79.
80. return (hash & 0x7FFFFFFF);
81. }
82. /* End Of P. J. Weinberger Hash Function */
83.
84.
85. unsigned int ELFHash(const std::string& str)
86. {
87. unsigned int hash = 0;
88. unsigned int x = 0;
89.
90. for(std::size_t i = 0; i < str.length(); i++)
91. {
92. hash = (hash << 4) + str[i];
93. if((x = hash & 0xF0000000L) != 0)
94. {
95. hash ^= (x >> 24);
96. hash &= ~x;
97. }
98. }
99.
100. return (hash & 0x7FFFFFFF);
101. }
102. /* End Of ELF Hash Function */
103.
104.
105. unsigned int BKDRHash(const std::string& str)
106. {
107. unsigned int seed = 131; // 31 131 1313 13131 131313 etc..
108. unsigned int hash = 0;
109.
110. for(std::size_t i = 0; i < str.length(); i++)
111. {
112. hash = (hash * seed) + str[i];
113. }
114.
115. return (hash & 0x7FFFFFFF);
116. }
117. /* End Of BKDR Hash Function */
118.
119.
120. unsigned int SDBMHash(const std::string& str)
121. {
122. unsigned int hash = 0;
123.
124. for(std::size_t i = 0; i < str.length(); i++)
125. {
126. hash = str[i] + (hash << 6) + (hash << 16) - hash;
127. }
128.
129. return (hash & 0x7FFFFFFF);
130. }
131. /* End Of SDBM Hash Function */
132.
133.
134. unsigned int DJBHash(const std::string& str)
135. {
136. unsigned int hash = 5381;
137.
138. for(std::size_t i = 0; i < str.length(); i++)
139. {
140. hash = ((hash << 5) + hash) + str[i];
141. }
142.
143. return (hash & 0x7FFFFFFF);
144. }
145. /* End Of DJB Hash Function */
146.
147.
148. unsigned int DEKHash(const std::string& str)
149. {
150. unsigned int hash = static_cast<unsigned int>(str.length());
151.
152. for(std::size_t i = 0; i < str.length(); i++)
153. {
154. hash = ((hash << 5) ^ (hash >> 27)) ^ str[i];
155. }
156.
157. return (hash & 0x7FFFFFFF);
158. }
159. /* End Of DEK Hash Function */
160.
161.
162. unsigned int APHash(const std::string& str)
163. {
164. unsigned int hash = 0;
165.
166. for(std::size_t i = 0; i < str.length(); i++)
167. {
168. hash ^= ((i & 1) == 0) ? ( (hash << 7) ^ str[i] ^ (hash >> 3)) :
169. (~((hash << 11) ^ str[i] ^ (hash >> 5)));
170. }
171.
172. return (hash & 0x7FFFFFFF);
173. }
174. /* End Of AP Hash Function */