doc.html 5.5 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136
  1. <h2>Generic Perfect Hash Generator</h2>
  2. <p><em>Ilan Schnell, 2008</em>
  3. </p>
  4. <p>perfect_hash.py provides a perfect hash generator which is not language
  5. specific. That is, the generator can output a perfect hash function for
  6. a given set of keys in any programming language, this is achieved by
  7. filling a given code template.
  8. </p>
  9. <h3>Acknowledgments:</h3>
  10. <p>This code is derived from A. M. Kuchling's
  11. <a href="http://www.amk.ca/python/code/perfect-hash">Perfect Minimal Hash Generator</a>.
  12. </p>
  13. <h3>Introduction:</h3>
  14. <p>A perfect hash function of a certain set S of keys is a hash function
  15. which maps all keys in S to different numbers.
  16. That means that for the set S,
  17. the hash function is collision-free, or perfect.
  18. Further, a perfect hash function is called minimal when it maps n keys
  19. to n <em>consecutive</em> integers, usually in the range from 0 to n-1.
  20. </p>
  21. <p>After coming across A. M. Kuchling's Perfect Minimal Hash Generator,
  22. I decided to write a general tool for generating perfect hashes.
  23. It is general in the sense that it can produce perfect hash functions
  24. for almost any programming language.
  25. A given code template is filled with parameters,
  26. such that the output is code which implements the hash function.
  27. </p>
  28. <p>The algorithm the program uses is described in the paper
  29. <a href="http://citeseer.ist.psu.edu/122364.html">&quot;Optimal algorithms for minimal perfect hashing&quot;</a>,
  30. Z. J. Czech, G. Havas and B.S. Majewski.
  31. </p>
  32. <p>I tried to illustrate the algorithm and explain how it works on
  33. <a href="http://ilan.schnell-web.net/prog/perfect-hash/algo.html">this page</a>.
  34. </p>
  35. <h3>Usage:</h3>
  36. <p>Given a set of keys which are ordinary character string,
  37. the program returns a minimal perfect hash function.
  38. This hash function is returned in the form of Python code by default.
  39. Suppose we have a file with keys:
  40. </p>
  41. <pre><code># 'animals.txt'
  42. Elephant
  43. Horse
  44. Camel
  45. Python
  46. Dog
  47. Cat
  48. </code></pre><p>The exact way this file is parsed can be specified using command line
  49. options, for example it is possible to only read one column from a file
  50. which contains different items in each row.
  51. The program is invoked like this:
  52. </p>
  53. <pre><code># =======================================================================
  54. # ================= Python code for perfect hash function ===============
  55. # =======================================================================
  56. G = [0, 0, 4, 1, 0, 3, 8, 1, 6]
  57. S1 = [5, 0, 0, 6, 1, 0, 4, 7]
  58. S2 = [7, 3, 6, 7, 8, 5, 7, 6]
  59. def hash_f(key, T):
  60. return sum(T[i % 8] * ord(c) for i, c in enumerate(str(key))) % 9
  61. def perfect_hash(key):
  62. return (G[hash_f(key, S1)] + G[hash_f(key, S2)]) % 9
  63. # ============================ Sanity check =============================
  64. K = [&quot;Elephant&quot;, &quot;Horse&quot;, &quot;Camel&quot;, &quot;Python&quot;, &quot;Dog&quot;, &quot;Cat&quot;]
  65. H = [0, 1, 2, 3, 4, 5]
  66. assert len(K) == len(H) == 6
  67. for k, h in zip(K, H):
  68. assert perfect_hash(k) == h
  69. </code></pre><p>The way the program works is by filling a code template with the calculated
  70. parameters. The program can take such a template in form of a file and
  71. fill in the calculated parameters, this allows the generation of perfect
  72. hash function in any programming language. The hash function is kept quite
  73. simple and does not require machine or language specific byte level operations
  74. which might be hard to implement in the target language.
  75. The following parameters are available in the template, and will expand to:
  76. </p>
  77. <table>
  78. <tr><th>string</th><th>expands to</th></tr>
  79. <tr><td><code>$NS</code></td><td>the length of S1 and S2</td></tr>
  80. <tr><td><code>$S1</code></td><td>array of integers S1</td></tr>
  81. <tr><td><code>$S2</code></td><td>array of integers S2</td></tr>
  82. <tr><td><code>$NG</code></td><td>length of array G</td></tr>
  83. <tr><td><code>$G</code></td><td>array of integers G</td></tr>
  84. <tr><td><code>$NK</code></td><td>the number of keys, i.e. length of array K and H</td></tr>
  85. <tr><td><code>$K</code></td><td>array with the quoted keys</td></tr>
  86. <tr><td><code>$H</code></td><td>array of integer hash values</td></tr>
  87. <tr><td><code>$$</code></td><td><code>$</code> (a literal dollar sign)</td></tr>
  88. </table>
  89. <p>A literal <code>$</code> is escaped as <code>$$</code>. Since the syntax for arrays is not the
  90. same in all programming languages, some specifics can be adjusted using
  91. command line options.
  92. The section of the built-in template which creates the actual hash function
  93. is:
  94. </p>
  95. <pre><code>G = [$G]
  96. S1 = [$S1]
  97. S2 = [$S2]
  98. def hash_f(key, T):
  99. return sum(T[i % $NS] * ord(c) for i, c in enumerate(str(key))) % $NG
  100. def perfect_hash(key):
  101. return (G[hash_f(key, S1)] + G[hash_f(key, S2)]) % $NG
  102. </code></pre><p>Using code templates, makes this program very flexible. The package comes
  103. with several complete examples for C and C++. There are many choices one
  104. faces when implementing a static hash table: do the parameter lists go into
  105. a separate header file, should the API for the table only contain the hash
  106. values, but not the objects being mapped, and so on.
  107. All these various choices are possible because of the template is simply
  108. filled with the parameters, no matter what else is inside the template.
  109. </p>
  110. <p>Another possible use the program is as a python module. The functions and
  111. classes in <code>perfect_hash.py</code> are documented and have clean interfaces.
  112. The folder <code>example-Python</code> has examples which shows how the module
  113. can be used directly in this way.
  114. </p>
  115. <h3>Requirement:</h3>
  116. <p>Python 2.5
  117. </p>