doc.in 4.6 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130
  1. Generic Perfect Hash Generator
  2. ------------------------------
  3. *Ilan Schnell, 2008*
  4. perfect_hash.py provides a perfect hash generator which is not language
  5. specific. That is, the generator can output a perfect hash function for
  6. a given set of keys in any programming language, this is achieved by
  7. filling a given code template.
  8. ### Acknowledgments:
  9. This code is derived from A. M. Kuchling's
  10. [Perfect Minimal Hash Generator](http://www.amk.ca/python/code/perfect-hash).
  11. ### Introduction:
  12. A perfect hash function of a certain set S of keys is a hash function
  13. which maps all keys in S to different numbers.
  14. That means that for the set S,
  15. the hash function is collision-free, or perfect.
  16. Further, a perfect hash function is called minimal when it maps n keys
  17. to n *consecutive* integers, usually in the range from 0 to n-1.
  18. After coming across A. M. Kuchling's Perfect Minimal Hash Generator,
  19. I decided to write a general tool for generating perfect hashes.
  20. It is general in the sense that it can produce perfect hash functions
  21. for almost any programming language.
  22. A given code template is filled with parameters,
  23. such that the output is code which implements the hash function.
  24. The algorithm the program uses is described in the paper
  25. ["Optimal algorithms for minimal perfect hashing"]
  26. (http://citeseer.ist.psu.edu/122364.html),
  27. Z. J. Czech, G. Havas and B.S. Majewski.
  28. I tried to illustrate the algorithm and explain how it works on
  29. [this page](http://ilan.schnell-web.net/prog/perfect-hash/algo.html).
  30. ### Usage:
  31. Given a set of keys which are ordinary character string,
  32. the program returns a minimal perfect hash function.
  33. This hash function is returned in the form of Python code by default.
  34. Suppose we have a file with keys:
  35. # 'animals.txt'
  36. Elephant
  37. Horse
  38. Camel
  39. Python
  40. Dog
  41. Cat
  42. The exact way this file is parsed can be specified using command line
  43. options, for example it is possible to only read one column from a file
  44. which contains different items in each row.
  45. The program is invoked like this:
  46. # =======================================================================
  47. # ================= Python code for perfect hash function ===============
  48. # =======================================================================
  49. G = [0, 0, 4, 1, 0, 3, 8, 1, 6]
  50. S1 = [5, 0, 0, 6, 1, 0, 4, 7]
  51. S2 = [7, 3, 6, 7, 8, 5, 7, 6]
  52. def hash_f(key, T):
  53. return sum(T[i % 8] * ord(c) for i, c in enumerate(str(key))) % 9
  54. def perfect_hash(key):
  55. return (G[hash_f(key, S1)] + G[hash_f(key, S2)]) % 9
  56. # ============================ Sanity check =============================
  57. K = ["Elephant", "Horse", "Camel", "Python", "Dog", "Cat"]
  58. H = [0, 1, 2, 3, 4, 5]
  59. assert len(K) == len(H) == 6
  60. for k, h in zip(K, H):
  61. assert perfect_hash(k) == h
  62. The way the program works is by filling a code template with the calculated
  63. parameters. The program can take such a template in form of a file and
  64. fill in the calculated parameters, this allows the generation of perfect
  65. hash function in any programming language. The hash function is kept quite
  66. simple and does not require machine or language specific byte level operations
  67. which might be hard to implement in the target language.
  68. The following parameters are available in the template, and will expand to:
  69. ###### table
  70. A literal `$` is escaped as `$$`. Since the syntax for arrays is not the
  71. same in all programming languages, some specifics can be adjusted using
  72. command line options.
  73. The section of the built-in template which creates the actual hash function
  74. is:
  75. G = [$G]
  76. S1 = [$S1]
  77. S2 = [$S2]
  78. def hash_f(key, T):
  79. return sum(T[i % $NS] * ord(c) for i, c in enumerate(str(key))) % $NG
  80. def perfect_hash(key):
  81. return (G[hash_f(key, S1)] + G[hash_f(key, S2)]) % $NG
  82. Using code templates, makes this program very flexible. The package comes
  83. with several complete examples for C and C++. There are many choices one
  84. faces when implementing a static hash table: do the parameter lists go into
  85. a separate header file, should the API for the table only contain the hash
  86. values, but not the objects being mapped, and so on.
  87. All these various choices are possible because of the template is simply
  88. filled with the parameters, no matter what else is inside the template.
  89. Another possible use the program is as a python module. The functions and
  90. classes in `perfect_hash.py` are documented and have clean interfaces.
  91. The folder `example-Python` has examples which shows how the module
  92. can be used directly in this way.
  93. ### Requirement:
  94. Python 2.5