banner
yono

yono

哈喽~欢迎光临
follow
github

Absolute Mapping - Minimal Perfect Hash Table

Feature Introduction#

The hash table series data structure focuses on establishing key-value relationships. I really enjoy abstracting all logic into key-value relationships, so the hash table data structure is also one of my favorites.

  • Characteristics of hash tables: A hash table is a method of accessing records by mapping keys to a position in the table using a hash function, supporting fast insertion and search. Hash tables do not guarantee the continuity of keys or values.
  • Characteristics of perfect hash tables: A perfect hash table is a type of hash table that can map keys to values without any collisions. This means that each key has a unique hash value, and the hash function can directly locate the value corresponding to the key, providing more efficient lookup performance.
  • Characteristics of minimal perfect hash tables: A minimal perfect hash table is a specially designed perfect hash table that can store the mapping of keys to values using the least amount of space while ensuring no collisions. This type of hash table is particularly suitable for cases where the set of keys is static and known.

Let's evaluate which hash table structure to choose based on the dynamics of the table and the continuity of key values, purely from a personal perspective.

Since hash tables waste a lot of memory space, they are still a better choice in scenarios with high dynamic table deletions and modifications. Some may wonder if there is any logical relationship here. There is, but I am too lazy to explain.

In scenarios where the continuity of key values is relatively weak, the significant memory waste of hash tables becomes somewhat unacceptable.

Low DynamicsHigh Dynamics
Weak Key-Value ContinuityPerfect Hash TableHash Table
Strong Key-Value ContinuityHash Table (Preferred)/Perfect Hash TableHash Table

Isn't the topic "Minimal Perfect Hash Table"?

The application scenarios for minimal perfect hash tables are very narrow, even singular, and can only be used when the table relationships are completely static. However, the advantages of minimal perfect hash tables are also very prominent. Firstly, the memory area for keys and values is compact and does not waste space; secondly, the lookup time complexity for each key-value pair is O(1).

 (a) Perfect hash function. (b) Minimal perfect hash function.
(a) Perfect hash function. (b) Minimal perfect hash function.

Whose Library to Plagiarize#

Firstly, there is gperf, which is GNU's perfect hash function generator that can generate PHF (Perfect Hash Function) and MPHF (Minimal Perfect Hash Function). The key point is that this is a GNU library, so it has the best long-term prospects. However, currently, this library is very inefficient and does not support creating large mapping tables.

Secondly, there is CMPH, which is a perfect hash function generator created by some folks in Brazil, supporting multiple algorithms. It claims to be a top solution for creating large hash mapping tables (just a claim), but according to user tests, its efficiency is not as high as claimed in its papers (but at least it can generate). From my own testing, this library is indeed very difficult to use. On one hand, it has a framework where functional modules cannot be used independently; adding one means adding four algorithms. On the other hand, the provided source code has some bugs that require minor modifications to test properly. Most importantly, there are very few comments in the source code, making it very difficult to modify; understanding the entire library is necessary to make changes. For someone like me who enjoys modifying libraries, this is too difficult, and perhaps the authors are also wary of malicious users.

Thirdly, there is BobMPH, which requires a VPN to access. I named it this way because the author's name is clearly Bob Jenkins, though I haven't specifically looked into who he is. The source code is not released in a project or compressed format but is available on separate independent URLs, likely archived directly on the server by this guy. Downloading it on Windows took quite some effort. I haven't looked into it in detail yet, but if there's research demand, feel free to ask me for the source code.

Additional Notes#

Initially, I needed to optimize a key-value mapping relationship in a business logic. Currently, I have created a macro table to translate switch mappings, but this leads to huge ROM consumption. At the same time, I really dislike the uncertainty of lookup times, and based on these needs, I found the minimal perfect hash table.

I spent several days studying the implementation of "Secondly CMPH" and was finally getting the hang of it when I was interrupted by a bug in the business logic. After fixing the bug, I suddenly realized that the relationships in the hash table system did not meet my needs at all. I wanted the keys to be int values and more numerous, while the values should be trigger functions and fewer. The hash table system is inherently incapable of achieving this. Of course, I could modify the library to implement it, but that would no longer be a hash table. Why not find a more suitable structure?

Bob Jenkins' views mentioned above are quite similar to mine.

image

Also, this guy's homepage is very interesting Home page, Bob Jenkins (burtleburtle.net),

image

He seems to be a seasoned programmer.

image

This article is synchronized and updated to xLog by Mix Space. The original link is https://www.yono233.cn/posts/shoot/24_7_20_%E7%BB%9D%E5%AF%B9%E6%98%A0%E5%B0%84%E2%80%94%E2%80%94%E6%9C%80%E5%B0%8F%E5%AE%8C%E7%BE%8E%E5%93%88%E5%B8%8C%E8%A1%A8

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.