Methods for the reduction of address space are not new to software engineering. *Hashing* is a software technique for mapping a vector input into a lower-dimensional vector output. Suppose the goal is to do a fast search through a dictionary of words. The dictionary consists of multiple vocabularies, each having a unique criterion for its selection. For instance, 26 vocabularies could be set up in the dictionary, each corresponding to a letter of the alphabet.

For each input word to the search algorithm, its first letter maps to one of the 26 vocabulary lists, and it is chosen for the word search. By having multiple vocabularies, if words in the language had equal probability of starting with any of the 26 letters, then the number of words needing to be searched, on average, is reduced by 1/26 and search speed has been increased. Indexing by alphabet letters in dictionaries is a form of hash coding.

Natural languages do not have equal probabilities of starting with any of the 26 Roman letters, so the above simple-minded scheme would not be optimal. (It is almost pointless, for instance, to have a separate vocabulary for the letter x.) What is required of a hashing function is that its input range (of 26 letters) spans the output range (of vocabularies) so that all of them are addressed, and that a given input word always maps into a given output vocabulary. Think about this by envisioning a multidimensional *hyperspace* as the output of the hashing function, where each vector component has a dimension.

The input is a different, larger hyperspace. The hashing function must be, as mathematicians put it, *one-to-one* and *onto* between the spaces. Each point in input space must correspond to a unique point in output space (one-to-one). More generally, a one-to-one correspondence requires that the relationship between input and output be a function and that its inverse also be a function. In two dimensions (x and y) it must be monotonic: If x2 > x1, then f (x2) > f (x1).

The *onto* condition is that every point in output space corresponds to some point in the much larger but sparsely populated input space so that the entirety of the memory address-space is covered. The simple alphabet hashing scheme maps all words in the language into all the vocabularies. Monotonic DACs and ADCs also do something like this. There is a one-to-one correspondence between digital and analog quantities, and both span their full ranges. A converter with missing codes does not map input values into every output value. Converters have a simple, linear addressing function. Ideally, a converter is an identity function that maps between similar values in the digital and analog spaces.

CMAC memory does not hash like a converter. James S. Albus explains hash coding generally (in *Brains, Behavior & Robotics* , BYTE Books, 1981, p. 160):

*Hash coding is a commonly used memory-addressing technique for compressing a large but sparsely populated address space into a smaller, more densely populated one. Many addresses in the larger space are mapped onto each of the addresses in the smaller one. Hashing works as long as the probability of a collision (i.e., more than one filled location in the large memory mapping into the same address in the small) is low.*

Hash coding works when the input space is large (as many dimensions assure this) and the number of values in input space that are actually used is small in comparison. Albus continues (with slight editing):

*CMAC can tolerate a fairly high incidence of collisions because of its distributed memory; i.e., its output is the sum of many locations. Thus a collision which in a conventional memory would make the output completely incorrect, in CMAC introduces only a small amount of noise into the output. In CMAC, hashing noise occurs randomly over the input space each time new data is stored. Thus each new data storage operation somewhat degrades previously stored data. The effect is that the contents of a CMAC memory are most accurately defined in the regions where they are most recently stored. Old data gets hashed over and tends to gradually fade or be “forgotten.”*

In Part 3 of this series we will discuss a method devised by Albus of mapping from the input vector.

Hashing maps an n-space into a lower-dimensional space. Ideally, if it is possible, the mapping is one-to-one, but generally it is not. Albus's description of CMAC mapping – that it maps multiple points in the input space into one point in the output space – is correct and should be used for the remainder of the discussion.

Hi Dennis–interesting application of a hash function. I first encountered hash functions in crytopgrapy. You hear about bad guys stealing databases of user names and passwords. Almost always you'll read that the passwords were hashed. There are many ways to do this, but it is interesting to me that you can use the method for memory compression.

-Practical Application

I wonder which electronic computer or circuit currently use distributed-Memory Address. Unless there is some diagram or mathematic expression, it is not easy to understand addressing mapping for number, N. When figuring out the practical application, it is more clarified for address mapping.

@eafpres1 hash functions in cryptography are implemented in information security primarily to evaluate the integrity of data, authentication and other security mechanism, it has numerous applications-digital signatures, MAC and other forms of authentications. Hash functions is actually a “pure security topic”, it great to know that we can use that method in memory compression.

“There are many ways to do this, but it is interesting to me that you can use the method for memory compression.”@eafpres1: Even I had always heard of hashing within the context of cryptography but it's interesting to see it being used for memory compression. I wonder how efficient is this method of compression compared to other compression tools out there. It'd be interesting to see if it offers more efficiency in terms of execution and storage space.

I would like to refer hybrid signature schemes which are often used, and wherein a cryptographic hash function is computed, and only the resulting hash is digitally signed.

To quote from this article “Hash coding works when the input space is large….and the number of values in input space actually used in small in comparison”. In protecting systems from hackers who use 'hashing' techniques, the common practice is to do the reverse i.e. increasing the number of values used in the input space or, in simple language, using longer and more complex passwords with mixed characters.

In other systems that use hashing in memory compression and the acceleration of search functions, the main challenge when designing these systems is usually to either reduce the probability of collisions or ,if that is not possible, to reduce or eliminate the effects of such collisions so that the results are as accurate as possible. CMAC seems to have handled this by using an adaptive and evolving memory space that continues to fine tune itself on its own over time.

I am impressed with the method here, which can be used in memory compression. Even though there are plenty of compression tools available out there in the market, this method seems to be more promising when compared to others. Well, am unsure about one thing here is its storage space. Otherwise this would make it to my list of fav method indeed!