monotone-devel
[Top][All Lists]

## [Monotone-devel] Re: Support for binary files, scalability and Windows p

 From: Peter Simons Subject: [Monotone-devel] Re: Support for binary files, scalability and Windows port Date: 18 Jan 2004 18:27:54 +0100

```Asger Kunuk Alstrup writes:

> Does it need to be injective? I.e. should different data
> with reasonable probability result in different ids?

> Does it need to be surjective? I.e. based on an id, you
> should with reasonable probability be able to find the
> data it corresponds to?

figured, I might as well answer. :-)

A hash function cannot possibly be injective, because it
maps from an infinite set (all possible permutations of
input data) to a finite set (all possible hash values). An
injective mapping can only occur between sets of equal
magnitude.

Also, you seem to confuse a surjective function with an
_invertible_ one. Surjectivity means that the image set of a
mapping is equal to the set it's mapping to. In other words:
All elements of the image domain are actually "used" by the
mapping. This property is very desirable for hash functions,
but I don't think any of the current hash algorithms has
been proven to be surjective.

Just for kicks, here are the formal definitions: Any
function f, which maps from D to M, is injective if for any
x and x' in D the property

f(x) = f(x')  ==>  x = x'.

holds. f is surjective if for any y in M there exists x in D
so that f(x) = y.

Finally, a function that is injective _and_ surjective is
called "bijective", and all bijective functions are
invertible. Being invertible would be a really, really BAD
property for a hash function.

making free-time for Graydon so that he can fix the bug
that's keeping me from re-building my depots *grin* -- the
desired property for cryptographic hash functions is that
they are non-linear. A good design principle is that if one
bit in the input changes, 50% of the bits in the output
change.

This property makes it very unlikely that similar inputs
collide. What "similar inputs" are, depends, of course, on
your application. As far as I know, there are hash
algorithms specifically designed for video and audio
streams, but I might be wrong since this is not definitely
not my area of expertise.

Hope to help,

Peter

```