Hashing notes

From Helpful

This article/section is a stub — probably a pile of half-sorted notes and assertions some of which may well be wrong, and not verified as a whole. Feel free to add or refine.

Contents

Hash function

A hash function takes data (usually byte data), and gives a mangled, reducde-size output. The result is often referred to as the hash of something, a hash value, or such.


Uses for hashes rely on different properties. Those properties include:

  • a hash is (often much) smaller than the data it was calculated for
  • a hash function always generates the same hash value for the same input
  • the original is not recognizable in the hash value (not always true)
  • a hash function will try to employ all the bits of the hash value equally
  • the hash value is usually quite different for inputs that are nearly equal (due largely to the Avalanche effect)


Note that while the same original value will always hash to the same hash value, equal hash values don't mean the original value was the same - but it does make it more likely, which is regularly exploited in programming tasks.

Actually, hash functions meant to be used in programming may well be much simpler and, unlike cryptographic hashes, only care for the first two points mentioned above since they aim also to be fast themselves.

In various tasks you may only wish to separate things that you should compare with a more involved method, in which case 'XOR the first four characters of strings' or even just 'use the first character' can be enough for a significant speed improvement, both because you need to do fewer comparisons and because such a hash is faster than most thins. At the same time, though, it can also be overly sensitive to certain cases of data, so a balance between effectiveness and simplicity is often sought for a hash function when a task is of fairly central importance.


Uses

Verification

One use of hashes is verifying that a large amount of data was transferred correctly, by hashing the original on the source side, hashing the transferred version on the destination side, and checking these two values.

You can do this for still-decent-sized parts when you want to support partial retransmission, which can be bandwidth-efficient on a mildly noisy line.

Such a hash value is itself relatively tiny (e.g. 4 bytes for CRC32, 16 bytes for MD5, 20 bytes for SHA1) and generally a very good indicator (this is always a game of probabilities, but you can make those probabilities quite extreme) of both success on success, and of failure on failure.

This all stands and falls with how probable it is that even a small error will be noticed, also under systematic data errors/alterations. Simple hashes may be sensitive to certain errors. For example, a hash function consisting of XORring all bytes will be sensitive to, say, a 7-bit line that always removes the highest bit, and is likely to say things are okay when the hash value itself is also transmitted over the that line. Such things don't happen much, but serves to explain that certain hash functions may be weak for verification, and/or for other tasks.


Luckily, most existing hash functions are designed with such things in mind.


Password checking


probably-unique ID

Hash map

Perfect hash function

hash types, significators, values you see in files, etc.

Hash(/Digester/Checksummer) methods

This article/section is a stub — probably a pile of half-sorted notes and assertions some of which may well be wrong, and not verified as a whole. Feel free to add or refine.

(this is likely to contain errors, and given to be incomplete)

  • DES
    • Basic DES: (13 characters): 2 salt, 11 hash value. (56-bit DES)
    • Extended DES: (20 characters): 9 character salt, 11 hash value
  • MD family:
    • MD5 [1] (Salt size is variable(verify), commonly 8 or 12 characters(verify))
      • MD5
      • MD5 (Unix)
      • MD5 (APR), e.g. in '$apr1$<salt>$<hash>'
    • MD2 [2]
    • MD4 [3]
      • eDonkey hashing is a variant of MD4; see [4]
    • MD3
    • MD


  • SHA [5]
    • SHA-1 (160-bit)
    • SHA-384
    • SHA-256
    • SHA-244
    • SHA-512
    • SHA-0


  • CRC ('Cyclic Redundancy Check')
    • Primarily used as a checksum function/value
    • Note that specific implementations can vary in with:
      • (most obviously) size, often 32 or 16, sometimes 8 or 64
      • choice of polynomial (note that if reported as a hex value rather than a polynomial's powers, it may be bit-reversed, so in a way there are two such values for each poly)
      • initial value (often all bits on)
      • when/how bits are shifted (verify)
      • possible other details, such as a bitmask in each step, or general details of bit ordering (verify)
      • ...as such, checking a CRC from something you didn't design yourself, you often have to figure out its specific implementation.


  • Haval: [6]
    • Haval-128
    • Haval-160
    • Haval-192
    • Haval-224
    • Haval-256
  • RipeMD [7] (Cryptographic)
    • RIPEMD-160
    • RIPEMD-128
    • RIPEMD-256
    • RIPEMD-320


  • Tiger [8]
    • Tiger-128
    • Tiger-160
    • Tiger-192
  • Whirlpool [9] (Cryptographic)
  • Panama, [10]
    • Panama
    • RadioGatĂșn
  • AES (Rijndael) [11]
  • MySQL password hashing
    • (various different versions exist)


Block ciphers / cryptosystems

  • Blowfish
    • ($2$ or $2a$): 16 character salt
    • ($2a$07$): see IPSEC note below
  • Cast
    • Cast 128 (a.k.a. Cast5) [16]
    • Cast 256 (a.k.a. Cast6) [17]
  • RSA (Rivest, Shamir, Adleman) [20]

(a.k.a. rDSA(verify))



Identifiers may include IPSEC's ESP Transform Identifiers (see RFC 2407), and others. said transform identifiers include:

  • 1: DES_IV64
  • 2: DES
  • 3: 3DES
  • 4: RC5
  • 5: IDEA
  • 6: CAST
  • 7: BLOWFISH
  • 8: 3IDEA
  • 9: DES_IV32
  • 10: RC4
  • 11: NULL

See also

More notes on

CRC

This article/section is a stub — probably a pile of half-sorted notes and assertions some of which may well be wrong, and not verified as a whole. Feel free to add or refine.

Notes:


See also: