6 Integrity
6.2 Other ways of providing assurance of integrity
Some other method of providing assurance of the integrity of a message is therefore needed – some kind of concise identity of the original message that can be checked against the received message to reveal any possible discrepancies between the two. This is the purpose of a message digest. It consists of a small, fixed-length block of data, also known as a hash value, which is a function of the original message. (The term ‘hash value’ can be used in other contexts.) The hash value is dependent on all the original data (in other words, it will change even if only one bit of the data changes) and is calculated by applying a mathematical function, known as a hash function, which converts a variable-length string to a fixed-length string. A simple example is a function that XORs together each byte of an input string to produce a single output byte.
A common use of a hash value is the storage of passwords on a computer system. If the passwords are stored in the clear, anyone gaining unlawful access to the computer files could discover and use them. This can be avoided if a hash of the password is stored instead. When a user enters a password at log-in the hash value is recalculated and compared with the stored value. The security of this method relies on the hash function being computationally irreversible. In other words, it is easy to compute a hash value for a given input string, but extremely difficult to deduce the input string from the hash value. Hash functions with this characteristic are known as one-way hash functions.
For a hash value to give an effective assurance about the integrity of data, it should also be computationally infeasible to generate another message that hashes to the same value. Hash functions that provide this characteristic are said to be collision-free. The example of the XOR function given earlier is not collision-free, since it would be simple to generate messages that would produce an identical hash.
The following very simple method gives an insight into how a one-way hash could be derived. (This example is not a practical method of producing hash values but does serve to demonstrate their function.)
- Concatenate the message by removing all the spaces.
- Arrange the message in blocks of five characters.
- Pad the final block if it contains less than five characters. (For example, if the final block has only two characters it could be padded by adding AAA.)
- Assign each block a numerical code from one of 26^{5} possible values according to the arrangement of letters. (See the example in the box below.)
- Derive a value that is the modulo-26^{5} sum of all the codes.
At the receiving end the hash value is recalculated using the same algorithm and is compared with the appended hash value received with the message. Any alterations in the original message should be revealed by a different hash value.
Box 3: A method of block coding
This is a worked example of a method of block coding the text VALUE
- Code each letter according to its position in the alphabet (A=0, B=1, etc.), giving the number sequence 21, 0, 11, 20, 4.
- Multiply each coded number by a power of 26 depending on its position in the sequence, giving: 21 × 26^{4}, 0 × 26^{3}, 11 × 26^{2}, 20 × 26^{1}, 4 × 26^{0}
- Add together the resulting numbers: 6 596 496 + 0 + 7436 + 520 + 4 = 9 604 456
In practice, of course, message digest algorithms in common use are very much more complex than the method described above. Two are briefly described in Table 5.
Table 5: Examples of common message digest algorithms
Algorithm | Description |
MD5 | Takes any arbitrary length input string and produces a fixed 128-bit value. This is done by a method of blocking and padding and then performing four rounds of processing based on a combination of logical functions. Considered to be reasonably secure although potential weaknesses have been reported. |
SHA (secure hash algorithm) | Similar to MD5 but produces a 160-bit hash value so is more resistant to brute force attacks^{1}. |
^{1}A brute force attack on a hash value can be either an attempt to find another message that hashes to the same value or an attempt to find two messages that hash to the same value.
A message authentication code is similar to a one-way hash function and has the same properties, but the algorithm uses the additional ingredient of a secret key, and therefore possession of the key to perform the check is necessary.