Hamming Distance


蔡健雅-达尔文

Hamming distance is a metric to measure difference between two strings of equal length or two data with same length(eg. two binary data with same bits).

Definiton

In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In another way, it measures the minimum number of substitutions required to change one string into the other, or the minimum number of errors that could have transformed one string into the other.

Example

String

the Hamming distance between below two strings is 8.
“Hello,world”
“world,hello”
11101011101 | 8
there is a python distance lib which can calculate hamming distance.

1
2
3
>>> import distance as ds
>>> ds.hamming("hello,world", "world,hello")
8

Data

hamming distance between
1011101 and
1001001
is 2.

Here data can be binary or octonary or decimal or hexadecimal, but the two data should be with same length in same system.

Application

Coding theory

In coding theory, hamming distance is used as error detecting and error correcting codes(ECC).
a code with minimum Hamming distance d between its codewords can detect at most d-1 errors and can correct (d-1)/2 errors.The latter number is also called the error-correcting capability of the code.

For error detecting, hammind distance is usde to evaluate the performace of algorithm.
Let’s say, there are fixed-length data d, then we count a checksum C using differet algorithm. then we have data(d, C). we can get the minimum hamming distance d in set(d, C). so the algorithm can detect d-1 errors and correct (d-1)/2 errors.
CRC is generally better than modular sum.

For error correcting, there is a kind of ECC named as Hamming Code.

Programming

In softwar programming, there is a simple method to protect data from unexpected bit inverse(eg. soft error).
By define enum type like this:

1
2
3
4
5
6
typedef
{
Red = 0x0F,
Blue = 0x33,
Green = 0x3C
}Color;

the minimum hamming distance between Red,Blue and Green is 4.

Generate data set with min hamming distance

min_hamming_distance_setview raw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#coding:utf-8
#author: www.wuchenxu.com
#versin: 1.0
#date: 2016-05-18
#https://github.com/doukremt/distance
import distance as ds
def convertDec2BinStr(num):
return '{0:08b}'.format(num)
data= [0]
min_hamming_distance = 4
for i in range(0, 255):
counter = 0
for j in data:
if ds.hamming(convertDec2BinStr(j), convertDec2BinStr(i)) >= min_hamming_distance:
counter +=1
if counter == len(data):
data.append(i)
for i in data:
print convertDec2BinStr(i)+'(0x{0:02x})'.format(i)

data=[0] defines the first number of data set; min_hamming_distance defines the min hanmming distance of the data set.

result
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
00000000(0x00)
00001111(0x0f)
00110011(0x33)
00111100(0x3c)
01010101(0x55)
01011010(0x5a)
01100110(0x66)
01101001(0x69)
10010110(0x96)
10011001(0x99)
10100101(0xa5)
10101010(0xaa)
11000011(0xc3)
11001100(0xcc)
11110000(0xf0)

Reference:

  1. Hamming distance in wiki
  2. Hamming Code
  3. python distance lib
  4. python format