Categories
Coding

Classic Bit Hackery In Java

In the next release of Commons::Net, there is going to be a utility class for calculating various metrics around Classless Interdomain Routing (CIDR) rules and classless IP addresses – the sort of thing you can find in many online Perl-based address calculators. Anyways, the interesting thing about the address rules is that they are all basically forms of bit manipulation. A 32-bit IPv4 address can be decomposed into a network portion (n bits) and a host portion (32-n bits).

/* Create a binary netmask from the number of bits specification /x */
int cidrPart = rangeCheck(Integer.parseInt(matcher.group(5)), 0, NBITS-1);
for (int j = 0; j < cidrPart; ++j) {
netmask |= (1 << 31-j);
}

In the example code above, a 32-bit integer is initialized with a number of 1’s corresponding to the value of the cidrPart variable. for instance, if cidrPart = 8, the netmask integer would look like 11111111000000000000000000000000, with the most significant eight bits set. This allows us to take a 32-bit IP address and easily figure out various attributes. For instance, the line

network = (address & netmask);

strips out the host portion and just leaves us with the network portion of the address.

Similarly, if we know that a broadcast address is in the form nnnnnn11…..1, i.e. n host bits and 32-n 1s, then we can create the broadcast address using the netmask with the line

broadcast = network | ~(netmask);

i.e. invert the netmask and OR with the supplied address.

We can also iterate over valid address ranges in CIDR ranges easily. For instance, we can calculate the lowest and highest addresses easily:

private int low() { return network() + 1; }
private int high() { return broadcast() - 1; }

And thus iterate over each address in the range:

public String[] getAllAddresses() {
String[] addresses = new String[getAddressCount()];
for (int add = low(), j=0; add <= high(); ++add, ++j) {
addresses[j] = format(toArray(add));
}
return addresses;
}
}

But the clever part is the bit counting technique which is used to count the number of 1 bits in the netmask integer (the population). The obvious way is to count up or down from 0 to tex:2^{31}, shifting the target integer at each stage, and counting the set bits using AND. However, in the book Hacker’s Delight, there are tons of clever recipes for situations like this. The one that I ended up using is the following (almost identical to the C version, apart from the use of >>> for unsigned shifts):

/*
* Count the number of 1-bits in a 32-bit integer using a divide-and-conquer strategy
* see Hacker's Delight section 5.1
*/
int pop(int x) {
x = x - ((x >>> 1) & 0x55555555);
x = (x & 0x33333333) + ((x >>> 2) & 0x33333333);
x = (x + (x >>> 4)) & 0x0F0F0F0F;
x = x + (x >>> 8);
x = x + (x >>> 16);
return x & 0x0000003F;
}

This extremely elegant piece of code is a divide-and-conquer procedure for counting 1 bits in a 32-bit integer. At each step, it counts the number of 1s in each adjacent tex:2^n-bit group, where n is increased from 0 to tex:\log_2{(n-1)}, where n is the number of bits in the integer (32 in this case). At each stage, the number of bits in each n-bit group are summed, until finally, we are left with the count of the number of 1 bits, which can be represented in 6 bits (32 = 0xb100000).

It’s easier to visualize what’s going on if you can see how the magic hex constants relate to the structure of a 32-bit integer. For instance, using bc:

/cygdrive/c/sandbox>bc
bc 1.06
Copyright 1991-1994, 1997, 1998, 2000 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'.

ibase=16
obase=2
55555555
1010101010101010101010101010101
33333333
110011001100110011001100110011
0F0F0F0F
1111000011110000111100001111

If you want to see the java code involved, it is here: SubnetUtils.java. There is also a very nice explanation of the bit counting routine (in assembler) here.