clean up bitwise manipulation article and add additional infos

jakobkogler · jakobkogler · commit ffdf1e5e8b9d · 2023-04-16T22:31:39.000+02:00
diff --git a/src/algebra/bit-manipulation.md b/src/algebra/bit-manipulation.md
@@ -1,69 +1,60 @@
-# Binary number
-
-A **binary number** is a number expressed in the base-2 numeral system or binary numeral system, a method of mathematical expression which uses only two symbols: typically "0" (zero) and "1" (one).
-
-We are talking about a numerical system of great importance in technological development. Among some of the most outstanding applications are the representation of images and texts, in the field of electronics. With the combination of zeros and ones, any number can be represented on a decimal basis, making communication with electronic devices easier for humans. In addition to translating the different programming languages, to a common language where the computer understands.
-
-**In this article I will not talk about how to convert to binary or decimal. There is a lot of information all over the internet on the subject.**
-
+---
+tags:
+  - Original
+---
 # Bit manipulation
 
-When finding solutions to problems, a very important factor is the simplicity and the execution time. Now, in this bitwise operations are very important and you can take advantage of them.
-
-# Bit operators
-
-## Bit shift operators
-### Multiplication and division by powers of 2
-
-There are two operators for shifting bits.
-
-$\gg$ One bit to the right. Which would be the same as a division by a power of two.
-
-$\ll$ One bit to the left. And this would be a multiplication, equally by a power of two.
-
-Let's see an example.
-
-Let's say we have a number $N$, which. $N = 4$.
-
-The binary representation of $4$ is:
+## Binary number
 
-$4 = 100_2$
+A **binary number** is a number expressed in the base-2 numeral system or binary numeral system, it is a method of mathematical expression which uses only two symbols: typically "0" (zero) and "1" (one).
 
-If we shift one bit to the left, then we would be adding a bit to the end, in this case a 0.
+We say that a certain bit is **set**, if it is one, and **cleared** if it is zero.
 
-$4 \ll 1 = 100_2 \ll 1 = 1000_2$
+The binary number $(a_k a_{k-1} \dots a_1 a_0)_2$ represents the number:
 
-$1000_2 = 8$ and clearly, $4 * 2 = 8$
+$$(a_k a_{k-1} \dots a_1 a_0)_2 = a_k \cdot 2^k + a_{k-1} \cdot 2^{k-1} + \dots + a_1 \cdot 2^1 + a_0 \cdot 2^0.$$
 
-So, for a division by two, it would be exactly the same, but using the operator $<<$.
+For instance the binary number $1101_2$ represents the number $13$:
 
-$4 \gg 1 = 100_2 \gg 1 = 10_2$
+$$\begin{align}
+1101_2 &= 1 \cdot 2^3 + 1 \cdot 2^2 + 0 \cdot 2^1 + 1 \cdot 2^0 \\
+       &= 1\cdot 8 + 1 \cdot 4 + 0 \cdot 2 + 1 \cdot 1 = 13
+\end{align}$$
 
-In this case, we would remove one bit from the end, this is what is called the **least significant bit**, the rightmost bit of the representation.
+Computers represent integers as binary numbers.
+Positive integers (both signed and unsigned) are just represented with their binary digits, and negative signed numbers (which can be positive and negative) are usually represented with the [Two's complement](https://en.wikipedia.org/wiki/Two%27s_complement).
 
-Of course, the **most significant bit** is the leftmost bit.
+```cpp
+unsigned int unsigned_number = 13;
+assert(unsigned_number == 0b1101);
 
-And then, how could it carry out operations with other powers, greater than two?
+int positive_signed_number = 13;
+assert(positive_signed_number == 0b1101);
 
-Easy, you just have to remember the exponent of two, to create said power. In other words:
+int negative_signed_number = -13;
+assert(negative_signed_number == 0b1111'1111'1111'1111'1111'1111'1111'0011);
+```
 
-$4 = 2 ^ 2$, so to do multiplication or division operations with **4**, we must move **2 bits**.
+CPUs are very fast manipulating those bits with specific operations.
+For some problems we can take these binary number representations to our advantage, and speed up the execution time.
+And for some problems (typically in combinatorics or dynamic programming) where we want to track which objects we already picked from a given set of objects, we can just use an large enought integer where each digit represents an object and depending on if we pick or drop the object we set or clear the digit.
 
-$8 = 2 ^ 3$, so for $8$ it would be **3 bits**. *And so on*.	
+## Bit operators
 
+All those introduced operators are instant (same speed as an addition) on a CPU for fixed-length integers.
 
-## Bitwise operators.
+### Bitwise operators
 
-```
-& : The bitwise AND operator compares each bit of its first operand with the corresponding bit of its second operand. 
+-   $\&$ : The bitwise AND operator compares each bit of its first operand with the corresponding bit of its second operand. 
     If both bits are 1, the corresponding result bit is set to 1. Otherwise, the corresponding result bit is set to 0.
  	
-| : The bitwise inclusive OR operator compares each bit of its first operand with the corresponding bit of its second operand.
+-   $|$ : The bitwise inclusive OR operator compares each bit of its first operand with the corresponding bit of its second operand.
     If one of the two bits is 1, the corresponding result bit is set to 1. Otherwise, the corresponding result bit is set to 0.
 
-^ : The bitwise exclusive OR (XOR) operator compares each bit of its first operand with the corresponding bit of its second operand.
+-   $\wedge$ : The bitwise exclusive OR (XOR) operator compares each bit of its first operand with the corresponding bit of its second operand.
     If one bit is 0 and the other bit is 1, the corresponding result bit is set to 1. Otherwise, the corresponding result bit is set to 0.
-```
+
+-   $\sim$ : The bitwise complement (NOT) operator flips each bit of a number, if a bit is set the operator will clear it, if it is cleared the operator sets it.
 
 Examples:
 
@@ -88,70 +79,79 @@ n-1       = 01010111
 n ^ (n-1) = 00001111
 ```
 
-# Useful tricks.
+```
+n         = 01011000
+--------------------
+~n        = 10100111
+```
 
-## A bit is set (1) or cleared (0)
+### Shift operators
 
-The value of the $x$-th bit can be by shifting the number $x$ positions to the right, the $x$-th bit is the units place, therefore we can extract it by performing a bitwise & with 1
+There are two operators for shifting bits.
+
+-   $\gg$ Shifts a number to the right by removing the last few binary digits of the number.
+    Each shift by one represents an integer division by 2, so a right shift by $k$ represents an integer division by $2^k$.
+
+    E.g. $5 \gg 2 = 101_2 \gg 2 = 1_2 = 1$ which is the same as $\frac{5}{2^2} = \frac{5}{4} = 1$.
+    For a computer though shifting some bits is a lot faster than doing divisions.
+
+-   $\ll$ Shifts a number to left by appending zero digits.
+    In similar fashion to a right shift by $k$, a left shift by $k$ represents a multiplication by $2^k$.
+
+    E.g. $5 \ll 3 = 101_2 \ll 3 = 101000_2 = 40$ which is the same as $5 \cdot 2^3 = 5 \cdot 8 = 40$.
+
+    Notice however that for a fixed-length integer that means dropping the most left digits, and if you shift too much you end up with the number $0$.
 
-``` cpp
-bool check(int number, int x) {
-    return ((number >> x) & 1);
-}
-```
 
-## Parity of a number
+## Useful tricks
 
-Function to know if a number is even or odd.
+### Set/flip/clear a bit
 
-True if number is odd, false for number is even.
+Using bitwise shifts and some basic bitwise operations we can easily set, flip or clear a bit.
+$1 \ll x$ is a number with only the $x$-th bit set, while $\sim(1 \ll x)$ is a number with all bits set except the $x$-th bit.
+
+- $n ~|~ (1 \ll x)$ sets the $x$-th bit in the number $n$
+- $n ~\wedge~ (1 \ll x)$ flips the $x$-th bit in the number $n$
+- $n ~\&~ \sim(1 \ll x)$ clears the $x$-th bit in the number $n$
+
+### Check if a bit is set
+
+The value of the $x$-th bit can be by shifting the number $x$ positions to the right, the $x$-th bit is the units place, therefore we can extract it by performing a bitwise & with 1
 
 ``` cpp
-bool check(int number) {
-    return (number & 1);
+bool is_set(unsigned int number, int x) {
+    return (number >> x) & 1;
 }
 ```
 
-## Check if an integer is a power of 2
+### Check if an integer is a power of 2
+
+A power of two is a number that has only a single bit in it (e.g. $32 = 0010~0000_2$), while the predecessor of that number has that digit not set and all the digits after it set ($31 = 0001~1111_2$).
+So the bitwise AND of a number with it's predecessor will always be 0, as they don't have any common digits set.
+You can easily check that this only happens for the the power of twos and for the number $0$ which already has no digit set.
 
 ``` cpp
-bool powerTwo = n && !(n & (n - 1));
+bool isPowerOfTwo(unsigned int n) {
+    return n && !(n & (n - 1));
+}
 ```
 
-## Clear the most-right set bit
+### Clear the most-right set bit
 
-The expression **n & (n-1)** can be used to turn off the rightmost set bit of a number **n**. This works like the expression **n-1** flips all bits after the rightmost set bit of n, including the rightmost set bit. Therefore, **n & (n-1)** returns the last flipped bit of **n**.
+The expression $n ~\&~ (n-1)$ can be used to turn off the rightmost set bit of a number $n$.
+This works because the expression $n-1$ flips all bits after the rightmost set bit of $n$, including the rightmost set bit.
+So all those digits are different from the original number, and by doing a bitwise AND they are all set to 0, giving you the original number $n$ with the rightmost set bit flipped.
 
-For example, consider the number 52, which is 00110100 in binary, and has a total set of 3 bits.
+For example, consider the number $52 = 0011~0100_2$:
 
 ```
-1st iteration of the loop: n = 52
-
-00110100    &               (n)
-00110011                    (n-1)
-~~~~~~~~
-00110000
-```
-
-``` 
-2nd iteration of the loop: n = 48
- 
-00110000    &               (n)
-00101111                    (n-1)
-~~~~~~~~
-00100000
-``` 
- 
-```
-3rd iteration of the loop: n = 32
- 
-00100000    &               (n)
-00011111                    (n-1)
-~~~~~~~~
-00000000                    (n = 0)
+n         = 00110100
+n-1       = 00110011
+--------------------
+n & (n-1) = 00110000
 ```
 
-## Brian Kernighan's algorithm.
+### Brian Kernighan's algorithm
 
 We can count the number of bits set with the above expression.
 
@@ -161,21 +161,42 @@ The idea is to consider only the set bits of an integer by turning off its right
 int countSetBits(int n)
 {
     int count = 0;
- 
     while (n)
     {
         n = n & (n - 1);
         count++;
     }
- 
     return count;
 }
 ```
 
-Additionally, there are also predefined functions in C++, to count the number of bits in a representation.
+### Addtional tricks
+
+- $n ~\&~ (n + 1)$ clears all trailing ones: $0011~0111_2 \rightarrow 0011~0000_2$.
+- $n ~|~ (n + 1)$ sets the last cleared bit: $0011~0101_2 \rightarrow 0011~0111_2$.
+- $n ~\&~ -n$ extracts the last set bit: $0011~0100_2 \rightarrow 0000~0100_2$.
+
+Many more can be found in the book [Hacker's Delight](https://en.wikipedia.org/wiki/Hacker%27s_Delight).
+
+### Language and compiler support
+
+C++ supports some of those operations since C++20 via the [bit](https://en.cppreference.com/w/cpp/header/bit) standard library:
+
+- `has_single_bit`: checks if the number is a power of two
+- `bit_ceil` / `bit_floor`: round up/down to the next power of two
+- `rotl` / `rotr`: rotate the bits in the number
+- `countl_zero` / `countr_zero` / `countl_one` / `countr_one`: count the leading/trailing zeros/ones
+- `popcount`: count the number of set bits
+
+Additionally, there are also predefined functions in some compilers that help working with bits.
+E.g. GCC defines a list at [Built-in Functions Provided by GCC](https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html) that also work in older versions of C++:
+
+- `__builtin_popcount(unsigned int)` returns the number of set bits (`__builtin_popcount(0b0001'0010'1100) == 4`)
+- `__builtin_ffs(int)` finds the index of the first (most right) set bit (`__builtin_ffs(0b0001'0010'1100) == 3`)
+- `__builtin_clz(unsigned int)` the count of leading zeros (`__builtin_clz(0b0001'0010'1100) == 23`)
+- `__builtin_ctz(unsigned int)` the count of trailing zeros (`__builtin_ctz(0b0001'0010'1100) == 2`)
 
-1. ```__builtin_popcount(number)``` number of bits set.
-2. ```__builtin_ctz(number)``` number of bits cleared.
+_Note that some of the operations (both the C++20 functions and the Compiler Built-in ones) might be quite slow in GCC if you don't enable a specific compiler target with `#pragma GCC target("popcnt")`._
 
 ## Practice Problems