Skip to content

Commit 019dba3

Browse files
authored
Clean up Sieve of Eratosthenes
1 parent 9a91d58 commit 019dba3

File tree

1 file changed

+37
-38
lines changed

1 file changed

+37
-38
lines changed

src/algebra/sieve-of-eratosthenes.md

Lines changed: 37 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,9 @@ Sieve of Eratosthenes is an algorithm for finding all the prime numbers in a seg
66

77
The idea is simple: at the beginning we write down a row of numbers and eliminate all numbers divisible by 2, except number 2 itself, then divisible by 3, except number 3 itself, next by 7, 11, and all the remaining prime numbers till $n$.
88

9-
##Implementation
9+
## Implementation
1010

11-
````cpp
11+
```cpp
1212
int n;
1313
vector<char> prime (n+1, true);
1414
prime[0] = prime[1] = false;
@@ -17,88 +17,87 @@ for (int i=2; i<=n; ++i)
1717
if (i * 1ll * i <= n)
1818
for (int j=i*i; j<=n; j+=i)
1919
prime[j] = false;
20-
````
20+
```
2121
22-
This code firstly marks all numbers except zero and number one as prime numbers, then it begins the process of sifting composite numbers. For this purpose we go through all numbers $2$ to $n$ in a cycle, and if the current number $i$ is a prime number, then we mark all numbers that are multiple to it as composite numbers.
23-
In doing so, we are starting from $i^2$ as all lesser numbers that are multiple to $i$ necessary have prime factor which is less than $i$, and that means that all of them were sifted earlier. (Since $i^2$ can easily overflow the type $int$ the additional verification is done using type $long \ long$ before the second nested cycle).
22+
This code first marks all numbers except zero and one as prime numbers, then it begins the process of sifting composite numbers. For this it goes through all numbers $2$ to $n$ in a cycle. If the current number $i$ is a prime number, it marks all numbers that are multiples of $i$ as composite numbers, starting from $i^2$ as all smaller numbers that are multiples of $i$ necessary have a prime factor which is less than $i$, so all of them were sifted earlier. (Since $i^2$ can easily overflow the type `int`, the additional verification is done using type `long long` before the second nested cycle).
2423
25-
Using such implementation the algorithm consumes $O(n)$ of the memory (obviously) and performs $O(n \log \log n)$ (this is being proved in the next section).
24+
Using such implementation the algorithm consumes $O(n)$ of the memory (obviously) and performs $O(n \log \log n)$ (this is proven in the next section).
2625
27-
##Asymptotic analysis
26+
## Asymptotic analysis
2827
29-
Lets prove that algorithms running time is $O(n \log \log n)$. It will take $\frac n p$ actions for every prime $p \le n$ the inner cycle performs. Hence, we need to evaluate the next value:
28+
Let's prove that algorithm's running time is $O(n \log \log n)$. It will take $\frac n p$ actions for every prime $p \le n$ the inner cycle performs. Hence, we need to evaluate the next value:
3029
3130
$$\sum_{\substack{p \le n, \\\ p\ is\ prime}} \frac n p = n \cdot \sum_{\substack{p \le n, \\\ p\ is\ prime}} \frac 1 p.$$
3231
33-
Let’s remember two known facts. First fact: the quantity of prime numbers, that are less or equal to $n$ approximately equals to $\frac n {\ln n}$. Second fact: the $k$-th prime number approximately equals $k \ln k$ (that is following from the first fact). Then, we can write down a sum in a such way:
32+
Let's recall two known facts. First fact: the number of prime numbers that are less than or equal to $n$ approximately equals $\frac n {\ln n}$. Second fact: the $k$-th prime number approximately equals $k \ln k$ (that follows from the first fact). Then, we can write down the sum in a such way:
3433
3534
$$\sum_{\substack{p \le n, \\\ p\ is\ prime}} \frac 1 p \approx \frac 1 2 + \sum_{k = 2}^{\frac n {\ln n}} \frac 1 {k \ln k}.$$
3635
37-
Here, we extracted the first prime number out of a sum, since $k = 1$ in approximation $k \ln k$ leads to $0$ yielding division by zero operation.
36+
Here we separated the first prime number from the rest of the numbers in the sum, since $k = 1$ in approximation $k \ln k$ is $0$ and causes division by zero operation.
3837
39-
Now, lets evaluate this sum using the integral of a same function over $k$ from $2$ to $\frac n {\ln n}$ (we can make such approximation because, in fact, the sum is related to the integral as its approximation using rectangle method):
38+
Now, let's evaluate this sum using the integral of a same function over $k$ from $2$ to $\frac n {\ln n}$ (we can make such approximation because, in fact, the sum is related to the integral as its approximation using rectangle method):
4039
4140
$$\sum_{k = 2}^{\frac n {\ln n}} \frac 1 {k \ln k} \approx \int_2^{\frac n {\ln n}} \frac 1 {k \ln k} dk.$$
4241
43-
The antiderivative for the integrand is $ \ln \ln k$. Using a substitution and removing terms of lower order, well get the result:
42+
The antiderivative for the integrand is $ \ln \ln k$. Using a substitution and removing terms of lower order, we'll get the result:
4443
4544
$$\int_2^{\frac n {\ln n}} \frac 1 {k \ln k} dk = \ln \ln \frac n {\ln n} - \ln \ln 2 = \ln(\ln n - \ln \ln n) - \ln \ln 2 \approx \ln \ln n.$$
4645
47-
Now, returning to the original sum, well get its approximate evaluation:
46+
Now, returning to the original sum, we'll get its approximate evaluation:
4847
4948
$$\sum_{\substack{p \le n, \\\ p\ is\ prime}} \frac n p \approx n \ln \ln n + o(n),$$
5049
5150
Q.E.D.
5251
53-
More strict proof (that gives more precise evaluation which is accurate within constant multipliers) you can find in the book authored by Hardy & Wright "An Introduction to the Theory of Numbers (p. 349).
52+
More strict proof (that gives more precise evaluation which is accurate within constant multipliers) you can find in the book authored by Hardy & Wright "An Introduction to the Theory of Numbers" (p. 349).
5453
55-
##Different optimizations of the Sieve of Eratosthenes
54+
## Different optimizations of the Sieve of Eratosthenes
5655
57-
The biggest weakness of the algorithm is that it walks along the memory, constantly getting out of the cache memory’s limits. Because of that the constant which is concealed in $O(n \log \log n)$ is comparably big.
56+
The biggest weakness of the algorithm is that it "walks" along the memory, constantly getting out of the cache memory limits. Because of that the constant which is concealed in $O(n \log \log n)$ is comparably big.
5857
59-
Besides, the consumed memory is the bottleneck for big enough $n$.
58+
Besides, the consumed memory is the bottleneck for big $n$.
6059
6160
The methods presented below allow us to reduce the quantity of the performed operations, as well as to shorten the consumed memory noticeably.
6261
63-
##Sieving by the prime numbers till root
62+
### Sieving by the prime numbers till root
6463
6564
Obviously, to find all the prime numbers until $n$, it will be enough just to perform the sieving only by the prime numbers, which do not exceed the root of $n$.
6665
6766
Thus, the outer cycle of the algorithm will change:
6867
69-
````cpp
68+
```cpp
7069
for (int i=2; i*i<=n; ++i)
71-
````
70+
```
7271

73-
Such optimization doesnt affect the running time (indeed, by repeating the proof presented above well get the evaluation $n \ln \ln \sqrt n + o(n)$, which is asymptotically the same according to the properties of logarithm), though the number of operations will reduce noticeably.
72+
Such optimization doesn't affect the running time (indeed, by repeating the proof presented above we'll get the evaluation $n \ln \ln \sqrt n + o(n)$, which is asymptotically the same according to the properties of logarithm), though the number of operations will reduce noticeably.
7473

75-
##Sieving by the odd numbers only
74+
### Sieving by the odd numbers only
7675

77-
Since all the even numbers, except $2$, are composite, we can stop checking even numbers at all. Instead, we need to operate with odd numbers only.
76+
Since all even numbers (except $2$) are composite, we can stop checking even numbers at all. Instead, we need to operate with odd numbers only.
7877

79-
Firstly, it will allow us to shorten the needed memory in half. Secondly, it will reduce the number of operations performing by algorithm approximately in half.
78+
First, it will allow us to half the needed memory. Second, it will reduce the number of operations performing by algorithm approximately in half.
8079

81-
##Reducing consumed memory
80+
### Reducing consumed memory
8281

8382
We should notice that algorithm of Eratosthenes operates with $n$ bits of memory. Hence, we can essentially reduce consumed memory by preserving not $n$ bytes, which are the variables of Boolean type, but $n$ bits, i.e. $\frac n 8$ bytes of memory.
8483

85-
However, such approach, which is called **“a bit-level compression**, will complicate the operations with these bits. Any bit’s reading or recording will look like a few arithmetic operations ultimately slowing down the algorithm.
84+
However, such approach, which is called **bit-level compression**, will complicate the operations with these bits. Read or write operation on any bit will require several arithmetic operations and ultimately slow down the algorithm.
8685

8786
Thus, this approach is justified provided $n$ is so big that we cannot allocate $n$ bytes of the memory anymore. In this case we will trade saving memory ($8$ times less) with significant slowing down of the algorithm.
8887

89-
After all, its worth mentioning the data structures that automatically do a bit-level compression, such as vector<bool\> and bitset<\>, have been already implemented in C++ language.
88+
After all, it's worth mentioning the data structures that automatically do a bit-level compression, such as vector<bool\> and bitset<\>, have been already implemented in C++ language.
9089

91-
However, if the speed of action is very important, then it’s better to implement a bit-level compression manually using bit operations. Still, the compilers cannot generate sufficiently fast code for today.
90+
However, if speed is very important, it's better to implement a bit-level compression manually using bit operations. Still, the compilers cannot generate sufficiently fast code for today.
9291

93-
##Block sieving
92+
### Block sieving
9493

95-
It follows from the optimization sieving by the prime numbers till root that there is no need to keep the whole array $prime[1n]$ all the time. For performing of sieving its enough to keep just prime numbers until root of $n$, i.e. $prime[1 \sqrt n]$ and to build the remaining part of array in blocks. In doing so, we need to keep one block only at the present moment in time.
94+
It follows from the optimization "sieving by the prime numbers till root" that there is no need to keep the whole array $prime[1...n]$ all the time. For performing of sieving it's enough to keep just prime numbers until root of $n$, i.e. $prime[1... \sqrt n]$ and to build the remaining part of array in blocks. In doing so, we need to keep one block only at the present moment in time.
9695

97-
Let $s$ be a constant which determines the size of the block, then we have $\lceil {\frac n s} \rceil$ blocks altogether, and block $k$ ($k = 0 \lfloor {\frac n s} \rfloor$) contains numbers in a segment $[ks; ks + s - 1]$. We can work on blocks by turns, i.e. for every block $k$ we will go through all the prime numbers (from $1$ to $\sqrt n$) and perform sieving by them inside of a current block only. It is worth working on the first block accurately because of different reasons: firstly, all the prime numbers from $[1; \sqrt n]$ shouldnt remove themselves; secondly, the numbers $0$ and $1$ should be marked as non-prime numbers. While working on the last block it should not be forgotten that the last needed number $n$ is not necessary located in the end of the block.
96+
Let $s$ be a constant which determines the size of the block, then we have $\lceil {\frac n s} \rceil$ blocks altogether, and block $k$ ($k = 0 ... \lfloor {\frac n s} \rfloor$) contains numbers in a segment $[ks; ks + s - 1]$. We can work on blocks by turns, i.e. for every block $k$ we will go through all the prime numbers (from $1$ to $\sqrt n$) and perform sieving by them inside of a current block only. It is worth working on the first block accurately because of different reasons: first, all the prime numbers from $[1; \sqrt n]$ shouldn't remove themselves; second, the numbers $0$ and $1$ should be marked as non-prime numbers. While working on the last block it should not be forgotten that the last needed number $n$ is not necessary located in the end of the block.
9897

99-
Here we have the implementation of block sieving. The program reads off the number $n$ and finds out the quantity of the prime numbers from $1$ to $n$.
98+
Here we have the implementation of block sieving. The program reads the number $n$ and finds the number of prime numbers from $1$ to $n$.
10099

101-
````cpp
100+
```cpp
102101
const int SQRT_MAXN = 100000; // square root of maximum value of N
103102
const int S = 10000;
104103
bool nprime[SQRT_MAXN], bl[S];
@@ -136,12 +135,12 @@ int main() {
136135
cout << result;
137136

138137
}
139-
````
138+
```
140139

141-
The block sieving's running time is the same as common Sieve of Eratosthenes’ (well, if the blocks’ size won’t be very small), but the needed memory will shorten to $O(\sqrt n + s)$ and "the random walking on the memory will be reduced. On the other hand, there will be a division for each pair of a block and prime number from $[1; \sqrt n]$, and that will be far worse for smaller block sizes. Hence, it is necessary to keep balance when selecting constant $s$.
140+
The running time of block sieving is the same as for regular sieve of Eratosthenes (unless the size of the blocks is very small), but the needed memory will shorten to $O(\sqrt n + s)$ and "the random walking" on the memory will be reduced. On the other hand, there will be a division for each pair of a block and prime number from $[1; \sqrt n]$, and that will be far worse for smaller block sizes. Hence, it is necessary to keep balance when selecting constant $s$.
142141

143-
According to the performed experiments, we have the best speed of work when $s$ has a value appr. from $10^4$ to $10^5$.
142+
According to the performed experiments, we have the best speed of work when $s$ has a value approximately from $10^4$ to $10^5$.
144143

145-
##Advancement to the linear time complexity
144+
## Advancement to the linear time complexity
146145

147-
We can convert the Eratosthenes algorithm into another algorithm that will have linear time complexity. Look at the article [Sieve of Eratosthenes Having Linear Time Complexity](./algebra/prime-sieve-linear.html). (However, this algorithm has its own weaknesses).
146+
We can convert the Eratosthenes algorithm into another algorithm that will have linear time complexity. Look at the article [Sieve of Eratosthenes Having Linear Time Complexity](./algebra/prime-sieve-linear.html). However, this algorithm has its own weaknesses.

0 commit comments

Comments
 (0)