Skip to content

Improve Sieve of Eratosthenes article: expanded explanations, added code and comments for odd-only and optimized sieves, and corrected mathematical typos. #1483

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
202 changes: 137 additions & 65 deletions src/algebra/sieve-of-eratosthenes.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,18 +6,18 @@ e_maxx_link: eratosthenes_sieve

# Sieve of Eratosthenes

Sieve of Eratosthenes is an algorithm for finding all the prime numbers in a segment $[1;n]$ using $O(n \log \log n)$ operations.
Sieve of Eratosthenes is an algorithm for finding all the prime numbers in a segment $[1, n]$ using $O(n \log \log n)$ operations.

The algorithm is very simple:
at the beginning we write down all numbers between 2 and $n$.
We mark all proper multiples of 2 (since 2 is the smallest prime number) as composite.
A proper multiple of a number $x$, is a number greater than $x$ and divisible by $x$.
A proper multiple of a number $n$, is a number greater than $n$ and divisible by $n$.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A proper multiple of a number $n$, is a number greater than $n$ and divisible by $n$.
A proper multiple of a number $m$, is a number greater than $n$ and divisible by $m$.

$n$ is already used.

Then we find the next number that hasn't been marked as composite, in this case it is 3.
Which means 3 is prime, and we mark all proper multiples of 3 as composite.
The next unmarked number is 5, which is the next prime number, and we mark all proper multiples of it.
And we continue this procedure until we have processed all numbers in the row.

In the following image you can see a visualization of the algorithm for computing all prime numbers in the range $[1; 16]$. It can be seen, that quite often we mark numbers as composite multiple times.
In the following image you can see a visualization of the algorithm for computing all prime numbers in the range $[1, 16]$. It can be seen, that quite often we mark numbers as composite multiple times.

<div style="text-align: center;">
<img src="sieve_eratosthenes.png" alt="Sieve of Eratosthenes">
Expand Down Expand Up @@ -98,15 +98,17 @@ The methods presented below allow us to reduce the quantity of the performed ope
### Sieving till root

Obviously, to find all the prime numbers until $n$, it will be enough just to perform the sifting only by the prime numbers, which do not exceed the root of $n$.
To understand it more clearly please checkout RobJohn's [this answer](https://math.stackexchange.com/a/58808) to the question "Why in Sieve of Eratosthenes of $N$
number you need to check and cross out numbers up to $\sqrt{N}$?" on [MathStackExchange](https://math.stackexchange.com).
Comment on lines +101 to +102
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To understand it more clearly please checkout RobJohn's [this answer](https://math.stackexchange.com/a/58808) to the question "Why in Sieve of Eratosthenes of $N$
number you need to check and cross out numbers up to $\sqrt{N}$?" on [MathStackExchange](https://math.stackexchange.com).
This is because the smallest prime factor $p$ of a composite number $m \leq n$ may not exceed $\sqrt m \leq \sqrt n$. Indeed, $p \cdot \frac{m}{p} = m$ and $p \leq \frac{m}{p}$, thus if $p$ was greater than $\sqrt m$, the product of $p$ and $\frac{m}{p}$ would be greater than $(\sqrt m)^2 = m$.

I think it's better to provide an inline explanation, rather than link to stackexchange, esp. given that the explanation is fairly short.


```cpp
int n;
vector<bool> is_prime(n+1, true);
is_prime[0] = is_prime[1] = false;
for (int i = 2; i * i <= n; i++) {
if (is_prime[i]) {
for (int j = i * i; j <= n; j += i)
is_prime[j] = false;
if(!is_prime[i]) continue;
for (int j = i * i; j <= limit; j += i){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (int j = i * i; j <= limit; j += i){
for (int j = i * i; j <= n; j += i){

limit is used here, but not defined anywhere. Generally, I don't think it should be different from $n$ in this particular part.

is_prime[j] = false;
}
}
```
Expand All @@ -119,6 +121,38 @@ Since all even numbers (except $2$) are composite, we can stop checking even num

First, it will allow us to halve the needed memory. Second, it will reduce the number of operations performed by algorithm approximately in half.

```cpp
// returns a vector of all prime numbers in the interval [2, n]
vector<int> primes_upto(int n){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this code tested on any particular problem? If no, could you please do it?

if(n < 2) return {};

// is_prime[i] is true if 2i + 1 is a prime.
vector<bool> is_prime((n+1)/2, true);
is_prime[0] = false; // 1 is not a prime.

// iterate over each odd number in [3, sqrt(n)]
for(int i = 3; i * i <= n; i += 2){
if(!is_prime[i/2]) continue;
// j += 2*i because we want j to represent odd multiples of i.
for(int j = i * i; j <= n; j += 2 * i){
is_prime[j/2] = false;
}
}

vector<int> primes;
primes.push_back(2); // starting with only even prime.
// iterating vector<bool> is_prime
for(int i = 1; 2 * i + 1 <= n; ++i){
if(is_prime[i]){ // if the block is true.
// then append the odd number it represents i.e. 2i + 1.
primes.push_back(2 * i + 1);
}
}

return primes;
}
```

### Memory consumption and speed of operations

We should notice, that these two implementations of the Sieve of Eratosthenes use $n$ bits of memory by using the data structure `vector<bool>`.
Expand All @@ -135,70 +169,96 @@ You are limited by how fast you can load the data into the cache, and therefore
A benchmark ([link](https://gist.github.com/jakobkogler/e6359ea9ced24fe304f1a8af3c9bee0e)) shows, that using a `vector<bool>` is between 1.4x and 1.7x faster than using a `vector<char>`.

The same considerations also apply to `bitset`.
It's also an efficient way of storing bits, similar to `vector<bool>`, so it takes only $\frac{N}{8}$ bytes of memory, but is a bit slower in accessing the elements.
It's also an efficient way of storing bits, similar to `vector<bool>`, so it takes only $\frac{N}{8}$ bytes of memory, but is a bit slower in accessing the elements means `vector<bool>` is a space-optimized specialization that typically uses 1 bit per element, unlike standard containers like `vector<char>` or `vector<bool>` (if it were a normal template), which use 1 byte per element.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
It's also an efficient way of storing bits, similar to `vector<bool>`, so it takes only $\frac{N}{8}$ bytes of memory, but is a bit slower in accessing the elements means `vector<bool>` is a space-optimized specialization that typically uses 1 bit per element, unlike standard containers like `vector<char>` or `vector<bool>` (if it were a normal template), which use 1 byte per element.
It's also an efficient way of storing bits, similar to `vector<bool>`, so it takes only $\frac{N}{8}$ bytes of memory, but is a bit slower in accessing the elements.

This paragraph is about bitset, not vector<bool>. The explanation in this particular paragraph seems out of place, moreover I believe the same information about vector<bool> is already conveyed above, on lines 159-160.


In the benchmark above `bitset` performs a bit worse than `vector<bool>`.
Another drawback from `bitset` is that you need to know the size at compile time.

To maintain the consistency and readability in this article we will keep using `vector<bool>`.

### Segmented Sieve

It follows from the optimization "sieving till root" that there is no need to keep the whole array `is_prime[1...n]` at all times.
For sieving it is enough to just keep the prime numbers until the root of $n$, i.e. `prime[1... sqrt(n)]`, split the complete range into blocks, and sieve each block separately.

Let $s$ be a constant which determines the size of the block, then we have $\lceil {\frac n s} \rceil$ blocks altogether, and the block $k$ ($k = 0 ... \lfloor {\frac n s} \rfloor$) contains the numbers in a segment $[ks; ks + s - 1]$.
We can work on blocks by turns, i.e. for every block $k$ we will go through all the prime numbers (from $1$ to $\sqrt n$) and perform sieving using them.
Let $s$ be a constant which determines the size of the block, then we have $\lceil \frac{n}{s} \rceil$ blocks altogether, and each block $k$ contains $s$ numbers in the interval $[ks, (k+1)s)$ for $k = 0, 1, \dots , \lfloor \frac{n}{s} \rfloor$.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last block ends in $n$, rather than $(k+1)s$, right? I'm not sure if it needs to be mentioned separately though.


We can work on blocks by turns, i.e. for every block $k$ we will go through all the prime numbers (from $1$ to $\sqrt n$) and remove mark their multiple as composite.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We can work on blocks by turns, i.e. for every block $k$ we will go through all the prime numbers (from $1$ to $\sqrt n$) and remove mark their multiple as composite.
We can work on blocks by turns, i.e. for every block $k$ we will go through all the prime numbers (from $1$ to $\sqrt n$) and mark their multiple as composite.


It is worth noting, that we have to modify the strategy a little bit when handling the first numbers: first, all the prime numbers from $[1; \sqrt n]$ shouldn't remove themselves; and second, the numbers $0$ and $1$ should be marked as non-prime numbers.
While working on the last block it should not be forgotten that the last needed number $n$ is not necessarily located at the end of the block.

As discussed previously, the typical implementation of the Sieve of Eratosthenes is limited by the speed how fast you can load data into the CPU caches.
By splitting the range of potential prime numbers $[1; n]$ into smaller blocks, we never have to keep multiple blocks in memory at the same time, and all operations are much more cache-friendlier.
As we are now no longer limited by the cache speeds, we can replace the `vector<bool>` with a `vector<char>`, and gain some additional performance as the processors can handle read and writes with bytes directly and don't need to rely on bit operations for extracting individual bits.
The benchmark ([link](https://gist.github.com/jakobkogler/e6359ea9ced24fe304f1a8af3c9bee0e)) shows, that using a `vector<char>` is about 3x faster in this situation than using a `vector<bool>`.
A word of caution: those numbers might differ depending on architecture, compiler, and optimization levels.
Comment on lines -152 to -156
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you think it's better to put it below the code, than where it was before?

While working on the last block it should not be forgotten that the last needed number $n$ is not necessarily located at the end of the block.

Here we have an implementation that counts the number of primes smaller than or equal to $n$ using block sieving.

```cpp
int count_primes(int n) {
const int S = 10000;
int count_primes(int n, int s = 1000){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test the new version on any particular problem? If no, could you please do it?

if(n < 2) return 0;

int limit = sqrt(n);

vector<int> primes;
int nsqrt = sqrt(n);
vector<char> is_prime(nsqrt + 2, true);
for (int i = 2; i <= nsqrt; i++) {
if (is_prime[i]) {
primes.push_back(i);
for (int j = i * i; j <= nsqrt; j += i)
is_prime[j] = false;
vector<bool> is_prime(limit + 1, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we are now no longer limited by the cache speeds, we can replace the vector<bool> with a vector<char>

The usage of vector<char> rather than vector<bool> was intentional.


// genetating primes in range [2, sqrt(n)]
for(int i = 2; i <= limit; i++) {
if(!is_prime[i]) continue;
primes.push_back(i);
for(int j = i * i; j <= limit; j += i){
is_prime[j] = false;
}
}

int result = 0;
vector<char> block(S);
for (int k = 0; k * S <= n; k++) {
fill(block.begin(), block.end(), true);
int start = k * S;
for (int p : primes) {
int start_idx = (start + p - 1) / p;
int j = max(start_idx, p) * p - start;
for (; j < S; j += p)
block[j] = false;
int total_primes = 0;
for (int k = 0; k * s <= n; k++){
// current block's range [left, right)
int left = k * s, right = (k + 1) * s;

// block[i] denotes whether the number (left + i) is prime.
vector<bool> block(s, true);
if (k == 0) block[0] = block[1] = false;


for(int prime : primes) {
// start is first multiple of 'prime' >= left.
// see the note at the end of this code to understand the initialization of 'start'.
int start = max(prime * prime, (left + prime - 1) / prime * prime);
for(int j = start; j < right; j+=prime){
block[j - left] = false;
}
}
if (k == 0)
block[0] = block[1] = false;
for (int i = 0; i < S && start + i <= n; i++) {
if (block[i])
result++;

for(int i = 0; i < s && left + i <= n; i++){
if(block[i]) total_primes++;
}
}
return result;

return total_primes;
}
```

#### Note on 'start'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it warrants a dedicated section. Also when your additions ends, it is followed by some general stuff that is not from this particular section, which might be confusing as to where we stop talking about start and start discussing the block sieve in general.

As we know that we starting marking the multiple of a prime $p$ as composite starting from $p^2$. And the multiple of $p$ that we mark starting from $p^2$ are $p^2, p^2 + p, p^2 + 2p, \dots$ are of the form $p(p + i)$ for $i \geq 0$.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
As we know that we starting marking the multiple of a prime $p$ as composite starting from $p^2$. And the multiple of $p$ that we mark starting from $p^2$ are $p^2, p^2 + p, p^2 + 2p, \dots$ are of the form $p(p + i)$ for $i \geq 0$.
We start marking the multiple of a prime $p$ as composite starting from $p^2$. The multiples of $p$ that we mark starting from $p^2$ are $p^2, p^2 + p, p^2 + 2p, \dots$, in other words the numbers of the form $p(p + i)$ for $i \geq 0$.


Now to start marking multiple of $p$ in range $[left, right)$ we first need to figure out the first multiple of $p \geq$ *left*.
If *left* lies between two consecutive multiple of $p$ as $[p(p+i), p(p+i+1)]$ then we should starting marking numbers starting from $p(p+i+1)$ which is given by $\left\lceil \frac{left}{p} \right\rceil \cdot p $ that is `((left + p - 1) / p) * p`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If *left* lies between two consecutive multiple of $p$ as $[p(p+i), p(p+i+1)]$ then we should starting marking numbers starting from $p(p+i+1)$ which is given by $\left\lceil \frac{left}{p} \right\rceil \cdot p $ that is `((left + p - 1) / p) * p`.
If *left* lies between two consecutive multiple of $p$ as $[p(p+i), p(p+i+1)]$ then we should start marking numbers starting from $p(p+i+1)$ which is given by $\left\lceil \frac{left}{p} \right\rceil \cdot p $ that is `((left + p - 1) / p) * p`.


Now suppose *left* lies in the interval $[0, p]$ then $\left\lceil \frac{left}{p} \right\rceil \cdot p$ gives us $p$, right? That's the problem.
We'll mistakenly mark $p$ as the composite so, to avoid this we take maximum of $p*p$ and $\left\lceil \frac{left}{p} \right\rceil \cdot p$.


The running time of block sieving is the same as for regular sieve of Eratosthenes (unless the size of the blocks is very small), but the needed memory will shorten to $O(\sqrt{n} + S)$ and we have better caching results.
On the other hand, there will be a division for each pair of a block and prime number from $[1; \sqrt{n}]$, and that will be far worse for smaller block sizes.
Hence, it is necessary to keep balance when selecting the constant $S$.

On the other hand, there will be a division for each pair of a block and prime number from $[1, \sqrt{n}]$, and that will be far worse for smaller block sizes.
Hence, it is necessary to keep balance when selecting the constant $s$.
We achieved the best results for block sizes between $10^4$ and $10^5$.

As discussed previously, the typical implementation of the Sieve of Eratosthenes is limited by the speed how fast you can load data into the CPU caches.
By splitting the range of potential prime numbers $[1, n]$ into smaller blocks, we never have to keep multiple blocks in memory at the same time, and all operations are much more cache-friendlier.
As we are now no longer limited by the cache speeds, we can replace the `vector<bool>` with a `vector<char>`, and gain some additional performance as the processors can handle read and writes with bytes directly and don't need to rely on bit operations for extracting individual bits.
The benchmark ([link](https://gist.github.com/jakobkogler/e6359ea9ced24fe304f1a8af3c9bee0e)) shows, that using a `vector<char>` is about 3x faster in this situation than using a `vector<bool>`.
A word of caution: those numbers might differ depending on architecture, compiler, and optimization levels.

## Find primes in range

Sometimes we need to find all prime numbers in a range $[L,R]$ of small size (e.g. $R - L + 1 \approx 1e7$), where $R$ can be very large (e.g. $1e12$).
Expand All @@ -207,47 +267,59 @@ To solve such a problem, we can use the idea of the Segmented sieve.
We pre-generate all prime numbers up to $\sqrt R$, and use those primes to mark all composite numbers in the segment $[L, R]$.

```cpp
vector<char> segmentedSieve(long long L, long long R) {
vector<bool> segmentedSieve(long long L, long long R) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we are now no longer limited by the cache speeds, we can replace the vector<bool> with a vector<char>

The usage of vector<char> rather than vector<bool> was intentional.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test the new version on any particular problem? If no, could you please do it?

// generate all primes up to sqrt(R)
long long lim = sqrt(R);
vector<char> mark(lim + 1, false);
long long limit = sqrt(R);
vector<long long> primes;
for (long long i = 2; i <= lim; ++i) {
if (!mark[i]) {
primes.emplace_back(i);
for (long long j = i * i; j <= lim; j += i)
mark[j] = true;
vector<bool> is_prime(limit + 1, true);

for(long long i = 2; i <= limit; ++i){
if(!is_prime[i]) continue;
primes.push_back(i);
for(long long j = i * i; j <= limit; j += i){
is_prime[j] = false;
}
}

vector<char> isPrime(R - L + 1, true);
for (long long i : primes)
for (long long j = max(i * i, (L + i - 1) / i * i); j <= R; j += i)
isPrime[j - L] = false;
if (L == 1)
isPrime[0] = false;
return isPrime;
// primes_in_range[i] is true is L + i is prime where i belongs to [0, R-L].
vector<bool> primes_in_range(R - L + 1, true);
if(L == 1) primes_in_range[0] = false;

for(long long prime : primes){
long long start = max(prime * prime, (L + prime - 1) / prime * prime);
for(long long j = start; j <= R; j+=prime){
primes_in_range[j - L] = false;
}
}

return primes_in_range;
}
```
Time complexity of this approach is $O((R - L + 1) \log \log (R) + \sqrt R \log \log \sqrt R)$.

It's also possible that we don't pre-generate all prime numbers:

```cpp
vector<char> segmentedSieveNoPreGen(long long L, long long R) {
vector<char> isPrime(R - L + 1, true);
long long lim = sqrt(R);
for (long long i = 2; i <= lim; ++i)
for (long long j = max(i * i, (L + i - 1) / i * i); j <= R; j += i)
vector<bool> segmentedSieveNoPreGen(long long L, long long R) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test the new version on any particular problem? If no, could you please do it?

vector<bool> isPrime(R - L + 1, true);
if (L == 1) isPrime[0] = false;
long long limit = sqrt(R);

for(long long i = 2; i <= limit; ++i){
long long start = max(i * i, (L + i - 1) / i * i);
for(long long j = start; j <= R; j += i){
isPrime[j - L] = false;
if (L == 1)
isPrime[0] = false;
}
}
return isPrime;
}
```

We are not filtering out non-primes before sieving, so this approach is slower but avoids precomputation.
Obviously, the complexity is worse, which is $O((R - L + 1) \log (R) + \sqrt R)$. However, it still runs very fast in practice.

## Practical Tip:
For most competitive programming tasks involving prime numbers up to $10^7$ or so, the odd-number-only sieve with vector<bool> is usually the fastest. For larger ranges (like $[10^9, 10^9 + 10^6]$), the segmented sieve is indispensable.

## Linear time modification

We can modify the algorithm in a such a way, that it only has linear time complexity.
Expand Down