Skip to content

Improve Sieve of Eratosthenes article: expanded explanations, added code and comments for odd-only and optimized sieves, and corrected mathematical typos. #1483

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

MythOfSisyphus
Copy link

  • Expanded key explanations (e.g. why sieving starts from p*p and how ceil(L/p)*p avoids marking the prime itself)

  • Added well-commented code for:

    • Odd-only sieve

    • Segmented sieve

    • Prime count via blocks

    • Sieve over arbitrary range [L, R]

  • Corrected mathematical typos in writing intervals.

…ode and comments for odd-only and optimized sieves, and corrected mathematical typos.
@mhayter
Copy link
Contributor

mhayter commented Aug 1, 2025

I'm confused on some of these changes:

Why is x changed to n as a variable? We already defined n.

Why are we using limit now and not defining it?

@MythOfSisyphus
Copy link
Author

MythOfSisyphus commented Aug 2, 2025

"Why is $x$ changed to $n$ as a variable? We already defined $n$." Oh, yeah! I didn't realize it. I took $n$ because it is more natural choice to denote an integer. But yeah, as we've written in first line "... finding prime numbers in range $[1, n]$". We could either stick with the original $x$ or use a different variable name like $m$ for clarity.

"Why are we using limit now and not defining it?" I'm really sorry for that. Yeah, in section Sieving till root I wrote j <= limit, in second for loop, without defining limit. Actually, it should be j <= n. I didn't notice it because I used limit to denote square root of $n$ and $R$ in Segmented Sieve.

Thank you very much for closely reviewing my all changes.
I’ll be happy to fix both issues and push an updated commit to this branch — unless you'd prefer to handle the minor edits directly before merging.

@mhayter
Copy link
Contributor

mhayter commented Aug 2, 2025

I'm not convinced we will be merging. Please fix these issues and proofread your request. I haven't thoroughly reviewed by your request yet because of two obvious issues.

@MythOfSisyphus
Copy link
Author

I don't know what you're finding hard to understand. If you read original article here after that read my refactored version here you'll clearly see how much explanation and readability in the code I improved.

Basically, I edited the whole article. If you have read cp-algorithm closely you'll realize that "code is very poor and explanations are not great." That's what I tried to improve in this very file.

If after reading both articles you still find yourself confused, then close my pull request (OfCourse without merging). I'll be happy to keep my changes locally in my machine only.

As my title and description are very what I did with the file.

@mhayter mhayter closed this Aug 8, 2025
github-actions bot added a commit that referenced this pull request Aug 8, 2025
@adamant-pwn adamant-pwn reopened this Aug 9, 2025
Copy link
Member

@adamant-pwn adamant-pwn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for the pull request! I left some comments where it seemed appropriate, and I think we would need to address them before considering merging.

I'm also conflicted about significantly rewriting code blocks, as we currently don't have tests for them. We should either add some tests (see here), or at least make sure that they were tested in some actual competitive programming problems.


The algorithm is very simple:
at the beginning we write down all numbers between 2 and $n$.
We mark all proper multiples of 2 (since 2 is the smallest prime number) as composite.
A proper multiple of a number $x$, is a number greater than $x$ and divisible by $x$.
A proper multiple of a number $n$, is a number greater than $n$ and divisible by $n$.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A proper multiple of a number $n$, is a number greater than $n$ and divisible by $n$.
A proper multiple of a number $m$, is a number greater than $n$ and divisible by $m$.

$n$ is already used.

Comment on lines +101 to +102
To understand it more clearly please checkout RobJohn's [this answer](https://math.stackexchange.com/a/58808) to the question "Why in Sieve of Eratosthenes of $N$
number you need to check and cross out numbers up to $\sqrt{N}$?" on [MathStackExchange](https://math.stackexchange.com).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To understand it more clearly please checkout RobJohn's [this answer](https://math.stackexchange.com/a/58808) to the question "Why in Sieve of Eratosthenes of $N$
number you need to check and cross out numbers up to $\sqrt{N}$?" on [MathStackExchange](https://math.stackexchange.com).
This is because the smallest prime factor $p$ of a composite number $m \leq n$ may not exceed $\sqrt m \leq \sqrt n$. Indeed, $p \cdot \frac{m}{p} = m$ and $p \leq \frac{m}{p}$, thus if $p$ was greater than $\sqrt m$, the product of $p$ and $\frac{m}{p}$ would be greater than $(\sqrt m)^2 = m$.

I think it's better to provide an inline explanation, rather than link to stackexchange, esp. given that the explanation is fairly short.

for (int j = i * i; j <= n; j += i)
is_prime[j] = false;
if(!is_prime[i]) continue;
for (int j = i * i; j <= limit; j += i){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (int j = i * i; j <= limit; j += i){
for (int j = i * i; j <= n; j += i){

limit is used here, but not defined anywhere. Generally, I don't think it should be different from $n$ in this particular part.

@@ -119,6 +121,38 @@ Since all even numbers (except $2$) are composite, we can stop checking even num

First, it will allow us to halve the needed memory. Second, it will reduce the number of operations performed by algorithm approximately in half.

```cpp
// returns a vector of all prime numbers in the interval [2, n]
vector<int> primes_upto(int n){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this code tested on any particular problem? If no, could you please do it?

@@ -135,70 +169,96 @@ You are limited by how fast you can load the data into the cache, and therefore
A benchmark ([link](https://gist.github.com/jakobkogler/e6359ea9ced24fe304f1a8af3c9bee0e)) shows, that using a `vector<bool>` is between 1.4x and 1.7x faster than using a `vector<char>`.

The same considerations also apply to `bitset`.
It's also an efficient way of storing bits, similar to `vector<bool>`, so it takes only $\frac{N}{8}$ bytes of memory, but is a bit slower in accessing the elements.
It's also an efficient way of storing bits, similar to `vector<bool>`, so it takes only $\frac{N}{8}$ bytes of memory, but is a bit slower in accessing the elements means `vector<bool>` is a space-optimized specialization that typically uses 1 bit per element, unlike standard containers like `vector<char>` or `vector<bool>` (if it were a normal template), which use 1 byte per element.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
It's also an efficient way of storing bits, similar to `vector<bool>`, so it takes only $\frac{N}{8}$ bytes of memory, but is a bit slower in accessing the elements means `vector<bool>` is a space-optimized specialization that typically uses 1 bit per element, unlike standard containers like `vector<char>` or `vector<bool>` (if it were a normal template), which use 1 byte per element.
It's also an efficient way of storing bits, similar to `vector<bool>`, so it takes only $\frac{N}{8}$ bytes of memory, but is a bit slower in accessing the elements.

This paragraph is about bitset, not vector<bool>. The explanation in this particular paragraph seems out of place, moreover I believe the same information about vector<bool> is already conveyed above, on lines 159-160.

}
```

#### Note on 'start'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it warrants a dedicated section. Also when your additions ends, it is followed by some general stuff that is not from this particular section, which might be confusing as to where we stop talking about start and start discussing the block sieve in general.

primes.push_back(i);
for (int j = i * i; j <= nsqrt; j += i)
is_prime[j] = false;
vector<bool> is_prime(limit + 1, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we are now no longer limited by the cache speeds, we can replace the vector<bool> with a vector<char>

The usage of vector<char> rather than vector<bool> was intentional.

@@ -207,47 +267,59 @@ To solve such a problem, we can use the idea of the Segmented sieve.
We pre-generate all prime numbers up to $\sqrt R$, and use those primes to mark all composite numbers in the segment $[L, R]$.

```cpp
vector<char> segmentedSieve(long long L, long long R) {
vector<bool> segmentedSieve(long long L, long long R) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we are now no longer limited by the cache speeds, we can replace the vector<bool> with a vector<char>

The usage of vector<char> rather than vector<bool> was intentional.

@@ -207,47 +267,59 @@ To solve such a problem, we can use the idea of the Segmented sieve.
We pre-generate all prime numbers up to $\sqrt R$, and use those primes to mark all composite numbers in the segment $[L, R]$.

```cpp
vector<char> segmentedSieve(long long L, long long R) {
vector<bool> segmentedSieve(long long L, long long R) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test the new version on any particular problem? If no, could you please do it?

long long lim = sqrt(R);
for (long long i = 2; i <= lim; ++i)
for (long long j = max(i * i, (L + i - 1) / i * i); j <= R; j += i)
vector<bool> segmentedSieveNoPreGen(long long L, long long R) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test the new version on any particular problem? If no, could you please do it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants