-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Improve Sieve of Eratosthenes article: expanded explanations, added code and comments for odd-only and optimized sieves, and corrected mathematical typos. #1483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ode and comments for odd-only and optimized sieves, and corrected mathematical typos.
I'm confused on some of these changes: Why is x changed to n as a variable? We already defined n. Why are we using limit now and not defining it? |
"Why is "Why are we using limit now and not defining it?" I'm really sorry for that. Yeah, in section Sieving till root I wrote Thank you very much for closely reviewing my all changes. |
I'm not convinced we will be merging. Please fix these issues and proofread your request. I haven't thoroughly reviewed by your request yet because of two obvious issues. |
I don't know what you're finding hard to understand. If you read original article here after that read my refactored version here you'll clearly see how much explanation and readability in the code I improved. Basically, I edited the whole article. If you have read cp-algorithm closely you'll realize that "code is very poor and explanations are not great." That's what I tried to improve in this very file. If after reading both articles you still find yourself confused, then close my pull request (OfCourse without merging). I'll be happy to keep my changes locally in my machine only. As my title and description are very what I did with the file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks for the pull request! I left some comments where it seemed appropriate, and I think we would need to address them before considering merging.
I'm also conflicted about significantly rewriting code blocks, as we currently don't have tests for them. We should either add some tests (see here), or at least make sure that they were tested in some actual competitive programming problems.
|
||
The algorithm is very simple: | ||
at the beginning we write down all numbers between 2 and $n$. | ||
We mark all proper multiples of 2 (since 2 is the smallest prime number) as composite. | ||
A proper multiple of a number $x$, is a number greater than $x$ and divisible by $x$. | ||
A proper multiple of a number $n$, is a number greater than $n$ and divisible by $n$. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A proper multiple of a number $n$, is a number greater than $n$ and divisible by $n$. | |
A proper multiple of a number $m$, is a number greater than $n$ and divisible by $m$. |
To understand it more clearly please checkout RobJohn's [this answer](https://math.stackexchange.com/a/58808) to the question "Why in Sieve of Eratosthenes of $N$ | ||
number you need to check and cross out numbers up to $\sqrt{N}$?" on [MathStackExchange](https://math.stackexchange.com). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To understand it more clearly please checkout RobJohn's [this answer](https://math.stackexchange.com/a/58808) to the question "Why in Sieve of Eratosthenes of $N$ | |
number you need to check and cross out numbers up to $\sqrt{N}$?" on [MathStackExchange](https://math.stackexchange.com). | |
This is because the smallest prime factor $p$ of a composite number $m \leq n$ may not exceed $\sqrt m \leq \sqrt n$. Indeed, $p \cdot \frac{m}{p} = m$ and $p \leq \frac{m}{p}$, thus if $p$ was greater than $\sqrt m$, the product of $p$ and $\frac{m}{p}$ would be greater than $(\sqrt m)^2 = m$. |
I think it's better to provide an inline explanation, rather than link to stackexchange, esp. given that the explanation is fairly short.
for (int j = i * i; j <= n; j += i) | ||
is_prime[j] = false; | ||
if(!is_prime[i]) continue; | ||
for (int j = i * i; j <= limit; j += i){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for (int j = i * i; j <= limit; j += i){ | |
for (int j = i * i; j <= n; j += i){ |
limit
is used here, but not defined anywhere. Generally, I don't think it should be different from
@@ -119,6 +121,38 @@ Since all even numbers (except $2$) are composite, we can stop checking even num | |||
|
|||
First, it will allow us to halve the needed memory. Second, it will reduce the number of operations performed by algorithm approximately in half. | |||
|
|||
```cpp | |||
// returns a vector of all prime numbers in the interval [2, n] | |||
vector<int> primes_upto(int n){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this code tested on any particular problem? If no, could you please do it?
@@ -135,70 +169,96 @@ You are limited by how fast you can load the data into the cache, and therefore | |||
A benchmark ([link](https://gist.github.com/jakobkogler/e6359ea9ced24fe304f1a8af3c9bee0e)) shows, that using a `vector<bool>` is between 1.4x and 1.7x faster than using a `vector<char>`. | |||
|
|||
The same considerations also apply to `bitset`. | |||
It's also an efficient way of storing bits, similar to `vector<bool>`, so it takes only $\frac{N}{8}$ bytes of memory, but is a bit slower in accessing the elements. | |||
It's also an efficient way of storing bits, similar to `vector<bool>`, so it takes only $\frac{N}{8}$ bytes of memory, but is a bit slower in accessing the elements means `vector<bool>` is a space-optimized specialization that typically uses 1 bit per element, unlike standard containers like `vector<char>` or `vector<bool>` (if it were a normal template), which use 1 byte per element. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also an efficient way of storing bits, similar to `vector<bool>`, so it takes only $\frac{N}{8}$ bytes of memory, but is a bit slower in accessing the elements means `vector<bool>` is a space-optimized specialization that typically uses 1 bit per element, unlike standard containers like `vector<char>` or `vector<bool>` (if it were a normal template), which use 1 byte per element. | |
It's also an efficient way of storing bits, similar to `vector<bool>`, so it takes only $\frac{N}{8}$ bytes of memory, but is a bit slower in accessing the elements. |
This paragraph is about bitset
, not vector<bool>
. The explanation in this particular paragraph seems out of place, moreover I believe the same information about vector<bool>
is already conveyed above, on lines 159-160.
} | ||
``` | ||
|
||
#### Note on 'start' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it warrants a dedicated section. Also when your additions ends, it is followed by some general stuff that is not from this particular section, which might be confusing as to where we stop talking about start
and start discussing the block sieve in general.
primes.push_back(i); | ||
for (int j = i * i; j <= nsqrt; j += i) | ||
is_prime[j] = false; | ||
vector<bool> is_prime(limit + 1, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we are now no longer limited by the cache speeds, we can replace the
vector<bool>
with avector<char>
The usage of vector<char>
rather than vector<bool>
was intentional.
@@ -207,47 +267,59 @@ To solve such a problem, we can use the idea of the Segmented sieve. | |||
We pre-generate all prime numbers up to $\sqrt R$, and use those primes to mark all composite numbers in the segment $[L, R]$. | |||
|
|||
```cpp | |||
vector<char> segmentedSieve(long long L, long long R) { | |||
vector<bool> segmentedSieve(long long L, long long R) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we are now no longer limited by the cache speeds, we can replace the
vector<bool>
with avector<char>
The usage of vector<char>
rather than vector<bool>
was intentional.
@@ -207,47 +267,59 @@ To solve such a problem, we can use the idea of the Segmented sieve. | |||
We pre-generate all prime numbers up to $\sqrt R$, and use those primes to mark all composite numbers in the segment $[L, R]$. | |||
|
|||
```cpp | |||
vector<char> segmentedSieve(long long L, long long R) { | |||
vector<bool> segmentedSieve(long long L, long long R) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you test the new version on any particular problem? If no, could you please do it?
long long lim = sqrt(R); | ||
for (long long i = 2; i <= lim; ++i) | ||
for (long long j = max(i * i, (L + i - 1) / i * i); j <= R; j += i) | ||
vector<bool> segmentedSieveNoPreGen(long long L, long long R) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you test the new version on any particular problem? If no, could you please do it?
Expanded key explanations (e.g. why sieving starts from p*p and how ceil(L/p)*p avoids marking the prime itself)
Added well-commented code for:
Odd-only sieve
Segmented sieve
Prime count via blocks
Sieve over arbitrary range [L, R]
Corrected mathematical typos in writing intervals.