-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Make it less likely that Google is indexing old version of the docs (with rel="canonical" rather than robots.txt) #8958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Instead, it suggests to add an html meta tag in the concerned pages. <meta name="robots" content="noindex" /> |
Instead, it suggests to add an html meta tag in the concerned pages.
<meta name="robots" content="noindex" />
Good catch.
Now, how do we do this in practice? Run a script on the git repo of the
webpage to add this?
|
I'm surprised that the rel=canonical link does not already do this... as
long as the same path exists in stable, that is.
…On 1 Jun 2017 1:17 am, "Gael Varoquaux" ***@***.***> wrote:
> Instead, it suggests to add an html meta tag in the concerned pages.
> <meta name="robots" content="noindex" />
Good catch.
Now, how do we do this in practice? Run a script on the git repo of the
webpage to add this?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#8958 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz67MwSOjI6KdORNBwYm57NgwWilrCks5r_YSPgaJpZM4NqE4s>
.
|
We have a big warning on old versions but that would still be a nice to have. We probably need a script to insert this tag to all the pages of the old versions of the doc hosted here: https://github.com/scikit-learn/scikit-learn.github.io/ Ideally this script would integrated into our Circle CI based documentation builder. |
I thought "is this actually happening"? I never remembered having such an issue but actually it does not for some search engines. Google does not do that (at least for me on the first page of results) but for example DuckDuckGo does returns gives a 0.18 example as the second match: At the same time you get greeted by a big warning (as @ogrisel was saying) so maybe good enough? |
Can't we just add a robots.txt in the root folder? That's what ReadTheDocs does to hide versions. |
Example of ReadTheDocs configuration,
|
So it looks like this is happening again and more often for 1.5 for some reason. I was looking at the website stats and plenty of 1.5 pages are in the top results: This has also been reported in #30672. For me this happens with Google or https://search.brave.com but not DuckDuckGo or Qwant. This may well depend on search personalization ... With Google the version pointed to depends on what you search sometimes
|
We should have some form of From a (small sample) look at the source of view-source:https://scikit-learn.org/1.5/auto_examples/classification/plot_lda_qda.html and view-source:https://scikit-learn.org/stable/auto_examples/classification/plot_lda_qda.html#sphx-glr-auto-examples-classification-plot-lda-qda-py it seems that neither contains a I think the correct thing to do would be to include a |
Following the meeting discussion, I have one reasonable hypothesis about "why is it happening now?" is that there was some Compared https://scikit-learn.org/1.4/install.html to https://scikit-learn.org/1.5/install.html. It was part of our custom layout until 1.4
Now the question is how do we do the same thing with pydata-sphinx-theme? For completeness, I guess the |
From https://pydata-sphinx-theme.readthedocs.io/en/stable/api/pydata_sphinx_theme/index.html#pydata_sphinx_theme._fix_canonical_url I guess that setting |
The funny thing is that 10+ years ago in the same project other people were in a very similar situation #2192 (comment) 🤣
Oh well live and learn (and then forget and go back to square 1) 😉 |
a very similar situation #2192 (comment) 🤣 #2192 (comment)
The robots.txt was a mistake (my mistake). We need to change it to using link rel="canonical"
Hugely funny!
|
How do we remove/restore to its old contents https://scikit-learn.org/robots.txt? |
Directly in the scikit-learn/scikit-learn.github.io repo, see scikit-learn/scikit-learn.github.io#22 |
Based on user reports, it seems to have been fixed, e.g. the first result of google search don't point to older documentation any more. Website analytics also seem to agree, for example looking at the API doc of |
Google is indexing old versions of the docs, leading to problems such as #4736.
I suggest adding a robots.txt to fix the problem. I suggest the following content:
I am not sure that the disallow line is correct, though :$. But I think that it is worth trying.
What do people think?
The text was updated successfully, but these errors were encountered: