-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
MNT Add robots.txt to avoid indexing of old version doc #30685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reading some doc about robot.txt, it is stated that it's for controlling exploration but not indexation of the website. To control indexation it's advised to use an html tag which is a lot more complex. So let's first try this and see if it's good enough for us before considering more advanced solutions. |
Hmmm thinking about it a bit more, I think Maybe I can do a PR to the https://github.com/scikit-learn/scikit-learn.github.io repo adding About the approach, I agree with you that it seems the simplest thing to try. Since ReadTheDocs is recommending using robots.txt I guess this should work, we will see. About robots.txt not being the "right" way, indeed this is what #8958 (comment) was pointing at, but indeed it sounds more complex. We are also using rel=canonical which should help #8958 (comment) but I think we are using it only in some places (not 100% sure). Also rel=canonical does not work for documentation which have been renamed or have disappeared. |
So backporting in 1.6.X (see #30686) will put it in |
Yep, I opened scikit-learn/scikit-learn.github.io#22 on the website repo. |
…)" This reverts commit 61077dc.
Let's revert this one |
I opened #30687 to revert the robots.txt addition to the scikit-learn/scikit-learn repo. |
Thanks heaps!! |
Fixes #8958.
After rereading the issue, adding a
robots.txt
seems like the simplest thing to do. I think this is worth trying for a few weeks and see whether this helps. ReadTheDocs mentions robots.txt for example.For now I made the choice of excluding everything for indexing but:
/stable
/dev/developers
to allow indexing of the developer docThis can definitely be tweaked if you have better suggestions.
I kind of tested manually the robots.txt with https://robotstxt.com/tester and it seems to do what we want.