Skip to content

MAINT create robots.txt for setting up pydata-sphinx-theme preview #28376

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 7, 2024

Conversation

Charlie-XIAO
Copy link
Contributor

Related to: #28353. In particular, see #28353 (comment).

@betatim

Copy link

github-actions bot commented Feb 7, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: fae750d. Link to the linter CI: here

Copy link
Member

@betatim betatim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Let's try it out. I think the chances of this breaking something related to the main webpage are small, but I'll keep an eye on it after merging to check

@betatim betatim merged commit e8addd7 into scikit-learn:main Feb 7, 2024
@@ -0,0 +1,2 @@
User-agent: *
Disallow: /_pst_preview/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Disallow: /_pst_preview/
# Do not let search engines index the PyData theme preview site
# during the live testing phase.
# https://github.com/scikit-learn/scikit-learn/pull/28353
Disallow: /_pst_preview/

@ogrisel
Copy link
Member

ogrisel commented Feb 7, 2024

My review arrived too late. It would be good to insert a comment to explain the Why of such config files.

@Charlie-XIAO
Copy link
Contributor Author

Charlie-XIAO commented Feb 7, 2024

Thanks @ogrisel for the comment, I did not know that robots.txt can have comments. I will have a follow up for that.

@Charlie-XIAO Charlie-XIAO deleted the robots-pst branch February 7, 2024 09:14
@ogrisel
Copy link
Member

ogrisel commented Feb 7, 2024

I did not know either but it looked the intuitive thing to do and I checked:

https://developers.google.com/search/docs/crawling-indexing/robots/create-robots-txt#create_rules

@Charlie-XIAO
Copy link
Contributor Author

I opened #28378 for this.

@betatim
Copy link
Member

betatim commented Feb 7, 2024

Reading https://github.com/scikit-learn/scikit-learn/blob/main/build_tools/circle/push_doc.sh more closely to understand how the docs repo works I think the result of this PR will be a robots.txt in https://github.com/scikit-learn/scikit-learn.github.io/tree/main/dev (/dev/ subdirectory). This means we need someone with more experience to chime in.

Are the files above /dev/ created by a script or by hand in https://github.com/scikit-learn/scikit-learn.github.io/tree/main ? The other files there (like https://github.com/scikit-learn/scikit-learn.github.io/blob/main/index.html) have existed unchanged for so long that we can't see the history of how they got created :-/

Maybe @thomasjpfan knows more?

@lesteve
Copy link
Member

lesteve commented Feb 7, 2024

As an alternative adding a robots.txt directly into https://github.com/scikit-learn/scikit-learn.github.io may be simpler ...

That would mean doing a PR on the scikit-learn.github.io repo (better for remembering why we did this) or pushing directly into the repo if we want a quick and dirty thing.

There are precedents of doing things directly in the scikit-learn.github.io repo, see https://github.com/scikit-learn/scikit-learn.github.io/pulls?q=is%3Apr+sort%3Aupdated-desc+is%3Amerged

@Charlie-XIAO
Copy link
Contributor Author

Charlie-XIAO commented Feb 7, 2024

I am +1 for directly adding into scikit-learn.github.io. Modifying the workflow is feasible, but since this robots.txt does not rely on any variable (such as $dir) there seems to be no reason to use a worflow.

There are precedents of doing things directly in the scikit-learn.github.io repo

Yep and (though I'm not one) I think maintainers need to make some modifications directly in that repo per major/minor release: See the 9th step of https://scikit-learn.org/dev/developers/maintainer.html#making-a-release.

@lesteve
Copy link
Member

lesteve commented Feb 7, 2024

I opened scikit-learn/scikit-learn.github.io#21 to add directly a robots.txt in the .github.io repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants