Skip to content

MAINT [PST] set up live preview #28353

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 8, 2024

Conversation

Charlie-XIAO
Copy link
Contributor

@Charlie-XIAO Charlie-XIAO commented Feb 2, 2024

Please note that this PR targets the new_web_theme branch!

See #28084 (comment), this intends to push doc to https://scikit-learn.org/new_web_theme when the new_web_theme branch gets updated, so that we can get a live preview of the website.

@adrinjalali

Copy link

github-actions bot commented Feb 2, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 82ec9b6. Link to the linter CI: here

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scikit-learn/core-devs would we be okay with this? I suggested it.

@lesteve
Copy link
Member

lesteve commented Feb 2, 2024

+1, this makes it easier to follow the work on pydata-sphinx-theme

@betatim
Copy link
Member

betatim commented Feb 2, 2024

I think it is a good idea to have this. It makes it much easier to look at changes, current state of work, etc.

Two thoughts:

  • can we add a robots.txt so that the page doesn't get indexed by search engines?
    • what other options do we have/should we make use of to indicate that this shouldn't be indexed by search engines, archive.org, or users who for accidentally end up here?
  • what do you think of using something like scikit-learn.org/_/pydata_theme_preview to host it?
    • adding a /_/ or leading _ to the directory name seems to kinda maybe for some people indicated "this is prviate". But maybe this is something I learnt from the few web devs I"ve worked with?
    • pydata_theme_preview is a bit more descriptive than new_web_theme.

@lesteve
Copy link
Member

lesteve commented Feb 2, 2024

I thought about the robots.txt thing, it seems like we don't have any right now, or maybe I missed it in https://github.com/scikit-learn/scikit-learn.github.io?

Even without robots.txt it does not seem like I am getting search results from old scikit-learn versions (compared to say pytest where I get plenty of results from 7.1 doc instead of the latest 8.0 release).

If we think robots.txt is a must here is an example I found (probably needs to be adapted):
https://github.com/dirty-cat/dirty-cat.github.io/blob/main/robots.txt

@betatim
Copy link
Member

betatim commented Feb 2, 2024

I think the reason we only see one version in the Google search results is that every page, no matter the version, contains meta data about the URL of the canonical version of a page. For example https://github.com/scikit-learn/scikit-learn.github.io/blob/013521018e8b62e469c99a848322124a777324ca/1.0/modules/cross_validation.html#L29 - I think the way this works is that this tells Google that all the cross-validation pages for different versions are somehow related to/less authoritative than the /stable/ one. And as a result this is the only one that shows up in results

@Charlie-XIAO
Copy link
Contributor Author

Charlie-XIAO commented Feb 2, 2024

As suggested, I changed to scikit-learn.org/_pst_preview, where _ intends to indicate that this is private and pst is the abbreviation for pydata-sphinx-theme (this abbreviation is widely used and I think it's nice to keep URL short).

Should the robots.txt thing be done in the github.io repo, or created it using the script here?

@betatim
Copy link
Member

betatim commented Feb 2, 2024

Should the robots.txt thing be done in the github.io repo, or created it using the script here?

I'd start by investigating https://www.sphinx-doc.org/en/master/usage/configuration.html#confval-html_extra_path which mentions robots.txt. That way we can have everything in this repository.

@Charlie-XIAO
Copy link
Contributor Author

OK so I pushed a commit which I'm not sure about its correctness. I know that html_extra_path would copy individual files to the output directory, so theoretically it would be at scikit-learn.org/_pst_preview/robots.txt after deployed. Then the robots.txt will tell all robots to stay away from everything under scikit-learn.org/_pst_preview/ I suppose?

@betatim
Copy link
Member

betatim commented Feb 5, 2024

I had to do some quick reading about robots.txt. As I understand it, it has to be at the top level/root of a website. So scikit-learn.com/robots.txt. As contents it should have:

User-agent: *
Disallow: /_pst_preview/

This means that we need to make a PR to main to add the robots.txt (basically f9e41a3)

@Charlie-XIAO
Copy link
Contributor Author

Charlie-XIAO commented Feb 7, 2024

Ah I didn't know robots.txt must be at the root. Will make a PR to main soon.

Update: please see #28376.

@Charlie-XIAO Charlie-XIAO changed the title MAINT switch to pydata-sphinx-theme [5] set up live preview MAINT [PST] set up live preview Feb 7, 2024
@betatim betatim merged commit f635307 into scikit-learn:new_web_theme Feb 8, 2024
@Charlie-XIAO Charlie-XIAO deleted the setup-live-preview branch February 8, 2024 13:08
@betatim
Copy link
Member

betatim commented Feb 8, 2024

Next realisation: for this to work the contents of this PR should have been merged into main. Otherwise the updates to the CI steps won't take effect.

What is the best way of doing this? Should we remove the commits from new_web_theme and make a new PR that targets main? Can we somehow cherry pick the relevant commits? Basically wondering what the best way of fixing this is so that we can still easily merge new_web_theme into main in the future, when all the work is done.

@lesteve
Copy link
Member

lesteve commented Feb 8, 2024

Next realisation: for this to work the contents of this PR should have been merged into main. Otherwise the updates to the CI steps won't take effect.

Hmmm yeah good point ...

What is the best way of doing this? Should we remove the commits from new_web_theme and make a new PR that targets main?

Doing a PR that targets main with the same changes seems like the first step.

Afterwards as long as you merge main into the new_web_theme and solve conflicts, you are fine when you will want to do the "final big PR" scikit-learn:main <- scikit-learn:new_web_theme

Note: reusing the same convention as the top part of a PR:
image

@Charlie-XIAO
Copy link
Contributor Author

Sorry I want to ask a question here: do you mean that CircleCI will only look at the workflow configuration on main? Then will the changes to .circleci/config.yml in #28379 also have no effect?

@lesteve
Copy link
Member

lesteve commented Feb 8, 2024

Actually I am super confused now, and I have to say I am not sure anymore doing a PR to main is needed ...

It looks like the doc is built on pushes to new_web_theme on CircleCI:
https://app.circleci.com/pipelines/github/scikit-learn/scikit-learn?branch=new_web_theme

Currently there is an error that would need to get fixed.

@betatim can you explain why you think we need a PR to main, maybe that will clarify things for me. This does not help that Github actions and CircleCI are slightly different, for example there is no on: push branches: [main, new_web_them] concept for CircleCI it seems ...

@Charlie-XIAO everything will be allright in the end, but for now things are a bit hazy in my brain ...

@Charlie-XIAO
Copy link
Contributor Author

Charlie-XIAO commented Feb 8, 2024

That's totally fine @lesteve thanks for your effort. And:

It looks like the doc is built on pushes to new_web_theme on CircleCI:
https://app.circleci.com/pipelines/github/scikit-learn/scikit-learn?branch=new_web_theme Currently there is an error that would need to get fixed.

This is what I'm thinking about as well. I think the reason we don't see the preview being successfully deployed is primarily because of the dependencies issue (#28379) that directly causes the doc build process to fail. (This also makes us unable to easily check whether the configuration is getting updated by this PR because it did not even get to that step.)

@lesteve
Copy link
Member

lesteve commented Feb 8, 2024

OK I merged #28379 let's see what happens in particular whether the deploy step to preview the website needs to be in main or not.

See ongoing build log

@betatim
Copy link
Member

betatim commented Feb 8, 2024

@betatim can you explain why you think we need a PR to main, maybe that will clarify things for me. This does not help that Github actions and CircleCI are slightly different, for example there is no on: push branches: [main, new_web_them] concept for CircleCI it seems ...

I thought this because I was waiting to see the live pre-view arrive in https://github.com/scikit-learn/scikit-learn.github.io/ after merging this PR. But it didn't. So then I thought "aha, this is because CircleCI is clever and only looks at main".

So really it is a guess, not knowledge.

@lesteve
Copy link
Member

lesteve commented Feb 8, 2024

See ongoing build log

It seems like the doc build is running the examples, so at least merging #28379 helped.

The moment of truth is going to be a the deploy step 🤞 ...

@Charlie-XIAO

This comment was marked as outdated.

@Charlie-XIAO
Copy link
Contributor Author

https://scikit-learn.org/_pst_preview/

It worked 🚀

@lesteve
Copy link
Member

lesteve commented Feb 8, 2024

Yep with @betatim we were seeing the same thing 🎉. The main page seems a bit broken but this is likely fixable or maybe you haven't worked on it yet

@Charlie-XIAO
Copy link
Contributor Author

Charlie-XIAO commented Feb 8, 2024

The main page is broken because it is meant to be fixed in #28331, and since the dependency issue is now fixed all the ongoing PRs in the list #28084 (comment) can be continued. Thanks again!

By the way I think I still need to open a PR to remove the robots.txt thing on main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants