Skip to content

Add rel=canonical to most old html files. #39

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 8, 2019

Conversation

Carreau
Copy link

@Carreau Carreau commented Jul 29, 2019

rel=canonical is supposed to be helpful for search engine to know
which pages a search engine should promote.

This is not a redirect.

If a file exists and is not in a versioned (x.y.z) folder; consider it
as a potential "canonical" target

If a file exist in a versioned doc of matplotlib make its rel=canonical
point to the corresponding candidate.

Example:

path/to/foo.html -> candidate as being canonical
3.0.1/path/to/foo.html > set rel=canonical to matplotlib.org/path/to/foo.html

Special case the examples and gallery folder.
The examples folder is from old matplotlib version (2.0.2) but still
quite popular on Google; special case.

  1. Apply the same rule as above but from withing examples toward
    gallery:

    /gallery/path/to/foo.html -> candidate as being canonical
    /example/path/to/foo.html > set rel=canonical to matplotlib.org/gallery/path/to/foo.html

    This took care of 145 files.

  2. Many examples were "moved" during examples -> gallery transition. If
    filename is unique in gallery; use this as a potential heuristic for
    detecting move:

    /gallery/path/to/verryuniquename.html -> candidate as being canonical as the filename is unique !
    /example/old/location/verryuniquename.html > set rel=canonical to matplotlib.org/gallery/path/to/verryuniquename.html

    This took care of 185 files in example
    Leaving 271 "orphan"

  3. Resolve chain of rel-canonical. Fix links that point to example
    that themselves point to gallery.

    /gallery/path/to/verryuniquename.html
    /example/old/location/verryuniquename.html -> matplotlib.org/gallery/path/to/verryuniquename.html
    /2.0.2/example/old/location/verryuniquename.html -> should point to /example/....
    make it point to /gallery/....

No files with existing rel=canonical have been touched. This will
likely only affect pre-3.0 documentation files.


Obviously scripted, but kind of ugly:

https://gist.github.com/Carreau/d2b3e36b65d4155827539ef462860444

I paid attension to avoid doing any other changes than adding a single line to target files; so can try to do changes.

rel=canonical  is supposed to be helpful for search engine to know
which pages a search engine should promote.

This is _not_ a redirect.

If a file exists and is and is not in a versioned (x.y.z) folder; consider it
as a potential "canonical" target

If a file exist in a versioned doc of matplotlib make its rel=canonical
point to the corresponding candidate.

Example:

    path/to/foo.html -> candidate as being canonical
    3.0.1/path/to/foo.html > set rel=canonical to matplotlib.org/path/to/foo.html

Special case the `examples` and `gallery` folder.
The `example` folder is from old matplotlib version (2.0.2) but still
quite popular on Google; special case.

1) Apply the same rule as above but from withing `examples` toward
`gallery`:

    /gallery/path/to/foo.html -> candidate as being canonical
    /example/path/to/foo.html > set rel=canonical to matplotlib.org/gallery/path/to/foo.html

    This took care of 145 files.

2) Many examples were "moved" during examples -> gallery transition. If
filename is unique in gallery; use this as a potential heuristic for
detecting move:

    /gallery/path/to/verryuniquename.html -> candidate as being canonical as the filename is unique !
    /example/old/location/verryuniquename.html > set rel=canonical to matplotlib.org/gallery/path/to/verryuniquename.html

    This took care of 185 files in example
    Leaving 271 "orphan"

3) Resolve chain of rel-canonical. Fix links that point to example
that themselves point to gallery.

    /gallery/path/to/verryuniquename.html
    /example/old/location/verryuniquename.html ->  matplotlib.org/gallery/path/to/verryuniquename.html
    /2.0.2/example/old/location/verryuniquename.html -> should point to /example/....
                                                       make it point to /gallery/....

No files with existing rel=canonical have been touched. This will
likely only affect pre-3.0 documentation files.
@Carreau
Copy link
Author

Carreau commented Jul 29, 2019

Need to look at the users/ doc as well likely.

@ImportanceOfBeingErnest
Copy link
Member

For users/ it might be more complicated, because partially it contains files from the actual docs, e.g.

https://matplotlib.org/users/navigation_toolbar.html
https://matplotlib.org/users/event_handling.html

and partially it contains files where the current version is actually in the tutorials or some other section, e.g.

https://matplotlib.org/users/annotations.html

should now really be

https://matplotlib.org/tutorials/text/annotations.html

@ImportanceOfBeingErnest
Copy link
Member

Is this ready to go? Shall we just merge this and see what happens?

Do you have some example search terms to be able to later verify if it had the desired effect?

@Carreau
Copy link
Author

Carreau commented Aug 8, 2019

Is this ready to go?

I believe it is.

Shall we just merge this and see what happens?

That would be great.

Do you have some example search terms to be able to later verify if it had the desired effect?

We can either look at the traffic on google analytics (though I don't have access).

My use case was the following search in google, matplotlib gallery, in incognito mode:

  1. https://matplotlib.org/gallery.html
  2. https://matplotlib.org/3.1.0/gallery/index.html
  3. https://matplotlib.org/2.1.1/gallery/index.html

Matplotlib examples also:

  1. https://matplotlib.org/examples/
  2. https://matplotlib.org/3.1.1/gallery/index.html

@Carreau
Copy link
Author

Carreau commented Aug 8, 2019

Note that there are more fixes that should come on top of these, but they are special case so not included.
For example all the whatever/index.html, should rel=canonical to whatever/, and the gallery.html should rel=canonical to gallery/; but those would be drowned in this PR.

@ImportanceOfBeingErnest ImportanceOfBeingErnest merged commit b7f4bea into matplotlib:master Aug 8, 2019
@Carreau
Copy link
Author

Carreau commented Aug 8, 2019

Thanks, much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants