Skip to content

Duplicate uploads of nightly wheels to scipy-wheels-nightly Anaconda cloud package index fails #22757

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
matthewfeickert opened this issue Apr 1, 2022 · 27 comments · Fixed by #22759

Comments

@matthewfeickert
Copy link
Contributor

Duplicate uploads also fail.

I think that is OK though?

We have at least one build on main per day most days and I am not sure it is worth the logic to not get rejected on the days we do not (or on days when someone has pushed the button to run it early).

If no one minds I don't mind throwing in some additional logic to check just before the upload stage. This is pretty easy with pip index versions and some sed

$ python -m pip index \
    --index-url https://pypi.anaconda.org/scipy-wheels-nightly/simple \
    --pre \
    versions matplotlib | \
  grep matplotlib | \
  sed 's/.*(\(.*\))/\1/'
WARNING: pip index is currently an experimental command. It may be removed/changed in a future release without prior warning.
3.6.0.dev1948+gd8ede1a710

So all you'd need to do would be something like

$ LAST_NIGHTLY_VERSION="$(python -m pip index \
    --index-url https://pypi.anaconda.org/scipy-wheels-nightly/simple \
    --pre \
    versions matplotlib | \
  grep matplotlib | \
  sed 's/.*(\(.*\))/\1/')"
WARNING: pip index is currently an experimental command. It may be removed/changed in a future release without prior warning.
$ echo "${LAST_NIGHTLY_VERSION}"
3.6.0.dev1948+gd8ede1a710

and then just check if that shows up as version of the wheels that just got downloaded from the GitHub Actions workflow artifact

$ [ "$(find dist -type f -iname "matplotlib-${LAST_NIGHTLY_VERSION}*.whl" | wc --lines)" -gt "0" ]
$ echo $?
0

(maybe a more elegant way to do that but that's not the worst)

I'll make an Issue from this and I'm happy to PR it, unless you would prefer to avoid touching it as much as possible.

Originally posted by @matthewfeickert in #22733 (comment)

@tacaswell
Copy link
Member

If you are motivated I do not think we will turn down the improvement! I just would not necessarily ask anyone to do it if that makes sense.

@matthewfeickert
Copy link
Contributor Author

If you are motivated I do not think we will turn down the improvement! I just would not necessarily ask anyone to do it if that makes sense.

Yup! Makes sense to me. 🙂

@QuLogic
Copy link
Member

QuLogic commented Apr 1, 2022

The output says -i/--interactive or --force or --skip; would --skip not work?

@matthewfeickert
Copy link
Contributor Author

matthewfeickert commented Apr 1, 2022

The output says -i/--interactive or --force or --skip; would --skip not work?

Oh snap I should have read the error message more.

Distribution already exists. Please use the -i/--interactive or --force or --skip options or `anaconda remove scipy-wheels-nightly/matplotlib/3.6.0.dev1948+gd8ede1a710/matplotlib-3.6.0.dev1948+gd8ede1a710-cp310-cp310-macosx_10_12_universal2.whl

Yeah

$ docker run --rm -ti python:3.10 /bin/bash
root@0ad6f9d32f6b:/# python -m venv venv && . venv/bin/activate
(venv) root@0ad6f9d32f6b:/# python -m pip --quiet install --upgrade pip setuptools wheel
(venv) root@0ad6f9d32f6b:/# python -m pip --quiet install git+https://github.com/Anaconda-Server/anaconda-client
(venv) root@0ad6f9d32f6b:/# anaconda upload --help
usage: anaconda upload [-h] [-c CHANNELS] [-l LABELS] [--no-progress] [-u USER] [--all] [-p PACKAGE] [-v VERSION] [-s SUMMARY] [-t PACKAGE_TYPE]
                       [-d DESCRIPTION] [--thumbnail THUMBNAIL] [--private] [--no-register | --register] [--build-id BUILD_ID]
                       [-i | -f | --force | --skip-existing]
                       files [files ...]

Upload packages to your Anaconda repository

positional arguments:
  files                 Distributions to upload

options:
  -h, --help            show this help message and exit
  -c CHANNELS, --channel CHANNELS
                        [DEPRECATED] Add this file to a specific channel. Warning: if the file channels do not include "main", the file will not show
                        up in your user channel
  -l LABELS, --label LABELS
                        Add this file to a specific label. Warning: if the file labels do not include "main", the file will not show up in your user
                        label
  --no-progress         Don't show upload progress
  -u USER, --user USER  User account or Organization, defaults to the current user
  --all                 Use conda convert to generate packages for all platforms and upload them
  --no-register         Don't create a new package namespace if it does not exist
  --register            Create a new package namespace if it does not exist
  --build-id BUILD_ID   Anaconda repository Build ID (internal only)
  -i, --interactive     Run an interactive prompt if any packages are missing
  -f, --fail            Fail if a package or release does not exist (default)
  --force               Force a package upload regardless of errors
  --skip-existing       Skip errors on package batch upload if it already exists

metadata options:
  -p PACKAGE, --package PACKAGE
                        Defaults to the package name in the uploaded file
  -v VERSION, --version VERSION
                        Defaults to the package version in the uploaded file
  -s SUMMARY, --summary SUMMARY
                        Set the summary of the package
  -t PACKAGE_TYPE, --package-type PACKAGE_TYPE
                        Set the package type [env, ipynb]. Defaults to autodetect
  -d DESCRIPTION, --description DESCRIPTION
                        description of the file(s)
  --thumbnail THUMBNAIL
                        Notebook's thumbnail image
  --private             Create the package with private access

    anaconda upload CONDA_PACKAGE_1.bz2
    anaconda upload notebook.ipynb
    anaconda upload environment.yml

##### See Also

  * [Uploading a Conda Package](https://docs.anaconda.com/anaconda-repository/user-guide/tasks/pkgs/use-pkg-managers/#uploading-a-conda-package)
  * [Uploading a PyPI Package](https://docs.anaconda.com/anaconda-repository/user-guide/tasks/pkgs/use-pkg-managers/#uploading-pypi-packages)

so yeah. Just adding --skip-existing should do it. Thanks for catching this @QuLogic!

@matthewfeickert
Copy link
Contributor Author

matthewfeickert commented Apr 2, 2022

So, a related question to all of this is how and when to do cleanup of the wheels that are uploaded to https://anaconda.org/scipy-wheels-nightly/matplotlib. At the moment, there are two nights of wheels uploaded at https://anaconda.org/scipy-wheels-nightly/matplotlib/files

$ python -m pip index --index-url https://pypi.anaconda.org/scipy-wheels-nightly/simple --pre versions matplotlib
WARNING: pip index is currently an experimental command. It may be removed/changed in a future release without prior warning.
matplotlib (3.6.0.dev1954+g6c3412baf6)
Available versions: 3.6.0.dev1954+g6c3412baf6, 3.6.0.dev1948+gd8ede1a710

but eventually these need to get cleaned up so that there aren't multiple GBs of nightly wheels. Though this clearly isn't a problem, as looking at the other projects in https://anaconda.org/scipy-wheels-nightly/repo and their queries with

$ anaconda show scipy-wheels-nightly/<project name>

makes it clear that they have a much longer upload history than

$ anaconda show scipy-wheels-nightly
Using Anaconda API: https://api.anaconda.org
Username: scipy-wheels-nightly
Member since: Mon Feb 10 12:40:19 2020
  +company: None
  +description: None
  +location: None
  +name: None
  +url: None
  +user_type: org
Packages:
     Name                      |  Version | Access       | Package Types   | Platforms       | Builds    
     ------------------------- |   ------ | ------------ | --------------- | --------------- | ----------
     scipy-wheels-nightly/dipy | 1.6.0.dev0 | public       | pypi            | []              |           
     scipy-wheels-nightly/h5py |    3.6.0 | public       | pypi            | []              |           
     scipy-wheels-nightly/matplotlib | 3.6.0.dev1948+gd8ede1a710 | public       | pypi            | []              |           
     scipy-wheels-nightly/numpy | 1.23.0.dev0+749.ga41e64367 | public       | pypi            | []              |           
     scipy-wheels-nightly/pandas | 1.5.0.dev0+112.g21b7dafcb2 | public       | pypi            | []              |           
     scipy-wheels-nightly/scikit-image | 0.19.0.dev0 | public       | pypi            | []              |           
     scipy-wheels-nightly/scikit-learn | 1.1.dev0 | public       | pypi            | []              |           
     scipy-wheels-nightly/scipy | 1.9.0.dev0+1049.c9cdbf2 | public       | pypi            | []              |           
     scipy-wheels-nightly/statsmodels | 0.14.0.dev0 | public       | pypi            | []              |           
$ anaconda show scipy-wheels-nightly/matplotlib
Using Anaconda API: https://api.anaconda.org
Name:    matplotlib
Summary: 
Access:  public
Package Types:  pypi
Versions:
   + 3.6.0.dev1948+gd8ede1a710
   + 3.6.0.dev1954+g6c3412baf6

To install this package with pypi run:
     pip install -i https://pypi.anaconda.org/scipy-wheels-nightly/simple matplotlib

Though if the matplotlib dev team wanted to treat the nightlies truly as only the most recent nightly (which to me makes sense) it might be worth looking into running something like

$ anaconda remove scipy-wheels-nightly/matplotlib

before upload to delete all previous nighties (...or if you need to drill down all the way to the file name then perhaps it makes sense to run the anaconda remove command after upload of the new nightlies so that in the worse case scenario where there is a problem you've at least deployed without deleting all of the old nightlies first).

$ anaconda remove --help
usage: anaconda remove [-h] [-f] specs [specs ...]

Remove an object from your Anaconda repository.

example::

    anaconda remove sean/meta/1.2.0/meta.tar.gz

positional arguments:
  specs        Package written as <user>[/<package>[/<version>[/<filename>]]]

options:
  -h, --help   show this help message and exit
  -f, --force  Do not prompt removal

So that could look like

Before upload:

# Get last nightly upload version
anaconda show scipy-wheels-nightly/matplotlib &> old_versions.txt
OLD_NIGHTLY_VERSION="$(grep dev old_versions.txt | tail --lines 1 | awk '{print $NF}')"

After upload

# Remove previous nightly upload version now that new upload is done
anaconda --token ${{ secrets.ANACONDA_ORG_UPLOAD_TOKEN }} remove \
  --force \
  "scipy-wheels-nightly/matplotlib/${OLD_NIGHTLY_VERSION}"

I would want one of the project maintainers that know ANACONDA_ORG_UPLOAD_TOKEN to try this manually first to make sure that there aren't any snags with the CLI API.

Example:

anaconda --token "${ANACONDA_ORG_UPLOAD_TOKEN}" remove \
  --force \
  scipy-wheels-nightly/matplotlib/3.6.0.dev1948+gd8ede1a710  # First nightly wheel to be uploaded

@tacaswell
Copy link
Member

On one hand, 👍 for thinking about resource exhaustion, that is the sort of thing I tend to not think about until it is actually a problem. I think we should leave the clean up policy to the nightly repo / @ogrisel as we should do what ever else everyone else is doing. It also makes sense to me to enforce that policy centrally (rather than on a per-uploader) .

I also think there is value to having a decent range of built nightlies as it will let you run bisects without needing to be able to build. Not sure .how annoying that would be to actually do, but seems plausible.

@matthewfeickert
Copy link
Contributor Author

I think we should leave the clean up policy to the nightly repo / @ogrisel as we should do what ever else everyone else is doing. It also makes sense to me to enforce that policy centrally (rather than on a per-uploader) .

SGTM 👍

@ogrisel I'm going to consider this closed and that it is in the domain of scipy-wheels-nightly org now, but if there are actions that you want matplotlib to take please just respond here. Thanks!

I also think there is value to having a decent range of built nightlies as it will let you run bisects without needing to be able to build. Not sure .how annoying that would be to actually do, but seems plausible.

Ah that's a good point that I hadn't even considered.

@matthewfeickert
Copy link
Contributor Author

matthewfeickert commented May 31, 2022

@ogrisel the nightly wheel uploads are failing as the scipy-wheels-nightly matplotlib storage is full:

Error:  ('Storage requirements exceeded (53687091200 bytes). Payment is required to add a file. Please go to https://anaconda.org/binstar.settings/billing to update your plan', 402)
Error: Process completed with exit code 1.

Can you comment on how you would like removal of old wheels to work?

@ogrisel
Copy link

ogrisel commented May 31, 2022

Sorry I had not seen the notifications of the previous discussion.

Indeed I think it would be great to have shared script to automatically clean-up old nightly files and only keep the 5 most recent dev wheels for a given project and platform spec for instance. Assuming one dev build per day, that's approximately one week of history which might be helpful to avoid deleting wheels that might still be used by automated systems with a bit of lag between successive steps.

Note that scikit-learn does not cause too much space usage because with use the fixed .dev0 suffix. But we might want to use a more precise number in the future.

@ogrisel
Copy link

ogrisel commented May 31, 2022

In the mean time we can do a bit of manual clean-up to avoid further errors in the short term.

@matthewfeickert
Copy link
Contributor Author

Indeed I think it would be great to have shared script to automatically clean-up old nightly files and only keep the 5 most recent dev wheels for a given project and platform spec for instance.

That sounds quite useful. 👍 I'm not part of the dev team on any of the scipy-wheels-nightly projects, but if I can be of any help in getting this working let me know.

@tacaswell
Copy link
Member

In the interest of un-blocking things I just went and manually deleted ~20 pages (at 50/pg) of files so hopefully we have unbroken upload for everyone and gotten some head-room to sort this out.

@tacaswell
Copy link
Member

Pandas has 8000 files (at ~10MB / file), numpy has 600 files (at 10-17MB/file), dipy re-uses the .dev0 name, statsmodel has 1300 files (at just under 10MB each), h5py has ~100 (and a really odd upload pattern...I think we are just slow on that project). scipy has 350 files at (37MB - 50MB), but has not uploaded in ~5 months, scikit-image has less that 1 page of files, but has not uploaded in a year.

@tacaswell
Copy link
Member

Not sure I did enough though:

image

I'm hesitant to start cleaning out other projects files.

@tacaswell
Copy link
Member

After poking a bit more I found the information I was trying to summarize above the hard way just directly in the UI 🤦🏻

image

@matthewfeickert
Copy link
Contributor Author

@ogrisel The nightly matplotlib wheel uploads have run out of space again. While I assume that @tacaswell will manually delete files again to fix this, it would be nice to see if we can get something like what you described in #22757 (comment) before the next time this hits.

Has there been any discussion on this with the other projects? If not, I could work with the matplotlib dev team to try to make a script that works for matplotlib and runs as part of the nightly upload and then it could be generalized/made part of the scipy-wheels-nightly org infrastructure.

@ogrisel
Copy link

ogrisel commented Jun 24, 2022

That would be great, thanks!

@tacaswell
Copy link
Member

I just accidentally took out all of the Matplotlib wheels....Going to go re-run the nightly job so we get one version up.

@tacaswell
Copy link
Member

I think the cap is on the org, @ogrisel who would you feel comfortable giving me permission to purge the pandas files?

image

@matthewfeickert
Copy link
Contributor Author

matthewfeickert commented Jun 24, 2022

That would be great, thanks!

@tacaswell while I am hopefully going to have time to start looking at this before SciPy 2022, as this script will require authentication for testing properly this might benefit from being worked on together at the matplotlib sprints.

edit: Done in PR #23349

@matthewfeickert
Copy link
Contributor Author

matthewfeickert commented Jun 28, 2022

That would be great, thanks!

@ogrisel what are your thoughts on a scipy-wheels-nightly org hosted GitHub action? c.f. #23349 (comment)

@matthewfeickert
Copy link
Contributor Author

@ogrisel gentle ping on this so that if it is of interest we can make some movement on it during the SciPy 2022 sprints this weekend.

@matthewfeickert
Copy link
Contributor Author

matthewfeickert commented Aug 10, 2022

@ogrisel ping on this again to get your thoughts.

Also @mattip, @rgommers, and @charris (Charles is manually doing the SciPy removal at the moment), given your interest in anaconda/anaconda-client#540 (comment) and the creation of tools/wheels/upload_wheels.sh in andyfaff/scipy#28 would you have any interest in a scipy-wheels-nightly org hosted GitHub action as well? c.f. #23349 for additional context.

@matthewfeickert matthewfeickert mentioned this issue Aug 10, 2022
11 tasks
@ogrisel
Copy link

ogrisel commented Aug 10, 2022

Thanks @matthewfeickert for your efforts and sorry for the slow replies on my end. Would you mind starting a collaboration effort on https://discuss.scientific-python.org/ ?

Maybe we could have an official SPEC for publishing nightly wheels and a policy and shared tools to avoid shared resource exhaustion.

@tacaswell where did you find the information displayed in #22757 (comment) ?

It would be nice to get some pandas dev onboard given the fact that this is the project that currently uses the most shared resources.

As a scikit-learn developer I do not feel the same pressure to fix the problem given our comparatively small storage space usage according to the UI.

@mattip
Copy link
Contributor

mattip commented Aug 10, 2022

@ogrisel the information is available to the admins at https://anaconda.org/scipy-wheels-nightly/settings/storage

@mattip
Copy link
Contributor

mattip commented Aug 10, 2022

would you have any interest in a scipy-wheels-nightly org hosted GitHub action

I think it is fine to duplicate a 40 line code snippet to the 9 projects uploading to the site. Some of the uploads are done from travis (for aarch64 and ppc64) so a github action will not work there.

@matthewfeickert
Copy link
Contributor Author

Would you mind starting a collaboration effort on https://discuss.scientific-python.org/ ?

Maybe we could have an official SPEC for publishing nightly wheels and a policy and shared tools to avoid shared resource exhaustion.

Done in https://discuss.scientific-python.org/t/interest-in-github-action-for-scipy-wheels-nightly-uploads-and-removals/397. 👍 I'll move all further discussion there so that we don't keep this Issue more active then need be. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants