Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent HTTP 403 on fetch_california_housing and other Figshare hosted data on Azure CI #30761

Open
lesteve opened this issue Feb 3, 2025 · 2 comments

Comments

@lesteve
Copy link
Member

lesteve commented Feb 3, 2025

Already noticed in #30636 (comment).

This seems to happen from time to time in doctests (build log) or in other places (build log)

Error in doctests
=================================== FAILURES ===================================
�[31m�[1m________________________ [doctest] getting_started.rst _________________________�[0m
167 the best set of parameters. Read more in the :ref:`User Guide
168 <grid_search>`::
169 
170   >>> from sklearn.datasets import fetch_california_housing
171   >>> from sklearn.ensemble import RandomForestRegressor
172   >>> from sklearn.model_selection import RandomizedSearchCV
173   >>> from sklearn.model_selection import train_test_split
174   >>> from scipy.stats import randint
175   ...
176   >>> X, y = fetch_california_housing(return_X_y=True)
UNEXPECTED EXCEPTION: <HTTPError 403: 'Forbidden'>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/doctest.py", line 1395, in __run
    exec(compile(example.source, filename, "single",
    ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                 compileflags, True), test.globs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<doctest getting_started.rst[33]>", line 1, in <module>
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/sklearn/utils/_param_validation.py", line 218, in wrapper
    return func(*args, **kwargs)
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/sklearn/datasets/_california_housing.py", line 177, in fetch_california_housing
    archive_path = _fetch_remote(
        ARCHIVE,
    ...<2 lines>...
        delay=delay,
    )
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/sklearn/datasets/_base.py", line 1513, in _fetch_remote
    urlretrieve(remote.url, temp_file_path)
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/urllib/request.py", line 214, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
                            ~~~~~~~^^^^^^^^^^^
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/urllib/request.py", line 189, in urlopen
    return opener.open(url, data, timeout)
           ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/urllib/request.py", line 495, in open
    response = meth(req, response)
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/urllib/request.py", line 604, in http_response
    response = self.parent.error(
        'http', request, response, code, msg, hdrs)
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/urllib/request.py", line 533, in error
    return self._call_chain(*args)
           ~~~~~~~~~~~~~~~~^^^^^^^
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/urllib/request.py", line 466, in _call_chain
    result = func(*args)
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/urllib/request.py", line 613, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
�[1m�[31m/home/vsts/work/1/s/doc/getting_started.rst�[0m:176: UnexpectedException
�[31m�[1m____________________________ [doctest] compose.rst _____________________________�[0m
285 the regressor that will be used for prediction, and the transformer that will
286 be applied to the target variable::
287 
288   >>> import numpy as np
289   >>> from sklearn.datasets import fetch_california_housing
290   >>> from sklearn.compose import TransformedTargetRegressor
291   >>> from sklearn.preprocessing import QuantileTransformer
292   >>> from sklearn.linear_model import LinearRegression
293   >>> from sklearn.model_selection import train_test_split
294   >>> X, y = fetch_california_housing(return_X_y=True)
UNEXPECTED EXCEPTION: <HTTPError 403: 'Forbidden'>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/doctest.py", line 1395, in __run
    exec(compile(example.source, filename, "single",
    ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                 compileflags, True), test.globs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<doctest compose.rst[59]>", line 1, in <module>
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/sklearn/utils/_param_validation.py", line 218, in wrapper
    return func(*args, **kwargs)
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/sklearn/datasets/_california_housing.py", line 177, in fetch_california_housing
    archive_path = _fetch_remote(
        ARCHIVE,
    ...<2 lines>...
        delay=delay,
    )
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/sklearn/datasets/_base.py", line 1513, in _fetch_remote
    urlretrieve(remote.url, temp_file_path)
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/urllib/request.py", line 214, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
                            ~~~~~~~^^^^^^^^^^^
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/urllib/request.py", line 189, in urlopen
    return opener.open(url, data, timeout)
           ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/urllib/request.py", line 495, in open
    response = meth(req, response)
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/urllib/request.py", line 604, in http_response
    response = self.parent.error(
        'http', request, response, code, msg, hdrs)
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/urllib/request.py", line 533, in error
    return self._call_chain(*args)
           ~~~~~~~~~~~~~~~~^^^^^^^
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/urllib/request.py", line 466, in _call_chain
    result = func(*args)
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/urllib/request.py", line 613, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
�[1m�[31m/home/vsts/work/1/s/doc/modules/compose.rst�[0m:294: UnexpectedException
�[36m�[1m=========================== short test summary info ============================�[0m
�[31mFAILED�[0m ../1/s/doc/getting_started.rst::�[1mgetting_started.rst�[0m
�[31mFAILED�[0m ../1/s/doc/modules/compose.rst::�[1mcompose.rst�[0m
�[31m======= �[31m�[1m2 failed�[0m, �[32m39 passed�[0m, �[33m2 skipped�[0m, �[33m39 warnings�[0m�[31m in 86.94s (0:01:26)�[0m�[31m ========�[0m
Internal Pytest error (error in conftest.py when downloading all the datasets)
============================= test session starts ==============================
platform linux -- Python 3.13.1, pytest-8.3.4, pluggy-1.5.0
rootdir: /home/vsts/work/tmp_folder
configfile: setup.cfg
plugins: scipy_doctest-1.6, cov-6.0.0, xdist-3.6.1
created: 2/2 workers
2 workers [38211 items]

INTERNALERROR> def worker_internal_error(
INTERNALERROR>         self, node: WorkerController, formatted_error: str
INTERNALERROR>     ) -> None:
INTERNALERROR>         """
INTERNALERROR>   File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/_pytest/main.py", line 283, in wrap_session
INTERNALERROR>     session.exitstatus = doit(config, session) or 0
INTERNALERROR>                          ~~~~^^^^^^^^^^^^^^^^^
INTERNALERROR>   File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/_pytest/main.py", line 337, in _main
INTERNALERROR>     config.hook.pytest_runtestloop(session=session)
INTERNALERROR>     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
INTERNALERROR>   File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/pluggy/_hooks.py", line 513, in __call__
INTERNALERROR>     return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
INTERNALERROR>            ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
INTERNALERROR>   File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/pluggy/_manager.py", line 120, in _hookexec
INTERNALERROR>     return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
INTERNALERROR>            ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
INTERNALERROR>   File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/pluggy/_callers.py", line 182, in _multicall
INTERNALERROR>     return outcome.get_result()
INTERNALERROR>            ~~~~~~~~~~~~~~~~~~^^
INTERNALERROR>   File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/pluggy/_result.py", line 100, in get_result
INTERNALERROR>     raise exc.with_traceback(exc.__traceback__)
INTERNALERROR>   File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/pluggy/_callers.py", line 167, in _multicall
INTERNALERROR>     teardown.throw(outcome._exception)
INTERNALERROR>     ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
INTERNALERROR>   File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/_pytest/logging.py", line 803, in pytest_runtestloop
INTERNALERROR>     return (yield)  # Run all the tests.
INTERNALERROR>             ^^^^^
INTERNALERROR>   File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/pluggy/_callers.py", line 167, in _multicall
INTERNALERROR>     teardown.throw(outcome._exception)
INTERNALERROR>     ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
INTERNALERROR>   File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/_pytest/terminal.py", line 673, in pytest_runtestloop
INTERNALERROR>     result = yield
INTERNALERROR>              ^^^^^
INTERNALERROR>   File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/pluggy/_callers.py", line 103, in _multicall
INTERNALERROR>     res = hook_impl.function(*args)
INTERNALERROR>   File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/xdist/dsession.py", line 138, in pytest_runtestloop
INTERNALERROR>     self.loop_once()
INTERNALERROR>     ~~~~~~~~~~~~~~^^
INTERNALERROR>   File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/xdist/dsession.py", line 163, in loop_once
INTERNALERROR>     call(**kwargs)
INTERNALERROR>     ~~~~^^^^^^^^^^
INTERNALERROR>   File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/xdist/dsession.py", line 218, in worker_workerfinished
INTERNALERROR>     self._active_nodes.remove(node)
INTERNALERROR>     ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
INTERNALERROR> KeyError: <WorkerController gw0>

I don't think this happens often enough to bother us right now but if it starts happening more often we should contact Figshare support and tell them.

I did something similar last time this was happening (on Colab and Kaggle notebooks) #28297 (comment) and in the end they fixed it.

My guess is that this is somehow triggering an anti-abuse mechanism ...

@lesteve lesteve changed the title HTTP 403 on fetch_california_housing on Azure CI Intermittent HTTP 403 on fetch_california_housing on Azure CI Feb 3, 2025
@ogrisel
Copy link
Member

ogrisel commented Feb 3, 2025

Also for fetch_kddcup99 as detected today in #30753.

@lesteve lesteve changed the title Intermittent HTTP 403 on fetch_california_housing on Azure CI Intermittent HTTP 403 on fetch_california_housing and other Figshare hosted data on Azure CI Feb 12, 2025
@lesteve
Copy link
Member Author

lesteve commented Feb 17, 2025

Seen in build log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants