Skip to content

Infrastructure: Upgrading pythontest service #122795

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
JacobCoffee opened this issue Aug 7, 2024 · 7 comments
Closed

Infrastructure: Upgrading pythontest service #122795

JacobCoffee opened this issue Aug 7, 2024 · 7 comments
Assignees

Comments

@JacobCoffee
Copy link
Member

JacobCoffee commented Aug 7, 2024

I'm commencing OS upgrades for the pythontest service to the latest Ubuntu LTS - which is used for some network tests in the CPython test suite (see lines referencing pythontest.net.)

No changes should be necessary on anyones end, but if anyone happens to run their tests with internet access enabled some may fail during this upgrade.

### Tasks
- [x] Host provisioned
- [x] Data migrated
- [x] Test suite tested against current `pythontest` service
- [x] Test suite tested against new `pythontest` service
- [x] Test suite tested against bogus endpoint to ensure fails indeed... fail
- [x] Old production host shut down
- [x] New production host swapped out
- [x] Test suite tested against replaced `pythontest` service
- [x] DNS records swapped in Route53
@JacobCoffee JacobCoffee self-assigned this Aug 7, 2024
@JacobCoffee
Copy link
Member Author

TTL is 7200 for the DNS entries so they will take awhile to roll over.

@JacobCoffee
Copy link
Member Author

DNS has propagated and tests are now passing

cpython [⇣1] via C v15.0.0-clang via  pyenv took 5s
➜ ./python.exe -m test -v test_urllib2net -u all -j16
== CPython 3.14.0a0 (heads/main:5b8a6c5186, Aug 6 2024, 16:03:59) [Clang 15.0.0 (clan
== macOS-14.5-arm64-arm-64bit-Mach-O little-endian
== Python build: debug
== cwd: /Users/coffee/git/internal/python/cpython/build/test_python_worker_20375æ
== CPU count: 16
== encodings: locale=UTF-8 FS=utf-8
== resources: all

Using random seed: 681115882
0:00:00 load avg: 2.32 Run 1 test in parallel using 1 worker process
0:00:05 load avg: 2.21 [1/1] test_urllib2net passed
test_close (test.test_urllib2net.CloseSocketTest.test_close) ... ok
test_custom_headers (test.test_urllib2net.OtherNetworkTests.test_custom_headers) ... ok
test_file (test.test_urllib2net.OtherNetworkTests.test_file) ... ok
test_ftp (test.test_urllib2net.OtherNetworkTests.test_ftp) ... ok
test_redirect_url_withfrag (test.test_urllib2net.OtherNetworkTests.test_redirect_url_withfrag) ... ok
test_sites_no_connection_close (test.test_urllib2net.OtherNetworkTests.test_sites_no_connection_close) ... skipped 'XXX: http://www.imdb.com is gone'
test_urlwithfrag (test.test_urllib2net.OtherNetworkTests.test_urlwithfrag) ... ok
test_ftp_basic (test.test_urllib2net.TimeoutTest.test_ftp_basic) ... ok
test_ftp_default_timeout (test.test_urllib2net.TimeoutTest.test_ftp_default_timeout) ... ok
test_ftp_no_timeout (test.test_urllib2net.TimeoutTest.test_ftp_no_timeout) ... ok
test_ftp_timeout (test.test_urllib2net.TimeoutTest.test_ftp_timeout) ... ok
test_http_basic (test.test_urllib2net.TimeoutTest.test_http_basic) ... ok
test_http_default_timeout (test.test_urllib2net.TimeoutTest.test_http_default_timeout) ... ok
test_http_no_timeout (test.test_urllib2net.TimeoutTest.test_http_no_timeout) ... ok
test_http_timeout (test.test_urllib2net.TimeoutTest.test_http_timeout) ... ok

----------------------------------------------------------------------
Ran 15 tests in 5.375s

OK (skipped=1)

== Tests result: SUCCESS ==

1 test OK.

Total duration: 5.6 sec
Total tests: run=15 skipped=1
Total test files: run=1/1
Result: SUCCESS

@hugovk
Copy link
Member

hugovk commented Aug 7, 2024

Confirmed PRs are now passing, thank you for keeping the infra up-to-date!

@ned-deily
Copy link
Member

We are seeing some ftp tests failing from macOS and iOS buildbots / CI with pythontest.net vsftpd apparently returning 500 OOPS: vsf_sysutil_bind errors. I think these are new errors but can't swear to it. @freakboy3742 noticed the failures with the iOS buildbots. Sounds like it might have to do with not enough passive ports or something similar. It could be that connections are taking a while to time out from some buildbot or CI instances. This is one of those things that just worked for years so perhaps the old OS configuration had tweaked something?

Here are two examples from today:

https://github.com/python/cpython/actions/runs/10380012215/job/28739153641
https://buildbot.python.org/#/builders/1380/builds/1068/steps/10/logs/stdio

@freakboy3742
Copy link
Contributor

FWIW - I've also seen the failure on Linux as well (see 1, 2).

There's possibly an interwoven issue with socket_helper. transient_internet; these errors all appear to be 500 service errors from an FTP service, and the helper is choking because that particular error doesn't set reason on the error, so what would be normally be marked as a resource skip is raising as an error.

@ewdurbin
Copy link
Member

Looks like we are only opening 10 passive ports, which is the same as past hosts. Seems reasonable to just extend that further to ~100 ports.

vsftpd.conf: https://github.com/python/psf-salt/blob/f348b8a70632f5bc238bba6fd826d5fb4f8cf6f6/salt/pythontest/config/vsftpd.conf#L35-L42
firewall config: https://github.com/python/psf-salt/blob/f348b8a70632f5bc238bba6fd826d5fb4f8cf6f6/pillar/base/firewall/ftp.sls#L7

@JacobCoffee
Copy link
Member Author

This is deployed so if you continue to see errors let me know

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants