Skip to content

Reconnect also on ldap.UNAVAILABLE #267

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

spaceone
Copy link
Contributor

@spaceone spaceone commented Mar 16, 2019

LDAP connections may also raise UNAVAILABLE not only SERVER_DOWN. It would be nice if the reconnection class also reconnects in this case.

Below is an example script which can provoke the scenario:

import time
import threading
import subprocess

def foo():
    lo = ...  # initialize here!
    lo._retry_max = 10E4
    lo._retry_delay = .001
    x = 0
    lo.search_ext_s("dc=base", ldap.SCOPE_SUBTREE, "uid=Administrator", attr=["uid"])
    s = time.time()
    print("go")
    while (time.time() - s) < run_time:
        x += 1
        lo.search(filter="uid=Administrator", attr=["uid"])
    print("Searches per sec: {}".format(x / run_time))


thread_count = 100
run_time = 10.0
my_thread = [None] * thread_count
for i in range(0, thread_count):
    my_thread[i] = threading.Thread(target=foo)
for t in my_thread:
    t.start()
print("warmup")
time.sleep(3)
print("restart slapd")
subprocess.check_call(["systemctl", "restart", "slapd.service"])
print("restarted")
for t in my_thread:
    t.join()

print("done")

@spaceone
Copy link
Contributor Author

The same applies to ldap.CONNECT_ERROR. I can adjust the PR if you agree to add also this.

@codecov
Copy link

codecov bot commented Mar 16, 2019

Codecov Report

Merging #267 into master will not change coverage.
The diff coverage is 100%.

@@          Coverage Diff           @@
##           master    #267   +/-   ##
======================================
  Coverage    71.1%   71.1%           
======================================
  Files          49      49           
  Lines        4818    4818           
  Branches      812     812           
======================================
  Hits         3426    3426           
- Misses       1056    1057    +1     
+ Partials      336     335    -1
Impacted Files Coverage Δ
Lib/ldap/ldapobject.py 76.38% <100%> (ø) ⬆️
Lib/ldap/controls/deref.py 57.14% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update adf4761...48c878f. Read the comment docs.

@spaceone
Copy link
Contributor Author

Also ldap.TIMEOUT and ldap.TIMELIMIT_EXCEEDED.

@spaceone
Copy link
Contributor Author

spaceone commented Apr 12, 2019

I adjusted the Pull Request so that one can specify which exceptions should be catched for reconnecting. Therefore it would be extensible and if you want backwards compatibility.

If you plan to merge this tell me, I would squash the first and second commit.

@spaceone
Copy link
Contributor Author

@tiran @encukou Any comment on this?

pmhahn pushed a commit to univention/univention-corporate-server that referenced this pull request Aug 12, 2019
pmhahn pushed a commit to univention/univention-corporate-server that referenced this pull request Aug 12, 2019
pmhahn pushed a commit to univention/univention-corporate-server that referenced this pull request Aug 12, 2019
@spaceone
Copy link
Contributor Author

There is another issue, which is triggered, when re-using a connection after a ldap.SERVER_DOWN connection. Then the reconnection does not apply as well:

import ldap
import subprocess

lo = ldap.ldapobject.ReconnectLDAPObject(uri)
dn = lo.whoami_s()[3:]

# stop open ldap
subprocess.call(['service', 'slapd', 'stop'])

# do a search, wait for the timeout, handle SERVER_DOWN exception
try:
    print(lo.search_s(dn, ldap.SCOPE_BASE, '(objectClass=*)'))
except ldap.SERVER_DOWN:
    pass  # some handling here

# start open ldap again
subprocess.call(['service', 'slapd', 'start'])

# try to use the connection again
print(lo.search_s(dn, ldap.SCOPE_BASE, '(objectClass=*)'))
# this now raises ldap.INSUFFICIENT_ACCESS, as it is now a unbound connection and does everything anonymously.

@encukou
Copy link
Member

encukou commented Jan 29, 2020

Is it possible to test this?
The existing tests fail, and need to be adjusted.

Copy link
Member

@tiran tiran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please rebase your PR and provide tests and documentation updates (.. versionadded:: 3.3)?

@spaceone spaceone force-pushed the reconnect-on-unavailable branch 5 times, most recently from e214b39 to 00e7186 Compare November 16, 2020 09:49
@spaceone
Copy link
Contributor Author

@tiran Ok, I implemented the 2 new test cases. I would revert making reconnection_exceptions configurable, if you agree. I consider it bad style now. We should just add all exceptions which may happen. I wanted to add the parameter so that if we encounter more exceptions in the future, e.g. in different ldap server implementation that it is easier to fix the behavior instead of waiting for a new python-ldap release.

And then I would revert the changes once so that we see that TravisCI actually really fails with the new test cases.

@spaceone spaceone requested a review from tiran December 12, 2020 00:04
@spaceone spaceone force-pushed the reconnect-on-unavailable branch from 00e7186 to 8a5db63 Compare December 12, 2020 00:21
@spaceone
Copy link
Contributor Author

spaceone commented Oct 1, 2021

@tiran ping.

@mistotebe
Copy link
Contributor

Yes, CONNECT_ERROR should definitely be handled, I can see why it could be appropriate for UNAVAILABLE as well.

I have some reservations to handling TIMELIMIT_EXCEEDED in this way (and to some extent TIMEOUT too). The original server is still likely to be contactable, so libldap will use the first URL in the list and you're likely to get the same error back, making no progress?

Not sure what to do when the first server in the list is contactable but broken (TLS, ...), libldap might not take that as a clue to try the next one so we'd need to shuffle the URI list manually?

Regarding the test cases, can't see how they're going to fail, especially the threaded one?

@spaceone
Copy link
Contributor Author

The exceptions come from real-life customer environments where they occurred.
IIRC the test case uses threading to trigger the exception UNAVAILABLE with openldap reliable and faster.

@mistotebe
Copy link
Contributor

The exceptions come from real-life customer environments where they occurred.

Please describe a situation where python-ldap has enough information to retry TIMELIMIT_EXCEEDED exceptions in a useful way (and why it wouldn't end up reconnecting to the same host for that matter).

IIRC the test case uses threading to trigger the exception UNAVAILABLE with openldap reliable and faster.

The point is that test_107_reconnect_restore doesn't seem to check for anything so AFAIK the test will still pass even if the thread fails in any way whatsoever?

@spaceone
Copy link
Contributor Author

spaceone commented Mar 15, 2022

The exceptions come from real-life customer environments where they occurred.

Please describe a situation where python-ldap has enough information to retry TIMELIMIT_EXCEEDED exceptions in a useful way (and why it wouldn't end up reconnecting to the same host for that matter).

I can't explain why this happened on those Systems. I can only provide the information that it happened multiple times against OpenLDAP 2.4.

IIRC the test case uses threading to trigger the exception UNAVAILABLE with openldap reliable and faster.

The point is that test_107_reconnect_restore doesn't seem to check for anything so AFAIK the test will still pass even if the thread fails in any way whatsoever?

But the test will fail if you drop the exception handling.
I am unsure how I can add a test condition there. Maybe I have to mock the reconnect() method and make sure inside the exception is raised?

@mistotebe
Copy link
Contributor

Please describe a situation where python-ldap has enough information to retry TIMELIMIT_EXCEEDED exceptions in a useful way (and why it wouldn't end up reconnecting to the same host for that matter).

I can't explain why this happened on those Systems. I can only provide the information that it happened multiple times against OpenLDAP 2.4.

That wasn't my question, if you get a TIMELIMIT_EXCEEDED, how do you know this is because a server is down and retrying will pick up a different one? With an overloaded/misconfigured server, the retry will fail the same and the application still needs to be ready to do whatever is needed to reconsider/retry, including shuffling the URI list. I don't object to making the exception list configurable, however spaghetti this code looks already (using something like python-tenacity would have been much better). Then it's up to the application to decide what kinds of errors are safe to retry from automatically and which ones would need to be handled by the caller.

The point is that test_107_reconnect_restore doesn't seem to check for anything so AFAIK the test will still pass even if the thread fails in any way whatsoever?

But the test will fail if you drop the exception handling. I am unsure how I can add a test condition there. Maybe I have to mock the reconnect() method and make sure inside the exception is raised?

Oh you get a lot of tracebacks printed out, I can see that, but my point is the test passes anyway.

@spaceone spaceone force-pushed the reconnect-on-unavailable branch from 8a5db63 to 12994f2 Compare March 18, 2022 11:19
@spaceone
Copy link
Contributor Author

@mistotebe ah OK, now I get you.
So by default we cannot add TIMELIMIT_EXCEEDED. I just dropped it for now.

Adding a dependency on https://github.com/jd/tenacity is probably not a good idea now. I like that pyhton-ldap has few dependencies.

I reworked the tests so that the failing condition is explicit - I thought a failing thread would propagate the error, sorry for that.

@ghost
Copy link

ghost commented Jul 1, 2022

I'll try to fix some of the failing tests

@spaceone spaceone force-pushed the reconnect-on-unavailable branch 2 times, most recently from 3acbfdd to 412a35d Compare July 1, 2022 09:30
@artificial-intelligence

can the tests be run again? the logs aren't available anymore, I'd still be interested in a fix (I'm the "ghost user" above).

@artificial-intelligence

any update on this? this blocks https://bugs.launchpad.net/keystone/+bug/1953627

@droideck
Copy link
Contributor

@spaceone hi! Could you please rebase the commit in this PR? (and possibly check out the above questions), and we'll rerun the tests.

@spaceone spaceone force-pushed the reconnect-on-unavailable branch 4 times, most recently from 48edd06 to b222122 Compare July 1, 2023 17:12
@spaceone
Copy link
Contributor Author

spaceone commented Jul 3, 2023

@spaceone hi! Could you please rebase the commit in this PR? (and possibly check out the above questions), and we'll rerun the tests.

@droideck done. Tests pass.

@spaceone spaceone force-pushed the reconnect-on-unavailable branch 2 times, most recently from 96c08b9 to 6d22640 Compare August 3, 2023 12:50
@spaceone
Copy link
Contributor Author

@mistotebe @tiran ping. anything else?

@mistotebe
Copy link
Contributor

It is good from my side, @tiran you are marked as "requested changes", could you review again and adjust if appropriate?

mistotebe
mistotebe previously approved these changes Oct 23, 2023
Copy link
Contributor

@mistotebe mistotebe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now

@spaceone spaceone force-pushed the reconnect-on-unavailable branch from 6d22640 to 45c3975 Compare October 23, 2023 15:29
@spaceone

This comment was marked as outdated.

spaceone added 2 commits June 3, 2025 11:01
test_106_reconnect_restore() handles a SERVER_DOWN exception manually
and tries to re-use the connection afterwards again.
This established the connection again but did not bind(), so it now
raises ldap.INSUFFICIENT_ACCESS.

test_107_reconnect_restore() restarts the LDAP server during searches,
which causes a UNAVAILABLE exception.
@spaceone
Copy link
Contributor Author

spaceone commented Jun 3, 2025

OK, the tests have been refactored and run:

  1. Fixed Code:
$ PYTHONPATH=Lib/ python3 -m unittest -v Tests.t_ldapobject.Test01_ReconnectLDAPObject.test107_reconnect_restore
test107_reconnect_restore (Tests.t_ldapobject.Test01_ReconnectLDAPObject.test107_reconnect_restore)
The idea of this test is to restart the LDAP-Server while there are ongoing searches. ... ok

----------------------------------------------------------------------
Ran 1 test in 10.312s

OK

  1. Failing test shows that it's currently broken:
$ python3 -m unittest -v Tests.t_ldapobject.Test01_ReconnectLDAPObject.test107_reconnect_restore                                                                                        
test107_reconnect_restore (Tests.t_ldapobject.Test01_ReconnectLDAPObject.test107_reconnect_restore)                    
The idea of this test is to restart the LDAP-Server while there are ongoing searches. ... Exception occurred {'result': 52, 'desc': 'Server is unavailable', 'ctrls': []} Traceback (most recent call last):                                   
  File "/home/git/python-ldap/Tests/t_ldapobject.py", line 701, in reconnect_search_thread                       
    _reconnect_search_thread()                                                                                         
  File "/home/git/python-ldap/Tests/t_ldapobject.py", line 697, in _reconnect_search_thread
    lo.search_ext_s(self.server.suffix, ldap.SCOPE_SUBTREE, filterstr="cn=user1", attrlist=["cn"])
  File "/usr/lib/python3/dist-packages/ldap/ldapobject.py", line 1030, in search_ext_s
    return self._apply_method_s(SimpleLDAPObject.search_ext_s,*args,**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/ldap/ldapobject.py", line 968, in _apply_method_s
    return func(self,*args,**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/ldap/ldapobject.py", line 625, in search_ext_s
    return self.result(msgid,all=1,timeout=timeout)[1]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/ldap/ldapobject.py", line 535, in result
    resp_type, resp_data, resp_msgid = self.result2(msgid,all,timeout)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/ldap/ldapobject.py", line 539, in result2
    resp_type, resp_data, resp_msgid, resp_ctrls = self.result3(msgid,all,timeout)
                                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/ldap/ldapobject.py", line 543, in result3
    resp_type, resp_data, resp_msgid, decoded_resp_ctrls, retoid, retval = self.result4(
                                                                           ^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/ldap/ldapobject.py", line 553, in result4
    ldap_result = self._ldap_call(self._l.result4,msgid,all,timeout,add_ctrls,add_intermediates,add_extop)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/ldap/ldapobject.py", line 128, in _ldap_call
    result = func(*args,**kwargs)
             ^^^^^^^^^^^^^^^^^^^^
ldap.UNAVAILABLE: {'result': 52, 'desc': 'Server is unavailable', 'ctrls': []}
…
                                                                                                                       
FAIL                                                                                                                   
                                                           
======================================================================                           
FAIL: test107_reconnect_restore (Tests.t_ldapobject.Test01_ReconnectLDAPObject.test107_reconnect_restore)
The idea of this test is to restart the LDAP-Server while there are ongoing searches. 
----------------------------------------------------------------------       
Traceback (most recent call last):                                                                                     
  File "/home/git/python-ldap/Tests/t_ldapobject.py", line 717, in test107_reconnect_restore
    self.assertEqual(excs, [])      
AssertionError: Lists differ: [("{'result': 52, 'desc': 'Server is unava[6916 chars]\n')] != []
                                                                                                                       
First list contains 3 additional elements.            
First extra element 0:                             
("{'result': 52, 'desc': 'Server is unavailable', 'ctrls': []}", 'Traceback (most recent call last):\n  File "/home/git/python-ldap/Tests/t_ldapobject.py", line 701, in reconnect_search_thread\n    _reconnect_search_thread()\n  File 
"/home/git/python-ldap/Tests/t_ldapobject.py", line 697, in _reconnect_search_thread\n    lo.search_ext_s(self.server.suffix, ldap.SCOPE_SUBTREE, filterstr="cn=user1", attrlist=["cn"])\n  File "/usr/lib/python3/dist-packages/ldap/lda
pobject.py", line 1030, in search_ext_s\n    return self._apply_method_s(SimpleLDAPObject.search_ext_s,*args,**kwargs)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/usr/lib/python3/dist-packages/l
dap/ldapobject.py", line 968, in _apply_method_s\n    return func(self,*args,**kwargs)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/usr/lib/python3/dist-packages/ldap/ldapobject.py", line 625, in search_ext_s\n    return self.result(msg
id,all=1,timeout=timeout)[1]\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/usr/lib/python3/dist-packages/ldap/ldapobject.py", line 535, in result\n    resp_type, resp_data, resp_msgid = self.result2(msgid,all,timeout)\n   
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/usr/lib/python3/dist-packages/ldap/ldapobject.py", line 539, in result2\n    resp_type, resp_data, resp_msgid, resp_ctrls = self.result3(msgid,all,timeout)\n    
                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/usr/lib/python3/dist-packages/ldap/ldapobject.py", line 543, in result3\n    resp_type, resp_data, resp_msgid, decoded_resp_ctrls, retoid, retval = se
lf.result4(\n                                                                           ^^^^^^^^^^^^^\n  File "/usr/lib/python3/dist-packages/ldap/ldapobject.py", line 553, in result4\n    ldap_result = self._ldap_call(self._l.result4,msgi
d,all,timeout,add_ctrls,add_intermediates,add_extop)\n                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/usr/lib/python3/dist-packages/ldap/ldapobject.py", line 128, in _lda
p_call\n    result = func(*args,**kwargs)\n             ^^^^^^^^^^^^^^^^^^^^\nldap.UNAVAILABLE: {\'result\': 52, \'desc\': \'Server is unavailable\', \'ctrls\': []}\n')
                                                                                                                       
Diff is 7994 characters long. Set self.maxDiff to None to see it.                                         

----------------------------------------------------------------------
Ran 1 test in 10.318s

FAILED (failures=1)

@spaceone spaceone requested a review from tiran June 3, 2025 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants