Skip to content

Add bytes_strictness to allow configuring behavior on bytes/text mismatch #171

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 2, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 35 additions & 21 deletions Doc/bytes_mode.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,37 +43,51 @@ Encoding/decoding to other formats – text, images, etc. – is left to the cal
The bytes mode
--------------

The behavior of python-ldap 3.0 in Python 2 is influenced by a ``bytes_mode``
argument to :func:`ldap.initialize`.
The argument can take these values:
In Python 3, text values are represented as ``str``, the Unicode text type.

``bytes_mode=True``: backwards-compatible
In Python 2, the behavior of python-ldap 3.0 is influenced by a ``bytes_mode``
argument to :func:`ldap.initialize`:

Text values returned from python-ldap are always bytes (``str``).
Text values supplied to python-ldap may be either bytes or Unicode.
The encoding for bytes is always assumed to be UTF-8.
``bytes_mode=True`` (backwards compatible):
Text values are represented as bytes (``str``) encoded using UTF-8.

Not available in Python 3.
``bytes_mode=False`` (future compatible):
Text values are represented as ``unicode``.

``bytes_mode=False``: strictly future-compatible
If not given explicitly, python-ldap will default to ``bytes_mode=True``,
but if an ``unicode`` value supplied to it, if will warn and use that value.

Text values must be represented as ``unicode``.
An error is raised if python-ldap receives a text value as bytes (``str``).
Backwards-compatible behavior is not scheduled for removal until Python 2
itself reaches end of life.

Unspecified: relaxed mode with warnings

Causes a warning on Python 2.
Errors, warnings, and automatic encoding
----------------------------------------

Text values returned from python-ldap are always ``unicode``.
Text values supplied to python-ldap should be ``unicode``;
warnings are emitted when they are not.
While the type of values *returned* from python-ldap is always given by
``bytes_mode``, for Python 2 the behavior for “wrong-type” values *passed in*
can be controlled by the ``bytes_strictness`` argument to
:func:`ldap.initialize`:

The warnings are of type :class:`~ldap.LDAPBytesWarning`, which
is a subclass of :class:`BytesWarning` designed to be easily
:ref:`filtered out <filter-bytes-warning>` if needed.
``bytes_strictness='error'`` (default if ``bytes_mode`` is specified):
A ``TypeError`` is raised.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should spell out the behavior in form of an example and list affected attributes

If bytes_mode is True and a text string is encountered, then a TypeError is raised. If bytes_mode is False...

Bytes mode applies to:

  • dn argument
  • attr argument
  • newrdn argument for rename
  • who and cred for bind
  • user, oldpw and newpw for passwd
  • base, filterstr and attrlist members for search*
  • mod_type (2nd argument) of modlist member for add*, modify*

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I linked to text mode from those arguments. Does that look good?


Backwards-compatible behavior is not scheduled for removal until Python 2
itself reaches end of life.
``bytes_strictness='warn'`` (default when ``bytes_mode`` is not given explicitly):
A warning is raised, and the value is encoded/decoded
using the UTF-8 encoding.

The warnings are of type :class:`~ldap.LDAPBytesWarning`, which
is a subclass of :class:`BytesWarning` designed to be easily
:ref:`filtered out <filter-bytes-warning>` if needed.

``bytes_strictness='silent'``:
The value is automatically encoded/decoded using the UTF-8 encoding.

On Python 3, ``bytes_strictness`` is ignored and a ``TypeError`` is always
raised.

When setting ``bytes_strictness``, an explicit value for ``bytes_mode`` needs
to be given as well.


Porting recommendations
Expand Down
27 changes: 24 additions & 3 deletions Doc/reference/ldap.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Functions

This module defines the following functions:

.. py:function:: initialize(uri [, trace_level=0 [, trace_file=sys.stdout [, trace_stack_limit=None, [bytes_mode=None]]]]) -> LDAPObject object
.. py:function:: initialize(uri [, trace_level=0 [, trace_file=sys.stdout [, trace_stack_limit=None, [bytes_mode=None, [bytes_strictness=None]]]]]) -> LDAPObject object

Initializes a new connection object for accessing the given LDAP server,
and return an LDAP object (see :ref:`ldap-objects`) used to perform operations
Expand All @@ -53,7 +53,8 @@ This module defines the following functions:
*trace_file* specifies a file-like object as target of the debug log and
*trace_stack_limit* specifies the stack limit of tracebacks in debug log.

The *bytes_mode* argument specifies text/bytes behavior under Python 2.
The *bytes_mode* and *bytes_strictness* arguments specify text/bytes
behavior under Python 2.
See :ref:`text-bytes` for a complete documentation.

Possible values for *trace_level* are
Expand Down Expand Up @@ -696,6 +697,9 @@ and wait for and return with the server's result, or with

*serverctrls* and *clientctrls* like described in section :ref:`ldap-controls`.

The *dn* argument, and mod_type (second item) of *modlist* are text strings;
see :ref:`bytes_mode`.


.. py:method:: LDAPObject.bind(who, cred, method) -> int

Expand Down Expand Up @@ -737,6 +741,8 @@ and wait for and return with the server's result, or with

*serverctrls* and *clientctrls* like described in section :ref:`ldap-controls`.

The *dn* and *attr* arguments are text strings; see :ref:`bytes_mode`.

.. note::

A design fault in the LDAP API prevents *value*
Expand All @@ -757,6 +763,8 @@ and wait for and return with the server's result, or with

*serverctrls* and *clientctrls* like described in section :ref:`ldap-controls`.

The *dn* argument is text string; see :ref:`bytes_mode`.


.. py:method:: LDAPObject.extop(extreq[,serverctrls=None[,clientctrls=None]]]) -> int

Expand Down Expand Up @@ -810,6 +818,9 @@ and wait for and return with the server's result, or with
You might want to look into sub-module :py:mod:`ldap.modlist` for
generating *modlist*.

The *dn* argument, and mod_type (second item) of *modlist* are text strings;
see :ref:`bytes_mode`.


.. py:method:: LDAPObject.modrdn(dn, newrdn [, delold=1]) -> int

Expand All @@ -826,6 +837,8 @@ and wait for and return with the server's result, or with
This operation is emulated by :py:meth:`rename()` and :py:meth:`rename_s()` methods
since the modrdn2* routines in the C library are deprecated.

The *dn* and *newrdn* arguments are text strings; see :ref:`bytes_mode`.


.. py:method:: LDAPObject.passwd(user, oldpw, newpw [, serverctrls=None [, clientctrls=None]]) -> int

Expand All @@ -844,6 +857,8 @@ and wait for and return with the server's result, or with

The asynchronous version returns the initiated message id.

The *user*, *oldpw* and *newpw* arguments are text strings; see :ref:`bytes_mode`.

.. seealso::

:rfc:`3062` - LDAP Password Modify Extended Operation
Expand All @@ -865,6 +880,8 @@ and wait for and return with the server's result, or with

*serverctrls* and *clientctrls* like described in section :ref:`ldap-controls`.

The *dn* and *newdn* arguments are text strings; see :ref:`bytes_mode`.


.. py:method:: LDAPObject.result([msgid=RES_ANY [, all=1 [, timeout=None]]]) -> 2-tuple

Expand Down Expand Up @@ -1015,12 +1032,13 @@ and wait for and return with the server's result, or with

*serverctrls* and *clientctrls* like described in section :ref:`ldap-controls`.

The *who* and *cred* arguments are text strings; see :ref:`bytes_mode`.

.. versionchanged:: 3.0

:meth:`~LDAPObject.simple_bind` and :meth:`~LDAPObject.simple_bind_s`
now accept ``None`` for *who* and *cred*, too.


.. py:method:: LDAPObject.search(base, scope [,filterstr='(objectClass=*)' [, attrlist=None [, attrsonly=0]]]) ->int

.. py:method:: LDAPObject.search_s(base, scope [,filterstr='(objectClass=*)' [, attrlist=None [, attrsonly=0]]]) ->list|None
Expand Down Expand Up @@ -1073,6 +1091,9 @@ and wait for and return with the server's result, or with
or :py:meth:`search_ext_s()` (client-side search limit). If non-zero
not more than *sizelimit* results are returned by the server.

The *base* and *filterstr* arguments, and *attrlist* contents,
are text strings; see :ref:`bytes_mode`.

.. versionchanged:: 3.0

``filterstr=None`` is equivalent to ``filterstr='(objectClass=*)'``.
Expand Down
82 changes: 52 additions & 30 deletions Lib/ldap/ldapobject.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,8 @@ class SimpleLDAPObject:

def __init__(
self,uri,
trace_level=0,trace_file=None,trace_stack_limit=5,bytes_mode=None
trace_level=0,trace_file=None,trace_stack_limit=5,bytes_mode=None,
bytes_strictness=None,
):
self._trace_level = trace_level
self._trace_file = trace_file or sys.stdout
Expand All @@ -107,20 +108,26 @@ def __init__(
# Bytes mode
# ----------

# By default, raise a TypeError when receiving invalid args
self.bytes_mode_hardfail = True
if bytes_mode is None and PY2:
_raise_byteswarning(
"Under Python 2, python-ldap uses bytes by default. "
"This will be removed in Python 3 (no bytes for DN/RDN/field names). "
"Please call initialize(..., bytes_mode=False) explicitly.")
bytes_mode = True
# Disable hard failure when running in backwards compatibility mode.
self.bytes_mode_hardfail = False
elif bytes_mode and not PY2:
raise ValueError("bytes_mode is *not* supported under Python 3.")
# On by default on Py2, off on Py3.
if PY2:
if bytes_mode is None:
bytes_mode = True
if bytes_strictness is None:
_raise_byteswarning(
"Under Python 2, python-ldap uses bytes by default. "
"This will be removed in Python 3 (no bytes for "
"DN/RDN/field names). "
"Please call initialize(..., bytes_mode=False) explicitly.")
bytes_strictness = 'warn'
else:
if bytes_strictness is None:
bytes_strictness = 'error'
else:
if bytes_mode:
raise ValueError("bytes_mode is *not* supported under Python 3.")
bytes_mode = False
bytes_strictness = 'error'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe raise an exception when bytes_strictness argument is neither 'error' nor None?

self.bytes_mode = bytes_mode
self.bytes_strictness = bytes_strictness

def _bytesify_input(self, arg_name, value):
"""Adapt a value following bytes_mode in Python 2.
Expand All @@ -130,38 +137,46 @@ def _bytesify_input(self, arg_name, value):
With bytes_mode ON, takes bytes or None and returns bytes or None.
With bytes_mode OFF, takes unicode or None and returns bytes or None.

This function should be applied on all text inputs (distinguished names
and attribute names in modlists) to convert them to the bytes expected
by the C bindings.
For the wrong argument type (unicode or bytes, respectively),
behavior depends on the bytes_strictness setting.
In all cases, bytes or None are returned (or an exception is raised).
"""
if not PY2:
return value

if value is None:
return value

elif self.bytes_mode:
if isinstance(value, bytes):
return value
elif self.bytes_strictness == 'silent':
pass
elif self.bytes_strictness == 'warn':
_raise_byteswarning(
"Received non-bytes value for '{}' in bytes mode; "
"please choose an explicit "
"option for bytes_mode on your LDAP connection".format(arg_name))
else:
if self.bytes_mode_hardfail:
raise TypeError(
"All provided fields *must* be bytes when bytes mode is on; "
"got type '{}' for '{}'.".format(type(value).__name__, arg_name)
)
else:
_raise_byteswarning(
"Received non-bytes value for '{}' with default (disabled) bytes mode; "
"please choose an explicit "
"option for bytes_mode on your LDAP connection".format(arg_name))
return value.encode('utf-8')
return value.encode('utf-8')
else:
if not isinstance(value, text_type):
if isinstance(value, unicode):
return value.encode('utf-8')
elif self.bytes_strictness == 'silent':
pass
elif self.bytes_strictness == 'warn':
_raise_byteswarning(
"Received non-text value for '{}' with bytes_mode off and "
"bytes_strictness='warn'".format(arg_name))
else:
raise TypeError(
"All provided fields *must* be text when bytes mode is off; "
"got type '{}' for '{}'.".format(type(value).__name__, arg_name)
)
assert not isinstance(value, bytes)
return value.encode('utf-8')
return value

def _bytesify_modlist(self, arg_name, modlist, with_opcode):
"""Adapt a modlist according to bytes_mode.
Expand Down Expand Up @@ -1064,7 +1079,7 @@ class ReconnectLDAPObject(SimpleLDAPObject):
def __init__(
self,uri,
trace_level=0,trace_file=None,trace_stack_limit=5,bytes_mode=None,
retry_max=1,retry_delay=60.0
bytes_strictness=None, retry_max=1, retry_delay=60.0
):
"""
Parameters like SimpleLDAPObject.__init__() with these
Expand All @@ -1078,7 +1093,9 @@ def __init__(
self._uri = uri
self._options = []
self._last_bind = None
SimpleLDAPObject.__init__(self,uri,trace_level,trace_file,trace_stack_limit,bytes_mode)
SimpleLDAPObject.__init__(self, uri, trace_level, trace_file,
trace_stack_limit, bytes_mode,
bytes_strictness=bytes_strictness)
self._reconnect_lock = ldap.LDAPLock(desc='reconnect lock within %s' % (repr(self)))
self._retry_max = retry_max
self._retry_delay = retry_delay
Expand All @@ -1097,6 +1114,11 @@ def __getstate__(self):

def __setstate__(self,d):
"""set up the object from pickled data"""
hardfail = d.get('bytes_mode_hardfail')
if hardfail:
d.setdefault('bytes_strictness', 'error')
else:
d.setdefault('bytes_strictness', 'warn')
self.__dict__.update(d)
self._last_bind = getattr(SimpleLDAPObject, self._last_bind[0]), self._last_bind[1], self._last_bind[2]
self._ldap_object_lock = self._ldap_lock()
Expand Down
Loading