Skip to content

Automatic argument conversion based on Python 3 function annotations #2890

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pekkaklarck opened this issue Jul 4, 2018 · 35 comments
Closed

Comments

@pekkaklarck
Copy link
Member

pekkaklarck commented Jul 4, 2018

If a library is implemented using Java, we automatically convert arguments to correct types based on the types in the keyword signature. Similar support would be very handy also with Python based libraries, but Python 2 doesn't have a built-in way to specify type information.

Python 3 supports function annotations that can be used for various purposes, for example, to provide type hints. We could use annotations to get type information and convert arguments based on that. In practice that would mean that keyword like

def example(argument: int):
    # ...

would get the argument as an integer when used like Example 42 in the test data. This would avoid the need to add argument = int(argument) to the keyword or using Example ${42} in the data.

A problem with Python 3 only solution is that those still using Python 2 don't benefit from it, but this kind of very useful feature would also give a good reason to migrate. Python 2 end-of-life is less than 1.5 years from this day, so spending extra effort to support it doesn't feel beneficial either. A bigger problem is that function annotations are not at all compatible with Python 2, so libraries using them will be incompatible with Python 2 as well. This means Robot's own standard libraries cannot start using this functionality.

Once the support for argument conversion has been added to Python based static libraries, we need to think how to add it to the dynamic library API and to the remote library API. The former has already requested by #2068.

I'd really like to see this already in RF 3.1 and add it to it in high priority. May still be descoped if implementation turns out to be complicated.


UPDATE: We are adding type conversion also based on argument default values (#2932) and explicit information passed via the @keyword decorator (#2947). Also the support for dynamic API (#2068) is under development.

@pekkaklarck
Copy link
Member Author

One thing to consider is that do we want to handle list and dicts somehow specially. If we implement #2699, we need a utility to parse strings to lists/dicts, and using the same utility if someone uses list/dict as the argument type sounds like a good idea. In practice that would mean that keyword like

def example(arg: dict):
    # ...

would get a dictionary as an argument, not a string looking like a dictionary, when used like

Example    {'a': 1, 'b': 2}

@pekkaklarck
Copy link
Member Author

Another thing to consider is how to handle non-string values when a type is specified. Should we validate that the argument is of correct type? If not, should we convert the argument or make it an error?

@basic-settings
Copy link

I think for this case it would make sense to:

  • try to convert to the type
  • if conversion fails, there could be two options:
  1. Raise a warning (Warning: could not convert x to y..) and send the data as string (as before)
  2. Raise an error and fail the test

I would go with option 1 for the sake of backwards compatibility with already available tests.

@pekkaklarck
Copy link
Member Author

pekkaklarck commented Jul 6, 2018

I think that if we try to convert an argument and it fails, we should make it an error similarly if there are other problems with arguments. I think this is how we handle conversion with Java libraries where this functionality already exists. It's a good point that is backwards incompatible, but I doubt such conflict is common. I actually doubt there are many libraries that use annotations in the first place because they couldn't be used at all until RF 3.0.3 (#2627).

The bigger questions is, what arguments should we try to convert. For example, if we have a keyword

def kw(arg: int):
    pass

and it is passed a string, we naturally convert it. But what should we do if it is given a float like 1.5? Should we pass it as-is, convert it to int, or make it an error? I think it would be best to convert also in this case. The user has clearly declared that the argument should be an int, and we should guarantee that.

@basic-settings
Copy link

Convert in any case, as the signature expects a specific format.

@jeroenroosen
Copy link
Contributor

jeroenroosen commented Jul 18, 2018

Regarding the list / dict typing: I've dealt with this before using the List and Dict from the typing module and while you have to use an internal to get the types inside of lists / dicts, it proved to be a workable solution for the project I was working on at the time. I've included a runnable snippet below, perhaps this might offer some inspiration.

p.s. Using that internal actually breaks somewhere between Python 3.3 and 3.5.2, as it moved from __parameters__ to __args__, afaik this module still does not expose these properly - but perhaps I'm just not seeing the proper solution here.

import inspect
from typing import List, Dict


def f(list_arg: List[int], dict_arg: Dict[str, int], int_arg: int, default_arg):
    pass

signature = inspect.signature(f)

for name in list(signature.parameters.keys()):
    type_ = signature.parameters[name].annotation
    if type_ is inspect._empty:
        print(name, "defaults to", str)
    elif issubclass(type_, list):
        print(name, "is a list of", getattr(type_, "__args__", str))
    elif issubclass(type_, dict):
        print(name, "is a dict of", getattr(type_, "__args__", (str, str)))
    else:
        print(name, "is a", type_)

Output:

list_arg is a list of (<class 'int'>,)
dict_arg is a dict of (<class 'str'>, <class 'int'>)
int_arg is a <class 'int'>
default_arg defaults to <class 'str'>

@pekkaklarck
Copy link
Member Author

Thanks @jeroenroosen, that's very interesting! If we want to do validation for list/dict items we need to do something like this, but at least in the beginning just making sure keywords accepting lists/dicts get correct types might be enough. I've just got named-only argument support finalized and plan to look at this topic next.

@pekkaklarck
Copy link
Member Author

Just realized we could do type conversion also based on argument default values. For example, if we have a keyword with a signature def example(good=False), we can see that good is a Boolean and could do type conversion based on that. This would be very handy and work also with Python 2 code, but could also cause backwards compatibility problems. It would also mean we needed to be more careful when designing the rules for conversion.

@basic-settings
Copy link

basic-settings commented Jul 19, 2018

This is but problematic with code that defaults to None, which is allowed in Python.

Take this for example:

def spam(eggs: int=None):
    print(eggs, type(eggs))
    
spam()
spam(3)

will output:

None <class 'NoneType'>
3 <class 'int'>

In cases like these I would expect types to not change if the method is called without arguments, but change if called with something like a float.

@jeroenroosen
Copy link
Contributor

jeroenroosen commented Jul 19, 2018

I'm not too familiar with RF internals to make a proper judgement of this, but wouldn't this be quite inconsistent? arg=None comes to mind.

Edit: @basic-settings beat me to it :)

@pekkaklarck
Copy link
Member Author

How to handle None is an important question in general. I think that regardless what type is specified, we need to pass actual Python None object as-is and also convert string None to a None object.

How to handle numbers is another important question. I guess we could argue that if a keyword is defined like spam(eggs: int) and called with a float, converting the float to an integer makes sense. On the other hand, if a keyword is defined just like spam(eggs=0), converting 0.5 to 0 doesn't sound good and would also be badly backwards incompatible. I'm starting to think we shouldn't convert between numerical types.

Perhaps it would be safest to do conversion only if the passed argument is a string. The main motivation of this enhancement is to automatically allow using Spam 42 and not Spam ${42} in the test data. Doing conversion if you explicitly use Spam ${42.5} would probably cause more problems than have benefits. Not doing conversion if the argument is not a string would also avoid problems in cases like def maps(eggs: dict) when used like Maps ${my custom mapping}.

@pekkaklarck
Copy link
Member Author

Another design decision related to handling numbers. Let's assume we have keywords

def int_type(arg: int):
    # ...

def int_default(arg=0):
    arg = float(arg)
    # ...

and they are used like

Int type       1.5
Int default    1.5

How should this situation be handled? In the former case it feels pretty logical to consider this an error, but with the latter case that would be a backwards incompatible change. Possible solutions I see:

  1. Don't do any automatic conversion based on default values. I don't like that.

  2. Different conversion logic based on is type information got from explicit annotations or from default values. Possible but adds extra code and makes logic somewhat inconsistent.

  3. If type is int, first try to convert the string to an int and then try float if it fails. In other words, both the above examples would work. It can be considered a problem that if there is an explicit annotation arg: int we still pass a float. Personally I don't think that's a big problem because a) Python in general doesn't enforce types, b) I don't see any real problems caused by this, and c) it's easy to add assert isisntance(arg, int) into the keyword if an int is absolutely mandatory.

What do others think about this?

@basic-settings
Copy link

+1 for passing None as the actual Python object, it would make life much easier in all cases

I would argue that if a method does not expose type hinting then conversion shouldn't be performed at all and arguments are sent as defined in the test scripts. I've seen devs validating arg value with if not arg: which works both for 0 and for None, so cases like method(arg=0) are not uncommon (although not particularly nice as opposed to method(arg=None)).

It could also mean that the method doesn't really care if the type is int, str or x as it will do internal casting, or use the value as-is.

@jeroenroosen
Copy link
Contributor

I wrote this earlier but got a little distracted before posting, so I ended up repeating you a little:

I agree that it would make sense to only convert str values, especially if you consider that the more complex types (like dicts, but also Selenium's WebElements come to mind) will usually already be of the proper type for their respective contexts.

Regarding the Python 2 solution, I think it's a bad idea to make assumptions about the type of an argument based on the default value, as you cannot say with certainty whether this is the only or one of many accepted data types, and the function might do conversion differently than RF might (whether this is proper keyword programming is a different discussion.)

To get back to the origin of this issue: in Java you can convert because you know the function's signature. In Python 2 you simply don't have these typing hints (unless you were to e.g. decorate the keyword functions and annotate them that way) so whatever you do is a guess at best: that def kw(arg=0) may actually be intended as a float, so even if its input is a str I don't think it's wise to naively convert it to an int.

The same is of course true for Python 3 functions without type hints (or typing.Any.)

As for adoption of this feature in Python, only converting type hinted arguments might also be a good way to gradually introduce this feature into libraries without causing too many compatibility issues and not having to deal with Python 2 at all.

@pekkaklarck
Copy link
Member Author

pekkaklarck commented Jul 19, 2018

We seem to agree that only strings should be converted and that 'None' should be converted to None, but there's a clear disagreement should we do type conversion also based on default values or not. I agree conversion based on defaults is more error prone, but I also think it would be very useful and ought to be doable so that it is safe. Let's anyway make it so that this issue deals about conversion based on type annotations (as the title of the issue says) and then have another issue about conversion based on defaults. Easier to talk about that once this basic issue is done first anyway.

@pekkaklarck
Copy link
Member Author

pekkaklarck commented Jul 19, 2018

Next question, which types should be convert automatically and how? I'm currently thinking the following. Notice that all these expect the original value to be a string. Also, NONE case-insensitively converted to None object in all cases.

  1. int: Both integers and floats. Convert to float only if converting to integer fails. Other values cause an error.
  2. float: Both integers and floats. Always converted to a float. Other values cause an error.
  3. bool: TRUE and FALSE case-insensitively. Other values passed as-is.
  4. list/tuple/dict/set: Anything that ast.literal_eval converts into a list/tuple/dict/set. Other values cause an error.

In addition to these we probably could consider at least bytes and datetime/data/timedelta but I'm not sure are used enough to be worth the effort. Could obviously be added later.

@basic-settings
Copy link

  1. +0.5
  2. +1
  3. +1
  4. +1

I would personally chose the more explicit (strict) approach to convert to the specified type and if that fails then error out. More precise, for point 1. if int conversion fails then error out.

My reasoning for the prevention of type conversion on point 1. to float is that we work with embedded systems and writing (or trying to write) non-ints to sysfs devices (e.g,. GPIOS, network devices, etc.) would have problematic consequences (e.g. unwanted failures/exceptions/more time spent to analyze and debug).

For example:

# echo 1.2 > value | cat value
-sh: write error: Invalid argument
0
# echo 1 > value | cat value
1

Invalid argument is not the most explicit error description and even more discerning is that the example below works fine due to expansion:

# echo "0" > value | cat value
0

Hence the dynamic library (in our case) will do strict type checking and prevent calling the method. I decided not to perform type conversion and just assert to return the responsibility to create proper test data to the tester and not rely on (too many) internal type conversions.

We could also create a slight overhead with a --allow-lazy-type-conversion flag, which would then bypass strict checking and try to convert to float for point 1. in case of failure to convert to int.

@jeroenroosen
Copy link
Contributor

jeroenroosen commented Jul 20, 2018

+1 for simple types such as int, float, boolean. Also +1 for strict conversion only.

For boolean, I would also be in favour of converting 0 and 1 to False and True respectively.

The problem I see with bytes is that you would have to make assumptions about the encoding (although I suppose the default UTF-8 would be fine), but it's still an assumption, and I don't really like that.

For the same reason I think DateTime could be iffy, but I suppose any input that's valid for DateTime's constructor would do... (edit:) the problem I see here is that most users would use formats from e.g. the RF DateTimeLibrary, so it would also have to be compatible with that? This might get complicated very fast.

By the way, converting to enums is quite easy to do (and relevant for some of our in-house keywords) so I'd love to see that included:

if isinstance(arg_type_from_signature, EnumMeta):
    return arg_type_from_signature[str_value]

@basic-settings
Copy link

basic-settings commented Jul 20, 2018

Conversion of 0 and 1 to True and False is an interesting proposition, especially since in Python bool is a subclass of int (i.e. isinstance(True, int) is True).

What's also interesting from a language perspective, you can change to bool anything, with 0 being False and everything else True.

Here are some more interesting use cases for bool conversion:

>>> a = bool(0)
>>> a
False
>>> a = bool(30)
>>> a
True
>>> a = bool(-2)
>>> a
True
>>> a = bool(0.0)
>>> a
False
>>> a = bool('asd')
>>> a
True
>>> a = bool('0')
>>> a
True

@jeroenroosen
Copy link
Contributor

I brought up 0 and 1 because these are well known synonyms for True and False. I don't think it's a good idea to convert just anything: bool("False") is True after all, so the behaviour would be inconsistent if we were to evaluate that as False instead.

I was more thinking among the lines of ConfigParser.getboolean, as those conversions feel pretty natural to me.

@pekkaklarck
Copy link
Member Author

pekkaklarck commented Jul 20, 2018

Thanks for the feedback! Some new comments:

  1. I don't feel strongly should we try to convert to a float if the type is specified to be an int and int conversion fails. I guess it would be more correct to be strict, but I don't see much harm accepting strings like 1.2 as well, especially when we pass actual floats (e.g. ${1.2}) forward as-is anyway. If we do add type conversion also based on default values, there float conversion in this case is more important to avoid backwards incompatibility issues. We already agreed not to talk about that in this issue, though.

  2. I definitely don't want command line options to control this behavior.

  3. Converting strings 0 and 1 to False and True, respectively, can be considered. That's something that could actually be added to our existing utils.is_truthy utility that is widely used by Robot itself as well as many external libraries like SeleniumLibrary and SSHLibrary. If we'd use is_truthy for conversion here, then also NO would be considered False which ought to be fine. As I already commented, other strings should be passed as-is without any errors.

  4. If we handle bytes, then we probably should do direct Unicode code point to byte conversion. You could then represent the null byte with \x00, byte 228 as \xe4 or ä, and so on. Bytes above 255 would naturally cause an error. In practice this would mean encoding the string using Latin-1.

  5. If we'd handle datetime.datetime/date/timedelta, we should use our own DateTime library for conversion. That would be consistent with how many libs currently handled dates and times and would also make using dates/times pretty convenient.

  6. Support for enums sounds like a good idea. Perhaps @jeroenroosen can provide a pull request adding that once the functionality is ready otherwise. It ought to be easy to add conversion for new times in general.

  7. We could also consider supporting Decimal.

@pekkaklarck
Copy link
Member Author

My prototype code worked so well that I decided to push it into a remote branch for others to see and test. Easier to discuss about the details when we have something concrete that actually works.

I won't have time to continue working with this more during the weekend, other than possibly answering questions via mobile. Next week I ought to have some time, but I also need to continue our woodshed project at the summer cottage. =)

@jeroenroosen
Copy link
Contributor

I added enum support as well.

@pekkaklarck
Copy link
Member Author

Thanks @jeroenroosen! I added few comments to your PR but in general the enum support looks nice and is definitely worth adding. I'm still on holiday next week and unless it gets rainy I won't be coding much. I'll return back to this more actively the week after.

pekkaklarck added a commit that referenced this issue Aug 8, 2018
@pekkaklarck
Copy link
Member Author

Been implementing type conversion based on default values (#2932) and decided to do some changes to the underlying conversion logic:

  1. If type is explicitly specified to be int using annotations, don't try converting to float if int conversion fails. This will work differently if the type is got implicitly from default values.

  2. Add support for bytearray. Main reason is to support bytes with Python 2 where the distinction between "normal" bytes and strings is not clear.

  3. Support type(None) (and possibly also plain None) to indicate that string NONE should be converted to the None object.

I was also thinking should we make bool conversion stricter and only support known false and true strings making other strings failures. This would be consistent with the stricter int conversion as well as with how ConfigParser.getboolean behaves, but it would limit the possibility to consider also some other strings false. For example, some of the Robot's own BuiltIn keyowrds consider no values false when used with the values argument and in non-English environments using something like ei (Finnish) or (Chinese) could be nice. In the end I considered the benefits of passing non-recognized strings as-is greater than stricter validation.

pekkaklarck added a commit that referenced this issue Aug 16, 2018
- Support implicit conversion based on argument default values (#2932)
- Fix using varargs when converting arguments
- Add support for bytearray, frozenset and NoneType
- Don't convert to float if int conversion fails when type information
  is got explicitly from annotations
pekkaklarck added a commit that referenced this issue Aug 20, 2018
- Convert varargs and kwargs with annotations #2890
- Convert kwonly args based on default values #2932
pekkaklarck added a commit that referenced this issue Aug 26, 2018
- In ArgumentSpec defaults and kw-only-defaults are stored in same
  mapping. Earlier they were stored separately and defaults were a
  list. This is backwards incompatible compared to earlier releases,
  but this is also internal code and the change happens in a major
  release.

- In ArgumentSpec 'annotations' was changed to 'types' to communicate
  what we store there. 'kwonlydefaults' and 'reqkwargs' were removed
  these changes only affect code added after the latest release.

- Non-type function annotations are ignored. This may still change.

- General clean-up made easier by the above changes.

To some extend related to type conversion #2890 and #2932.
pekkaklarck added a commit that referenced this issue Aug 26, 2018
Converters are now classes. This design also makes it easy to
support custom converters if we want to.
pekkaklarck added a commit that referenced this issue Aug 30, 2018
This includes values that aren't even types. Naturally only types we
recognize can be used for type conversion.

Affects #2890 and #2947.
pekkaklarck added a commit that referenced this issue Aug 30, 2018
- Non-types don't cause error (#2890 and #2947)
- Invalid type info passed via @Keyword causes explicit errors (#2947)
pekkaklarck added a commit that referenced this issue Aug 31, 2018
Types set via @Keyword decorator (#2947) override types set via
annotations (#2890). This includs empty types ({} or []) but not
None which is the default set by @Keyword.
pekkaklarck added a commit that referenced this issue Aug 31, 2018
Return value type/annotation isn't used for anything yet, but may be
used in the future by Libdoc. Testing that specifying the value doesn't
cause errors. Related to #2890 and #2947.
pekkaklarck added a commit that referenced this issue Aug 31, 2018
- Old "enum" module is supported along with newer "enum34" and
  the standard module in Python 3.4+. Old code didn't work with
  the old module.
- Tests requiring enum are tagged with "require-enum".
- Enum requirement is documented in atest/README.rst and "enum34"
  was conditionally (< 3.0) added to atest/requirements.txt.
- At the same time made lxml installation in atest/requirement.txt
  conditional (CPython and PyPy).

Related to type conversion issues #2890 and #2947.
@pekkaklarck
Copy link
Member Author

Thought this issue is ready, except for writing documentation and possibly slightly enhanced error reporting, but then executed tests on Python 3.7 and noticed that the typing module has changed so that the current implementation doesn't recognize the types it exposes.: https://bugs.python.org/issue34568

Unless the old behavior of the typing module is restored or there's some other clean way to recognize these types, it might be easiest just to explicitly document we don't support it.

pekkaklarck added a commit that referenced this issue Sep 13, 2018
- collections.abc.Sequence is now mapped directly to list, not to list
  or tuple.
- collections.abc.Iterable is not supported anymore.
- Explicit tests for Integral, Real and ByteString.
- Test cleanup.
@pekkaklarck
Copy link
Member Author

pekkaklarck commented Sep 26, 2018

PEP-563 added from __future__ import annotations in Python 3.7 that changes how annotations are evaluated. We need to take that into account but that can be done after alpha 2. Reopening this to make sure we don't forget about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants