Skip to content

[Bug/TYP]: misc type errors #26858

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
twoertwein opened this issue Sep 21, 2023 · 12 comments
Open

[Bug/TYP]: misc type errors #26858

twoertwein opened this issue Sep 21, 2023 · 12 comments

Comments

@twoertwein
Copy link

twoertwein commented Sep 21, 2023

Bug summary

Pandas uses mypy/pyright to check its internal code. We currently pin matplotlib <3.8 to avoid many type errors pandas-dev/pandas#55210 Many of these errors are definitely on pandas's side :) but a few might be bugs in matplotlib.

Code for reproduction

# Simplified examples from pandas that trigger type errors

def format_date_labels(ax: Axes, rot) -> None:
    for label in ax.get_xticklabels():
        label.set_ha("right") # error: "Text" has no attribute "set_ha"  [attr-defined]


def test(cellText: np.ndarray):
    # error: Argument "cellText" to "table" has incompatible type "ndarray[Any, Any]"; expected "Sequence[Sequence[str]] | None"  [arg-type]
    # maybe use a Protocol such as Iterable?
    matplotlib.table.table(ax, cellText=cellText **kwargs)


def decorate_axes(ax: Axes, freq, kwargs) -> None:
    # The following might just be bad pandas code
    # error: "Axes" has no attribute "freq"  [attr-defined]
    ax.freq = freq
    xaxis = ax.get_xaxis()
    xaxis.freq = freq
    if not hasattr(ax, "legendlabels"):
        ax.legendlabels = [kwargs.get("label", None)]
    else:
        ax.legendlabels.append(kwargs.get("label", None))

# error: Argument 1 to "FixedLocator" has incompatible type "ndarray[Any, Any]"; expected "Sequence[float]"  [arg-type]
FixedLocator(ax.get_xticks()) 

# same also for set_ylabel, maybe use a protocol for str-coercible types instead of requiring str
def test(ax: Axes, label: Hashable): # Hashable might be too wide, but str seems to strict
    ax.set_xlabel(label)

def test(ax: Axes):
    axes = np.array([ax])
    # error: Argument 1 to "set_ticks_props" has incompatible type "ndarray[Any, dtype[Any]]"; expected "Axes | Sequence[Axes]"  [arg-type]
    set_ticks_props(axes, **kwargs)

Actual outcome

the above type errors

Expected outcome

Fewer errors, but still some errors - some of the pandas code is definitely to be blamed!

Additional information

No response

Operating system

No response

Matplotlib Version

3.8

Matplotlib Backend

No response

Python version

No response

Jupyter version

No response

Installation

None

@tacaswell
Copy link
Member

# same also for set_ylabel, maybe use a protocol for str-coercible types instead of requiring str
def test(ax: Axis, label: Hashable): # Hashable might be too wide, but str seems to strict
    ax.set_xlabel(label)

def test(ax: Axis):
    axes = np.array([ax])
    # error: Argument 1 to "set_ticks_props" has incompatible type "ndarray[Any, dtype[Any]]"; expected "Axes | Sequence[Axes]"  [arg-type]
    set_ticks_props(axes, **kwargs)

These should be Axes not Axis ?

@twoertwein
Copy link
Author

twoertwein commented Sep 21, 2023

Thank you - that might be my copy&past mistakes. The main point is that it doesn't work with an np.ndarray that contains the appropriate type.

(Sorry, not familiar with matplotlib and how pandas uses matplotlib)

edit: updated the examples

@oscargus
Copy link
Member

oscargus commented Sep 21, 2023

def test(cellText: np.ndarray):
    # error: Argument "cellText" to "table" has incompatible type "ndarray[Any, Any]"; expected "Sequence[Sequence[str]] | None"  [arg-type]
    # maybe use a Protocol such as Iterable?
    matplotlib.table.table(ax, cellText=cellText **kwargs)

This is documented as str all the way down to the Text object, where set_text will apply str on whatever comes in. So this can be changed, both in the documentation and in the typing stubs.

I'd suspect the same thing holds for set_*label.

The natural thing is to provide str here, but, as you point out, anything coercible to a str can be used. Which then is everything, so Any is correct, but still maybe not what we would like to encourage...

@ksunden
Copy link
Member

ksunden commented Sep 21, 2023

  • set_ha is a dynamically generated alias of set_horizontalalignment, so type-checked code should probably use the statically defined canonical version. (Without effectively undoing the reasons for having the dynamic alias helpers, we can't type hint aliases appropriately)

  • np.ndarray not being compatible with Sequence[Sequence[...]], maybe we can add an overload or something, but the key to when the latter is used is that ragged data structures are allowed, so ArrayLike and other nd-array type hints are insufficient themselves. Not 100% clear how to square that circle here.

  • ax.freq is not an attribute that is present on Axes or Axis objects by default, I don't see anywhere in the matplotlib codebase where that is added, I think that one is correct to flag as a type checker. Pandas is adding a dynamic attribute, here, I think. (same with legendlabels, actually)

  • FixedLocator, is partially that we do not specify the dtype on return of get_xticks, which we probably can do, but also np.NDArray[np.float64], still doesn't fully qualify as a Sequence, I guess, so similar to the Sequence[Sequence[...]] ones, may need to overload/union type (which is not ideal for maintainability) or figure out if there is a way to enforce that NDArray is a sequence (I think zero-D arrays are what throws that for a bit of a loop)... I thought NDArray met the definition of Sequence when I was writing the type hints initially, though, so I'm slightly confused by this.

  • Labels, I think the actual "proper" type hint might be:

class SupportsStr(Protocol):
   def __str__(self) -> str: ...

(the stdlib typing module provides similar protocols for int, float, etc, but not actually str, probably because object implements __str__)

Which effectively reduces to Any (aside from things that actively remove __str__, since object defines __str__)

I will note that in the case of set_xlabel, the docstring actually does declare the officially supported type as str, though (as does Axis.set_label_text), though it is eventually handed to Text.set_text which is actually type hinted as Any/docstring defined as object. So I was following the docstring here, but there is a reasonable argument that those could be expanded.

@ksunden
Copy link
Member

ksunden commented Sep 21, 2023

ndarray seems to not qualify under Sequence because it does not implement index or count, which are methods defined for Sequence, but when used as a parent class are implemented for you.

There does not seem to be another good ABC to use between Collection (which doesn't define __getitem__) and Sequence (which requires index/count)

I know numpy is trying to reduce the number of methods rather than increase it, but perhaps there is a compelling case to be made for these... though not fully sure index at least can be made sensibly such that type checkers will agree that it is implemented according to Sequence (though at least at runtime, no actual annotations are reported for Sequence methods, so who knows?)

@ksunden
Copy link
Member

ksunden commented Sep 21, 2023

@twoertwein Can you help me to run the type checks against pandas locally?

I had tried quite a bit earlier in the process running against pandas-stubs and it did not flag anything from mpl, so I moved on (and therefore didn't give you all as much of a warning as I did to some other projects where I did find things flagged, so sorry to have missed it, I just got a little hopeful that things just ended up working out even better than I thought)

Even now, if I do mypy pandas-stubs from that repo (with mpl git version), I don't see anything, and if I do mypy pandas from the main repo (with stubs installed) I see a lot of unrelated errors, mostly for Need type annotation, though I do additionally see some things related to Axes, though they all seem like legitimate flags for a static type checker (mostly that dynamic attributes seem to be added to them, like the examples regarding freq above.

I also tried mypy -m pandas to see if that changed anything, and didn't get any flags?

Edit: after writing this, I noticed the tests folder in the pandas-stubs repo, but even running those gave nothing related to mpl (some failures due to lack of optional dependenciees and some np.bool_ vs bool builtin errors)

Double edit: I think I got there, turns out git pull helps reduce the unrelated things. (I had pulled on stubs but not the main repo...) I guess I'm still a bit surprised that mypy doesn't flag anything when run on the stubs.

@twoertwein
Copy link
Author

twoertwein commented Sep 22, 2023

@twoertwein Can you help me to run the type checks against pandas locally?

The pandas mypy/pyright checks do not use pandas-stubs. It is best to follow the instructions here to set the development environment up. After that you can simply call mypy.

Thank you for spending much time on this summary. Sorry, I didn't intend to consume your time by debugging the issues over at pandas - I'm happy to cc you for targeted questions though :)

edit: also feel free to ping me if you like me to test the impact of important/big matplotlib type changes on internal pandas code.

@twoertwein
Copy link
Author

Thank you very much for the quick and active response! I will work on pandas PRs to get the internal pandas code (more) compliant with the matplotlib annotations (might take a bit).

Sorry, I'm not sure how actionable this issue is on the matplotlib side (obviously many todos on the pandas side). Feel free to close it.

@twoertwein
Copy link
Author

About ndarray and Sequence: Sequence is a concrete class, not a protocol. Unless ndarray inherits from it, it will never match it.

@ksunden
Copy link
Member

ksunden commented Sep 22, 2023

I mean, Sequence is very much an Abstract class, but you are correct that it does require inheritance to be considered (well, runtime you can register it... and in fact list/tuple/etc do not actually inherit from it, though they count for mypy purposes... I had been thinking that mypy had special handling that effectively treated it as a protocol, even though I knew it wasn't actually one at runtime, but I was in fact mistaken)

@BvB93
Copy link

BvB93 commented Sep 25, 2023

I mean, Sequence is very much an Abstract class, but you are correct that it does require inheritance to be considered...

I recall some discussions about changing this way back in the past, though nothing has come for it. If you do really want to express ndarray as a sequence you could consider using a slightly different definition: that of the sequence protocol as defined by the Python C API: https://docs.python.org/3/c-api/sequence.html#c.PySequence_Check. Not sure to what extent this would be practically useable in matplotlib, but this should take care of the whole "ndarray is not an abc.Sequence member"-problem.

from typing import Protocol, TypeVar

_T_co = TypeVar("_T_co", covariant=True)

class _PySequence(Protocol[_T_co]):
    def __len__(self) -> int: ...
    def __getitem__(self, __key: int) -> _T_co: ...

@twoertwein
Copy link
Author

There is also a Sequence-like protocol that disallows strings hauntsaninja/useful_types#14 It could probably be adjusted to also fit ndarray.

(Pandas has the same issue: Index and Series are incompatible with Sequence.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants