Skip to content

Fix/26837 format array #44527

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

TomAugspurger
Copy link
Contributor

@TomAugspurger TomAugspurger commented Nov 19, 2021

The basic strategy is to move format_array from pandas.io.formats.format to ExtensionArray._format_array. By default, we convert to ndarray and then use the existing formatting mechanism on the ndarray.

Tom Augspurger added 2 commits November 19, 2021 11:25
Moves responsibility for converting EAs to a List[str] from
pandas.io.formats to a method on the EA.
leading_space,
quoting,
)
elif is_datetime64_dtype(values.dtype):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should check to see if these datetlike are actually hit anymore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yah. i expect that if you use isinstance(values, ExtensionArray) instead of is_extension_array_dtype above you can get the dt64 and td64 cases at the same time

@@ -681,6 +685,43 @@ def _format_native_types(
self.asi8, tz=self.tz, format=fmt, na_rep=na_rep
)

def _format_array(
self,
formatter: Callable | None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im not too familiar with how this gets reached. is e.g. formatter going to always be self._formatter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then can we do without the formatter arg?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I was incorrect. formatter might be a user-supplied callable in cases like df.to_html(formatter={"col": formatter}).

@jreback jreback added Output-Formatting __repr__ of pandas objects, to_string Refactor Internal refactoring of code labels Nov 20, 2021
@jorisvandenbossche jorisvandenbossche added the ExtensionArray Extending pandas with custom dtypes or arrays. label Nov 22, 2021

from pandas.io.formats.format import format_array

values = extract_array(self, extract_numpy=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would we need to extract_array? i guess for PandasArray?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my guess too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you check if its really necessary and if so, add a comment as to why

@jbrockmendel
Copy link
Member

Can we use this to render unnecessary any of the other formatting-related methods out there? in particular im thinking of

  • DTA/TDA/PA._format_native_types
  • DTI/TDI/PI._format_with_header (matches Index._format_with_header but with different na_rep and extra kwarg)
  • IntervalIndex._format_native_types (matches Index._format_native_types but with different default value for na_rep)
  • IntervalIndex._format_data (# TODO: integrate with categorical and make generic)

@TomAugspurger
Copy link
Contributor Author

I'm not sure, but it's not obvious to me. _format_array is designed to be generic. I would assume those are more specific and rely on some structure that can't be assumed in _format_array.

decimal: str = ".",
leading_space: bool | None = True,
quoting: int | None = None,
) -> list[str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment this seems like adding a lot of boilerplate that could be handled in the base class no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is slightly different than the Categorical case. Categorical wants to change the values passed to the fmt_klass. This is actually changing the fmt_klass itself.

We could add some method to the interface to get the formatting class for an array. I don't really think that we want to publicly expose the ArrayFormatter interface publicly though.

If it's purely about the lines of code here, we could have a "private" _fmt_klass on our DatetimeArrays and check for that attribute, and use it in the base class.

# in ExtensionArray._format_array
if hasattr(self, "_format_class"):
    fmt_klass = self._format_class
else:
    fmt_klass = GenericArrayFormatter

I dunno. This is all kind of messy.

self,
formatter: Callable | None,
*,
float_format: FloatFormatType,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we take this opportunity to make a FormatOptionsType that has all of these defaults?

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain how this related to the existing _formatter mechanism?
(here in the PR for reviewing purpose, but it will also need to explained in the implementer's notes in arrays/base.py I think)

@github-actions
Copy link
Contributor

github-actions bot commented Jan 1, 2022

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Jan 1, 2022
@jreback
Copy link
Contributor

jreback commented Jan 16, 2022

@TomAugspurger can you merge main

@mroeschke
Copy link
Member

Thanks for the PR, but it appears to have gone stale. Feel free to reopen when you have time to revisit.

@mroeschke mroeschke closed this Mar 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Output-Formatting __repr__ of pandas objects, to_string Refactor Internal refactoring of code Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor format_array in pandas\io\formats\format.py
5 participants