-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Impossible creation of array with dtype=string #61263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…cation issues (pandas-dev#60954) with changes
Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>
Also, please add a test for this. |
pre-commit.ci autofix |
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use pytest for testing, you'll need to add a test using that format. See here:
https://pandas.pydata.org/pandas-docs/dev/development/contributing_codebase.html#using-pytest
The general pytest introduction may also be useful:
Thank you for the details, will work on it |
@rhshadrach I’ve been testing the following case in
However, the test fails with So currently, the list of lists gets converted into a 1D NumPy array of strings. Thanks! |
I believe converting to a 1-dimesional ndarray of strings is the expected behavior of |
Thanks for the clarification! You're right — the behavior of
My initial assumption was that it should preserve the list structure instead of converting to strings, but after re-evaluating and running the test, I see that the 1D array of strings is indeed the intended behavior. The test has now been updated and passes successfully and got the below output
Please let me know if I need to change anything |
@Manju080 - the last change I'm seeing is from 3 weeks ago. Perhaps you need to push some commits? |
That's right, I just wanna make sure before committing the changes. |
…o bugfix-61155
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good!
Co-authored-by: Richard Shadrach <45562402+rhshadrach@users.noreply.github.com>
Apologies for the causing confusion, I will work this to fix. |
@rhshadrach Thank you very much, required changes are done. |
pre-commit.ci autofix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Thanks @Manju080 |
* DOC: Update warning in Index.values docstring to clarify index modification issues (pandas-dev#60954) * DOC: Update warning in Index.values docstring to clarify index modification issues (pandas-dev#60954) with changes * Update pandas/core/indexes/base.py Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> * DOC : Fixing the whitespace which was causing error * Fixed docstring validation and formatting issues * BUG: Fix array creation for string dtype with inconsistent list lengths (pandas-dev#61155) * BUG: Fix array creation for string dtype with inconsistent list lengths (pandas-dev#61155) * BUG fix GH#61155 v2 * BUG fix GH#61155 with test case for list of lists handling * Fix formatting in test_string_array.py (pre-commit autofix) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add test for list of lists handling in ensure_string_array (GH#61155) * fixing checks * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update pandas/tests/libs/test_lib.py Co-authored-by: Richard Shadrach <45562402+rhshadrach@users.noreply.github.com> * Remove pandas/tests/arrays/test_string_array.py as requested * wrong fiel base.py * Remove check for nested lists in scalars in string_.py first try * Revert unintended changes to base.py --------- Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Richard Shadrach <45562402+rhshadrach@users.noreply.github.com>
closes #61155
Hello @rhshadrach ,
I’ve created a fix that raises a ValueError when trying to create a StringArray from a list of lists with inconsistent lengths or non-character elements. This aligns the behavior for both consistent and inconsistent input formats and also tested.
I've would like to hear opinion to raise an error when a list of lists is passed for
dtype=StringDtype
, to avoid ambiguous behavior. If preferred, we could instead join the inner lists into strings automatically — happy to adjust based on guidance.Example case :
pd.array([["t", "e", "s", "t"], ["w", "o", "r", "d"]], dtype="string")
output : <StringArray> ['test', 'word'] Length: 2, dtype: string
Thanks