Skip to content

Wide arg read csv excel (issue #574) #588

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

alixdamman
Copy link
Collaborator

Documentation for argument wide not yet written. Will write documentation when implementation is OK.

@alixdamman
Copy link
Collaborator Author

@gdementen add argument wide to Session.save and load (assuming all arrays are stored in wide/narrow format)?

@@ -306,6 +317,8 @@ Miscellaneous improvements
Closes :issue:`549`.



Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this a bit too much blank lines?

@@ -205,6 +205,9 @@ def df_aslarray(df, sort_rows=False, sort_columns=False, raw=False, parse_header
parse_header : bool, optional
Whether or not to parse columns labels. Pandas treats column labels as strings.
If True, column labels are converted into int, float or boolean when possible. Defaults to True.
wide: bool, optional
...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO

if not wide:
if nb_axes is not None:
nb_axes = None
warnings.warn("`nb_axes` argument cannot be used when `wide` argument is False")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be an exception instead of a warning?

warnings.warn("`nb_axes` argument cannot be used when `wide` argument is False")
if index_col is not None:
index_col = None
warnings.warn("`index_col` argument cannot be used when `wide` argument is False")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idem

@@ -288,6 +327,9 @@ def read_csv(filepath_or_buffer, nb_axes=None, index_col=None, sep=',', headerse
sort_columns : bool, optional
Whether or not to sort the columns alphabetically (sorting is more efficient than not sorting).
Defaults to False.
wide : bool, optional
...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO

@@ -540,6 +607,9 @@ def from_lists(data, nb_axes=None, index_col=None, fill_value=np.nan, sort_rows=
sort_columns : bool, optional
Whether or not to sort the columns alphabetically (sorting is more efficient than not sorting).
Defaults to False.
wide: bool, optional
...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO

@@ -1,2 +1,2 @@
time,2007,2010,2013
a,a0,a1,a2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using a0 etc... is probably a good idea, but then we need another test for "int" labels (to check they are parsed to int correctly and do not stay strings). This could use from_string or from_list instead of a file though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't already the case for from_lists and from_string?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the from_list test does not count as the year labels already are of the correct type (it uses 1991; not '1991'), and there is no test in from_string doctest with int-like labels in that last axis, and the tests do not check the type of the labels.

a\b,0,1,2
0,0,1,2
1,3,4,5
a\b,b0,b1,b2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't using a 2x3 (or 3x2) array instead of this 3x3 make all tests smaller?

@@ -0,0 +1,22 @@
a,b,c,value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice if this test also included missing values for c

self.assertEqual(la.axes.names, ['arr', 'age', 'sex', 'nat', 'time'])
assert_array_equal(la[X.arr[1], 0, 'F', X.nat[1], :],
[3722, 3395, 3347])
arr = read_excel(inputpath('test.xlsx'), '5d')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsure having a 5d test makes sense. I think it was just a relic of what we used as test data initially

@gdementen
Copy link
Contributor

@gdementen add argument wide to Session.save and load (assuming all arrays are stored in wide/narrow format)?

Would make sense, but that's lower priority IMO => open an issue for this?

@gdementen
Copy link
Contributor

btw: using ndtest to create the test files is a good idea and was long overdue. This partly solves #26.

a1,3,4,5
a2,6,7,8
a\b,b0,b1
1,0,1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment about int-like labels was meant for the "horizontal" dimension, which is not handled by pandas but manually parsed by us. Having a test for int-like in a column is interesting too (to make sure Pandas does not change its handling of them) but less critical. In any case, we should have a test explicitly telling in a comment (or test name?) that this is what we test, so that we do not break it accidentally.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I add a new test array with 'int' labels for all axes, is that OK?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure!

- updated CSV test files + test.xlsx

- updated unittests test_read_csv, test_read_excel_pandas, test_read_excel_xlwings and test_to_csv
…l, from_lists and from_strings functions + and updated df_aslarray so as to be able to load arrays stored in narrow format
@alixdamman alixdamman force-pushed the wide_arg_read_csv_excel_574 branch from 6a9e5dc to 7964815 Compare February 22, 2018 15:08
@alixdamman alixdamman merged commit 2f5b0d3 into larray-project:master Feb 22, 2018
@alixdamman alixdamman deleted the wide_arg_read_csv_excel_574 branch February 22, 2018 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants