-
Notifications
You must be signed in to change notification settings - Fork 6
Wide arg read csv excel (issue #574) #588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wide arg read csv excel (issue #574) #588
Conversation
@gdementen add argument |
@@ -306,6 +317,8 @@ Miscellaneous improvements | |||
Closes :issue:`549`. | |||
|
|||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this a bit too much blank lines?
larray/inout/array.py
Outdated
@@ -205,6 +205,9 @@ def df_aslarray(df, sort_rows=False, sort_columns=False, raw=False, parse_header | |||
parse_header : bool, optional | |||
Whether or not to parse columns labels. Pandas treats column labels as strings. | |||
If True, column labels are converted into int, float or boolean when possible. Defaults to True. | |||
wide: bool, optional | |||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO
larray/inout/array.py
Outdated
if not wide: | ||
if nb_axes is not None: | ||
nb_axes = None | ||
warnings.warn("`nb_axes` argument cannot be used when `wide` argument is False") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be an exception instead of a warning?
larray/inout/array.py
Outdated
warnings.warn("`nb_axes` argument cannot be used when `wide` argument is False") | ||
if index_col is not None: | ||
index_col = None | ||
warnings.warn("`index_col` argument cannot be used when `wide` argument is False") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idem
larray/inout/array.py
Outdated
@@ -288,6 +327,9 @@ def read_csv(filepath_or_buffer, nb_axes=None, index_col=None, sep=',', headerse | |||
sort_columns : bool, optional | |||
Whether or not to sort the columns alphabetically (sorting is more efficient than not sorting). | |||
Defaults to False. | |||
wide : bool, optional | |||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO
larray/inout/array.py
Outdated
@@ -540,6 +607,9 @@ def from_lists(data, nb_axes=None, index_col=None, fill_value=np.nan, sort_rows= | |||
sort_columns : bool, optional | |||
Whether or not to sort the columns alphabetically (sorting is more efficient than not sorting). | |||
Defaults to False. | |||
wide: bool, optional | |||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO
@@ -1,2 +1,2 @@ | |||
time,2007,2010,2013 | |||
a,a0,a1,a2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using a0 etc... is probably a good idea, but then we need another test for "int" labels (to check they are parsed to int correctly and do not stay strings). This could use from_string or from_list instead of a file though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't already the case for from_lists
and from_string
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the from_list test does not count as the year labels already are of the correct type (it uses 1991; not '1991'), and there is no test in from_string doctest with int-like labels in that last axis, and the tests do not check the type of the labels.
larray/tests/data/test2d.csv
Outdated
a\b,0,1,2 | ||
0,0,1,2 | ||
1,3,4,5 | ||
a\b,b0,b1,b2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't using a 2x3 (or 3x2) array instead of this 3x3 make all tests smaller?
@@ -0,0 +1,22 @@ | |||
a,b,c,value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice if this test also included missing values for c
larray/tests/test_array.py
Outdated
self.assertEqual(la.axes.names, ['arr', 'age', 'sex', 'nat', 'time']) | ||
assert_array_equal(la[X.arr[1], 0, 'F', X.nat[1], :], | ||
[3722, 3395, 3347]) | ||
arr = read_excel(inputpath('test.xlsx'), '5d') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unsure having a 5d test makes sense. I think it was just a relic of what we used as test data initially
Would make sense, but that's lower priority IMO => open an issue for this? |
btw: using ndtest to create the test files is a good idea and was long overdue. This partly solves #26. |
larray/tests/data/test2d.csv
Outdated
a1,3,4,5 | ||
a2,6,7,8 | ||
a\b,b0,b1 | ||
1,0,1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment about int-like labels was meant for the "horizontal" dimension, which is not handled by pandas but manually parsed by us. Having a test for int-like in a column is interesting too (to make sure Pandas does not change its handling of them) but less critical. In any case, we should have a test explicitly telling in a comment (or test name?) that this is what we test, so that we do not break it accidentally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I add a new test array with 'int' labels for all axes, is that OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure!
- updated CSV test files + test.xlsx - updated unittests test_read_csv, test_read_excel_pandas, test_read_excel_xlwings and test_to_csv
…l, from_lists and from_strings functions + and updated df_aslarray so as to be able to load arrays stored in narrow format
6a9e5dc
to
7964815
Compare
Documentation for argument
wide
not yet written. Will write documentation when implementation is OK.