Skip to content

csv2rec handles dates differently to datetimes when datefirst is specified. #6184

Closed
@ultra-andy

Description

@ultra-andy

If a test.csv file contains datetime type entries (ie, file contains "11/01/14 12:00:01 AM" only), and you attempt to run the following script:

import matplotlib.mlab as mlab
import datetime as datetime

dataset = mlab.csv2rec('test.csv', delimiter=',', names=['datetime'], dayfirst = True)
#dataset = mlab.csv2rec('test.csv', delimiter=',',converterd={0: lambda x: datetime.datetime.strptime(x, '%d/%m/%y %H:%M:%S %p'),}, names= ['datetime'], dayfirst = True)

for elem in dataset:
    print(elem, end = "\n")

...then the printed result misinterprets the datetime string by swapping the day and the month.

The problem is that there are inconsistent handlings of formats in the converter candidate dictionary in mlab.csv2rec. If the time entry is a simple date ("11/01/14"), then the dates will be echoed by the print statement correctly. If the column in the .csv file contains a datetime object (ie date and time, for instance "11/01/14 12:00:01 AM"), then the datetime data will be interpreted as though dayfirst and yearfirst have NOT been specified, and so will NOT be echoed correctly.

This happens because the mydate converter (which uses dayfirst and yearfirst) returns an error if d.hour or d.minute or d.second is greater than zero, but the alternative mydateparser converter, which does not return an error, does not interpret dayfirst or yearfirst, so assumes monthfirst. So then datetime strings in the .csv file are not interpreted correctly if they are other than month first format, and are a datetime string with hour or minute or second other than zero.

In other words, mydateparser is NOT consistent with mydate. Mydateparser should reflect the approach used for mydate and also refer to the dayfirst and yearfirst arguments so that datetime elements of the .csv file column are interpreted in a way that matches dayfirst and/or yearfirst arguments in the call, if specified as True.

A fix that works in my own C:\Python33\Lib\site-packages\matplotlib\mlab.py file is to replace this line:

mydateparser = with_default_value(dateparser, datetime.date(1,1,1))

with this block:

def mydateparser(x):
    # try and return a datetime object
    d = dateparser(x, dayfirst=dayfirst, yearfirst=yearfirst)
    return d
mydateparser = with_default_value(mydateparser, datetime.datetime(1,1,1,0,0,0))

This problem is also present in Python 3.4, and probably exists in other releases as well. This appears to be a design bug. It is a dangerous bug because it fails in a way that is inconsistent with other aspects of function behaviour, and can silently leave the user with invalid data despite the correct format having been specified in the function call.

The proposed fix shouldn't break any code that uses the workaround of explicitly specifying a converterd in the csv2rec arguments (which also works) - this fix has been commented out in the code script above.

I am assuming that the test for date objects in mydate should continue to check that hour and minute and second are all equal to 0, since I presume this was put in for a reason (which I don't understand), although modifying this part of the code to return a date in the correct format if there is no non-zero hour/minute/second component, and to return a datetime in the correct format if there is, could be a simpler solution.

I'm new to this bug reporting system, so apologies if I've done anything unconventional or improper - please just point me in the right direction! I've had a crack at creating a pull request (best way to learn!), but happy for people to make suggestions as to how to do the pull request better.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions