-
Notifications
You must be signed in to change notification settings - Fork 311
Improve TIME/DATETIME2 precision and handle DATETIMEOFFSET #665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The dbanydatecrack function, provided since FreeTDS 0.95.73, can be used to unpack any of the various date-and-time data types supported by FreeTDS, without loss of precision and without the need to convert into an intermediate format.
TIME and DATETIME2 values can have precision varying between one second (TIME(0), DATETIME2(0)) up to 100 nanoseconds (TIME(7), DATETIME2(7)). (In contrast, the older DATETIME type has precision of 1/300 second, or about 3.333 milliseconds.) When a TIME or DATETIME2 value is retrieved from the database (assuming we are using TDS 7.3 or later), it is converted into a datetime.time or datetime.datetime object, respectively. When doing so, the precision of the original value should be retained if possible. To accomplish this, use the new dbanydatecrack function, instead of first converting the value into a DATETIME and then using dbdatecrack. (Unfortunately, the Python datetime.time and datetime.datetime types provide one-microsecond precision at best. So TIME(7) and DATETIME2(7) values are still rounded to the nearest microsecond; but this is better than first rounding to the nearest 1/300 second and then rounding to the nearest millisecond.) As a result of this change: >>> cursor.execute("SELECT CAST('12:34:56.789123' AS TIME)") >>> cursor.fetchall() [(datetime.time(12, 34, 56, 789123),)] >>> cursor.execute("SELECT CAST('2000-01-01 12:34:56.789123' AS DATETIME2)") >>> cursor.fetchall() [(datetime.datetime(2000, 1, 1, 12, 34, 56, 789123),)] whereas previously: >>> cursor.execute("SELECT CAST('12:34:56.789123' AS TIME)") >>> cursor.fetchall() [(datetime.time(12, 34, 56, 790000),)] >>> cursor.execute("SELECT CAST('2000-01-01 12:34:56.789123' AS DATETIME2)") >>> cursor.fetchall() [(datetime.datetime(2000, 1, 1, 12, 34, 56, 790000),)]
This class can be used to represent a time zone with an arbitrary fixed offset from UTC. This class is essentially a simpler version of the datetime.timezone class (which is part of the standard library since Python 3.2); this is needed because pymssql still supports Python 2.7. In the future, this class may be removed and replaced with datetime.timezone.
The built-in DATETIMEOFFSET type is similar to DATETIME2, but explicitly represents the offset between UTC and the local time zone, so that timestamps can be manipulated sensibly even if they originate from different time zones (or the same time zone at different times of the year.) The natural equivalent in Python is an "aware" datetime object, i.e. a timestamp with an associated "tzinfo" object indicating the time zone. Thus, when a DATETIMEOFFSET value is retrieved from the database (which is only possible if we are using TDS 7.3 or later), convert it into an aware datetime object. Since pymssql still supports Python 2.7, which lacks the built-in datetime.timezone class, the MSSQLTimezone class is used instead. As with DATETIME2, parsing a DATETIMEOFFSET value, without loss of precision, requires using the dbanydatecrack function rather than dbdatecrack. As a result of this change: >>> cursor.execute( ... "SELECT CAST('2000-01-01 12:34:56.789123 -05:00' AS DATETIMEOFFSET)") >>> cursor.fetchall() [(datetime.datetime(2000, 1, 1, 12, 34, 56, 789123, tzinfo=MSSQLTimezone(datetime.timedelta(days=-1, seconds=68400))),)] whereas previously: >>> cursor.execute( ... "SELECT CAST('2000-01-01 12:34:56.789123 -05:00' AS DATETIMEOFFSET)") >>> cursor.fetchall() [(b'\x1e\x89\xd3_\x93\x00\x00\x00\xac\x8e\x00\x00\xd4\xfe\x07\xe0',)]
When retrieving a huge number of values from the database, the time and memory overhead of creating a new MSSQLTimezone object for every value becomes significant. In most cases, there will only be one or two distinct time zones in use (and in the worst case, never more than 1,681.) As a result, it is useful to memoize these objects. Using a test database consisting of one million random DATETIMEOFFSET values (with the offset randomly chosen as either -05:00 or -04:00), this more than doubles parsing speed: CPython 2.7, without memoization: ~4.2 seconds per 1,000,000 rows CPython 2.7, with memoization: ~1.5 seconds per 1,000,000 rows CPython 3.7, without memoization: ~4.4 seconds per 1,000,000 rows CPython 3.7, with memoization: ~1.7 seconds per 1,000,000 rows
This can be used to vary the formatting of query parameters based on the active TDS protocol version. Each of the following internal functions now accepts an optional tds_version_tuple argument: _substitute_params (called by MSSQLConnection.format_sql_command) _quote_data (called by _substitute_params) _quote_or_flatten (called by _quote_data) _quote_simple_value (called by _quote_or_flatten and _quote_data) (_quote_data contains an apparent bug, in that the charset is ignored for simple values. This is not addressed here.)
If the TDS version is 7.3 or later, convert datetime parameters to DATETIME2 or DATETIMEOFFSET, as appropriate. Don't do this for TDS 7.2 and earlier; both for symmetry and for backward compatibility. The historical behavior of pymssql is that when a datetime object is used as a query parameter, it is serialized as something like: {ts '2000-01-01 12:34:56.789'} which is more-or-less equivalent to: CAST('2000-01-01 12:34:56.789' AS DATETIME) In particular, the "ts" format: - forbids specifying more than 3 digits after the decimal point; - forbids specifying a time zone offset; - implicitly converts the value to DATETIME; i.e., the value is rounded to the nearest 1/300 second. Moreover, note that the historical behavior is to truncate values to the next lower millisecond, rather than rounding to the nearest. So a value such as datetime.datetime(2000, 1, 1, 12, 34, 56, 8999) is converted to "{ts '2000-01-01 12:34:56.008'}", which is interpreted by the server as 2000-01-01 12:34:56.0066667. Here, we maintain the historical behavior as long as TDS 7.2 or earlier is used. If TDS 7.3 or newer is used, values are instead serialized as something like: CAST('2000-01-01 12:34:56.789123' AS DATETIME2) or: CAST('2000-01-01 12:34:56.789123-05:00' AS DATETIMEOFFSET) (Note that SQL Server permits implicit conversions between all of these types; for example, inserting either of these expressions into a DATETIME column gives the same result as the "ts" expression above.) As a result of this change, when using TDS 7.3: >>> cursor.execute("SELECT %s", ... datetime.datetime(2000, 1, 1, 12, 34, 56, 8999)) >>> cursor.fetchall() [(datetime.datetime(2000, 1, 1, 12, 34, 56, 8999),)] >>> cursor.execute("SELECT %s", ... datetime.datetime(2000, 1, 1, 12, 34, 56, 8999, ... datetime.timezone(datetime.timedelta(hours=1)))) >>> cursor.fetchall() [(datetime.datetime(2000, 1, 1, 12, 34, 56, 8999, tzinfo=MSSQLTimezone(datetime.timedelta(days=seconds=3600))),)] Previously, or when using TDS 7.2: >>> cursor.execute("SELECT %s", ... datetime.datetime(2000, 1, 1, 12, 34, 56, 8999)) >>> cursor.fetchall() [(datetime.datetime(2000, 1, 1, 12, 34, 56, 7000),)] >>> cursor.execute("SELECT %s", ... datetime.datetime(2000, 1, 1, 12, 34, 56, 8999, ... datetime.timezone(datetime.timedelta(hours=1)))) >>> cursor.fetchall() [(datetime.datetime(2000, 1, 1, 12, 34, 56, 7000),)] This has the desirable properties that: - clients that are talking to old versions of SQL Server continue working as before; - clients that explicitly select TDS 7.2 or earlier continue working as before; - clients that are talking to new versions of SQL Server, using a new protocol version, obtain consistent and intuitive behavior. It has some drawbacks: - time zone information is (still) silently lost when using TDS 7.2 or earlier; - clients that automatically select the TDS protocol version will get different rounding behavior depending on the server version; - clients that explicitly select TDS 7.3 will get different (better) rounding behavior than they had with pymssql 2.1.4.
TIME values are deserialized as datetime.time objects. For symmetry, datetime.time objects should also be allowed as query parameters. Since there are no historical (pre-TDS-7.3) data types corresponding to TIME, this always uses the TIME type, and preserves microsecond precision.
In a test case, if two values that are expected to be equal are not, include the representations of both values as part of the AssertionError message, to aid in debugging.
The internal function get_sql_server_version is used by some test cases to check which actual version of SQL Server is in use. Previously (due to a typo), for versions newer than SQL Server 2012, this function would return 2000. Instead, return 2014 in that case. This function still ought to be updated to recognize the version codes for newer server versions.
It is useful to be able to perform tests using different TDS protocol versions. Make this something that can be configured explicitly in the tests.cfg file, or alternatively using the environment variable PYMSSQL_TEST_TDS_VERSION.
Test that it is possible to insert a datetime.time value as a query parameter, and that the value survives a round trip.
Test that it is possible to insert an aware datetime.datetime value as a query parameter, and that the value (including the timezone offset, and including microsecond precision) survives a round trip. Furthermore, test that Python's interpretation of timezone offsets matches the interpretation on the server: using Python's "astimezone" to translate the value into a different timezone should yield a DATETIMEOFFSET value that is considered "equal" by the server.
Test that when a datetime.datetime or datetime.time value is used as a query parameter (with TDS 7.3 or later), the value survives a round trip with microsecond precision.
After thinking more about issue #662, I realize that it's more complicated than I thought. The issue there is indeed an arguable bug in dbconvert and/or dbdatecrack. But my implementation has the same fundamental problem:
Now, for DATETIME2 and DATETIMEOFFSET, this could in principle be solved by allowing the value to overflow to the next day (i.e. '2000-01-01 23:59:59.9999999' should be rounded to '2000-01-02 00:00:00.000000'). Intuitively this feels like the correct behavior. But that doesn't work for TIME! Rounding '23:59:59.9999999' to '00:00:00.000000' would be very wrong in most circumstances. I can see a few possible solutions:
The third option appears to be what SQL Server itself does:
What do other people think? Which of these options is least bad? |
@bemoody can this somehow be circumvented without the patch at the moment? |
First of all, I'm not regularly using pymssql anymore and haven't kept up with recent releases, so I don't know if anything has changed since 2.1.4. (If there's real interest in merging these changes, I'd certainly be willing to dust the code off and try to fix the rounding issue - but this issue is no longer a high priority for me personally.) My practical recommendation, if you need to work with date/time values using pymssql 2.1.4, would be to convert all the values into strings and make the conversions explicit in your application. For example, if you want to query DATETIME2 or DATETIMEOFFSET values, then set If you have a |
These changes make a number of improvements in the handling of date/time data types.
DATETIME2
DATETIME2 values can have precision of up to 0.1 microsecond. However, pymssql 2.1.4 converts these values into DATETIME (both upon insertion and upon retrieval), so precision is lost both ways.
To correctly deserialize DATETIME2 values when they are retrieved from the database, we use
dbanydatecrack
(available since FreeTDS 0.95.73), as opposed to the combination ofdbconvert
anddbdatecrack
.To correctly serialize datetime values when they are inserted into the database, we convert the value to an expression like
CAST('2000-01-01 12:34:56.789123' AS DATETIME2)
, as opposed to{ts '2000-01-01 12:34:56.789'}
. For backward compatibility, however, this is only done if the TDS version is 7.3 or later.TIME
TIME is like DATETIME2 but includes only the time of day.
To deserialize these values, we use
dbanydatecrack
.To serialize datetime.time values, we use an expression like
CAST('12:34:56.789123' AS TIME)
. (pymssql 2.1.4 doesn't support use of datetime.time as a query parameter.)DATETIMEOFFSET
DATETIMEOFFSET is like DATETIME2 but also includes the current offset, in minutes, between local time and UTC. The closest semantic equivalent in Python is an "aware" datetime object.
To deserialize these values, we use
dbanydatecrack
. Since Python 2.7 lacks the built-indatetime.timezone
class, we provide an equivalent helper classMSSQLTimezone
.To serialize these values, we convert the value to an expression like
CAST('2000-01-01 12:34:56.789123 +01:00' AS DATETIMEOFFSET)
, as opposed to{ts '2000-01-01 12:34:56.789'}
. For backward compatibility, this is only done if the TDS version is 7.3 or later.Miscellaneous notes
I've also made several small fixes to the test suite, to enable these behaviors to be tested.
This fixes issues #608, #646, and #649.
This appears to fix issue #662, although I don't understand why. Perhaps a bug in
dbconvert
?Comments welcome! I'm far from being an expert on SQL Server, FreeTDS, or SQL in general, so there may be better ways to accomplish these things, or subtleties that I've missed.