Skip to content

Improve TIME/DATETIME2 precision and handle DATETIMEOFFSET #665

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

bemoody
Copy link

@bemoody bemoody commented Oct 29, 2019

These changes make a number of improvements in the handling of date/time data types.

DATETIME2

DATETIME2 values can have precision of up to 0.1 microsecond. However, pymssql 2.1.4 converts these values into DATETIME (both upon insertion and upon retrieval), so precision is lost both ways.

To correctly deserialize DATETIME2 values when they are retrieved from the database, we use dbanydatecrack (available since FreeTDS 0.95.73), as opposed to the combination of dbconvert and dbdatecrack.

To correctly serialize datetime values when they are inserted into the database, we convert the value to an expression like CAST('2000-01-01 12:34:56.789123' AS DATETIME2), as opposed to {ts '2000-01-01 12:34:56.789'}. For backward compatibility, however, this is only done if the TDS version is 7.3 or later.

TIME

TIME is like DATETIME2 but includes only the time of day.

To deserialize these values, we use dbanydatecrack.

To serialize datetime.time values, we use an expression like CAST('12:34:56.789123' AS TIME). (pymssql 2.1.4 doesn't support use of datetime.time as a query parameter.)

DATETIMEOFFSET

DATETIMEOFFSET is like DATETIME2 but also includes the current offset, in minutes, between local time and UTC. The closest semantic equivalent in Python is an "aware" datetime object.

To deserialize these values, we use dbanydatecrack. Since Python 2.7 lacks the built-in datetime.timezone class, we provide an equivalent helper class MSSQLTimezone.

To serialize these values, we convert the value to an expression like CAST('2000-01-01 12:34:56.789123 +01:00' AS DATETIMEOFFSET), as opposed to {ts '2000-01-01 12:34:56.789'}. For backward compatibility, this is only done if the TDS version is 7.3 or later.

Miscellaneous notes

I've also made several small fixes to the test suite, to enable these behaviors to be tested.

This fixes issues #608, #646, and #649.

This appears to fix issue #662, although I don't understand why. Perhaps a bug in dbconvert?

Comments welcome! I'm far from being an expert on SQL Server, FreeTDS, or SQL in general, so there may be better ways to accomplish these things, or subtleties that I've missed.

Benjamin Moody added 16 commits October 29, 2019 18:21
The dbanydatecrack function, provided since FreeTDS 0.95.73, can be
used to unpack any of the various date-and-time data types supported
by FreeTDS, without loss of precision and without the need to convert
into an intermediate format.
TIME and DATETIME2 values can have precision varying between one
second (TIME(0), DATETIME2(0)) up to 100 nanoseconds (TIME(7),
DATETIME2(7)).  (In contrast, the older DATETIME type has precision of
1/300 second, or about 3.333 milliseconds.)

When a TIME or DATETIME2 value is retrieved from the database
(assuming we are using TDS 7.3 or later), it is converted into a
datetime.time or datetime.datetime object, respectively.  When doing
so, the precision of the original value should be retained if
possible.

To accomplish this, use the new dbanydatecrack function, instead of
first converting the value into a DATETIME and then using dbdatecrack.

(Unfortunately, the Python datetime.time and datetime.datetime types
provide one-microsecond precision at best.  So TIME(7) and
DATETIME2(7) values are still rounded to the nearest microsecond; but
this is better than first rounding to the nearest 1/300 second and
then rounding to the nearest millisecond.)

As a result of this change:

  >>> cursor.execute("SELECT CAST('12:34:56.789123' AS TIME)")
  >>> cursor.fetchall()
  [(datetime.time(12, 34, 56, 789123),)]
  >>> cursor.execute("SELECT CAST('2000-01-01 12:34:56.789123' AS DATETIME2)")
  >>> cursor.fetchall()
  [(datetime.datetime(2000, 1, 1, 12, 34, 56, 789123),)]

whereas previously:

  >>> cursor.execute("SELECT CAST('12:34:56.789123' AS TIME)")
  >>> cursor.fetchall()
  [(datetime.time(12, 34, 56, 790000),)]
  >>> cursor.execute("SELECT CAST('2000-01-01 12:34:56.789123' AS DATETIME2)")
  >>> cursor.fetchall()
  [(datetime.datetime(2000, 1, 1, 12, 34, 56, 790000),)]
This class can be used to represent a time zone with an arbitrary
fixed offset from UTC.

This class is essentially a simpler version of the datetime.timezone
class (which is part of the standard library since Python 3.2); this
is needed because pymssql still supports Python 2.7.  In the future,
this class may be removed and replaced with datetime.timezone.
The built-in DATETIMEOFFSET type is similar to DATETIME2, but
explicitly represents the offset between UTC and the local time zone,
so that timestamps can be manipulated sensibly even if they originate
from different time zones (or the same time zone at different times of
the year.)

The natural equivalent in Python is an "aware" datetime object, i.e. a
timestamp with an associated "tzinfo" object indicating the time zone.

Thus, when a DATETIMEOFFSET value is retrieved from the database
(which is only possible if we are using TDS 7.3 or later), convert it
into an aware datetime object.

Since pymssql still supports Python 2.7, which lacks the built-in
datetime.timezone class, the MSSQLTimezone class is used instead.

As with DATETIME2, parsing a DATETIMEOFFSET value, without loss of
precision, requires using the dbanydatecrack function rather than
dbdatecrack.

As a result of this change:

  >>> cursor.execute(
  ...     "SELECT CAST('2000-01-01 12:34:56.789123 -05:00' AS DATETIMEOFFSET)")
  >>> cursor.fetchall()
  [(datetime.datetime(2000, 1, 1, 12, 34, 56, 789123, tzinfo=MSSQLTimezone(datetime.timedelta(days=-1, seconds=68400))),)]

whereas previously:

  >>> cursor.execute(
  ...     "SELECT CAST('2000-01-01 12:34:56.789123 -05:00' AS DATETIMEOFFSET)")
  >>> cursor.fetchall()
  [(b'\x1e\x89\xd3_\x93\x00\x00\x00\xac\x8e\x00\x00\xd4\xfe\x07\xe0',)]
When retrieving a huge number of values from the database, the time
and memory overhead of creating a new MSSQLTimezone object for every
value becomes significant.  In most cases, there will only be one or
two distinct time zones in use (and in the worst case, never more than
1,681.)  As a result, it is useful to memoize these objects.

Using a test database consisting of one million random DATETIMEOFFSET
values (with the offset randomly chosen as either -05:00 or -04:00),
this more than doubles parsing speed:

CPython 2.7, without memoization: ~4.2 seconds per 1,000,000 rows
CPython 2.7, with memoization:    ~1.5 seconds per 1,000,000 rows

CPython 3.7, without memoization: ~4.4 seconds per 1,000,000 rows
CPython 3.7, with memoization:    ~1.7 seconds per 1,000,000 rows
This can be used to vary the formatting of query parameters based on
the active TDS protocol version.

Each of the following internal functions now accepts an optional
tds_version_tuple argument:

 _substitute_params (called by MSSQLConnection.format_sql_command)
 _quote_data (called by _substitute_params)
 _quote_or_flatten (called by _quote_data)
 _quote_simple_value (called by _quote_or_flatten and _quote_data)

(_quote_data contains an apparent bug, in that the charset is ignored
for simple values.  This is not addressed here.)
If the TDS version is 7.3 or later, convert datetime parameters to
DATETIME2 or DATETIMEOFFSET, as appropriate.  Don't do this for TDS
7.2 and earlier; both for symmetry and for backward compatibility.

The historical behavior of pymssql is that when a datetime object is
used as a query parameter, it is serialized as something like:

  {ts '2000-01-01 12:34:56.789'}

which is more-or-less equivalent to:

  CAST('2000-01-01 12:34:56.789' AS DATETIME)

In particular, the "ts" format:
- forbids specifying more than 3 digits after the decimal point;
- forbids specifying a time zone offset;
- implicitly converts the value to DATETIME; i.e., the value is
  rounded to the nearest 1/300 second.

Moreover, note that the historical behavior is to truncate values to
the next lower millisecond, rather than rounding to the nearest.  So a
value such as datetime.datetime(2000, 1, 1, 12, 34, 56, 8999) is
converted to "{ts '2000-01-01 12:34:56.008'}", which is interpreted by
the server as 2000-01-01 12:34:56.0066667.

Here, we maintain the historical behavior as long as TDS 7.2 or
earlier is used.  If TDS 7.3 or newer is used, values are instead
serialized as something like:

  CAST('2000-01-01 12:34:56.789123' AS DATETIME2)

or:

  CAST('2000-01-01 12:34:56.789123-05:00' AS DATETIMEOFFSET)

(Note that SQL Server permits implicit conversions between all of
these types; for example, inserting either of these expressions into a
DATETIME column gives the same result as the "ts" expression above.)

As a result of this change, when using TDS 7.3:

  >>> cursor.execute("SELECT %s",
  ...                datetime.datetime(2000, 1, 1, 12, 34, 56, 8999))
  >>> cursor.fetchall()
  [(datetime.datetime(2000, 1, 1, 12, 34, 56, 8999),)]

  >>> cursor.execute("SELECT %s",
  ...     datetime.datetime(2000, 1, 1, 12, 34, 56, 8999,
  ...         datetime.timezone(datetime.timedelta(hours=1))))
  >>> cursor.fetchall()
  [(datetime.datetime(2000, 1, 1, 12, 34, 56, 8999, tzinfo=MSSQLTimezone(datetime.timedelta(days=seconds=3600))),)]

Previously, or when using TDS 7.2:

  >>> cursor.execute("SELECT %s",
  ...     datetime.datetime(2000, 1, 1, 12, 34, 56, 8999))
  >>> cursor.fetchall()
  [(datetime.datetime(2000, 1, 1, 12, 34, 56, 7000),)]

  >>> cursor.execute("SELECT %s",
  ...     datetime.datetime(2000, 1, 1, 12, 34, 56, 8999,
  ...         datetime.timezone(datetime.timedelta(hours=1))))
  >>> cursor.fetchall()
  [(datetime.datetime(2000, 1, 1, 12, 34, 56, 7000),)]

This has the desirable properties that:

- clients that are talking to old versions of SQL Server continue
  working as before;

- clients that explicitly select TDS 7.2 or earlier continue working
  as before;

- clients that are talking to new versions of SQL Server, using a new
  protocol version, obtain consistent and intuitive behavior.

It has some drawbacks:

- time zone information is (still) silently lost when using TDS 7.2 or
  earlier;

- clients that automatically select the TDS protocol version will get
  different rounding behavior depending on the server version;

- clients that explicitly select TDS 7.3 will get different (better)
  rounding behavior than they had with pymssql 2.1.4.
TIME values are deserialized as datetime.time objects.  For symmetry,
datetime.time objects should also be allowed as query parameters.

Since there are no historical (pre-TDS-7.3) data types corresponding
to TIME, this always uses the TIME type, and preserves microsecond
precision.
In a test case, if two values that are expected to be equal are not,
include the representations of both values as part of the
AssertionError message, to aid in debugging.
The internal function get_sql_server_version is used by some test
cases to check which actual version of SQL Server is in use.

Previously (due to a typo), for versions newer than SQL Server 2012,
this function would return 2000.  Instead, return 2014 in that case.

This function still ought to be updated to recognize the version codes
for newer server versions.
It is useful to be able to perform tests using different TDS protocol
versions.  Make this something that can be configured explicitly in
the tests.cfg file, or alternatively using the environment variable
PYMSSQL_TEST_TDS_VERSION.
Test that it is possible to insert a datetime.time value as a query
parameter, and that the value survives a round trip.
Test that it is possible to insert an aware datetime.datetime value as
a query parameter, and that the value (including the timezone offset,
and including microsecond precision) survives a round trip.

Furthermore, test that Python's interpretation of timezone offsets
matches the interpretation on the server: using Python's "astimezone"
to translate the value into a different timezone should yield a
DATETIMEOFFSET value that is considered "equal" by the server.
Test that when a datetime.datetime or datetime.time value is used as a
query parameter (with TDS 7.3 or later), the value survives a round
trip with microsecond precision.
@bemoody
Copy link
Author

bemoody commented Oct 30, 2019

After thinking more about issue #662, I realize that it's more complicated than I thought.

The issue there is indeed an arguable bug in dbconvert and/or dbdatecrack. But my implementation has the same fundamental problem:

>>> cursor.execute("select cast('2000-01-01 23:59:59.9999999' as datetime2)")
>>> cursor.fetchall()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "src/pymssql.pyx", line 549, in pymssql.Cursor.fetchall
    rows.append(self.getrow())
  File "src/pymssql.pyx", line 500, in pymssql.Cursor.getrow
    row = next(self._source._conn.get_iterator(row_format))
  File "src/_mssql.pyx", line 453, in _mssql.MSSQLRowIterator.__next__
    return self.conn.fetch_next_row(1, self.row_format)
  File "src/_mssql.pyx", line 1213, in _mssql.MSSQLConnection.fetch_next_row
    
  File "src/_mssql.pyx", line 1385, in _mssql.MSSQLConnection.get_row
    data, col_type, len)
  File "src/_mssql.pyx", line 863, in _mssql.MSSQLConnection.convert_db_value
    microsecond = (di2.nanosecond + 500) // 1000
ValueError: microsecond must be in 0..999999

Now, for DATETIME2 and DATETIMEOFFSET, this could in principle be solved by allowing the value to overflow to the next day (i.e. '2000-01-01 23:59:59.9999999' should be rounded to '2000-01-02 00:00:00.000000'). Intuitively this feels like the correct behavior.

But that doesn't work for TIME! Rounding '23:59:59.9999999' to '00:00:00.000000' would be very wrong in most circumstances.

I can see a few possible solutions:

  • Always truncate to the microsecond rather than rounding. This creates a tiny bias for high-precision timestamps, but it's internally consistent.

  • Round DATETIME2/DATETIMEOFFSET but truncate TIME. This creates an inconsistency between the data types, but arguably nobody should be using TIME anyway if they care about such things.

  • Round TIME, unless it is greater than 23:59:59.999999, in which case truncate it.

The third option appears to be what SQL Server itself does:

> SELECT CAST('2000-01-01 23:59:59.9999999' AS DATETIME2(0))
2000-01-02 00:00:00

> SELECT CAST('23:59:59.5' AS TIME(0))
23:59:59

> SELECT CAST('23:59:59.9999999' AS TIME(0))
23:59:59

What do other people think? Which of these options is least bad?

@psukys
Copy link

psukys commented Feb 24, 2022

@bemoody can this somehow be circumvented without the patch at the moment?

@bemoody
Copy link
Author

bemoody commented Feb 25, 2022

First of all, I'm not regularly using pymssql anymore and haven't kept up with recent releases, so I don't know if anything has changed since 2.1.4.

(If there's real interest in merging these changes, I'd certainly be willing to dust the code off and try to fix the rounding issue - but this issue is no longer a high priority for me personally.)

My practical recommendation, if you need to work with date/time values using pymssql 2.1.4, would be to convert all the values into strings and make the conversions explicit in your application.

For example, if you want to query DATETIME2 or DATETIMEOFFSET values, then set tds_version='7.2' (or lower) when calling pymssql.connect. Then the values will be returned as strings, which you'll need to parse somehow (perhaps using datetime.datetime.fromisoformat.)

If you have a datetime.datetime object that you want to insert into the database, then instead of passing t as an argument to cursor.execute, use str(t) instead. This will usually do what you want - but of course, you should always test to be sure your application is really doing what you think it's doing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants