Skip to content

Conversation

tswast
Copy link
Collaborator

@tswast tswast commented Mar 21, 2022

feat: dbdate and dbtime support numpy.datetime64 values

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Towards #28 🦕

@tswast tswast requested a review from a team as a code owner March 21, 2022 17:15
@tswast tswast requested review from a team and stephaniewang526 March 21, 2022 17:15
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-db-dtypes-pandas API. label Mar 21, 2022
@tswast
Copy link
Collaborator Author

tswast commented Mar 21, 2022

Without this change, we get the following test failures:

7 failed, 268 passed in 1.99s
(dev-3.9) ➜  python-db-dtypes-pandas git:(issue28-set-item) ✗ pytest tests/unit
===================================================== test session starts ======================================================
platform darwin -- Python 3.9.5, pytest-6.2.5, py-1.10.0, pluggy-0.13.1
rootdir: /Users/swast/src/github.com/googleapis/python-db-dtypes-pandas
plugins: cov-2.12.1, asyncio-0.15.1, anyio-3.3.0, requests-mock-1.9.3
collected 275 items                                                                                                            

tests/unit/test_arrow.py ...............................................................                                 [ 22%]
tests/unit/test_date.py ..........F..............F....FFFF...................                                            [ 42%]
tests/unit/test_dtypes.py .............................................................................................. [ 76%]
......................                                                                                                   [ 84%]
tests/unit/test_time.py ..................F........................                                                      [100%]

=========================================================== FAILURES ===========================================================
_____________________________________________ test_date_parsing[value7-expected7] ______________________________________________

value = numpy.datetime64('2012-02-29'), expected = datetime.date(2012, 2, 29)

    @pytest.mark.parametrize("value, expected", VALUE_PARSING_TEST_CASES)
    def test_date_parsing(value, expected):
>       assert pandas.Series([value], dtype="dbdate")[0] == expected

tests/unit/test_date.py:90: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/series.py:451: in __init__
    data = sanitize_array(data, index, dtype, copy)
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/construction.py:591: in sanitize_array
    subarr = _try_cast(data, dtype, copy, raise_cast_failure)
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/construction.py:754: in _try_cast
    subarr = array_type(arr, dtype=dtype, copy=copy)
db_dtypes/core.py:73: in _from_sequence
    return cls(cls.__ndarray(scalars))
db_dtypes/core.py:67: in __ndarray
    return numpy.array([cls._datetime(scalar) for scalar in scalars], "M8[ns]",)
db_dtypes/core.py:67: in <listcomp>
    return numpy.array([cls._datetime(scalar) for scalar in scalars], "M8[ns]",)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

scalar = numpy.datetime64('2012-02-29'), match_fn = <built-in method match of re.Pattern object at 0x19b588800>

    @staticmethod
    def _datetime(
        scalar,
        match_fn=re.compile(r"\s*(?P<year>\d+)-(?P<month>\d+)-(?P<day>\d+)\s*$").match,
    ) -> Optional[numpy.datetime64]:
        # Convert pyarrow values to datetime.date.
        if isinstance(scalar, (pyarrow.Date32Scalar, pyarrow.Date64Scalar)):
            scalar = scalar.as_py()
    
        if pandas.isna(scalar):
            return None
        elif isinstance(scalar, datetime.date):
            return pandas.Timestamp(
                year=scalar.year, month=scalar.month, day=scalar.day
            ).to_datetime64()
        elif isinstance(scalar, str):
            match = match_fn(scalar)
            if not match:
                raise ValueError(f"Bad date string: {repr(scalar)}")
            year = int(match.group("year"))
            month = int(match.group("month"))
            day = int(match.group("day"))
            return pandas.Timestamp(year=year, month=month, day=day).to_datetime64()
        else:
>           raise TypeError("Invalid value type", scalar)
E           TypeError: ('Invalid value type', numpy.datetime64('2012-02-29'))

db_dtypes/__init__.py:260: TypeError
_____________________________________________ test_date_set_item[value7-expected7] _____________________________________________

self = 0    NaT
dtype: dbdate, key = 0, value = numpy.datetime64('2012-02-29')

    def __setitem__(self, key, value) -> None:
        check_deprecated_indexers(key)
        key = com.apply_if_callable(key, self)
        cacher_needs_updating = self._check_is_chained_assignment_possible()
    
        if key is Ellipsis:
            key = slice(None)
    
        if isinstance(key, slice):
            indexer = self.index._convert_slice_indexer(key, kind="getitem")
            return self._set_values(indexer, value)
    
        try:
>           self._set_with_engine(key, value)

/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/series.py:1085: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = 0    NaT
dtype: dbdate, key = 0, value = numpy.datetime64('2012-02-29')

    def _set_with_engine(self, key, value) -> None:
        loc = self.index.get_loc(key)
    
        # this is equivalent to self._values[key] = value
>       self._mgr.setitem_inplace(loc, value)

/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/series.py:1149: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = SingleBlockManager
Items: RangeIndex(start=0, stop=1, step=1)
ExtensionBlock: 1 dtype: dbdate, indexer = 0
value = numpy.datetime64('2012-02-29')

    def setitem_inplace(self, indexer, value) -> None:
        """
        Set values with indexer.
    
        For Single[Block/Array]Manager, this backs s[indexer] = value
    
        This is an inplace version of `setitem()`, mutating the manager/values
        in place, not returning a new Manager (and Block), and thus never changing
        the dtype.
        """
        arr = self.array
    
        # EAs will do this validation in their own __setitem__ methods.
        if isinstance(arr, np.ndarray):
            # Note: checking for ndarray instead of np.dtype means we exclude
            #  dt64/td64, which do their own validation.
            value = np_can_hold_element(arr.dtype, value)
    
>       arr[indexer] = value

/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/internals/base.py:190: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <DateArray>
[NaT]
Length: 1, dtype: dbdate, key = 0, value = numpy.datetime64('2012-02-29')

    def __setitem__(self, key, value):
        if is_list_like(value):
            _datetime = self._datetime
            value = [_datetime(v) for v in value]
        elif not pandas.isna(value):
>           value = self._datetime(value)

db_dtypes/core.py:108: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

scalar = numpy.datetime64('2012-02-29'), match_fn = <built-in method match of re.Pattern object at 0x19b588800>

    @staticmethod
    def _datetime(
        scalar,
        match_fn=re.compile(r"\s*(?P<year>\d+)-(?P<month>\d+)-(?P<day>\d+)\s*$").match,
    ) -> Optional[numpy.datetime64]:
        # Convert pyarrow values to datetime.date.
        if isinstance(scalar, (pyarrow.Date32Scalar, pyarrow.Date64Scalar)):
            scalar = scalar.as_py()
    
        if pandas.isna(scalar):
            return None
        elif isinstance(scalar, datetime.date):
            return pandas.Timestamp(
                year=scalar.year, month=scalar.month, day=scalar.day
            ).to_datetime64()
        elif isinstance(scalar, str):
            match = match_fn(scalar)
            if not match:
                raise ValueError(f"Bad date string: {repr(scalar)}")
            year = int(match.group("year"))
            month = int(match.group("month"))
            day = int(match.group("day"))
            return pandas.Timestamp(year=year, month=month, day=day).to_datetime64()
        else:
>           raise TypeError("Invalid value type", scalar)
E           TypeError: ('Invalid value type', numpy.datetime64('2012-02-29'))

db_dtypes/__init__.py:260: TypeError

During handling of the above exception, another exception occurred:

value = numpy.datetime64('2012-02-29'), expected = datetime.date(2012, 2, 29)

    @pytest.mark.parametrize("value, expected", VALUE_PARSING_TEST_CASES)
    def test_date_set_item(value, expected):
        series = pandas.Series([None], dtype="dbdate")
>       series[0] = value

tests/unit/test_date.py:101: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/series.py:1140: in __setitem__
    self._set_with(key, value)
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/series.py:1167: in _set_with
    self._set_labels(key, value)
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/series.py:1179: in _set_labels
    self._set_values(indexer, value)
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/series.py:1185: in _set_values
    self._mgr = self._mgr.setitem(indexer=key, value=value)
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/internals/managers.py:337: in setitem
    return self.apply("setitem", indexer=indexer, value=value)
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/internals/managers.py:304: in apply
    applied = getattr(b, f)(**kwargs)
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/internals/blocks.py:1604: in setitem
    self.values[indexer] = value
db_dtypes/core.py:108: in __setitem__
    value = self._datetime(value)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

scalar = numpy.datetime64('2012-02-29'), match_fn = <built-in method match of re.Pattern object at 0x19b588800>

    @staticmethod
    def _datetime(
        scalar,
        match_fn=re.compile(r"\s*(?P<year>\d+)-(?P<month>\d+)-(?P<day>\d+)\s*$").match,
    ) -> Optional[numpy.datetime64]:
        # Convert pyarrow values to datetime.date.
        if isinstance(scalar, (pyarrow.Date32Scalar, pyarrow.Date64Scalar)):
            scalar = scalar.as_py()
    
        if pandas.isna(scalar):
            return None
        elif isinstance(scalar, datetime.date):
            return pandas.Timestamp(
                year=scalar.year, month=scalar.month, day=scalar.day
            ).to_datetime64()
        elif isinstance(scalar, str):
            match = match_fn(scalar)
            if not match:
                raise ValueError(f"Bad date string: {repr(scalar)}")
            year = int(match.group("year"))
            month = int(match.group("month"))
            day = int(match.group("day"))
            return pandas.Timestamp(year=year, month=month, day=day).to_datetime64()
        else:
>           raise TypeError("Invalid value type", scalar)
E           TypeError: ('Invalid value type', numpy.datetime64('2012-02-29'))

db_dtypes/__init__.py:260: TypeError
_______________________________________________ test_date_set_item_null[value1] ________________________________________________

self = 0    1970-01-01
dtype: dbdate, key = 0, value = NaT

    def __setitem__(self, key, value) -> None:
        check_deprecated_indexers(key)
        key = com.apply_if_callable(key, self)
        cacher_needs_updating = self._check_is_chained_assignment_possible()
    
        if key is Ellipsis:
            key = slice(None)
    
        if isinstance(key, slice):
            indexer = self.index._convert_slice_indexer(key, kind="getitem")
            return self._set_values(indexer, value)
    
        try:
>           self._set_with_engine(key, value)

/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/series.py:1085: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = 0    1970-01-01
dtype: dbdate, key = 0, value = NaT

    def _set_with_engine(self, key, value) -> None:
        loc = self.index.get_loc(key)
    
        # this is equivalent to self._values[key] = value
>       self._mgr.setitem_inplace(loc, value)

/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/series.py:1149: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = SingleBlockManager
Items: RangeIndex(start=0, stop=1, step=1)
ExtensionBlock: 1 dtype: dbdate, indexer = 0, value = NaT

    def setitem_inplace(self, indexer, value) -> None:
        """
        Set values with indexer.
    
        For Single[Block/Array]Manager, this backs s[indexer] = value
    
        This is an inplace version of `setitem()`, mutating the manager/values
        in place, not returning a new Manager (and Block), and thus never changing
        the dtype.
        """
        arr = self.array
    
        # EAs will do this validation in their own __setitem__ methods.
        if isinstance(arr, np.ndarray):
            # Note: checking for ndarray instead of np.dtype means we exclude
            #  dt64/td64, which do their own validation.
            value = np_can_hold_element(arr.dtype, value)
    
>       arr[indexer] = value

/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/internals/base.py:190: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <DateArray>
[datetime.date(1970, 1, 1)]
Length: 1, dtype: dbdate, key = 0, value = NaT

    def __setitem__(self, key, value):
        if is_list_like(value):
            _datetime = self._datetime
            value = [_datetime(v) for v in value]
        elif not pandas.isna(value):
            value = self._datetime(value)
>       return super().__setitem__(key, value)

db_dtypes/core.py:109: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <DateArray>
[datetime.date(1970, 1, 1)]
Length: 1, dtype: dbdate, key = 0, value = NaT

    def __setitem__(self, key, value):
        key = check_array_indexer(self, key)
        value = self._validate_setitem_value(value)
>       self._ndarray[key] = value
E       ValueError: cannot convert float NaN to integer

/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/arrays/_mixins.py:250: ValueError

During handling of the above exception, another exception occurred:

value = NaT

    @pytest.mark.parametrize("value", NULL_VALUE_TEST_CASES)
    def test_date_set_item_null(value):
        series = pandas.Series(["1970-01-01"], dtype="dbdate")
>       series[0] = value

tests/unit/test_date.py:108: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/series.py:1104: in __setitem__
    self.loc[key] = value
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/indexing.py:716: in __setitem__
    iloc._setitem_with_indexer(indexer, value, self.name)
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/indexing.py:1690: in _setitem_with_indexer
    self._setitem_single_block(indexer, value, name)
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/indexing.py:1938: in _setitem_single_block
    self.obj._mgr = self.obj._mgr.setitem(indexer=indexer, value=value)
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/internals/managers.py:337: in setitem
    return self.apply("setitem", indexer=indexer, value=value)
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/internals/managers.py:304: in apply
    applied = getattr(b, f)(**kwargs)
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/internals/blocks.py:1604: in setitem
    self.values[indexer] = value
db_dtypes/core.py:109: in __setitem__
    return super().__setitem__(key, value)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <DateArray>
[datetime.date(1970, 1, 1)]
Length: 1, dtype: dbdate, key = 0, value = NaT

    def __setitem__(self, key, value):
        key = check_array_indexer(self, key)
        value = self._validate_setitem_value(value)
>       self._ndarray[key] = value
E       ValueError: cannot convert float NaN to integer

/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/arrays/_mixins.py:250: ValueError
_________________________________________________ test_date_set_item_null[nan] _________________________________________________

self = 0    1970-01-01
dtype: dbdate, key = 0, value = nan

    def __setitem__(self, key, value) -> None:
        check_deprecated_indexers(key)
        key = com.apply_if_callable(key, self)
        cacher_needs_updating = self._check_is_chained_assignment_possible()
    
        if key is Ellipsis:
            key = slice(None)
    
        if isinstance(key, slice):
            indexer = self.index._convert_slice_indexer(key, kind="getitem")
            return self._set_values(indexer, value)
    
        try:
>           self._set_with_engine(key, value)

/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/series.py:1085: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = 0    1970-01-01
dtype: dbdate, key = 0, value = nan

    def _set_with_engine(self, key, value) -> None:
        loc = self.index.get_loc(key)
    
        # this is equivalent to self._values[key] = value
>       self._mgr.setitem_inplace(loc, value)

/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/series.py:1149: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = SingleBlockManager
Items: RangeIndex(start=0, stop=1, step=1)
ExtensionBlock: 1 dtype: dbdate, indexer = 0, value = nan

    def setitem_inplace(self, indexer, value) -> None:
        """
        Set values with indexer.
    
        For Single[Block/Array]Manager, this backs s[indexer] = value
    
        This is an inplace version of `setitem()`, mutating the manager/values
        in place, not returning a new Manager (and Block), and thus never changing
        the dtype.
        """
        arr = self.array
    
        # EAs will do this validation in their own __setitem__ methods.
        if isinstance(arr, np.ndarray):
            # Note: checking for ndarray instead of np.dtype means we exclude
            #  dt64/td64, which do their own validation.
            value = np_can_hold_element(arr.dtype, value)
    
>       arr[indexer] = value

/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/internals/base.py:190: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <DateArray>
[datetime.date(1970, 1, 1)]
Length: 1, dtype: dbdate, key = 0, value = nan

    def __setitem__(self, key, value):
        if is_list_like(value):
            _datetime = self._datetime
            value = [_datetime(v) for v in value]
        elif not pandas.isna(value):
            value = self._datetime(value)
>       return super().__setitem__(key, value)

db_dtypes/core.py:109: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <DateArray>
[datetime.date(1970, 1, 1)]
Length: 1, dtype: dbdate, key = 0, value = nan

    def __setitem__(self, key, value):
        key = check_array_indexer(self, key)
        value = self._validate_setitem_value(value)
>       self._ndarray[key] = value
E       ValueError: Could not convert object to NumPy datetime

/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/arrays/_mixins.py:250: ValueError

During handling of the above exception, another exception occurred:

value = nan

    @pytest.mark.parametrize("value", NULL_VALUE_TEST_CASES)
    def test_date_set_item_null(value):
        series = pandas.Series(["1970-01-01"], dtype="dbdate")
>       series[0] = value

tests/unit/test_date.py:108: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/series.py:1104: in __setitem__
    self.loc[key] = value
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/indexing.py:716: in __setitem__
    iloc._setitem_with_indexer(indexer, value, self.name)
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/indexing.py:1690: in _setitem_with_indexer
    self._setitem_single_block(indexer, value, name)
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/indexing.py:1938: in _setitem_single_block
    self.obj._mgr = self.obj._mgr.setitem(indexer=indexer, value=value)
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/internals/managers.py:337: in setitem
    return self.apply("setitem", indexer=indexer, value=value)
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/internals/managers.py:304: in apply
    applied = getattr(b, f)(**kwargs)
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/internals/blocks.py:1604: in setitem
    self.values[indexer] = value
db_dtypes/core.py:109: in __setitem__
    return super().__setitem__(key, value)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <DateArray>
[datetime.date(1970, 1, 1)]
Length: 1, dtype: dbdate, key = 0, value = nan

    def __setitem__(self, key, value):
        key = check_array_indexer(self, key)
        value = self._validate_setitem_value(value)
>       self._ndarray[key] = value
E       ValueError: Could not convert object to NumPy datetime

/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/arrays/_mixins.py:250: ValueError
_______________________________________________ test_date_set_item_null[value3] ________________________________________________

self = 0    1970-01-01
dtype: dbdate, key = 0, value = <NA>

    def __setitem__(self, key, value) -> None:
        check_deprecated_indexers(key)
        key = com.apply_if_callable(key, self)
        cacher_needs_updating = self._check_is_chained_assignment_possible()
    
        if key is Ellipsis:
            key = slice(None)
    
        if isinstance(key, slice):
            indexer = self.index._convert_slice_indexer(key, kind="getitem")
            return self._set_values(indexer, value)
    
        try:
>           self._set_with_engine(key, value)

/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/series.py:1085: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = 0    1970-01-01
dtype: dbdate, key = 0, value = <NA>

    def _set_with_engine(self, key, value) -> None:
        loc = self.index.get_loc(key)
    
        # this is equivalent to self._values[key] = value
>       self._mgr.setitem_inplace(loc, value)

/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/series.py:1149: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = SingleBlockManager
Items: RangeIndex(start=0, stop=1, step=1)
ExtensionBlock: 1 dtype: dbdate, indexer = 0, value = <NA>

    def setitem_inplace(self, indexer, value) -> None:
        """
        Set values with indexer.
    
        For Single[Block/Array]Manager, this backs s[indexer] = value
    
        This is an inplace version of `setitem()`, mutating the manager/values
        in place, not returning a new Manager (and Block), and thus never changing
        the dtype.
        """
        arr = self.array
    
        # EAs will do this validation in their own __setitem__ methods.
        if isinstance(arr, np.ndarray):
            # Note: checking for ndarray instead of np.dtype means we exclude
            #  dt64/td64, which do their own validation.
            value = np_can_hold_element(arr.dtype, value)
    
>       arr[indexer] = value

/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/internals/base.py:190: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <DateArray>
[datetime.date(1970, 1, 1)]
Length: 1, dtype: dbdate, key = 0, value = <NA>

    def __setitem__(self, key, value):
        if is_list_like(value):
            _datetime = self._datetime
            value = [_datetime(v) for v in value]
        elif not pandas.isna(value):
            value = self._datetime(value)
>       return super().__setitem__(key, value)

db_dtypes/core.py:109: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <DateArray>
[datetime.date(1970, 1, 1)]
Length: 1, dtype: dbdate, key = 0, value = <NA>

    def __setitem__(self, key, value):
        key = check_array_indexer(self, key)
        value = self._validate_setitem_value(value)
>       self._ndarray[key] = value
E       ValueError: Could not convert object to NumPy datetime

/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/arrays/_mixins.py:250: ValueError

During handling of the above exception, another exception occurred:

value = <NA>

    @pytest.mark.parametrize("value", NULL_VALUE_TEST_CASES)
    def test_date_set_item_null(value):
        series = pandas.Series(["1970-01-01"], dtype="dbdate")
>       series[0] = value

tests/unit/test_date.py:108: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/series.py:1104: in __setitem__
    self.loc[key] = value
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/indexing.py:716: in __setitem__
    iloc._setitem_with_indexer(indexer, value, self.name)
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/indexing.py:1690: in _setitem_with_indexer
    self._setitem_single_block(indexer, value, name)
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/indexing.py:1938: in _setitem_single_block
    self.obj._mgr = self.obj._mgr.setitem(indexer=indexer, value=value)
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/internals/managers.py:337: in setitem
    return self.apply("setitem", indexer=indexer, value=value)
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/internals/managers.py:304: in apply
    applied = getattr(b, f)(**kwargs)
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/internals/blocks.py:1604: in setitem
    self.values[indexer] = value
db_dtypes/core.py:109: in __setitem__
    return super().__setitem__(key, value)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <DateArray>
[datetime.date(1970, 1, 1)]
Length: 1, dtype: dbdate, key = 0, value = <NA>

    def __setitem__(self, key, value):
        key = check_array_indexer(self, key)
        value = self._validate_setitem_value(value)
>       self._ndarray[key] = value
E       ValueError: Could not convert object to NumPy datetime

/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/arrays/_mixins.py:250: ValueError
_____________________________________________________ test_date_set_slice ______________________________________________________

    def test_date_set_slice():
        series = pandas.Series([None, None, None], dtype="dbdate")
>       series[:] = [
            datetime.date(2022, 3, 21),
            "2011-12-13",
            numpy.datetime64("1998-09-04"),
        ]

tests/unit/test_date.py:114: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/series.py:1082: in __setitem__
    return self._set_values(indexer, value)
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/series.py:1185: in _set_values
    self._mgr = self._mgr.setitem(indexer=key, value=value)
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/internals/managers.py:337: in setitem
    return self.apply("setitem", indexer=indexer, value=value)
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/internals/managers.py:304: in apply
    applied = getattr(b, f)(**kwargs)
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/internals/blocks.py:1604: in setitem
    self.values[indexer] = value
db_dtypes/core.py:106: in __setitem__
    value = [_datetime(v) for v in value]
db_dtypes/core.py:106: in <listcomp>
    value = [_datetime(v) for v in value]
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

scalar = numpy.datetime64('1998-09-04'), match_fn = <built-in method match of re.Pattern object at 0x19b588800>

    @staticmethod
    def _datetime(
        scalar,
        match_fn=re.compile(r"\s*(?P<year>\d+)-(?P<month>\d+)-(?P<day>\d+)\s*$").match,
    ) -> Optional[numpy.datetime64]:
        # Convert pyarrow values to datetime.date.
        if isinstance(scalar, (pyarrow.Date32Scalar, pyarrow.Date64Scalar)):
            scalar = scalar.as_py()
    
        if pandas.isna(scalar):
            return None
        elif isinstance(scalar, datetime.date):
            return pandas.Timestamp(
                year=scalar.year, month=scalar.month, day=scalar.day
            ).to_datetime64()
        elif isinstance(scalar, str):
            match = match_fn(scalar)
            if not match:
                raise ValueError(f"Bad date string: {repr(scalar)}")
            year = int(match.group("year"))
            month = int(match.group("month"))
            day = int(match.group("day"))
            return pandas.Timestamp(year=year, month=month, day=day).to_datetime64()
        else:
>           raise TypeError("Invalid value type", scalar)
E           TypeError: ('Invalid value type', numpy.datetime64('1998-09-04'))

db_dtypes/__init__.py:260: TypeError
____________________________________________ test_time_parsing[value17-expected17] _____________________________________________

value = numpy.datetime64('1970-01-01T00:00:59.876543'), expected = datetime.time(0, 0, 59, 876543)

    @pytest.mark.parametrize(
        "value, expected",
        [
            # Midnight
            ("0", datetime.time(0)),
            ("0:0", datetime.time(0)),
            ("0:0:0", datetime.time(0)),
            ("0:0:0.", datetime.time(0)),
            ("0:0:0.0", datetime.time(0)),
            ("0:0:0.000000", datetime.time(0)),
            ("00:00:00", datetime.time(0, 0, 0)),
            ("  00:00:00  ", datetime.time(0, 0, 0)),
            # Short values
            ("1", datetime.time(1)),
            ("23", datetime.time(23)),
            ("1:2", datetime.time(1, 2)),
            ("23:59", datetime.time(23, 59)),
            ("1:2:3", datetime.time(1, 2, 3)),
            ("23:59:59", datetime.time(23, 59, 59)),
            # Non-octal values.
            ("08:08:08", datetime.time(8, 8, 8)),
            ("09:09:09", datetime.time(9, 9, 9)),
            # Fractional seconds can cause rounding problems if cast to float. See:
            # https://github.com/googleapis/python-db-dtypes-pandas/issues/18
            ("0:0:59.876543", datetime.time(0, 0, 59, 876543)),
            (
                numpy.datetime64("1970-01-01 00:00:59.876543"),
                datetime.time(0, 0, 59, 876543),
            ),
            ("01:01:01.010101", datetime.time(1, 1, 1, 10101)),
            (pandas.Timestamp("1970-01-01 01:01:01.010101"), datetime.time(1, 1, 1, 10101)),
            ("09:09:09.090909", datetime.time(9, 9, 9, 90909)),
            (datetime.time(9, 9, 9, 90909), datetime.time(9, 9, 9, 90909)),
            ("11:11:11.111111", datetime.time(11, 11, 11, 111111)),
            ("19:16:23.987654", datetime.time(19, 16, 23, 987654)),
            # Microsecond precision
            ("00:00:00.000001", datetime.time(0, 0, 0, 1)),
            ("23:59:59.999999", datetime.time(23, 59, 59, 999_999)),
            # TODO: Support nanosecond precision values without truncation.
            # https://github.com/googleapis/python-db-dtypes-pandas/issues/19
            ("0:0:0.000001001", datetime.time(0, 0, 0, 1)),
            ("23:59:59.999999000", datetime.time(23, 59, 59, 999_999)),
            ("23:59:59.999999999", datetime.time(23, 59, 59, 999_999)),
        ],
    )
    def test_time_parsing(value, expected):
>       assert pandas.Series([value], dtype="dbtime")[0] == expected

tests/unit/test_time.py:97: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/series.py:451: in __init__
    data = sanitize_array(data, index, dtype, copy)
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/construction.py:591: in sanitize_array
    subarr = _try_cast(data, dtype, copy, raise_cast_failure)
/usr/local/Caskroom/miniconda/base/envs/dev-3.9/lib/python3.9/site-packages/pandas/core/construction.py:754: in _try_cast
    subarr = array_type(arr, dtype=dtype, copy=copy)
db_dtypes/core.py:73: in _from_sequence
    return cls(cls.__ndarray(scalars))
db_dtypes/core.py:67: in __ndarray
    return numpy.array([cls._datetime(scalar) for scalar in scalars], "M8[ns]",)
db_dtypes/core.py:67: in <listcomp>
    return numpy.array([cls._datetime(scalar) for scalar in scalars], "M8[ns]",)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'db_dtypes.TimeArray'>, scalar = numpy.datetime64('1970-01-01T00:00:59.876543')
match_fn = <built-in method match of re.Pattern object at 0x7fe5cd2360f0>

    @classmethod
    def _datetime(
        cls,
        scalar,
        match_fn=re.compile(
            r"\s*(?P<hours>\d+)"
            r"(?::(?P<minutes>\d+)"
            r"(?::(?P<seconds>\d+)"
            r"(?:\.(?P<fraction>\d*))?)?)?\s*$"
        ).match,
    ) -> Optional[numpy.datetime64]:
        # Convert pyarrow values to datetime.time.
        if isinstance(scalar, (pyarrow.Time32Scalar, pyarrow.Time64Scalar)):
            scalar = (
                scalar.cast(pyarrow.time64("ns"))
                .cast(pyarrow.int64())
                .cast(pyarrow.timestamp("ns"))
                .as_py()
            )
    
        if pandas.isna(scalar):
            return None
        if isinstance(scalar, datetime.time):
            return pandas.Timestamp(
                year=1970,
                month=1,
                day=1,
                hour=scalar.hour,
                minute=scalar.minute,
                second=scalar.second,
                microsecond=scalar.microsecond,
            ).to_datetime64()
        elif isinstance(scalar, pandas.Timestamp):
            return scalar.to_datetime64()
        elif isinstance(scalar, str):
            # iso string
            parsed = match_fn(scalar)
            if not parsed:
                raise ValueError(f"Bad time string: {repr(scalar)}")
    
            hour = parsed.group("hours")
            minute = parsed.group("minutes")
            second = parsed.group("seconds")
            fraction = parsed.group("fraction")
            nanosecond = int(fraction.ljust(9, "0")[:9]) if fraction else 0
            return pandas.Timestamp(
                year=1970,
                month=1,
                day=1,
                hour=int(hour),
                minute=int(minute) if minute else 0,
                second=int(second) if second else 0,
                nanosecond=nanosecond,
            ).to_datetime64()
        else:
>           raise TypeError("Invalid value type", scalar)
E           TypeError: ('Invalid value type', numpy.datetime64('1970-01-01T00:00:59.876543'))

db_dtypes/__init__.py:153: TypeError
=================================================== short test summary info ====================================================
FAILED tests/unit/test_date.py::test_date_parsing[value7-expected7] - TypeError: ('Invalid value type', numpy.datetime64('201...
FAILED tests/unit/test_date.py::test_date_set_item[value7-expected7] - TypeError: ('Invalid value type', numpy.datetime64('20...
FAILED tests/unit/test_date.py::test_date_set_item_null[value1] - ValueError: cannot convert float NaN to integer
FAILED tests/unit/test_date.py::test_date_set_item_null[nan] - ValueError: Could not convert object to NumPy datetime
FAILED tests/unit/test_date.py::test_date_set_item_null[value3] - ValueError: Could not convert object to NumPy datetime
FAILED tests/unit/test_date.py::test_date_set_slice - TypeError: ('Invalid value type', numpy.datetime64('1998-09-04'))
FAILED tests/unit/test_time.py::test_time_parsing[value17-expected17] - TypeError: ('Invalid value type', numpy.datetime64('1..

@tswast tswast changed the title fix: dbdate and dbtime support set item fix: dbdate and dbtime support set item will null values Mar 21, 2022
@@ -121,6 +113,16 @@ def _validate_scalar(self, value):
"""
return self._datetime(value)

def _validate_setitem_value(self, value):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Per pandas-dev/pandas#45544 (comment) this is a required override, and will be documented as such in that PR. We had masked the need for this before with the __setitem__ override.

@tswast tswast merged commit 1db1357 into main Mar 21, 2022
@tswast tswast deleted the issue28-set-item branch March 21, 2022 20:19
gcf-merge-on-green bot pushed a commit that referenced this pull request Mar 24, 2022
🤖 I have created a release *beep* *boop*
---


## [0.4.0](v0.3.1...v0.4.0) (2022-03-24)


### ⚠ BREAKING CHANGES

* * fix: address failing compliance tests in DateArray and TimeArray
* * fix: address failing compliance tests in DateArray and TimeArray
* * fix: address failing compliance tests in DateArray and TimeArray
* * fix: address failing compliance tests in DateArray and TimeArray
* * fix: address failing compliance tests in DateArray and TimeArray
* * fix: address failing compliance tests in DateArray and TimeArray
* dbdate and dbtime dtypes return NaT instead of None for missing values

### Features

* dbdate and dbtime support numpy.datetime64 values in array constructor ([1db1357](1db1357))


### Bug Fixes

* address failing 2D array compliance tests  in DateArray ([#64](#64)) ([b771e05](b771e05))
* address failing tests with pandas 1.5.0 ([#82](#82)) ([38ac28d](38ac28d))
* allow comparison with scalar values ([#88](#88)) ([7495698](7495698))
* avoid TypeError when using sorted search ([#84](#84)) ([42bc2d9](42bc2d9))
* correct TypeError and comparison issues discovered in DateArray compliance tests ([#79](#79)) ([1e979cf](1e979cf))
* dbdate and dbtime support set item with null values ([#85](#85)) ([1db1357](1db1357))
* use `pandas.NaT` for missing values in dbdate and dbtime dtypes ([#67](#67)) ([f903c2c](f903c2c))
* use public pandas APIs where possible ([#60](#60)) ([e9d41d1](e9d41d1))


### Tests

* add dbtime compliance tests ([#90](#90)) ([f14fb2b](f14fb2b))
* add final dbdate compliance tests and sort ([#89](#89)) ([efe7e6d](efe7e6d))

---
This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-db-dtypes-pandas API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants