Skip to content

Support type casting in ODF #494

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mauropagano opened this issue May 15, 2025 · 2 comments
Open

Support type casting in ODF #494

mauropagano opened this issue May 15, 2025 · 2 comments
Labels
enhancement New feature or request

Comments

@mauropagano
Copy link

  1. Describe your new request in detail
    This is already kind of known, filing as separate issue so it's easy to track down.

It's currently not possible to enforce a schema on top of OracleArrowArray, example below.

oracle_df = conn.fetch_df_all(statement="select 1 n1 from dual", arraysize=2)
df = pa.Table.from_arrays(arrays=oracle_df.column_arrays(), schema=pa.schema([("n1", pa.int8())]))

  File "pyarrow/table.pxi", line 4893, in pyarrow.lib.Table.from_arrays
  File "pyarrow/table.pxi", line 1622, in pyarrow.lib._sanitize_arrays
  File "pyarrow/array.pxi", line 405, in pyarrow.lib.asarray
  File "pyarrow/array.pxi", line 273, in pyarrow.lib.array
  File "src/oracledb/interchange/nanoarrow_bridge.pyx", line 500, in oracledb.interchange.nanoarrow_bridge.OracleArrowArray.__arrow_c_array__
NotImplementedError: requested_schema

This would be especially useful when casting to ints or date32, there are ways around it but they are a bit annoying (and slow).

  1. Give supporting information about tools and operating systems. Give relevant product version numbers
    Not applicable
@mauropagano mauropagano added the enhancement New feature or request label May 15, 2025
@aosingh
Copy link
Member

aosingh commented May 16, 2025

@mauropagano

I will check how to support schema requests.

From what I see in the Arrow's spec:

If the caller requests a schema that is not compatible with the data, say requesting a schema with a different number of fields, the callee should raise an exception. The requested schema mechanism is only meant to negotiate between different representations of the same data and not to allow arbitrary schema transformations.

I guess, for compatibility, we need to map each arrow data type to their alternative representations (if they support multiple representations of the same data)

@mauropagano
Copy link
Author

I think the spec makes sense, I wouldn't expect some sort of magic projection (i.e. provide a subset of columns) to work here.

This is more to make it easy to disambiguate in those cases where the Oracle format is rather generous or lacking.

For the generous side, think NUMBER that can map to any number (no pun intended) of things and you need to enforce 0 precision to get a int64 now. Even that is quite wasteful when you could use something smaller if you know the data.

For the lacking side, imagine DATE in Oracle that you want to map to date32 because you know there is no time of day component. I'm not sure there is a way at all to specify something in db to "dictionary encode" that your date stops at day (you can do a check constraint on date = trunc(date) but that's not a datatype) and now you got a timestamp in Arrow land that is I believe 2x the size, plus all the fun associated with it :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants