Skip to content

sqlite_bro experiment with duck_db #1257

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
stonebig opened this issue Aug 27, 2023 · 3 comments
Open

sqlite_bro experiment with duck_db #1257

stonebig opened this issue Aug 27, 2023 · 3 comments

Comments

@stonebig
Copy link
Contributor

stonebig commented Aug 27, 2023

with duckdb-0.8.0

pydef py_sin(s:int) ->float:
    "sinus function : example loading module, handling input/output as strings"
    import math as py_math
    return ( py_math.sin(s*1));
WITH RECURSIVE
  cnt(x) AS (
     SELECT 1
     UNION ALL
     SELECT x+1 FROM cnt
      where x < 300000
  )

select sum(py_sin(x))  from cnt 

takes 130 seconds, while 0.4 seconds with SQLite

WITH RECURSIVE
  cnt(x) AS (
     SELECT 1
     UNION ALL
     SELECT x+1 FROM cnt
      where x < 300000 
  )

select sum(sin(x))  from cnt 

takes also 110 seconds, while 0.1 second with SQLite integrated math

@stonebig
Copy link
Contributor Author

this takes 1 second (or less) on duckdB:
select sum(sin(x)) from range(1 ,300001)cnt(x)

so 10 seconds are due to not liking "recursive" CTE in a 300 000 loop (30 000 loop /second limit)

Then there is a big slowdown in using UDF, versus SQLite.
( 3 000 loop/second limit, versus 300 000 loop/second for SQLite)

not using "recursive" CTE and not using UDF (in this test) makes it quick like SQLite

@stonebig
Copy link
Contributor Author

stonebig commented Sep 3, 2023

when duckdb gets less bad at UDF, it will be an interesting change
becoming fluent with massive datas makes tcl/tk a problem.

@stonebig
Copy link
Contributor Author

This takes 10 seconds to duckdb

pydef py_sin(s:int) ->float:
    "sinus function : example loading module, handling input/output as strings"
    import math as py_math
    return ( py_math.sin(s*1));

WITH  
  cnt(x) AS (
     SELECT * from generate_series (1, 30000 )
  )

select sum(py_sin(x))  from cnt 
`````


This takes 0.1 second to duckdb
`````
WITH cnt(x)  AS (select * from generate_series(1, 3000000)  )

select sum(sin(x))  from cnt ;
````

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant