DuckDB (native vs SQLAlchemy)

DuckDB (native vs SQLAlchemy)#

Beginning in 0.9, JupySQL supports DuckDB via a native connection and SQLAlchemy, both with comparable performance. JupySQL adds a small overhead; however, this overhead is constant.

At the moment, the only difference is that some features are only available when using SQLAlchemy.

Performance comparison (pandas)#

import pandas as pd
import numpy as np
num_rows = 1_000_000
num_cols = 100
df = pd.DataFrame(np.random.randn(num_rows, num_cols))

Raw DuckDB#

import duckdb
conn = duckdb.connect()

%%timeit
conn.execute("SELECT * FROM df").df()

1.23 s ± 60 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

DuckDB + SQLALchemy#

%load_ext sql
%config SqlMagic.autopandas = True
%config SqlMagic.displaycon = False
%sql duckdb:// --alias duckdb-sqlalchemy

Connecting to 'duckdb-sqlalchemy'

%%timeit
_ = %sql SELECT * FROM df

1.18 s ± 5.87 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

DuckDB + native#

%sql conn --alias duckdb-native

%%timeit
_ = %sql SELECT * FROM df

1.16 s ± 3.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Performance comparison (polars)#

%config SqlMagic.autopolars = True
%sql duckdb-sqlalchemy

Disabled 'autopandas' since 'autopolars' was enabled.

Switching to connection 'duckdb-sqlalchemy'

Raw DuckDB#

%%timeit
conn.execute("SELECT * FROM df").pl()

632 ms ± 2.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

DuckDB + SQLAlchemy#

%%timeit
_ = %sql SELECT * FROM df

612 ms ± 2.91 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

DuckDB + native#

%sql duckdb-native

Switching to connection 'duckdb-native'

%%timeit
_ = %sql SELECT * FROM df

603 ms ± 3.88 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Limitations of using native connections#

As of version 0.9.0, the only caveat is that %sqlcmd won’t work with a native connection.

%sqlcmd

UsageError: Missing argument for %sqlcmd. Valid commands are: tables, columns, test, profile, explore, snippets, connect

DuckDB (native vs SQLAlchemy)

Contents

DuckDB (native vs SQLAlchemy)#

Performance comparison (pandas)#

Raw DuckDB#

DuckDB + SQLALchemy#

DuckDB + native#

Performance comparison (polars)#

Raw DuckDB#

DuckDB + SQLAlchemy#

DuckDB + native#

Limitations of using native connections#