CARVIEW |
- Installation
- Documentation
- Getting Started
- Connect
- Data Import
- Overview
- Data Sources
- CSV Files
- JSON Files
- Overview
- Creating JSON
- Loading JSON
- Writing JSON
- JSON Type
- JSON Functions
- Format Settings
- Installing and Loading
- SQL to / from JSON
- Caveats
- Multiple Files
- Parquet Files
- Partitioning
- Appender
- INSERT Statements
- Client APIs
- Overview
- ADBC
- C
- Overview
- Startup
- Configuration
- Query
- Data Chunks
- Vectors
- Values
- Types
- Prepared Statements
- Appender
- Table Functions
- Replacement Scans
- API Reference
- C++
- CLI
- Dart
- Go
- Java (JDBC)
- Julia
- Node.js (Deprecated)
- Node.js (Neo)
- ODBC
- PHP
- Python
- Overview
- Data Ingestion
- Conversion between DuckDB and Python
- DB API
- Relational API
- Function API
- Types API
- Expression API
- Spark API
- API Reference
- Known Python Issues
- R
- Rust
- Swift
- Wasm
- SQL
- Introduction
- Statements
- Overview
- ANALYZE
- ALTER TABLE
- ALTER VIEW
- ATTACH and DETACH
- CALL
- CHECKPOINT
- COMMENT ON
- COPY
- CREATE INDEX
- CREATE MACRO
- CREATE SCHEMA
- CREATE SECRET
- CREATE SEQUENCE
- CREATE TABLE
- CREATE VIEW
- CREATE TYPE
- DELETE
- DESCRIBE
- DROP
- EXPORT and IMPORT DATABASE
- INSERT
- LOAD / INSTALL
- PIVOT
- Profiling
- SELECT
- SET / RESET
- SET VARIABLE
- SUMMARIZE
- Transaction Management
- UNPIVOT
- UPDATE
- USE
- VACUUM
- Query Syntax
- SELECT
- FROM and JOIN
- WHERE
- GROUP BY
- GROUPING SETS
- HAVING
- ORDER BY
- LIMIT and OFFSET
- SAMPLE
- Unnesting
- WITH
- WINDOW
- QUALIFY
- VALUES
- FILTER
- Set Operations
- Prepared Statements
- Data Types
- Overview
- Array
- Bitstring
- Blob
- Boolean
- Date
- Enum
- Interval
- List
- Literal Types
- Map
- NULL Values
- Numeric
- Struct
- Text
- Time
- Timestamp
- Time Zones
- Union
- Typecasting
- Expressions
- Overview
- CASE Expression
- Casting
- Collations
- Comparisons
- IN Operator
- Logical Operators
- Star Expression
- Subqueries
- TRY
- Functions
- Overview
- Aggregate Functions
- Array Functions
- Bitstring Functions
- Blob Functions
- Date Format Functions
- Date Functions
- Date Part Functions
- Enum Functions
- Interval Functions
- Lambda Functions
- List Functions
- Map Functions
- Nested Functions
- Numeric Functions
- Pattern Matching
- Regular Expressions
- Struct Functions
- Text Functions
- Time Functions
- Timestamp Functions
- Timestamp with Time Zone Functions
- Union Functions
- Utility Functions
- Window Functions
- Constraints
- Indexes
- Meta Queries
- DuckDB's SQL Dialect
- Overview
- Indexing
- Friendly SQL
- Keywords and Identifiers
- Order Preservation
- PostgreSQL Compatibility
- SQL Quirks
- Samples
- Configuration
- Extensions
- Overview
- Installing Extensions
- Advanced Installation Methods
- Distributing Extensions
- Versioning of Extensions
- Troubleshooting of Extensions
- Core Extensions
- Overview
- AutoComplete
- Avro
- AWS
- Azure
- Delta
- DuckLake
- Encodings
- Excel
- Full Text Search
- httpfs (HTTP and S3)
- Iceberg
- Overview
- Iceberg REST Catalogs
- Amazon S3 Tables
- Amazon SageMaker Lakehouse (AWS Glue)
- Troubleshooting
- ICU
- inet
- jemalloc
- MySQL
- PostgreSQL
- Spatial
- SQLite
- TPC-DS
- TPC-H
- UI
- VSS
- Guides
- Overview
- Data Viewers
- Database Integration
- File Formats
- Overview
- CSV Import
- CSV Export
- Directly Reading Files
- Excel Import
- Excel Export
- JSON Import
- JSON Export
- Parquet Import
- Parquet Export
- Querying Parquet Files
- File Access with the file: Protocol
- Network and Cloud Storage
- Overview
- HTTP Parquet Import
- S3 Parquet Import
- S3 Parquet Export
- S3 Iceberg Import
- S3 Express One
- GCS Import
- Cloudflare R2 Import
- DuckDB over HTTPS / S3
- Fastly Object Storage Import
- Meta Queries
- Describe Table
- EXPLAIN: Inspect Query Plans
- EXPLAIN ANALYZE: Profile Queries
- List Tables
- Summarize
- DuckDB Environment
- ODBC
- Performance
- Overview
- Environment
- Import
- Schema
- Indexing
- Join Operations
- File Formats
- How to Tune Workloads
- My Workload Is Slow
- Benchmarks
- Working with Huge Databases
- Python
- Installation
- Executing SQL
- Jupyter Notebooks
- marimo Notebooks
- SQL on Pandas
- Import from Pandas
- Export to Pandas
- Import from Numpy
- Export to Numpy
- SQL on Arrow
- Import from Arrow
- Export to Arrow
- Relational API on Pandas
- Multiple Python Threads
- Integration with Ibis
- Integration with Polars
- Using fsspec Filesystems
- SQL Editors
- SQL Features
- Snippets
- Creating Synthetic Data
- Dutch Railway Datasets
- Sharing Macros
- Analyzing a Git Repository
- Importing Duckbox Tables
- Copying an In-Memory Database to a File
- Troubleshooting
- Glossary of Terms
- Browsing Offline
- Operations Manual
- Overview
- DuckDB's Footprint
- Logging
- Securing DuckDB
- Non-Deterministic Behavior
- Limits
- Development
- DuckDB Repositories
- Profiling
- Building DuckDB
- Overview
- Build Configuration
- Building Extensions
- Android
- Linux
- macOS
- Raspberry Pi
- Windows
- Python
- R
- Troubleshooting
- Unofficial and Unsupported Platforms
- Benchmark Suite
- Testing
- Internals
- Why DuckDB
- Code of Conduct
- Release Calendar
- Roadmap
- Sitemap
- Live Demo
The standard DuckDB Python API provides a SQL interface compliant with the DB-API 2.0 specification described by PEP 249 similar to the SQLite Python API.
Connection
To use the module, you must first create a DuckDBPyConnection
object that represents a connection to a database.
This is done through the duckdb.connect
method.
The 'config' keyword argument can be used to provide a dict
that contains key->value pairs referencing settings understood by DuckDB.
In-Memory Connection
The special value :memory:
can be used to create an in-memory database. Note that for an in-memory database no data is persisted to disk (i.e., all data is lost when you exit the Python process).
Named In-memory Connections
The special value :memory:
can also be postfixed with a name, for example: :memory:conn3
.
When a name is provided, subsequent duckdb.connect
calls will create a new connection to the same database, sharing the catalogs (views, tables, macros etc..).
Using :memory:
without a name will always create a new and separate database instance.
Default Connection
By default we create an (unnamed) in-memory-database that lives inside the duckdb
module.
Every method of DuckDBPyConnection
is also available on the duckdb
module, this connection is what's used by these methods.
The special value :default:
can be used to get this default connection.
File-Based Connection
If the database
is a file path, a connection to a persistent database is established.
If the file does not exist the file will be created (the extension of the file is irrelevant and can be .db
, .duckdb
or anything else).
read_only
Connections
If you would like to connect in read-only mode, you can set the read_only
flag to True
. If the file does not exist, it is not created when connecting in read-only mode.
Read-only mode is required if multiple Python processes want to access the same database file at the same time.
import duckdb
duckdb.execute("CREATE TABLE tbl AS SELECT 42 a")
con = duckdb.connect(":default:")
con.sql("SELECT * FROM tbl")
# or
duckdb.default_connection().sql("SELECT * FROM tbl")
┌───────┐
│ a │
│ int32 │
├───────┤
│ 42 │
└───────┘
import duckdb
# to start an in-memory database
con = duckdb.connect(database = ":memory:")
# to use a database file (not shared between processes)
con = duckdb.connect(database = "my-db.duckdb", read_only = False)
# to use a database file (shared between processes)
con = duckdb.connect(database = "my-db.duckdb", read_only = True)
# to explicitly get the default connection
con = duckdb.connect(database = ":default:")
If you want to create a second connection to an existing database, you can use the cursor()
method. This might be useful for example to allow parallel threads running queries independently. A single connection is thread-safe but is locked for the duration of the queries, effectively serializing database access in this case.
Connections are closed implicitly when they go out of scope or if they are explicitly closed using close()
. Once the last connection to a database instance is closed, the database instance is closed as well.
Querying
SQL queries can be sent to DuckDB using the execute()
method of connections. Once a query has been executed, results can be retrieved using the fetchone
and fetchall
methods on the connection. fetchall
will retrieve all results and complete the transaction. fetchone
will retrieve a single row of results each time that it is invoked until no more results are available. The transaction will only close once fetchone
is called and there are no more results remaining (the return value will be None
). As an example, in the case of a query only returning a single row, fetchone
should be called once to retrieve the results and a second time to close the transaction. Below are some short examples:
# create a table
con.execute("CREATE TABLE items (item VARCHAR, value DECIMAL(10, 2), count INTEGER)")
# insert two items into the table
con.execute("INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2)")
# retrieve the items again
con.execute("SELECT * FROM items")
print(con.fetchall())
# [('jeans', Decimal('20.00'), 1), ('hammer', Decimal('42.20'), 2)]
# retrieve the items one at a time
con.execute("SELECT * FROM items")
print(con.fetchone())
# ('jeans', Decimal('20.00'), 1)
print(con.fetchone())
# ('hammer', Decimal('42.20'), 2)
print(con.fetchone()) # This closes the transaction. Any subsequent calls to .fetchone will return None
# None
The description
property of the connection object contains the column names as per the standard.
Prepared Statements
DuckDB also supports prepared statements in the API with the execute
and executemany
methods. The values may be passed as an additional parameter after a query that contains ?
or $1
(dollar symbol and a number) placeholders. Using the ?
notation adds the values in the same sequence as passed within the Python parameter. Using the $
notation allows for values to be reused within the SQL statement based on the number and index of the value found within the Python parameter. Values are converted according to the conversion rules.
Here are some examples. First, insert a row using a prepared statement:
con.execute("INSERT INTO items VALUES (?, ?, ?)", ["laptop", 2000, 1])
Second, insert several rows using a prepared statement:
con.executemany("INSERT INTO items VALUES (?, ?, ?)", [["chainsaw", 500, 10], ["iphone", 300, 2]] )
Query the database using a prepared statement:
con.execute("SELECT item FROM items WHERE value > ?", [400])
print(con.fetchall())
[('laptop',), ('chainsaw',)]
Query using the $
notation for a prepared statement and reused values:
con.execute("SELECT $1, $1, $2", ["duck", "goose"])
print(con.fetchall())
[('duck', 'duck', 'goose')]
Warning Do not use
executemany
to insert large amounts of data into DuckDB. See the data ingestion page for better options.
Named Parameters
Besides the standard unnamed parameters, like $1
, $2
etc., it's also possible to supply named parameters, like $my_parameter
.
When using named parameters, you have to provide a dictionary mapping of str
to value in the parameters
argument.
An example use is the following:
import duckdb
res = duckdb.execute("""
SELECT
$my_param,
$other_param,
$also_param
""",
{
"my_param": 5,
"other_param": "DuckDB",
"also_param": [42]
}
).fetchall()
print(res)
[(5, 'DuckDB', [42])]