CARVIEW |
- Installation
- Documentation
- Getting Started
- Connect
- Data Import
- Overview
- Data Sources
- CSV Files
- JSON Files
- Overview
- Creating JSON
- Loading JSON
- Writing JSON
- JSON Type
- JSON Functions
- Format Settings
- Installing and Loading
- SQL to / from JSON
- Caveats
- Multiple Files
- Parquet Files
- Partitioning
- Appender
- INSERT Statements
- Client APIs
- Overview
- ADBC
- C
- Overview
- Startup
- Configuration
- Query
- Data Chunks
- Vectors
- Values
- Types
- Prepared Statements
- Appender
- Table Functions
- Replacement Scans
- API Reference
- C++
- CLI
- Dart
- Go
- Java (JDBC)
- Julia
- Node.js (Deprecated)
- Node.js (Neo)
- ODBC
- PHP
- Python
- Overview
- Data Ingestion
- Conversion between DuckDB and Python
- DB API
- Relational API
- Function API
- Types API
- Expression API
- Spark API
- API Reference
- Known Python Issues
- R
- Rust
- Swift
- Wasm
- SQL
- Introduction
- Statements
- Overview
- ANALYZE
- ALTER TABLE
- ALTER VIEW
- ATTACH and DETACH
- CALL
- CHECKPOINT
- COMMENT ON
- COPY
- CREATE INDEX
- CREATE MACRO
- CREATE SCHEMA
- CREATE SECRET
- CREATE SEQUENCE
- CREATE TABLE
- CREATE VIEW
- CREATE TYPE
- DELETE
- DESCRIBE
- DROP
- EXPORT and IMPORT DATABASE
- INSERT
- LOAD / INSTALL
- PIVOT
- Profiling
- SELECT
- SET / RESET
- SET VARIABLE
- SUMMARIZE
- Transaction Management
- UNPIVOT
- UPDATE
- USE
- VACUUM
- Query Syntax
- SELECT
- FROM and JOIN
- WHERE
- GROUP BY
- GROUPING SETS
- HAVING
- ORDER BY
- LIMIT and OFFSET
- SAMPLE
- Unnesting
- WITH
- WINDOW
- QUALIFY
- VALUES
- FILTER
- Set Operations
- Prepared Statements
- Data Types
- Overview
- Array
- Bitstring
- Blob
- Boolean
- Date
- Enum
- Interval
- List
- Literal Types
- Map
- NULL Values
- Numeric
- Struct
- Text
- Time
- Timestamp
- Time Zones
- Union
- Typecasting
- Expressions
- Overview
- CASE Expression
- Casting
- Collations
- Comparisons
- IN Operator
- Logical Operators
- Star Expression
- Subqueries
- TRY
- Functions
- Overview
- Aggregate Functions
- Array Functions
- Bitstring Functions
- Blob Functions
- Date Format Functions
- Date Functions
- Date Part Functions
- Enum Functions
- Interval Functions
- Lambda Functions
- List Functions
- Map Functions
- Nested Functions
- Numeric Functions
- Pattern Matching
- Regular Expressions
- Struct Functions
- Text Functions
- Time Functions
- Timestamp Functions
- Timestamp with Time Zone Functions
- Union Functions
- Utility Functions
- Window Functions
- Constraints
- Indexes
- Meta Queries
- DuckDB's SQL Dialect
- Overview
- Indexing
- Friendly SQL
- Keywords and Identifiers
- Order Preservation
- PostgreSQL Compatibility
- SQL Quirks
- Samples
- Configuration
- Extensions
- Overview
- Installing Extensions
- Advanced Installation Methods
- Distributing Extensions
- Versioning of Extensions
- Troubleshooting of Extensions
- Core Extensions
- Overview
- AutoComplete
- Avro
- AWS
- Azure
- Delta
- DuckLake
- Encodings
- Excel
- Full Text Search
- httpfs (HTTP and S3)
- Iceberg
- Overview
- Iceberg REST Catalogs
- Amazon S3 Tables
- Amazon SageMaker Lakehouse (AWS Glue)
- Troubleshooting
- ICU
- inet
- jemalloc
- MySQL
- PostgreSQL
- Spatial
- SQLite
- TPC-DS
- TPC-H
- UI
- VSS
- Guides
- Overview
- Data Viewers
- Database Integration
- File Formats
- Overview
- CSV Import
- CSV Export
- Directly Reading Files
- Excel Import
- Excel Export
- JSON Import
- JSON Export
- Parquet Import
- Parquet Export
- Querying Parquet Files
- File Access with the file: Protocol
- Network and Cloud Storage
- Overview
- HTTP Parquet Import
- S3 Parquet Import
- S3 Parquet Export
- S3 Iceberg Import
- S3 Express One
- GCS Import
- Cloudflare R2 Import
- DuckDB over HTTPS / S3
- Fastly Object Storage Import
- Meta Queries
- Describe Table
- EXPLAIN: Inspect Query Plans
- EXPLAIN ANALYZE: Profile Queries
- List Tables
- Summarize
- DuckDB Environment
- ODBC
- Performance
- Overview
- Environment
- Import
- Schema
- Indexing
- Join Operations
- File Formats
- How to Tune Workloads
- My Workload Is Slow
- Benchmarks
- Working with Huge Databases
- Python
- Installation
- Executing SQL
- Jupyter Notebooks
- marimo Notebooks
- SQL on Pandas
- Import from Pandas
- Export to Pandas
- Import from Numpy
- Export to Numpy
- SQL on Arrow
- Import from Arrow
- Export to Arrow
- Relational API on Pandas
- Multiple Python Threads
- Integration with Ibis
- Integration with Polars
- Using fsspec Filesystems
- SQL Editors
- SQL Features
- Snippets
- Creating Synthetic Data
- Dutch Railway Datasets
- Sharing Macros
- Analyzing a Git Repository
- Importing Duckbox Tables
- Copying an In-Memory Database to a File
- Troubleshooting
- Glossary of Terms
- Browsing Offline
- Operations Manual
- Overview
- DuckDB's Footprint
- Logging
- Securing DuckDB
- Non-Deterministic Behavior
- Limits
- Development
- DuckDB Repositories
- Profiling
- Building DuckDB
- Overview
- Build Configuration
- Building Extensions
- Android
- Linux
- macOS
- Raspberry Pi
- Windows
- Python
- R
- Troubleshooting
- Unofficial and Unsupported Platforms
- Benchmark Suite
- Testing
- Internals
- Why DuckDB
- Code of Conduct
- Release Calendar
- Roadmap
- Sitemap
- Live Demo
DuckDB offers several advanced SQL features and syntactic sugar to make SQL queries more concise. We refer to these colloquially as “friendly SQL”.
Several of these features are also supported in other systems while some are (currently) exclusive to DuckDB.
Clauses
- Creating tables and inserting data:
CREATE OR REPLACE TABLE
: avoidDROP TABLE IF EXISTS
statements in scripts.CREATE TABLE ... AS SELECT
(CTAS): create a new table from the output of a table without manually defining a schema.INSERT INTO ... BY NAME
: this variant of theINSERT
statement allows using column names instead of positions.INSERT OR IGNORE INTO ...
: insert the rows that do not result in a conflict due toUNIQUE
orPRIMARY KEY
constraints.INSERT OR REPLACE INTO ...
: insert the rows that do not result in a conflict due toUNIQUE
orPRIMARY KEY
constraints. For those that result in a conflict, replace the columns of the existing row to the new values of the to-be-inserted row.
- Describing tables and computing statistics:
- Making SQL clauses more compact and readable:
FROM
-first syntax with an optionalSELECT
clause: DuckDB allows queries in the form ofFROM tbl
which selects all columns (performing aSELECT *
statement).GROUP BY ALL
: omit the group-by columns by inferring them from the list of attributes in theSELECT
clause.ORDER BY ALL
: shorthand to order on all columns (e.g., to ensure deterministic results).SELECT * EXCLUDE
: theEXCLUDE
option allows excluding specific columns from the*
expression.SELECT * REPLACE
: theREPLACE
option allows replacing specific columns with different expressions in a*
expression.UNION BY NAME
: perform theUNION
operation along the names of columns (instead of relying on positions).- Prefix aliases in the
SELECT
andFROM
clauses: writex: 42
instead of42 AS x
for improved readability. - Specifying a percentage of the table size for the
LIMIT
clause: writeLIMIT 10%
to return 10% of the query results.
- Transforming tables:
- Defining SQL-level variables:
Query Features
- Column aliases in
WHERE
,GROUP BY
, andHAVING
. (Note that column aliases cannot be used in theON
clause ofJOIN
clauses.) COLUMNS()
expression can be used to execute the same expression on multiple columns:- Reusable column aliases (also known as “lateral column aliases”), e.g.:
SELECT i + 1 AS j, j + 2 AS k FROM range(0, 3) t(i)
- Advanced aggregation features for analytical (OLAP) queries:
count()
shorthand forcount(*)
IN
operator for lists and maps- Specifying column names for common table expressions (
WITH
) - Specifying column names in the
JOIN
clause - Using
VALUES
in theJOIN
clause - Using
VALUES
in the anchor part of common table expressions
Literals and Identifiers
- Case-insensitivity while maintaining case of entities in the catalog
- Deduplicating identifiers
- Underscores as digit separators in numeric literals
Data Types
Data Import
- Auto-detecting the headers and schema of CSV files
- Directly querying CSV files and Parquet files
- Replacement scans:
- You can load from files using the syntax
FROM 'my.csv'
,FROM 'my.csv.gz'
,FROM 'my.parquet'
, etc. - In Python, you can access Pandas data frames using
FROM df
.
- You can load from files using the syntax
- Filename expansion (globbing), e.g.:
FROM 'my-data/part-*.parquet'
Functions and Expressions
- Dot operator for function chaining:
SELECT ('hello').upper()
- String formatters:
the
format()
function with thefmt
syntax and theprintf() function
- List comprehensions
- List slicing and indexing from the back (
[-1]
) - String slicing
STRUCT.*
notation- Creating
LIST
using square brackets - Simple
LIST
andSTRUCT
creation - Updating the schema of
STRUCT
s
Join Types
Trailing Commas
DuckDB allows trailing commas,
both when listing entities (e.g., column and table names) and when constructing LIST
items.
For example, the following query works:
SELECT
42 AS x,
['a', 'b', 'c',] AS y,
'hello world' AS z,
;
"Top-N in Group" Queries
Computing the "top-N rows in a group" ordered by some criteria is a common task in SQL that unfortunately often requires a complex query involving window functions and/or subqueries.
To aid in this, DuckDB provides the aggregate functions max(arg, n)
, min(arg, n)
, arg_max(arg, val, n)
, arg_min(arg, val, n)
, max_by(arg, val, n)
and min_by(arg, val, n)
to efficiently return the "top" n
rows in a group based on a specific column in either ascending or descending order.
For example, let's use the following table:
SELECT * FROM t1;
┌─────────┬───────┐
│ grp │ val │
│ varchar │ int32 │
├─────────┼───────┤
│ a │ 2 │
│ a │ 1 │
│ b │ 5 │
│ b │ 4 │
│ a │ 3 │
│ b │ 6 │
└─────────┴───────┘
We want to get a list of the top-3 val
values in each group grp
. The conventional way to do this is to use a window function in a subquery:
SELECT array_agg(rs.val), rs.grp
FROM
(SELECT val, grp, row_number() OVER (PARTITION BY grp ORDER BY val DESC) AS rid
FROM t1 ORDER BY val DESC) AS rs
WHERE rid < 4
GROUP BY rs.grp;
┌───────────────────┬─────────┐
│ array_agg(rs.val) │ grp │
│ int32[] │ varchar │
├───────────────────┼─────────┤
│ [3, 2, 1] │ a │
│ [6, 5, 4] │ b │
└───────────────────┴─────────┘
But in DuckDB, we can do this much more concisely (and efficiently!):
SELECT max(val, 3) FROM t1 GROUP BY grp;
┌─────────────┐
│ max(val, 3) │
│ int32[] │
├─────────────┤
│ [3, 2, 1] │
│ [6, 5, 4] │
└─────────────┘
Related Blog Posts
- “Friendlier SQL with DuckDB” blog post
- “Even Friendlier SQL with DuckDB” blog post
- “SQL Gymnastics: Bending SQL into Flexible New Shapes” blog post