CARVIEW |
- Installation
- Documentation
- Getting Started
- Connect
- Data Import
- Overview
- Data Sources
- CSV Files
- JSON Files
- Overview
- Creating JSON
- Loading JSON
- Writing JSON
- JSON Type
- JSON Functions
- Format Settings
- Installing and Loading
- SQL to / from JSON
- Caveats
- Multiple Files
- Parquet Files
- Partitioning
- Appender
- INSERT Statements
- Client APIs
- Overview
- ADBC
- C
- Overview
- Startup
- Configuration
- Query
- Data Chunks
- Vectors
- Values
- Types
- Prepared Statements
- Appender
- Table Functions
- Replacement Scans
- API Reference
- C++
- CLI
- Dart
- Go
- Java (JDBC)
- Julia
- Node.js (Deprecated)
- Node.js (Neo)
- ODBC
- PHP
- Python
- Overview
- Data Ingestion
- Conversion between DuckDB and Python
- DB API
- Relational API
- Function API
- Types API
- Expression API
- Spark API
- API Reference
- Known Python Issues
- R
- Rust
- Swift
- Wasm
- SQL
- Introduction
- Statements
- Overview
- ANALYZE
- ALTER TABLE
- ALTER VIEW
- ATTACH and DETACH
- CALL
- CHECKPOINT
- COMMENT ON
- COPY
- CREATE INDEX
- CREATE MACRO
- CREATE SCHEMA
- CREATE SECRET
- CREATE SEQUENCE
- CREATE TABLE
- CREATE VIEW
- CREATE TYPE
- DELETE
- DESCRIBE
- DROP
- EXPORT and IMPORT DATABASE
- INSERT
- LOAD / INSTALL
- PIVOT
- Profiling
- SELECT
- SET / RESET
- SET VARIABLE
- SUMMARIZE
- Transaction Management
- UNPIVOT
- UPDATE
- USE
- VACUUM
- Query Syntax
- SELECT
- FROM and JOIN
- WHERE
- GROUP BY
- GROUPING SETS
- HAVING
- ORDER BY
- LIMIT and OFFSET
- SAMPLE
- Unnesting
- WITH
- WINDOW
- QUALIFY
- VALUES
- FILTER
- Set Operations
- Prepared Statements
- Data Types
- Overview
- Array
- Bitstring
- Blob
- Boolean
- Date
- Enum
- Interval
- List
- Literal Types
- Map
- NULL Values
- Numeric
- Struct
- Text
- Time
- Timestamp
- Time Zones
- Union
- Typecasting
- Expressions
- Overview
- CASE Expression
- Casting
- Collations
- Comparisons
- IN Operator
- Logical Operators
- Star Expression
- Subqueries
- TRY
- Functions
- Overview
- Aggregate Functions
- Array Functions
- Bitstring Functions
- Blob Functions
- Date Format Functions
- Date Functions
- Date Part Functions
- Enum Functions
- Interval Functions
- Lambda Functions
- List Functions
- Map Functions
- Nested Functions
- Numeric Functions
- Pattern Matching
- Regular Expressions
- Struct Functions
- Text Functions
- Time Functions
- Timestamp Functions
- Timestamp with Time Zone Functions
- Union Functions
- Utility Functions
- Window Functions
- Constraints
- Indexes
- Meta Queries
- DuckDB's SQL Dialect
- Overview
- Indexing
- Friendly SQL
- Keywords and Identifiers
- Order Preservation
- PostgreSQL Compatibility
- SQL Quirks
- Samples
- Configuration
- Extensions
- Overview
- Installing Extensions
- Advanced Installation Methods
- Distributing Extensions
- Versioning of Extensions
- Troubleshooting of Extensions
- Core Extensions
- Overview
- AutoComplete
- Avro
- AWS
- Azure
- Delta
- DuckLake
- Encodings
- Excel
- Full Text Search
- httpfs (HTTP and S3)
- Iceberg
- Overview
- Iceberg REST Catalogs
- Amazon S3 Tables
- Amazon SageMaker Lakehouse (AWS Glue)
- Troubleshooting
- ICU
- inet
- jemalloc
- MySQL
- PostgreSQL
- Spatial
- SQLite
- TPC-DS
- TPC-H
- UI
- VSS
- Guides
- Overview
- Data Viewers
- Database Integration
- File Formats
- Overview
- CSV Import
- CSV Export
- Directly Reading Files
- Excel Import
- Excel Export
- JSON Import
- JSON Export
- Parquet Import
- Parquet Export
- Querying Parquet Files
- File Access with the file: Protocol
- Network and Cloud Storage
- Overview
- HTTP Parquet Import
- S3 Parquet Import
- S3 Parquet Export
- S3 Iceberg Import
- S3 Express One
- GCS Import
- Cloudflare R2 Import
- DuckDB over HTTPS / S3
- Fastly Object Storage Import
- Meta Queries
- Describe Table
- EXPLAIN: Inspect Query Plans
- EXPLAIN ANALYZE: Profile Queries
- List Tables
- Summarize
- DuckDB Environment
- ODBC
- Performance
- Overview
- Environment
- Import
- Schema
- Indexing
- Join Operations
- File Formats
- How to Tune Workloads
- My Workload Is Slow
- Benchmarks
- Working with Huge Databases
- Python
- Installation
- Executing SQL
- Jupyter Notebooks
- marimo Notebooks
- SQL on Pandas
- Import from Pandas
- Export to Pandas
- Import from Numpy
- Export to Numpy
- SQL on Arrow
- Import from Arrow
- Export to Arrow
- Relational API on Pandas
- Multiple Python Threads
- Integration with Ibis
- Integration with Polars
- Using fsspec Filesystems
- SQL Editors
- SQL Features
- Snippets
- Creating Synthetic Data
- Dutch Railway Datasets
- Sharing Macros
- Analyzing a Git Repository
- Importing Duckbox Tables
- Copying an In-Memory Database to a File
- Troubleshooting
- Glossary of Terms
- Browsing Offline
- Operations Manual
- Overview
- DuckDB's Footprint
- Logging
- Securing DuckDB
- Non-Deterministic Behavior
- Limits
- Development
- DuckDB Repositories
- Profiling
- Building DuckDB
- Overview
- Build Configuration
- Building Extensions
- Android
- Linux
- macOS
- Raspberry Pi
- Windows
- Python
- R
- Troubleshooting
- Unofficial and Unsupported Platforms
- Benchmark Suite
- Testing
- Internals
- Why DuckDB
- Code of Conduct
- Release Calendar
- Roadmap
- Sitemap
- Live Demo
There are four separate approaches to pattern matching provided by DuckDB:
the traditional SQL LIKE
operator,
the more recent SIMILAR TO
operator (added in SQL:1999),
a GLOB
operator,
and POSIX-style regular expressions.
LIKE
The LIKE
expression returns true
if the string matches the supplied pattern. (As expected, the NOT LIKE
expression returns false
if LIKE
returns true
, and vice versa. An equivalent expression is NOT (string LIKE pattern)
.)
If pattern does not contain percent signs or underscores, then the pattern only represents the string itself; in that case LIKE
acts like the equals operator. An underscore (_
) in pattern stands for (matches) any single character; a percent sign (%
) matches any sequence of zero or more characters.
LIKE
pattern matching always covers the entire string. Therefore, if it's desired to match a sequence anywhere within a string, the pattern must start and end with a percent sign.
Some examples:
SELECT 'abc' LIKE 'abc'; -- true
SELECT 'abc' LIKE 'a%' ; -- true
SELECT 'abc' LIKE '_b_'; -- true
SELECT 'abc' LIKE 'c'; -- false
SELECT 'abc' LIKE 'c%' ; -- false
SELECT 'abc' LIKE '%c'; -- true
SELECT 'abc' NOT LIKE '%c'; -- false
The keyword ILIKE
can be used instead of LIKE
to make the match case-insensitive according to the active locale:
SELECT 'abc' ILIKE '%C'; -- true
SELECT 'abc' NOT ILIKE '%C'; -- false
To search within a string for a character that is a wildcard (%
or _
), the pattern must use an ESCAPE
clause and an escape character to indicate the wildcard should be treated as a literal character instead of a wildcard. See an example below.
Additionally, the function like_escape
has the same functionality as a LIKE
expression with an ESCAPE
clause, but using function syntax. See the Text Functions Docs for details.
Search for strings with 'a' then a literal percent sign then 'c':
SELECT 'a%c' LIKE 'a$%c' ESCAPE '$'; -- true
SELECT 'azc' LIKE 'a$%c' ESCAPE '$'; -- false
Case-insensitive ILIKE with ESCAPE:
SELECT 'A%c' ILIKE 'a$%c' ESCAPE '$'; -- true
There are also alternative characters that can be used as keywords in place of LIKE
expressions. These enhance PostgreSQL compatibility.
PostgreSQL-style | LIKE -style |
---|---|
~~ |
LIKE |
!~~ |
NOT LIKE |
~~* |
ILIKE |
!~~* |
NOT ILIKE |
SIMILAR TO
The SIMILAR TO
operator returns true or false depending on whether its pattern matches the given string. It is similar to LIKE
, except that it interprets the pattern using a regular expression. Like LIKE
, the SIMILAR TO
operator succeeds only if its pattern matches the entire string; this is unlike common regular expression behavior where the pattern can match any part of the string.
A regular expression is a character sequence that is an abbreviated definition of a set of strings (a regular set). A string is said to match a regular expression if it is a member of the regular set described by the regular expression. As with LIKE
, pattern characters match string characters exactly unless they are special characters in the regular expression language — but regular expressions use different special characters than LIKE
does.
Some examples:
SELECT 'abc' SIMILAR TO 'abc'; -- true
SELECT 'abc' SIMILAR TO 'a'; -- false
SELECT 'abc' SIMILAR TO '.*(b|d).*'; -- true
SELECT 'abc' SIMILAR TO '(b|c).*'; -- false
SELECT 'abc' NOT SIMILAR TO 'abc'; -- false
In PostgreSQL,
~
is equivalent toSIMILAR TO
and!~
is equivalent toNOT SIMILAR TO
. In DuckDB, these equivalences do not hold currently, see the PostgreSQL compatibility page.
Globbing
DuckDB supports file name expansion, also known as globbing, for discovering files.
DuckDB's glob syntax uses the question mark (?
) wildcard to match any single character and the asterisk (*
) to match zero or more characters.
In addition, you can use the bracket syntax ([...]
) to match any single character contained within the brackets, or within the character range specified by the brackets. An exclamation mark (!
) may be used inside the first bracket to search for a character that is not contained within the brackets.
To learn more, visit the “glob (programming)” Wikipedia page.
GLOB
The GLOB
operator returns true
or false
if the string matches the GLOB
pattern. The GLOB
operator is most commonly used when searching for filenames that follow a specific pattern (for example a specific file extension).
Some examples:
SELECT 'best.txt' GLOB '*.txt'; -- true
SELECT 'best.txt' GLOB '????.txt'; -- true
SELECT 'best.txt' GLOB '?.txt'; -- false
SELECT 'best.txt' GLOB '[abc]est.txt'; -- true
SELECT 'best.txt' GLOB '[a-z]est.txt'; -- true
The bracket syntax is case-sensitive:
SELECT 'Best.txt' GLOB '[a-z]est.txt'; -- false
SELECT 'Best.txt' GLOB '[a-zA-Z]est.txt'; -- true
The !
applies to all characters within the brackets:
SELECT 'Best.txt' GLOB '[!a-zA-Z]est.txt'; -- false
To negate a GLOB operator, negate the entire expression:
SELECT NOT 'best.txt' GLOB '*.txt'; -- false
Three tildes (~~~
) may also be used in place of the GLOB
keyword.
GLOB-style | Symbolic-style |
---|---|
GLOB |
~~~ |
Glob Function to Find Filenames
The glob pattern matching syntax can also be used to search for filenames using the glob
table function.
It accepts one parameter: the path to search (which may include glob patterns).
Search the current directory for all files:
SELECT * FROM glob('*');
file |
---|
duckdb.exe |
test.csv |
test.json |
test.parquet |
test2.csv |
test2.parquet |
todos.json |
Globbing Semantics
DuckDB's globbing implementation follows the semantics of Python's glob
and not the glob
used in the shell.
A notable difference is the behavior of the **/
construct: **/filename
will not return a file with filename
in top-level directory.
For example, with a README.md
file present in the directory, the following query finds it:
SELECT * FROM glob('README.md');
file |
---|
README.md |
However, the following query returns an empty result:
SELECT * FROM glob('**/README.md');
Meanwhile, the globbing of Bash, Zsh, etc. finds the file using the same syntax:
ls **/README.md
README.md
Regular Expressions
DuckDB's regular expression support is documented on the Regular Expressions page. DuckDB supports some PostgreSQL-style operators for regular expression matching:
PostgreSQL-style | Equivalent expression |
---|---|
~ |
regexp_full_match |
!~ |
NOT regexp_full_match |
~* |
(not supported) |
!~* |
(not supported) |