CARVIEW |
- Installation
- Documentation
- Getting Started
- Connect
- Data Import
- Overview
- Data Sources
- CSV Files
- JSON Files
- Overview
- Creating JSON
- Loading JSON
- Writing JSON
- JSON Type
- JSON Functions
- Format Settings
- Installing and Loading
- SQL to / from JSON
- Caveats
- Multiple Files
- Parquet Files
- Partitioning
- Appender
- INSERT Statements
- Client APIs
- Overview
- Tertiary Clients
- ADBC
- C
- Overview
- Startup
- Configuration
- Query
- Data Chunks
- Vectors
- Values
- Types
- Prepared Statements
- Appender
- Table Functions
- Replacement Scans
- API Reference
- C++
- CLI
- Overview
- Arguments
- Dot Commands
- Output Formats
- Editing
- Safe Mode
- Autocomplete
- Syntax Highlighting
- Known Issues
- Dart
- Go
- Java (JDBC)
- Julia
- Node.js (Deprecated)
- Node.js (Neo)
- ODBC
- PHP
- Python
- Overview
- Data Ingestion
- Conversion between DuckDB and Python
- DB API
- Relational API
- Function API
- Types API
- Expression API
- Spark API
- API Reference
- Known Python Issues
- R
- Rust
- Swift
- Wasm
- SQL
- Introduction
- Statements
- Overview
- ANALYZE
- ALTER DATABASE
- ALTER TABLE
- ALTER VIEW
- ATTACH and DETACH
- CALL
- CHECKPOINT
- COMMENT ON
- COPY
- CREATE INDEX
- CREATE MACRO
- CREATE SCHEMA
- CREATE SECRET
- CREATE SEQUENCE
- CREATE TABLE
- CREATE VIEW
- CREATE TYPE
- DELETE
- DESCRIBE
- DROP
- EXPORT and IMPORT DATABASE
- INSERT
- LOAD / INSTALL
- MERGE INTO
- PIVOT
- Profiling
- SELECT
- SET / RESET
- SET VARIABLE
- SUMMARIZE
- Transaction Management
- UNPIVOT
- UPDATE
- USE
- VACUUM
- Query Syntax
- SELECT
- FROM and JOIN
- WHERE
- GROUP BY
- GROUPING SETS
- HAVING
- ORDER BY
- LIMIT and OFFSET
- SAMPLE
- Unnesting
- WITH
- WINDOW
- QUALIFY
- VALUES
- FILTER
- Set Operations
- Prepared Statements
- Data Types
- Overview
- Array
- Bitstring
- Blob
- Boolean
- Date
- Enum
- Interval
- List
- Literal Types
- Map
- NULL Values
- Numeric
- Struct
- Text
- Time
- Timestamp
- Time Zones
- Union
- Typecasting
- Expressions
- Overview
- CASE Expression
- Casting
- Collations
- Comparisons
- IN Operator
- Logical Operators
- Star Expression
- Subqueries
- TRY
- Functions
- Overview
- Aggregate Functions
- Array Functions
- Bitstring Functions
- Blob Functions
- Date Format Functions
- Date Functions
- Date Part Functions
- Enum Functions
- Interval Functions
- Lambda Functions
- List Functions
- Map Functions
- Nested Functions
- Numeric Functions
- Pattern Matching
- Regular Expressions
- Struct Functions
- Text Functions
- Time Functions
- Timestamp Functions
- Timestamp with Time Zone Functions
- Union Functions
- Utility Functions
- Window Functions
- Constraints
- Indexes
- Meta Queries
- DuckDB's SQL Dialect
- Overview
- Indexing
- Friendly SQL
- Keywords and Identifiers
- Order Preservation
- PostgreSQL Compatibility
- SQL Quirks
- Samples
- Configuration
- Extensions
- Overview
- Installing Extensions
- Advanced Installation Methods
- Distributing Extensions
- Versioning of Extensions
- Troubleshooting of Extensions
- Core Extensions
- Overview
- AutoComplete
- Avro
- AWS
- Azure
- Delta
- DuckLake
- Encodings
- Excel
- Full Text Search
- httpfs (HTTP and S3)
- Iceberg
- Overview
- Iceberg REST Catalogs
- Amazon S3 Tables
- Amazon SageMaker Lakehouse (AWS Glue)
- Troubleshooting
- ICU
- inet
- jemalloc
- MySQL
- PostgreSQL
- Spatial
- SQLite
- TPC-DS
- TPC-H
- UI
- VSS
- Guides
- Overview
- Data Viewers
- Database Integration
- File Formats
- Overview
- CSV Import
- CSV Export
- Directly Reading Files
- Excel Import
- Excel Export
- JSON Import
- JSON Export
- Parquet Import
- Parquet Export
- Querying Parquet Files
- File Access with the file: Protocol
- Network and Cloud Storage
- Overview
- HTTP Parquet Import
- S3 Parquet Import
- S3 Parquet Export
- S3 Iceberg Import
- S3 Express One
- GCS Import
- Cloudflare R2 Import
- DuckDB over HTTPS / S3
- Fastly Object Storage Import
- Meta Queries
- Describe Table
- EXPLAIN: Inspect Query Plans
- EXPLAIN ANALYZE: Profile Queries
- List Tables
- Summarize
- DuckDB Environment
- ODBC
- Performance
- Overview
- Environment
- Import
- Schema
- Indexing
- Join Operations
- File Formats
- How to Tune Workloads
- My Workload Is Slow
- Benchmarks
- Working with Huge Databases
- Python
- Installation
- Executing SQL
- Jupyter Notebooks
- marimo Notebooks
- SQL on Pandas
- Import from Pandas
- Export to Pandas
- Import from Numpy
- Export to Numpy
- SQL on Arrow
- Import from Arrow
- Export to Arrow
- Relational API on Pandas
- Multiple Python Threads
- Integration with Ibis
- Integration with Polars
- Using fsspec Filesystems
- SQL Editors
- SQL Features
- Snippets
- Creating Synthetic Data
- Dutch Railway Datasets
- Sharing Macros
- Analyzing a Git Repository
- Importing Duckbox Tables
- Copying an In-Memory Database to a File
- Troubleshooting
- Glossary of Terms
- Browsing Offline
- Operations Manual
- Overview
- DuckDB's Footprint
- Logging
- Securing DuckDB
- Non-Deterministic Behavior
- Limits
- DuckDB Docker Container
- Development
- DuckDB Repositories
- Profiling
- Building DuckDB
- Overview
- Build Configuration
- Building Extensions
- Android
- Linux
- macOS
- Raspberry Pi
- Windows
- Python
- R
- Troubleshooting
- Unofficial and Unsupported Platforms
- Benchmark Suite
- Testing
- Internals
- Sitemap
- Live Demo
DuckDB offers pattern matching operators
(LIKE
,
SIMILAR TO
,
GLOB
),
as well as support for regular expressions via functions.
Regular Expression Syntax
DuckDB uses the RE2 library as its regular expression engine. For the regular expression syntax, see the RE2 docs.
Functions
All functions accept an optional set of options.
Name | Description |
---|---|
regexp_extract(string, pattern[, group = 0][, options]) |
If string contains the regexp pattern , returns the capturing group specified by optional parameter group ; otherwise, returns the empty string. The group must be a constant value. If no group is given, it defaults to 0. A set of optional options can be set. |
regexp_extract(string, pattern, name_list[, options]) |
If string contains the regexp pattern , returns the capturing groups as a struct with corresponding names from name_list ; otherwise, returns a struct with the same keys and empty strings as values. |
regexp_extract_all(string, regex[, group = 0][, options]) |
Finds non-overlapping occurrences of regex in string and returns the corresponding values of group . |
regexp_full_match(string, regex[, options]) |
Returns true if the entire string matches the regex . |
regexp_matches(string, pattern[, options]) |
Returns true if string contains the regexp pattern , false otherwise. |
regexp_replace(string, pattern, replacement[, options]) |
If string contains the regexp pattern , replaces the matching part with replacement . By default, only the first occurrence is replaced. A set of optional options , including the global flag g , can be set. |
regexp_split_to_array(string, regex[, options]) |
Alias of string_split_regex . Splits the string along the regex . |
regexp_split_to_table(string, regex[, options]) |
Splits the string along the regex and returns a row for each part. |
regexp_extract(string, pattern[, group = 0][, options])
Description | If string contains the regexp pattern , returns the capturing group specified by optional parameter group ; otherwise, returns the empty string. The group must be a constant value. If no group is given, it defaults to 0. A set of optional options can be set. |
Example | regexp_extract('abc', '([a-z])(b)', 1) |
Result | a |
regexp_extract(string, pattern, name_list[, options])
Description | If string contains the regexp pattern , returns the capturing groups as a struct with corresponding names from name_list ; otherwise, returns a struct with the same keys and empty strings as values. A set of optional options can be set. |
Example | regexp_extract('2023-04-15', '(\d+)-(\d+)-(\d+)', ['y', 'm', 'd']) |
Result | {'y':'2023', 'm':'04', 'd':'15'} |
regexp_extract_all(string, regex[, group = 0][, options])
Description | Finds non-overlapping occurrences of regex in string and returns the corresponding values of group . A set of optional options can be set. |
Example | regexp_extract_all('Peter: 33, Paul:14', '(\w+):\s*(\d+)', 2) |
Result | [33, 14] |
regexp_full_match(string, regex[, options])
Description | Returns true if the entire string matches the regex . A set of optional options can be set. |
Example | regexp_full_match('anabanana', '(an)*') |
Result | false |
regexp_matches(string, pattern[, options])
Description | Returns true if string contains the regexp pattern , false otherwise. A set of optional options can be set. |
Example | regexp_matches('anabanana', '(an)*') |
Result | true |
regexp_replace(string, pattern, replacement[, options])
Description | If string contains the regexp pattern , replaces the matching part with replacement . By default, only the first occurrence is replaced. A set of optional options , including the global flag g , can be set. |
Example | regexp_replace('hello', '[lo]', '-') |
Result | he-lo |
regexp_split_to_array(string, regex[, options])
Description | Alias of string_split_regex . Splits the string along the regex . A set of optional options can be set. |
Example | regexp_split_to_array('hello world; 42', ';? ') |
Result | ['hello', 'world', '42'] |
regexp_split_to_table(string, regex[, options])
Description | Splits the string along the regex and returns a row for each part. A set of optional options can be set. |
Example | regexp_split_to_table('hello world; 42', ';? ') |
Result | Three rows: 'hello' , 'world' , '42' |
The regexp_matches
function is similar to the SIMILAR TO
operator, however, it does not require the entire string to match. Instead, regexp_matches
returns true
if the string merely contains the pattern (unless the special tokens ^
and $
are used to anchor the regular expression to the start and end of the string). Below are some examples:
SELECT regexp_matches('abc', 'abc'); -- true
SELECT regexp_matches('abc', '^abc$'); -- true
SELECT regexp_matches('abc', 'a'); -- true
SELECT regexp_matches('abc', '^a$'); -- false
SELECT regexp_matches('abc', '.*(b|d).*'); -- true
SELECT regexp_matches('abc', '(b|c).*'); -- true
SELECT regexp_matches('abc', '^(b|c).*'); -- false
SELECT regexp_matches('abc', '(?i)A'); -- true
SELECT regexp_matches('abc', 'A', 'i'); -- true
Options for Regular Expression Functions
The regex functions support the following options
.
Option | Description |
---|---|
'c' |
Case-sensitive matching |
'i' |
Case-insensitive matching |
'l' |
Match literals instead of regular expression tokens |
'm' , 'n' , 'p' |
Newline sensitive matching |
'g' |
Global replace, only available for regexp_replace |
's' |
Non-newline sensitive matching |
For example:
SELECT regexp_matches('abcd', 'ABC', 'c'); -- false
SELECT regexp_matches('abcd', 'ABC', 'i'); -- true
SELECT regexp_matches('ab^/$cd', '^/$', 'l'); -- true
SELECT regexp_matches(E'hello\nworld', 'hello.world', 'p'); -- false
SELECT regexp_matches(E'hello\nworld', 'hello.world', 's'); -- true
Using regexp_matches
The regexp_matches
operator will be optimized to the LIKE
operator when possible. To achieve best performance, the 'c'
option (case-sensitive matching) should be passed if applicable. Note that by default the RE2
library doesn't match the .
character to newline.
Original | Optimized equivalent |
---|---|
regexp_matches('hello world', '^hello', 'c') |
prefix('hello world', 'hello') |
regexp_matches('hello world', 'world$', 'c') |
suffix('hello world', 'world') |
regexp_matches('hello world', 'hello.world', 'c') |
LIKE 'hello_world' |
regexp_matches('hello world', 'he.*rld', 'c') |
LIKE '%he%rld' |
Using regexp_replace
The regexp_replace
function can be used to replace the part of a string that matches the regexp pattern with a replacement string. The notation \d
(where d
is a number indicating the group) can be used to refer to groups captured in the regular expression in the replacement string. Note that by default, regexp_replace
only replaces the first occurrence of the regular expression. To replace all occurrences, use the global replace (g
) flag.
Some examples for using regexp_replace
:
SELECT regexp_replace('abc', '(b|c)', 'X'); -- aXc
SELECT regexp_replace('abc', '(b|c)', 'X', 'g'); -- aXX
SELECT regexp_replace('abc', '(b|c)', '\1\1\1\1'); -- abbbbc
SELECT regexp_replace('abc', '(.*)c', '\1e'); -- abe
SELECT regexp_replace('abc', '(a)(b)', '\2\1'); -- bac
Using regexp_extract
The regexp_extract
function is used to extract a part of a string that matches the regexp pattern.
A specific capturing group within the pattern can be extracted using the group
parameter. If group
is not specified, it defaults to 0, extracting the first match with the whole pattern.
SELECT regexp_extract('abc', '.b.'); -- abc
SELECT regexp_extract('abc', '.b.', 0); -- abc
SELECT regexp_extract('abc', '.b.', 1); -- (empty)
SELECT regexp_extract('abc', '([a-z])(b)', 1); -- a
SELECT regexp_extract('abc', '([a-z])(b)', 2); -- b
The regexp_extract
function also supports a name_list
argument, which is a LIST
of strings. Using name_list
, the regexp_extract
will return the corresponding capture groups as fields of a STRUCT
:
SELECT regexp_extract('2023-04-15', '(\d+)-(\d+)-(\d+)', ['y', 'm', 'd']);
{'y': 2023, 'm': 04, 'd': 15}
SELECT regexp_extract('2023-04-15 07:59:56', '^(\d+)-(\d+)-(\d+) (\d+):(\d+):(\d+)', ['y', 'm', 'd']);
{'y': 2023, 'm': 04, 'd': 15}
SELECT regexp_extract('duckdb_0_7_1', '^(\w+)_(\d+)_(\d+)', ['tool', 'major', 'minor', 'fix']);
Binder Error:
Not enough group names in regexp_extract
If the number of column names is less than the number of capture groups, then only the first groups are returned. If the number of column names is greater, then an error is generated.
Limitations
Regular expressions only support 9 capture groups: \1
, \2
, \3
, …, \9
.
Capture groups with two or more digits are not supported.