CARVIEW |
- Installation
- Documentation
- Getting Started
- Connect
- Data Import
- Overview
- Data Sources
- CSV Files
- JSON Files
- Overview
- Creating JSON
- Loading JSON
- Writing JSON
- JSON Type
- JSON Functions
- Format Settings
- Installing and Loading
- SQL to / from JSON
- Caveats
- Multiple Files
- Parquet Files
- Partitioning
- Appender
- INSERT Statements
- Client APIs
- Overview
- Tertiary Clients
- ADBC
- C
- Overview
- Startup
- Configuration
- Query
- Data Chunks
- Vectors
- Values
- Types
- Prepared Statements
- Appender
- Table Functions
- Replacement Scans
- API Reference
- C++
- CLI
- Overview
- Arguments
- Dot Commands
- Output Formats
- Editing
- Safe Mode
- Autocomplete
- Syntax Highlighting
- Known Issues
- Dart
- Go
- Java (JDBC)
- Julia
- Node.js (Deprecated)
- Node.js (Neo)
- ODBC
- PHP
- Python
- Overview
- Data Ingestion
- Conversion between DuckDB and Python
- DB API
- Relational API
- Function API
- Types API
- Expression API
- Spark API
- API Reference
- Known Python Issues
- R
- Rust
- Swift
- Wasm
- SQL
- Introduction
- Statements
- Overview
- ANALYZE
- ALTER DATABASE
- ALTER TABLE
- ALTER VIEW
- ATTACH and DETACH
- CALL
- CHECKPOINT
- COMMENT ON
- COPY
- CREATE INDEX
- CREATE MACRO
- CREATE SCHEMA
- CREATE SECRET
- CREATE SEQUENCE
- CREATE TABLE
- CREATE VIEW
- CREATE TYPE
- DELETE
- DESCRIBE
- DROP
- EXPORT and IMPORT DATABASE
- INSERT
- LOAD / INSTALL
- MERGE INTO
- PIVOT
- Profiling
- SELECT
- SET / RESET
- SET VARIABLE
- SUMMARIZE
- Transaction Management
- UNPIVOT
- UPDATE
- USE
- VACUUM
- Query Syntax
- SELECT
- FROM and JOIN
- WHERE
- GROUP BY
- GROUPING SETS
- HAVING
- ORDER BY
- LIMIT and OFFSET
- SAMPLE
- Unnesting
- WITH
- WINDOW
- QUALIFY
- VALUES
- FILTER
- Set Operations
- Prepared Statements
- Data Types
- Overview
- Array
- Bitstring
- Blob
- Boolean
- Date
- Enum
- Interval
- List
- Literal Types
- Map
- NULL Values
- Numeric
- Struct
- Text
- Time
- Timestamp
- Time Zones
- Union
- Typecasting
- Expressions
- Overview
- CASE Expression
- Casting
- Collations
- Comparisons
- IN Operator
- Logical Operators
- Star Expression
- Subqueries
- TRY
- Functions
- Overview
- Aggregate Functions
- Array Functions
- Bitstring Functions
- Blob Functions
- Date Format Functions
- Date Functions
- Date Part Functions
- Enum Functions
- Interval Functions
- Lambda Functions
- List Functions
- Map Functions
- Nested Functions
- Numeric Functions
- Pattern Matching
- Regular Expressions
- Struct Functions
- Text Functions
- Time Functions
- Timestamp Functions
- Timestamp with Time Zone Functions
- Union Functions
- Utility Functions
- Window Functions
- Constraints
- Indexes
- Meta Queries
- DuckDB's SQL Dialect
- Overview
- Indexing
- Friendly SQL
- Keywords and Identifiers
- Order Preservation
- PostgreSQL Compatibility
- SQL Quirks
- Samples
- Configuration
- Extensions
- Overview
- Installing Extensions
- Advanced Installation Methods
- Distributing Extensions
- Versioning of Extensions
- Troubleshooting of Extensions
- Core Extensions
- Overview
- AutoComplete
- Avro
- AWS
- Azure
- Delta
- DuckLake
- Encodings
- Excel
- Full Text Search
- httpfs (HTTP and S3)
- Iceberg
- Overview
- Iceberg REST Catalogs
- Amazon S3 Tables
- Amazon SageMaker Lakehouse (AWS Glue)
- Troubleshooting
- ICU
- inet
- jemalloc
- MySQL
- PostgreSQL
- Spatial
- SQLite
- TPC-DS
- TPC-H
- UI
- VSS
- Guides
- Overview
- Data Viewers
- Database Integration
- File Formats
- Overview
- CSV Import
- CSV Export
- Directly Reading Files
- Excel Import
- Excel Export
- JSON Import
- JSON Export
- Parquet Import
- Parquet Export
- Querying Parquet Files
- File Access with the file: Protocol
- Network and Cloud Storage
- Overview
- HTTP Parquet Import
- S3 Parquet Import
- S3 Parquet Export
- S3 Iceberg Import
- S3 Express One
- GCS Import
- Cloudflare R2 Import
- DuckDB over HTTPS / S3
- Fastly Object Storage Import
- Meta Queries
- Describe Table
- EXPLAIN: Inspect Query Plans
- EXPLAIN ANALYZE: Profile Queries
- List Tables
- Summarize
- DuckDB Environment
- ODBC
- Performance
- Overview
- Environment
- Import
- Schema
- Indexing
- Join Operations
- File Formats
- How to Tune Workloads
- My Workload Is Slow
- Benchmarks
- Working with Huge Databases
- Python
- Installation
- Executing SQL
- Jupyter Notebooks
- marimo Notebooks
- SQL on Pandas
- Import from Pandas
- Export to Pandas
- Import from Numpy
- Export to Numpy
- SQL on Arrow
- Import from Arrow
- Export to Arrow
- Relational API on Pandas
- Multiple Python Threads
- Integration with Ibis
- Integration with Polars
- Using fsspec Filesystems
- SQL Editors
- SQL Features
- Snippets
- Creating Synthetic Data
- Dutch Railway Datasets
- Sharing Macros
- Analyzing a Git Repository
- Importing Duckbox Tables
- Copying an In-Memory Database to a File
- Troubleshooting
- Glossary of Terms
- Browsing Offline
- Operations Manual
- Overview
- DuckDB's Footprint
- Logging
- Securing DuckDB
- Non-Deterministic Behavior
- Limits
- DuckDB Docker Container
- Development
- DuckDB Repositories
- Profiling
- Building DuckDB
- Overview
- Build Configuration
- Building Extensions
- Android
- Linux
- macOS
- Raspberry Pi
- Windows
- Python
- R
- Troubleshooting
- Unofficial and Unsupported Platforms
- Benchmark Suite
- Testing
- Internals
- Sitemap
- Live Demo
Text Functions and Operators
This section describes functions and operators for examining and manipulating STRING
values.
Function | Description |
---|---|
string[index] |
Extracts a single character using a (1-based) index . |
string[begin:end] |
Extracts a string using slice conventions similar to Python. Missing begin or end arguments are interpreted as the beginning or end of the list respectively. Negative values are accepted. |
string LIKE target |
Returns true if the string matches the like specifier (see Pattern Matching). |
string SIMILAR TO regex |
Returns true if the string matches the regex (see Pattern Matching). |
string ^@ search_string |
Alias for starts_with . |
arg1 || arg2 |
Concatenates two strings, lists, or blobs. Any NULL input results in NULL . See also concat(arg1, arg2, ...) and list_concat(list1, list2, ...) . |
array_extract(string, index) |
Extracts a single character from a string using a (1-based) index . |
array_slice(list, begin, end) |
Extracts a sublist or substring using slice conventions. Negative values are accepted. |
ascii(string) |
Returns an integer that represents the Unicode code point of the first character of the string . |
bar(x, min, max[, width]) |
Draws a band whose width is proportional to (x - min ) and equal to width characters when x = max . width defaults to 80. |
base64(blob) |
Alias for to_base64 . |
bin(string) |
Converts the string to binary representation. |
bit_length(string) |
Number of bits in a string . |
char_length(string) |
Alias for length . |
character_length(string) |
Alias for length . |
chr(code_point) |
Returns a character which is corresponding the ASCII code value or Unicode code point. |
concat(value, ...) |
Concatenates multiple strings or lists. NULL inputs are skipped. See also operator || . |
concat_ws(separator, string, ...) |
Concatenates many strings, separated by separator . NULL inputs are skipped. |
contains(string, search_string) |
Returns true if search_string is found within string . |
ends_with(string, search_string) |
Alias for suffix . |
format(format, ...) |
Formats a string using the fmt syntax. |
formatReadableDecimalSize(integer) |
Converts integer to a human-readable representation using units based on powers of 10 (KB, MB, GB, etc.). |
formatReadableSize(integer) |
Alias for format_bytes . |
format_bytes(integer) |
Converts integer to a human-readable representation using units based on powers of 2 (KiB, MiB, GiB, etc.). |
from_base64(string) |
Converts a base64 encoded string to a character string (BLOB ). |
from_binary(value) |
Alias for unbin . |
from_hex(value) |
Alias for unhex . |
greatest(arg1, ...) |
Returns the largest value. For strings lexicographical ordering is used. Note that lowercase characters are considered “larger” than uppercase characters and collations are not supported. |
hash(value, ...) |
Returns a UBIGINT with the hash of the value . Note that this is not a cryptographic hash. |
hex(string) |
Converts the string to hexadecimal representation. |
ilike_escape(string, like_specifier, escape_character) |
Returns true if the string matches the like_specifier (see Pattern Matching) using case-insensitive matching. escape_character is used to search for wildcard characters in the string . |
instr(string, search_string) |
Returns location of first occurrence of search_string in string , counting from 1. Returns 0 if no match found. |
lcase(string) |
Alias for lower . |
least(arg1, ...) |
Returns the smallest value. For strings lexicographical ordering is used. Note that uppercase characters are considered “smaller” than lowercase characters, and collations are not supported. |
left(string, count) |
Extracts the left-most count characters. |
left_grapheme(string, count) |
Extracts the left-most count grapheme clusters. |
len(string) |
Alias for length . |
length(string) |
Number of characters in string . |
length_grapheme(string) |
Number of grapheme clusters in string . |
like_escape(string, like_specifier, escape_character) |
Returns true if the string matches the like_specifier (see Pattern Matching) using case-sensitive matching. escape_character is used to search for wildcard characters in the string . |
lower(string) |
Converts string to lower case. |
lpad(string, count, character) |
Pads the string with the character on the left until it has count characters. Truncates the string on the right if it has more than count characters. |
ltrim(string[, characters]) |
Removes any occurrences of any of the characters from the left side of the string . characters defaults to space . |
md5(string) |
Returns the MD5 hash of the string as a VARCHAR . |
md5_number(string) |
Returns the MD5 hash of the string as a HUGEINT . |
md5_number_lower(string) |
Returns the lower 64-bit segment of the MD5 hash of the string as a UBIGINT . |
md5_number_upper(string) |
Returns the upper 64-bit segment of the MD5 hash of the string as a UBIGINT . |
nfc_normalize(string) |
Converts string to Unicode NFC normalized string. Useful for comparisons and ordering if text data is mixed between NFC normalized and not. |
not_ilike_escape(string, like_specifier, escape_character) |
Returns false if the string matches the like_specifier (see Pattern Matching) using case-insensitive matching. escape_character is used to search for wildcard characters in the string . |
not_like_escape(string, like_specifier, escape_character) |
Returns false if the string matches the like_specifier (see Pattern Matching) using case-sensitive matching. escape_character is used to search for wildcard characters in the string . |
ord(string) |
Alias for unicode . |
parse_dirname(path[, separator]) |
Returns the top-level directory name from the given path . separator options: system , both_slash (default), forward_slash , backslash . |
parse_dirpath(path[, separator]) |
Returns the head of the path (the pathname until the last slash) similarly to Python's os.path.dirname . separator options: system , both_slash (default), forward_slash , backslash . |
parse_filename(string[, trim_extension][, separator]) |
Returns the last component of the path similarly to Python's os.path.basename function. If trim_extension is true , the file extension will be removed (defaults to false ). separator options: system , both_slash (default), forward_slash , backslash . |
parse_path(path[, separator]) |
Returns a list of the components (directories and filename) in the path similarly to Python's pathlib.parts function. separator options: system , both_slash (default), forward_slash , backslash . |
position(search_string IN string) |
Return location of first occurrence of search_string in string , counting from 1. Returns 0 if no match found. |
position(string, search_string) |
Alias for instr . |
prefix(string, search_string) |
Returns true if string starts with search_string . |
printf(format, ...) |
Formats a string using printf syntax. |
read_text(source) |
Returns the content from source (a filename, a list of filenames, or a glob pattern) as a VARCHAR . The file content is first validated to be valid UTF-8. If read_text attempts to read a file with invalid UTF-8 an error is thrown suggesting to use read_blob instead. See the read_text guide for more details. |
regexp_escape(string) |
Escapes special patterns to turn string into a regular expression similarly to Python's re.escape function. |
regexp_extract(string, regex[, group][, options]) |
If string contains the regex pattern, returns the capturing group specified by optional parameter group ; otherwise, returns the empty string. The group must be a constant value. If no group is given, it defaults to 0. A set of optional regex options can be set. |
regexp_extract(string, regex, name_list[, options]) |
If string contains the regex pattern, returns the capturing groups as a struct with corresponding names from name_list ; otherwise, returns a struct with the same keys and empty strings as values. A set of optional regex options can be set. |
regexp_extract_all(string, regex[, group][, options]) |
Finds non-overlapping occurrences of the regex in the string and returns the corresponding values of the capturing group . A set of optional regex options can be set. |
regexp_full_match(string, regex[, col2]) |
Returns true if the entire string matches the regex . A set of optional regex options can be set. |
regexp_matches(string, regex[, options]) |
Returns true if string contains the regex , false otherwise. A set of optional regex options can be set. |
regexp_replace(string, regex, replacement[, options]) |
If string contains the regex , replaces the matching part with replacement . A set of optional regex options can be set. |
regexp_split_to_array(string, regex[, options]) |
Alias for string_split_regex . |
regexp_split_to_table(string, regex) |
Splits the string along the regex and returns a row for each part. |
repeat(string, count) |
Repeats the string count number of times. |
replace(string, source, target) |
Replaces any occurrences of the source with target in string . |
reverse(string) |
Reverses the string . |
right(string, count) |
Extract the right-most count characters. |
right_grapheme(string, count) |
Extracts the right-most count grapheme clusters. |
rpad(string, count, character) |
Pads the string with the character on the right until it has count characters. Truncates the string on the right if it has more than count characters. |
rtrim(string[, characters]) |
Removes any occurrences of any of the characters from the right side of the string . characters defaults to space . |
sha1(value) |
Returns a VARCHAR with the SHA-1 hash of the value . |
sha256(value) |
Returns a VARCHAR with the SHA-256 hash of the value |
split(string, separator) |
Alias for string_split . |
split_part(string, separator, index) |
Splits the string along the separator and returns the data at the (1-based) index of the list. If the index is outside the bounds of the list, return an empty string (to match PostgreSQL's behavior). |
starts_with(string, search_string) |
Returns true if string begins with search_string . |
str_split(string, separator) |
Alias for string_split . |
str_split_regex(string, regex[, options]) |
Alias for string_split_regex . |
string_split(string, separator) |
Splits the string along the separator . |
string_split_regex(string, regex[, options]) |
Splits the string along the regex . A set of optional regex options can be set. |
string_to_array(string, separator) |
Alias for string_split . |
strip_accents(string) |
Strips accents from string . |
strlen(string) |
Number of bytes in string . |
strpos(string, search_string) |
Alias for instr . |
substr(string, start[, length]) |
Alias for substring . |
substring(string, start[, length]) |
Extracts substring starting from character start up to the end of the string. If optional argument length is set, extracts a substring of length characters instead. Note that a start value of 1 refers to the first character of the string . |
substring_grapheme(string, start[, length]) |
Extracts substring starting from grapheme clusters start up to the end of the string. If optional argument length is set, extracts a substring of length grapheme clusters instead. Note that a start value of 1 refers to the first character of the string . |
suffix(string, search_string) |
Returns true if string ends with search_string . |
to_base(number, radix[, min_length]) |
Converts number to a string in the given base radix , optionally padding with leading zeros to min_length . |
to_base64(blob) |
Converts a blob to a base64 encoded string. |
to_binary(string) |
Alias for bin . |
to_hex(string) |
Alias for hex . |
translate(string, from, to) |
Replaces each character in string that matches a character in the from set with the corresponding character in the to set. If from is longer than to , occurrences of the extra characters in from are deleted. |
trim(string[, characters]) |
Removes any occurrences of any of the characters from either side of the string . characters defaults to space . |
ucase(string) |
Alias for upper . |
unbin(value) |
Converts a value from binary representation to a blob. |
unhex(value) |
Converts a value from hexadecimal representation to a blob. |
unicode(string) |
Returns an INTEGER representing the unicode codepoint of the first character in the string . |
upper(string) |
Converts string to upper case. |
url_decode(string) |
Decodes a URL from a representation using Percent-Encoding. |
url_encode(string) |
Encodes a URL to a representation using Percent-Encoding. |
string[index]
Description | Extracts a single character using a (1-based) index . |
Example | 'DuckDB'[4] |
Result | k |
Alias | array_extract |
string[begin:end]
Description | Extracts a string using slice conventions similar to Python. Missing begin or end arguments are interpreted as the beginning or end of the list respectively. Negative values are accepted. |
Example | 'DuckDB'[:4] |
Result | Duck |
Alias | array_slice |
string LIKE target
Description | Returns true if the string matches the like specifier (see Pattern Matching). |
Example | 'hello' LIKE '%lo' |
Result | true |
string SIMILAR TO regex
Description | Returns true if the string matches the regex (see Pattern Matching). |
Example | 'hello' SIMILAR TO 'l+' |
Result | false |
Alias | regexp_full_match |
arg1 || arg2
Description | Concatenates two strings, lists, or blobs. Any NULL input results in NULL . See also concat(arg1, arg2, ...) and list_concat(list1, list2, ...) . |
Example 1 | 'Duck' || 'DB' |
Result | DuckDB |
Example 2 | [1, 2, 3] || [4, 5, 6] |
Result | [1, 2, 3, 4, 5, 6] |
Example 3 | '\xAA'::BLOB || '\xBB'::BLOB |
Result | \xAA\xBB |
array_extract(string, index)
Description | Extracts a single character from a string using a (1-based) index . |
Example | array_extract('DuckDB', 2) |
Result | u |
array_slice(list, begin, end)
Description | Extracts a sublist or substring using slice conventions. Negative values are accepted. |
Example 1 | array_slice('DuckDB', 3, 4) |
Result | ck |
Example 2 | array_slice('DuckDB', 3, NULL) |
Result | NULL |
Example 3 | array_slice('DuckDB', 0, -3) |
Result | Duck |
Alias | list_slice |
ascii(string)
Description | Returns an integer that represents the Unicode code point of the first character of the string . |
Example | ascii('Ω') |
Result | 937 |
bar(x, min, max[, width])
Description | Draws a band whose width is proportional to (x - min ) and equal to width characters when x = max . width defaults to 80. |
Example | bar(5, 0, 20, 10) |
Result | ██▌ |
bin(string)
Description | Converts the string to binary representation. |
Example | bin('Aa') |
Result | 0100000101100001 |
Alias | to_binary |
bit_length(string)
Description | Number of bits in a string . |
Example | bit_length('abc') |
Result | 24 |
chr(code_point)
Description | Returns a character which is corresponding the ASCII code value or Unicode code point. |
Example | chr(65) |
Result | A |
concat(value, ...)
Description | Concatenates multiple strings or lists. NULL inputs are skipped. See also operator || . |
Example 1 | concat('Hello', ' ', 'World') |
Result | Hello World |
Example 2 | concat([1, 2, 3], NULL, [4, 5, 6]) |
Result | [1, 2, 3, 4, 5, 6] |
concat_ws(separator, string, ...)
Description | Concatenates many strings, separated by separator . NULL inputs are skipped. |
Example | concat_ws(', ', 'Banana', 'Apple', 'Melon') |
Result | Banana, Apple, Melon |
contains(string, search_string)
Description | Returns true if search_string is found within string . |
Example | contains('abc', 'a') |
Result | true |
format(format, ...)
Description | Formats a string using the fmt syntax. |
Example | format('Benchmark "{}" took {} seconds', 'CSV', 42) |
Result | Benchmark "CSV" took 42 seconds |
formatReadableDecimalSize(integer)
Description | Converts integer to a human-readable representation using units based on powers of 10 (KB, MB, GB, etc.). |
Example | formatReadableDecimalSize(16_000) |
Result | 16.0 kB |
format_bytes(integer)
Description | Converts integer to a human-readable representation using units based on powers of 2 (KiB, MiB, GiB, etc.). |
Example | format_bytes(16_000) |
Result | 15.6 KiB |
Alias | formatReadableSize |
from_base64(string)
Description | Converts a base64 encoded string to a character string (BLOB ). |
Example | from_base64('QQ==') |
Result | A |
greatest(arg1, ...)
Description | Returns the largest value. For strings lexicographical ordering is used. Note that lowercase characters are considered “larger” than uppercase characters and collations are not supported. |
Example 1 | greatest(42, 84) |
Result | 84 |
Example 2 | greatest('abc', 'bcd', 'cde', 'EFG') |
Result | cde |
hash(value, ...)
Description | Returns a UBIGINT with the hash of the value . Note that this is not a cryptographic hash. |
Example | hash('🦆') |
Result | 4164431626903154684 |
hex(string)
Description | Converts the string to hexadecimal representation. |
Example | hex('Hello') |
Result | 48656C6C6F |
Alias | to_hex |
ilike_escape(string, like_specifier, escape_character)
Description | Returns true if the string matches the like_specifier (see Pattern Matching) using case-insensitive matching. escape_character is used to search for wildcard characters in the string . |
Example | ilike_escape('A%c', 'a$%C', '$') |
Result | true |
instr(string, search_string)
Description | Returns location of first occurrence of search_string in string , counting from 1. Returns 0 if no match found. |
Example | instr('test test', 'es') |
Result | 2 |
Aliases | position , strpos |
least(arg1, ...)
Description | Returns the smallest value. For strings lexicographical ordering is used. Note that uppercase characters are considered “smaller” than lowercase characters, and collations are not supported. |
Example 1 | least(42, 84) |
Result | 42 |
Example 2 | least('abc', 'bcd', 'cde', 'EFG') |
Result | EFG |
left(string, count)
Description | Extracts the left-most count characters. |
Example | left('Hello🦆', 2) |
Result | He |
left_grapheme(string, count)
Description | Extracts the left-most count grapheme clusters. |
Example | left_grapheme('🤦🏼♂️🤦🏽♀️', 1) |
Result | 🤦🏼♂️ |
length(string)
Description | Number of characters in string . |
Example | length('Hello🦆') |
Result | 6 |
Aliases | char_length , character_length , len |
length_grapheme(string)
Description | Number of grapheme clusters in string . |
Example | length_grapheme('🤦🏼♂️🤦🏽♀️') |
Result | 2 |
like_escape(string, like_specifier, escape_character)
Description | Returns true if the string matches the like_specifier (see Pattern Matching) using case-sensitive matching. escape_character is used to search for wildcard characters in the string . |
Example | like_escape('a%c', 'a$%c', '$') |
Result | true |
lower(string)
Description | Converts string to lower case. |
Example | lower('Hello') |
Result | hello |
Alias | lcase |
lpad(string, count, character)
Description | Pads the string with the character on the left until it has count characters. Truncates the string on the right if it has more than count characters. |
Example | lpad('hello', 8, '>') |
Result | >>>hello |
ltrim(string[, characters])
Description | Removes any occurrences of any of the characters from the left side of the string . characters defaults to space . |
Example 1 | ltrim(' test ') |
Result | test |
Example 2 | ltrim('>>>>test<<', '><') |
Result | test<< |
md5(string)
Description | Returns the MD5 hash of the string as a VARCHAR . |
Example | md5('abc') |
Result | 900150983cd24fb0d6963f7d28e17f72 |
md5_number(string)
Description | Returns the MD5 hash of the string as a HUGEINT . |
Example | md5_number('abc') |
Result | 152195979970564155685860391459828531600 |
md5_number_lower(string)
Description | Returns the lower 64-bit segment of the MD5 hash of the string as a UBIGINT . |
Example | md5_number_lower('abc') |
Result | 8250560606382298838 |
md5_number_upper(string)
Description | Returns the upper 64-bit segment of the MD5 hash of the string as a UBIGINT . |
Example | md5_number_upper('abc') |
Result | 12704604231530709392 |
nfc_normalize(string)
Description | Converts string to Unicode NFC normalized string. Useful for comparisons and ordering if text data is mixed between NFC normalized and not. |
Example | nfc_normalize('ardèch') |
Result | ardèch |
not_ilike_escape(string, like_specifier, escape_character)
Description | Returns false if the string matches the like_specifier (see Pattern Matching) using case-insensitive matching. escape_character is used to search for wildcard characters in the string . |
Example | not_ilike_escape('A%c', 'a$%C', '$') |
Result | false |
not_like_escape(string, like_specifier, escape_character)
Description | Returns false if the string matches the like_specifier (see Pattern Matching) using case-sensitive matching. escape_character is used to search for wildcard characters in the string . |
Example | not_like_escape('a%c', 'a$%c', '$') |
Result | false |
parse_dirname(path[, separator])
Description | Returns the top-level directory name from the given path . separator options: system , both_slash (default), forward_slash , backslash . |
Example | parse_dirname('path/to/file.csv', 'system') |
Result | path |
parse_dirpath(path[, separator])
Description | Returns the head of the path (the pathname until the last slash) similarly to Python's os.path.dirname . separator options: system , both_slash (default), forward_slash , backslash . |
Example | parse_dirpath('path/to/file.csv', 'forward_slash') |
Result | path/to |
parse_filename(string[, trim_extension][, separator])
Description | Returns the last component of the path similarly to Python's os.path.basename function. If trim_extension is true , the file extension will be removed (defaults to false ). separator options: system , both_slash (default), forward_slash , backslash . |
Example | parse_filename('path/to/file.csv', true, 'forward_slash') |
Result | file |
parse_path(path[, separator])
Description | Returns a list of the components (directories and filename) in the path similarly to Python's pathlib.parts function. separator options: system , both_slash (default), forward_slash , backslash . |
Example | parse_path('path/to/file.csv', 'system') |
Result | [path, to, file.csv] |
position(search_string IN string)
Description | Return location of first occurrence of search_string in string , counting from 1. Returns 0 if no match found. |
Example | position('b' IN 'abc') |
Result | 2 |
Aliases | instr , strpos |
prefix(string, search_string)
Description | Returns true if string starts with search_string . |
Example | prefix('abc', 'ab') |
Result | true |
printf(format, ...)
Description | Formats a string using printf syntax. |
Example | printf('Benchmark "%s" took %d seconds', 'CSV', 42) |
Result | Benchmark "CSV" took 42 seconds |
read_text(source)
Description | Returns the content from source (a filename, a list of filenames, or a glob pattern) as a VARCHAR . The file content is first validated to be valid UTF-8. If read_text attempts to read a file with invalid UTF-8 an error is thrown suggesting to use read_blob instead. See the read_text guide for more details. |
Example | read_text('hello.txt') |
Result | hello\n |
regexp_escape(string)
Description | Escapes special patterns to turn string into a regular expression similarly to Python's re.escape function. |
Example | regexp_escape('https://duckdb.org') |
Result | https\:\/\/duckdb\.org |
regexp_extract(string, regex[, group][, options])
Description | If string contains the regex pattern, returns the capturing group specified by optional parameter group ; otherwise, returns the empty string. The group must be a constant value. If no group is given, it defaults to 0. A set of optional regex options can be set. |
Example | regexp_extract('ABC', '([a-z])(b)', 1, 'i') |
Result | A |
regexp_extract(string, regex, name_list[, options])
Description | If string contains the regex pattern, returns the capturing groups as a struct with corresponding names from name_list ; otherwise, returns a struct with the same keys and empty strings as values. A set of optional regex options can be set. |
Example | regexp_extract('John Doe', '([a-z]+) ([a-z]+)', ['first_name', 'last_name'], 'i') |
Result | {'first_name': John, 'last_name': Doe} |
regexp_extract_all(string, regex[, group][, options])
Description | Finds non-overlapping occurrences of the regex in the string and returns the corresponding values of the capturing group . A set of optional regex options can be set. |
Example | regexp_extract_all('Peter: 33, Paul:14', '(\w+):\s*(\d+)', 2) |
Result | [33, 14] |
regexp_full_match(string, regex[, col2])
Description | Returns true if the entire string matches the regex . A set of optional regex options can be set. |
Example | regexp_full_match('anabanana', '(an)*') |
Result | false |
regexp_matches(string, regex[, options])
Description | Returns true if string contains the regex , false otherwise. A set of optional regex options can be set. |
Example | regexp_matches('anabanana', '(an)*') |
Result | true |
regexp_replace(string, regex, replacement[, options])
Description | If string contains the regex , replaces the matching part with replacement . A set of optional regex options can be set. |
Example | regexp_replace('hello', '[lo]', '-') |
Result | he-lo |
regexp_split_to_table(string, regex)
Description | Splits the string along the regex and returns a row for each part. |
Example | regexp_split_to_table('hello world; 42', ';? ') |
Result | Multiple rows: 'hello' , 'world' , '42' |
repeat(string, count)
Description | Repeats the string count number of times. |
Example | repeat('A', 5) |
Result | AAAAA |
replace(string, source, target)
Description | Replaces any occurrences of the source with target in string . |
Example | replace('hello', 'l', '-') |
Result | he--o |
reverse(string)
Description | Reverses the string . |
Example | reverse('hello') |
Result | olleh |
right(string, count)
Description | Extract the right-most count characters. |
Example | right('Hello🦆', 3) |
Result | lo🦆 |
right_grapheme(string, count)
Description | Extracts the right-most count grapheme clusters. |
Example | right_grapheme('🤦🏼♂️🤦🏽♀️', 1) |
Result | 🤦🏽♀️ |
rpad(string, count, character)
Description | Pads the string with the character on the right until it has count characters. Truncates the string on the right if it has more than count characters. |
Example | rpad('hello', 10, '<') |
Result | hello<<<<< |
rtrim(string[, characters])
Description | Removes any occurrences of any of the characters from the right side of the string . characters defaults to space . |
Example 1 | rtrim('test ') |
Result | test |
Example 2 | rtrim('>>>>test<<', '><') |
Result | >>>>test |
sha1(value)
Description | Returns a VARCHAR with the SHA-1 hash of the value . |
Example | sha1('🦆') |
Result | 949bf843dc338be348fb9525d1eb535d31241d76 |
sha256(value)
Description | Returns a VARCHAR with the SHA-256 hash of the value |
Example | sha256('🦆') |
Result | d7a5c5e0d1d94c32218539e7e47d4ba9c3c7b77d61332fb60d633dde89e473fb |
split_part(string, separator, index)
Description | Splits the string along the separator and returns the data at the (1-based) index of the list. If the index is outside the bounds of the list, return an empty string (to match PostgreSQL's behavior). |
Example | split_part('a;b;c', ';', 2) |
Result | b |
starts_with(string, search_string)
Description | Returns true if string begins with search_string . |
Example | starts_with('abc', 'a') |
Result | true |
Alias | ^@ |
string_split(string, separator)
Description | Splits the string along the separator . |
Example | string_split('hello-world', '-') |
Result | [hello, world] |
Aliases | split , str_split , string_to_array |
string_split_regex(string, regex[, options])
Description | Splits the string along the regex . A set of optional regex options can be set. |
Example | string_split_regex('hello world; 42', ';? ') |
Result | [hello, world, 42] |
Aliases | regexp_split_to_array , str_split_regex |
strip_accents(string)
Description | Strips accents from string . |
Example | strip_accents('mühleisen') |
Result | muhleisen |
strlen(string)
Description | Number of bytes in string . |
Example | strlen('🦆') |
Result | 4 |
substring(string, start[, length])
Description | Extracts substring starting from character start up to the end of the string. If optional argument length is set, extracts a substring of length characters instead. Note that a start value of 1 refers to the first character of the string . |
Example 1 | substring('Hello', 2) |
Result | ello |
Example 2 | substring('Hello', 2, 2) |
Result | el |
Alias | substr |
substring_grapheme(string, start[, length])
Description | Extracts substring starting from grapheme clusters start up to the end of the string. If optional argument length is set, extracts a substring of length grapheme clusters instead. Note that a start value of 1 refers to the first character of the string . |
Example 1 | substring_grapheme('🦆🤦🏼♂️🤦🏽♀️🦆', 3) |
Result | 🤦🏽♀️🦆 |
Example 2 | substring_grapheme('🦆🤦🏼♂️🤦🏽♀️🦆', 3, 2) |
Result | 🤦🏽♀️🦆 |
suffix(string, search_string)
Description | Returns true if string ends with search_string . |
Example | suffix('abc', 'bc') |
Result | true |
Alias | ends_with |
to_base(number, radix[, min_length])
Description | Converts number to a string in the given base radix , optionally padding with leading zeros to min_length . |
Example | to_base(42, 16, 5) |
Result | 0002A |
to_base64(blob)
Description | Converts a blob to a base64 encoded string. |
Example | to_base64('A'::BLOB) |
Result | QQ== |
Alias | base64 |
translate(string, from, to)
Description | Replaces each character in string that matches a character in the from set with the corresponding character in the to set. If from is longer than to , occurrences of the extra characters in from are deleted. |
Example | translate('12345', '143', 'ax') |
Result | a2x5 |
trim(string[, characters])
Description | Removes any occurrences of any of the characters from either side of the string . characters defaults to space . |
Example 1 | trim(' test ') |
Result | test |
Example 2 | trim('>>>>test<<', '><') |
Result | test |
unbin(value)
Description | Converts a value from binary representation to a blob. |
Example | unbin('0110') |
Result | \x06 |
Alias | from_binary |
unhex(value)
Description | Converts a value from hexadecimal representation to a blob. |
Example | unhex('2A') |
Result | * |
Alias | from_hex |
unicode(string)
Description | Returns an INTEGER representing the unicode codepoint of the first character in the string . |
Example | [unicode('âbcd'), unicode('â'), unicode(''), unicode(NULL)] |
Result | [226, 226, -1, NULL] |
Alias | ord |
upper(string)
Description | Converts string to upper case. |
Example | upper('Hello') |
Result | HELLO |
Alias | ucase |
url_decode(string)
Description | Decodes a URL from a representation using Percent-Encoding. |
Example | url_decode('https%3A%2F%2Fduckdb.org%2Fwhy_duckdb%23portable') |
Result | https://duckdb.org/why_duckdb#portable |
url_encode(string)
Description | Encodes a URL to a representation using Percent-Encoding. |
Example | url_encode('this string has/ special+ characters>') |
Result | this%20string%20has%2F%20special%2B%20characters%3E |
Text Similarity Functions
These functions are used to measure the similarity of two strings using various similarity measures.
Function | Description |
---|---|
damerau_levenshtein(s1, s2) |
Extension of Levenshtein distance to also include transposition of adjacent characters as an allowed edit operation. In other words, the minimum number of edit operations (insertions, deletions, substitutions or transpositions) required to change one string to another. Characters of different cases (e.g., a and A ) are considered different. |
editdist3(s1, s2) |
Alias for levenshtein . |
hamming(s1, s2) |
The Hamming distance between to strings, i.e., the number of positions with different characters for two strings of equal length. Strings must be of equal length. Characters of different cases (e.g., a and A ) are considered different. |
jaccard(s1, s2) |
The Jaccard similarity between two strings. Characters of different cases (e.g., a and A ) are considered different. Returns a number between 0 and 1. |
jaro_similarity(s1, s2[, score_cutoff]) |
The Jaro similarity between two strings. Characters of different cases (e.g., a and A ) are considered different. Returns a number between 0 and 1. For similarity < score_cutoff , 0 is returned instead. score_cutoff defaults to 0. |
jaro_winkler_similarity(s1, s2[, score_cutoff]) |
The Jaro-Winkler similarity between two strings. Characters of different cases (e.g., a and A ) are considered different. Returns a number between 0 and 1. For similarity < score_cutoff , 0 is returned instead. score_cutoff defaults to 0. |
levenshtein(s1, s2) |
The minimum number of single-character edits (insertions, deletions or substitutions) required to change one string to the other. Characters of different cases (e.g., a and A ) are considered different. |
mismatches(s1, s2) |
Alias for hamming . |
damerau_levenshtein(s1, s2)
Description | Extension of Levenshtein distance to also include transposition of adjacent characters as an allowed edit operation. In other words, the minimum number of edit operations (insertions, deletions, substitutions or transpositions) required to change one string to another. Characters of different cases (e.g., a and A ) are considered different. |
Example | damerau_levenshtein('duckdb', 'udckbd') |
Result | 2 |
hamming(s1, s2)
Description | The Hamming distance between to strings, i.e., the number of positions with different characters for two strings of equal length. Strings must be of equal length. Characters of different cases (e.g., a and A ) are considered different. |
Example | hamming('duck', 'luck') |
Result | 1 |
Alias | mismatches |
jaccard(s1, s2)
Description | The Jaccard similarity between two strings. Characters of different cases (e.g., a and A ) are considered different. Returns a number between 0 and 1. |
Example | jaccard('duck', 'luck') |
Result | 0.6 |
jaro_similarity(s1, s2[, score_cutoff])
Description | The Jaro similarity between two strings. Characters of different cases (e.g., a and A ) are considered different. Returns a number between 0 and 1. For similarity < score_cutoff , 0 is returned instead. score_cutoff defaults to 0. |
Example | jaro_similarity('duck', 'duckdb') |
Result | 0.8888888888888888 |
jaro_winkler_similarity(s1, s2[, score_cutoff])
Description | The Jaro-Winkler similarity between two strings. Characters of different cases (e.g., a and A ) are considered different. Returns a number between 0 and 1. For similarity < score_cutoff , 0 is returned instead. score_cutoff defaults to 0. |
Example | jaro_winkler_similarity('duck', 'duckdb') |
Result | 0.9333333333333333 |
levenshtein(s1, s2)
Description | The minimum number of single-character edits (insertions, deletions or substitutions) required to change one string to the other. Characters of different cases (e.g., a and A ) are considered different. |
Example | levenshtein('duck', 'db') |
Result | 3 |
Alias | editdist3 |
Formatters
fmt
Syntax
The format(format, parameters...)
function formats strings, loosely following the syntax of the {fmt} open-source formatting library.
Format without additional parameters:
SELECT format('Hello world'); -- Hello world
Format a string using {}:
SELECT format('The answer is {}', 42); -- The answer is 42
Format a string using positional arguments:
SELECT format('I''d rather be {1} than {0}.', 'right', 'happy'); -- I'd rather be happy than right.
Format Specifiers
Specifier | Description | Example |
---|---|---|
{:d} |
integer | 654321 |
{:E} |
scientific notation | 3.141593E+00 |
{:f} |
float | 4.560000 |
{:o} |
octal | 2375761 |
{:s} |
string | asd |
{:x} |
hexadecimal | 9fbf1 |
{:tX} |
integer, X is the thousand separator |
654 321 |
Formatting Types
Integers:
SELECT format('{} + {} = {}', 3, 5, 3 + 5); -- 3 + 5 = 8
Booleans:
SELECT format('{} != {}', true, false); -- true != false
Format datetime values:
SELECT format('{}', DATE '1992-01-01'); -- 1992-01-01
SELECT format('{}', TIME '12:01:00'); -- 12:01:00
SELECT format('{}', TIMESTAMP '1992-01-01 12:01:00'); -- 1992-01-01 12:01:00
Format BLOB:
SELECT format('{}', BLOB '\x00hello'); -- \x00hello
Pad integers with 0s:
SELECT format('{:04d}', 33); -- 0033
Create timestamps from integers:
SELECT format('{:02d}:{:02d}:{:02d} {}', 12, 3, 16, 'AM'); -- 12:03:16 AM
Convert to hexadecimal:
SELECT format('{:x}', 123_456_789); -- 75bcd15
Convert to binary:
SELECT format('{:b}', 123_456_789); -- 111010110111100110100010101
Print Numbers with Thousand Separators
Integers:
SELECT format('{:,}', 123_456_789); -- 123,456,789
SELECT format('{:t.}', 123_456_789); -- 123.456.789
SELECT format('{:''}', 123_456_789); -- 123'456'789
SELECT format('{:_}', 123_456_789); -- 123_456_789
SELECT format('{:t }', 123_456_789); -- 123 456 789
SELECT format('{:tX}', 123_456_789); -- 123X456X789
Float, double and decimal:
SELECT format('{:,f}', 123456.789); -- 123,456.78900
SELECT format('{:,.2f}', 123456.789); -- 123,456.79
SELECT format('{:t..2f}', 123456.789); -- 123.456,79
printf
Syntax
The printf(format, parameters...)
function formats strings using the printf
syntax.
Format without additional parameters:
SELECT printf('Hello world');
Hello world
Format a string using arguments in a given order:
SELECT printf('The answer to %s is %d', 'life', 42);
The answer to life is 42
Format a string using positional arguments %position$formatter
, e.g., the second parameter as a string is encoded as %2$s
:
SELECT printf('I''d rather be %2$s than %1$s.', 'right', 'happy');
I'd rather be happy than right.
Format Specifiers
Specifier | Description | Example |
---|---|---|
%c |
character code to character | a |
%d |
integer | 654321 |
%Xd |
integer with thousand seperarator X from , , . , '' , _ |
654_321 |
%E |
scientific notation | 3.141593E+00 |
%f |
float | 4.560000 |
%hd |
integer | 654321 |
%hhd |
integer | 654321 |
%lld |
integer | 654321 |
%o |
octal | 2375761 |
%s |
string | asd |
%x |
hexadecimal | 9fbf1 |
Formatting Types
Integers:
SELECT printf('%d + %d = %d', 3, 5, 3 + 5); -- 3 + 5 = 8
Booleans:
SELECT printf('%s != %s', true, false); -- true != false
Format datetime values:
SELECT printf('%s', DATE '1992-01-01'); -- 1992-01-01
SELECT printf('%s', TIME '12:01:00'); -- 12:01:00
SELECT printf('%s', TIMESTAMP '1992-01-01 12:01:00'); -- 1992-01-01 12:01:00
Format BLOB:
SELECT printf('%s', BLOB '\x00hello'); -- \x00hello
Pad integers with 0s:
SELECT printf('%04d', 33); -- 0033
Create timestamps from integers:
SELECT printf('%02d:%02d:%02d %s', 12, 3, 16, 'AM'); -- 12:03:16 AM
Convert to hexadecimal:
SELECT printf('%x', 123_456_789); -- 75bcd15
Convert to binary:
SELECT printf('%b', 123_456_789); -- 111010110111100110100010101
Thousand Separators
Integers:
SELECT printf('%,d', 123_456_789); -- 123,456,789
SELECT printf('%.d', 123_456_789); -- 123.456.789
SELECT printf('%''d', 123_456_789); -- 123'456'789
SELECT printf('%_d', 123_456_789); -- 123_456_789
Float, double and decimal:
SELECT printf('%,f', 123456.789); -- 123,456.789000
SELECT printf('%,.2f', 123456.789); -- 123,456.79