You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The optional first parameter is the flags to add
to the regex (e.g. -i for a case insensitive match).
ES2015 and Unicode
By default regexgen will output a standard JavaScript regular expression, with Unicode codepoints converted into UCS-2 surrogate pairs.
If desired, you can request an ES2015-compatible Unicode regular expression by supplying the -u flag, which results in those codepoints being retained.
Such regular expressions are compatible with current versions of Node, as well as the latest browsers, and may be more transferrable to other languages.
How does it work?
Generate a Trie containing all of the input strings.
This is a tree structure where each edge represents a single character. This removes
redundancies at the start of the strings, but common branches further down are not merged.
A trie can be seen as a tree-shaped deterministic finite automaton (DFA), so DFA algorithms
can be applied. In this case, we apply Hopcroft's DFA minimization algorithm
to merge the nondistinguishable states.
Convert the resulting minimized DFA to a regular expression. This is done using
Brzozowski's algebraic method,
which is quite elegant. It expresses the DFA as a system of equations which can be solved
for a resulting regex. Along the way, some additional optimizations are made, such
as hoisting common substrings out of an alternation, and using character class ranges.
This produces an an Abstract Syntax Tree
(AST) for the regex, which is then converted to a string and compiled to a JavaScript
RegExp object.
License
MIT
About
Generate regular expressions that match a set of strings