CARVIEW |
RegExp.escape()
+1
On Sun, Jun 13, 2010 at 7:50 AM, Jordan Osete <jordan.osete at yahoo.fr> wrote: > Hello everybody. > > How about standardizing something like RegExp.escape() ? > https://simonwillison.net/2006/Jan/20/escape/ > > It is trivial to implement, but it seems to me that this functionality > belongs to the language - the implementation obviously knows better > which characters must be escaped, and which ones don't need to. +1
+1, again.
Although this is only a minor convenience since you can do something like
text.replace(/[-[\]{}()*+?.,\\^$|]/g, "\\$&")
, the list of special
characters is subject to change. E.g., if ES adds /x, whitespace (and
possibly #) must be added.
On Wednesday, Jan 04, 2012 at 8:03 PM, Kris Kowal wrote: > On Sun, Jun 13, 2010 at 7:50 AM, Jordan Osete <jordan.osete at yahoo.fr> > wrote: >> Hello everybody. >> >> How about standardizing something like RegExp.escape() ? >> https://simonwillison.net/2006/Jan/20/escape/ >> >> It is trivial to implement, but it seems to me that this functionality >> belongs to the language - the implementation obviously knows better >> which characters must be escaped, and which ones don't need to. > > +1 +1, again. Although this is only a minor convenience since you can do something like text.replace(/[-[\]{}()*+?.,\\^$|]/g, "\\$&"), the list of special characters is subject to change. E.g., if ES adds /x, whitespace (and possibly #) must be added. -- Steven Levithan
In perl the recommended version is
text.replace(/([^a-zA-Z0-9])/g, "\\$1")
which is future-proof and safe and I think this also works for JS.
2012/3/23 Steven Levithan <steves_list at hotmail.com>: > On Wednesday, Jan 04, 2012 at 8:03 PM, Kris Kowal wrote: >> >> On Sun, Jun 13, 2010 at 7:50 AM, Jordan Osete <jordan.osete at yahoo.fr> >> wrote: >>> >>> Hello everybody. >>> >>> How about standardizing something like RegExp.escape() ? >>> https://simonwillison.net/2006/Jan/20/escape/ >>> >>> It is trivial to implement, but it seems to me that this functionality >>> belongs to the language - the implementation obviously knows better >>> which characters must be escaped, and which ones don't need to. >> >> >> +1 > > > +1, again. > > Although this is only a minor convenience since you can do something like > text.replace(/[-[\]{}()*+?.,\\^$|]/g, "\\$&"), the list of special > characters is subject to change. E.g., if ES adds /x, whitespace (and > possibly #) must be added. In perl the recommended version is text.replace(/([^a-zA-Z0-9])/g, "\\$1") which is future-proof and safe and I think this also works for JS. -- Erik Corry
It's probably future-proof and safe, but it escapes 65,520 characters more than necessary.
Anyway, no big deal if this isn't added. I have, however, seen a lot of developers get this wrong when trying to do it themselves (e.g., the blog post that started this thread was not safe until it was updated 4+ years later, and it wasn't the worst I've seen).
Erik Corry wrote: >Steven Levithan wrote: >> Kris Kowal wrote: >>> Jordan Osete wrote: >>>> Hello everybody. >>>> >>>> How about standardizing something like RegExp.escape() ? >>>> https://simonwillison.net/2006/Jan/20/escape/ >>>> >>>> It is trivial to implement, but it seems to me that this functionality >>>> belongs to the language - the implementation obviously knows better >>>> which characters must be escaped, and which ones don't need to. >>> >>> +1 >> >> +1, again. >> >> Although this is only a minor convenience since you can do something like >> text.replace(/[-[\]{}()*+?.,\\^$|]/g, "\\$&"), the list of special >> characters is subject to change. E.g., if ES adds /x, whitespace (and >> possibly #) must be added. > > In perl the recommended version is > > text.replace(/([^a-zA-Z0-9])/g, "\\$1") > > which is future-proof and safe and I think this also works for JS. It's probably future-proof and safe, but it escapes 65,520 characters more than necessary. Anyway, no big deal if this isn't added. I have, however, seen a lot of developers get this wrong when trying to do it themselves (e.g., the blog post that started this thread was not safe until it was updated 4+ years later, and it wasn't the worst I've seen). -- Steven Levithan
For the record, most languages with modern regular expressions include a built in method for this.
For instance:
- Perl: quotemeta(str)
- PHP: preg_quote(str)
- Python: re.escape(str)
- Ruby: Regexp.escape(str)
- Java: Pattern.quote(str)
- C#, VB.NET: Regex.Escape(str)
Erik Corry wrote: > In perl the recommended version is > > text.replace(/([^a-zA-Z0-9])/g, "\\$1") For the record, most languages with modern regular expressions include a built in method for this. For instance: * Perl: quotemeta(str) * PHP: preg_quote(str) * Python: re.escape(str) * Ruby: Regexp.escape(str) * Java: Pattern.quote(str) * C#, VB.NET: Regex.Escape(str) -- Steven Levithan
On 23 March 2012 12:12, Steven Levithan wrote:
It's probably future-proof and safe, but it escapes 65,520 characters more than necessary.
Anyway, no big deal if this isn't added. I have, however, seen a lot of developers get this wrong when trying to do it themselves (e.g., the blog post that started this thread was not safe until it was updated 4+ years later, and it wasn't the worst I've seen).
I've seen at least three that missed things out as well. The "all but alnums" approach doesn't seem to occur to people.
On 23 March 2012 12:37, Steven Levithan wrote:
For the record, most languages with modern regular expressions include a built in method for this.
Indeed. +1 for RegExp.escape in ES.
On 23 March 2012 12:12, Steven Levithan wrote: > Erik Corry wrote: > > In perl the recommended version is > > > > text.replace(/([^a-zA-Z0-9])/g, "\\$1") > > > > which is future-proof and safe and I think this also works for JS. > > It's probably future-proof and safe, but it escapes 65,520 characters more > than necessary. > > Anyway, no big deal if this isn't added. I have, however, seen a lot of > developers get this wrong when trying to do it themselves (e.g., the blog > post that started this thread was not safe until it was updated 4+ years > later, and it wasn't the worst I've seen). I've seen at least three that missed things out as well. The "all but alnums" approach doesn't seem to occur to people. On 23 March 2012 12:37, Steven Levithan wrote: > Erik Corry wrote: > > In perl the recommended version is > > text.replace(/([^a-zA-Z0-9])/g, "\\$1") > > For the record, most languages with modern regular expressions include a > built in method for this. > > For instance: > > * Perl: quotemeta(str) > * PHP: preg_quote(str) > * Python: re.escape(str) > * Ruby: Regexp.escape(str) > * Java: Pattern.quote(str) > * C#, VB.NET: Regex.Escape(str) Indeed. +1 for RegExp.escape in ES. -- T.J. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20120615/13c6bb1a/attachment.html>
YES. PLEASE put this in!
stackoverflow.com/a/6969486/151312
function escapeRegExp(str) {
return str.replace(/[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/g, "\\$&");
}
I'm doing my best to reply to every single question that pops up on stackoverflow and point them to this answer... but there are just too many wrong answers out there.
YES. PLEASE put this in! https://stackoverflow.com/a/6969486/151312 function escapeRegExp(str) { return str.replace(/[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/g, "\\$&"); } I'm doing my best to reply to every single question that pops up on stackoverflow and point them to this answer... but there are just too many wrong answers out there. AJ ONeal On Fri, Jun 15, 2012 at 9:17 AM, T.J. Crowder <tj at crowdersoftware.com>wrote: > On 23 March 2012 12:12, Steven Levithan wrote: > > Erik Corry wrote: > > > In perl the recommended version is > > > > > > text.replace(/([^a-zA-Z0-9])/g, "\\$1") > > > > > > which is future-proof and safe and I think this also works for JS. > > > > It's probably future-proof and safe, but it escapes 65,520 characters > more > > than necessary. > > > > Anyway, no big deal if this isn't added. I have, however, seen a lot of > > developers get this wrong when trying to do it themselves (e.g., the blog > > post that started this thread was not safe until it was updated 4+ years > > later, and it wasn't the worst I've seen). > > I've seen at least three that missed things out as well. The "all but > alnums" approach doesn't seem to occur to people. > > On 23 March 2012 12:37, Steven Levithan wrote: > > Erik Corry wrote: > > > In perl the recommended version is > > > text.replace(/([^a-zA-Z0-9])/g, "\\$1") > > > > For the record, most languages with modern regular expressions include a > > built in method for this. > > > > For instance: > > > > * Perl: quotemeta(str) > > * PHP: preg_quote(str) > > * Python: re.escape(str) > > * Ruby: Regexp.escape(str) > > * Java: Pattern.quote(str) > > * C#, VB.NET: Regex.Escape(str) > > Indeed. +1 for RegExp.escape in ES. > > -- T.J. > > _______________________________________________ > es-discuss mailing list > es-discuss at mozilla.org > https://mail.mozilla.org/listinfo/es-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20120615/f67c0194/attachment-0001.html>
On Mar 23, 2012, at 8:16 AM, Steven Levithan wrote:
Although this is only a minor convenience since you can do something like text.replace(/[-[]{}()*+?.,\^$|]/g, "\$&"), the list of special characters is subject to change.
That sounds like another good argument for standardizing.
The only challenge I see is how to fudge the spec enough to mandate that any extended, non-standard operators that the engine provides should also be escaped.
On Mar 23, 2012, at 8:16 AM, Steven Levithan wrote: > Although this is only a minor convenience since you can do something like text.replace(/[-[\]{}()*+?.,\\^$|]/g, "\\$&"), the list of special characters is subject to change. That sounds like another good argument for standardizing. The only challenge I see is how to fudge the spec enough to mandate that any extended, non-standard operators that the engine provides should also be escaped. Dave
One more for the "it's too late for ES6" train: most other programming
languages have a convenient "safe" way to turn a string into a regular
expression matching that string. RegExp.escape
is often suggested
as the function name. I think this is worth adding to the standard
library because it helps patch a "candy machine
interface" -- that is, I've seen the
following too often (including again today, hence this message):
function replaceTitle(title, str) {
return str.replace(new RegExp(title), "...");
}
There ought to be a standard simple way of writing this correctly.
Has this been discussed in the context of ES6/ES7 before?
One more for the "it's too late for ES6" train: most other programming languages have a convenient "safe" way to turn a string into a regular expression matching that string. `RegExp.escape` is often suggested as the function name. I think this is worth adding to the standard library because it helps patch a ["candy machine interface"](https://www.approxion.com/?p=123) -- that is, I've seen the following too often (including again today, hence this message): ```js function replaceTitle(title, str) { return str.replace(new RegExp(title), "..."); } ``` There ought to be a standard simple way of writing this correctly. Has this been discussed in the context of ES6/ES7 before? --scott
On 21 Mar 2014, at 16:38, C. Scott Ananian <ecmascript at cscott.net> wrote:
function replaceTitle(title, str) { return str.replace(new RegExp(title), "..."); }
There ought to be a standard simple way of writing this correctly.
I’ve used something like this in the past:
RegExp.escape = function(text) {
return text.replace(/[-[\]{}()*+?.,\\^$|#\s]/g, '\\$&');
};
It escapes some characters that do not strictly need escaping to avoid bugs in ancient JavaScript engines. A standardized version could be even simpler, and would indeed be very welcome IMHO.
On 21 Mar 2014, at 16:38, C. Scott Ananian <ecmascript at cscott.net> wrote: > ```js > function replaceTitle(title, str) { > return str.replace(new RegExp(title), "..."); > } > ``` > > There ought to be a standard simple way of writing this correctly. I’ve used something like this in the past: RegExp.escape = function(text) { return text.replace(/[-[\]{}()*+?.,\\^$|#\s]/g, '\\$&'); }; It escapes some characters that do not strictly need escaping to avoid bugs in ancient JavaScript engines. A standardized version could be even simpler, and would indeed be very welcome IMHO.
Continuing a 2 year old thread.
Continuing a 2 year old thread. https://esdiscuss.org/topic/regexp-escape -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20140321/b1d2c8f6/attachment.html>
Thanks for the back-reference, Kris. So, everyone seemed to be in favor of this, it just never got formally added.
@rwaldron, are you interested in championing this for ES7 as well?
Thanks for the back-reference, Kris. So, everyone seemed to be in favor of this, it just never got formally added. @rwaldron, are you interested in championing this for ES7 as well? --scott On Fri, Mar 21, 2014 at 12:09 PM, Kris Kowal <kris.kowal at cixar.com> wrote: > Continuing a 2 year old thread. > > https://esdiscuss.org/topic/regexp-escape
Not until someone writes something like this: gist.github.com/WebReflection/9353781, which I don't have the time to do myself.
On Fri, Mar 21, 2014 at 4:29 PM, C. Scott Ananian <ecmascript at cscott.net>wrote: > Thanks for the back-reference, Kris. So, everyone seemed to be in > favor of this, it just never got formally added. > > @rwaldron, are you interested in championing this for ES7 as well? > Not until someone writes something like this: https://gist.github.com/WebReflection/9353781, which I don't have the time to do myself. - Rick -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20140321/0117e591/attachment.html>
How about this -- gist.github.com/kangax/9698100
Made it loosely based on B.2.1.1 (escape)
How about this -- https://gist.github.com/kangax/9698100 Made it loosely based on B.2.1.1 (escape) -- kangax On Fri, Mar 21, 2014 at 5:56 PM, Rick Waldron <waldron.rick at gmail.com>wrote: > > > > On Fri, Mar 21, 2014 at 4:29 PM, C. Scott Ananian <ecmascript at cscott.net>wrote: > >> Thanks for the back-reference, Kris. So, everyone seemed to be in >> favor of this, it just never got formally added. >> >> @rwaldron, are you interested in championing this for ES7 as well? >> > > Not until someone writes something like this: > https://gist.github.com/WebReflection/9353781, which I don't have the > time to do myself. > > - Rick > > > > _______________________________________________ > es-discuss mailing list > es-discuss at mozilla.org > https://mail.mozilla.org/listinfo/es-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20140321/2f440a53/attachment-0001.html>
Reviving this, a year passed and I think we still want this.
We have even more validation than we had a year ago (added by libraries like lodash) and this is still useful.
What would be the required steps in order to push this forward to the ES2016 spec?
Reviving this, a year passed and I think we still want this. We have even more validation than we had a year ago (added by libraries like lodash) and this is still useful. What would be the required steps in order to push this forward to the ES2016 spec? -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150612/70cf0f50/attachment.html>
At risk of bikeshed, I think I would prefer syntax for it, personally, e.g.:
let myRegExp = /\d+\./{arbitrary.js(expression)}/SOMETHING$/;
(ASI issues notwithstanding) vaguely matching the idea of template strings. I prefer this kind of thing to be structured at the parse-level rather than relying on runtime string stitching and hoping for a valid parse.
At risk of bikeshed, I think I would prefer syntax for it, personally, e.g.: let myRegExp = /\d+\./{arbitrary.js(expression)}/SOMETHING$/; (ASI issues notwithstanding) vaguely matching the idea of template strings. I prefer this kind of thing to be structured at the parse-level rather than relying on runtime string stitching and hoping for a valid parse. Cheers On Friday, June 12, 2015, Benjamin Gruenaum <benjamingr at gmail.com> wrote: > Reviving this, a year passed and I think we still want this. > > We have even more validation than we had a year ago (added by libraries > like lodash) and this is still useful. > > What would be the required steps in order to push this forward to the > ES2016 spec? > -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150612/e8b13317/attachment.html>
The primary advantage to making it be a function (also doing it as syntax would be great too!) is that it's polyfillable, which means that all browsers could instantly take advantage of known-safe regex escaping.
The primary advantage to making it be a function (also doing it as syntax would be great too!) is that it's polyfillable, which means that all browsers could instantly take advantage of known-safe regex escaping. On Fri, Jun 12, 2015 at 11:34 AM, Alexander Jones <alex at weej.com> wrote: > At risk of bikeshed, I think I would prefer syntax for it, personally, > e.g.: > > let myRegExp = /\d+\./{arbitrary.js(expression)}/SOMETHING$/; > > (ASI issues notwithstanding) vaguely matching the idea of template > strings. I prefer this kind of thing to be structured at the parse-level > rather than relying on runtime string stitching and hoping for a valid > parse. > > Cheers > > On Friday, June 12, 2015, Benjamin Gruenaum <benjamingr at gmail.com> wrote: > >> Reviving this, a year passed and I think we still want this. >> >> We have even more validation than we had a year ago (added by libraries >> like lodash) and this is still useful. >> >> What would be the required steps in order to push this forward to the >> ES2016 spec? >> > > _______________________________________________ > es-discuss mailing list > es-discuss at mozilla.org > https://mail.mozilla.org/listinfo/es-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150612/b993ffd3/attachment.html>
I believe you need to champion the issue. Create a Github repository and start editing the fragment of the spec. I do not believe that the issue is contentious. The color of the shed is obvious. The only thing missing is a champion willing to do the writing.
I believe you need to champion the issue. Create a Github repository and start editing the fragment of the spec. I do not believe that the issue is contentious. The color of the shed is obvious. The only thing missing is a champion willing to do the writing. On Fri, Jun 12, 2015 at 10:52 AM, Benjamin Gruenaum <benjamingr at gmail.com> wrote: > Reviving this, a year passed and I think we still want this. > > We have even more validation than we had a year ago (added by libraries > like lodash) and this is still useful. > > What would be the required steps in order to push this forward to the > ES2016 spec? > > _______________________________________________ > es-discuss mailing list > es-discuss at mozilla.org > https://mail.mozilla.org/listinfo/es-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150612/e16af9b8/attachment.html>
You know what? Why not. I'm going to try to champion this.
I talked to Domenic and he said he's willing to help me with this which is a big help (this would be my first time).
I'll open a GitHub repo and see what I can come up with.
You know what? Why not. I'm going to try to champion this. I talked to Domenic and he said he's willing to help me with this which is a big help (this would be my first time). I'll open a GitHub repo and see what I can come up with. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150612/75126e22/attachment-0001.html>
I made this gist back in the days — gist.github.com/kangax/9698100 — and I believe Rick was going to bring it up at one of the meetings. I don't have time to set up repo and work with TC39 member so if you can continue carrying that torch, that would be awesome!
I made this gist back in the days — https://gist.github.com/kangax/9698100 — and I believe Rick was going to bring it up at one of the meetings. I don't have time to set up repo and work with TC39 member so if you can continue carrying that torch, that would be awesome! -- kangax On Fri, Jun 12, 2015 at 2:52 PM, Benjamin Gruenaum <benjamingr at gmail.com> wrote: > You know what? Why not. I'm going to try to champion this. > > I talked to Domenic and he said he's willing to help me with this which is > a big help (this would be my first time). > > I'll open a GitHub repo and see what I can come up with. > > _______________________________________________ > es-discuss mailing list > es-discuss at mozilla.org > https://mail.mozilla.org/listinfo/es-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150612/7f9597ed/attachment.html>
I made an initial repo benjamingr/RexExp.escape/blob/master/README.md
I've added a reference to that gist - I'll start poking around and have scheduled to meet with some local people interested in helping next week. I'll keep you updated.
I made an initial repo https://github.com/benjamingr/RexExp.escape/blob/master/README.md I've added a reference to that gist - I'll start poking around and have scheduled to meet with some local people interested in helping next week. I'll keep you updated. On Fri, Jun 12, 2015 at 9:57 PM, Juriy Zaytsev <kangax at gmail.com> wrote: > I made this gist back in the days — https://gist.github.com/kangax/9698100 > — and I believe Rick was going to bring it up at one of the meetings. I > don't have time to set up repo and work with TC39 member so if you can > continue carrying that torch, that would be awesome! > > -- > kangax > > On Fri, Jun 12, 2015 at 2:52 PM, Benjamin Gruenaum <benjamingr at gmail.com> > wrote: > >> You know what? Why not. I'm going to try to champion this. >> >> I talked to Domenic and he said he's willing to help me with this which >> is a big help (this would be my first time). >> >> I'll open a GitHub repo and see what I can come up with. >> >> _______________________________________________ >> es-discuss mailing list >> es-discuss at mozilla.org >> https://mail.mozilla.org/listinfo/es-discuss >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150612/4fa7d24a/attachment.html>
@Alexander Jones: no new syntax is needed to implement what you want; we
already have Template Strings. For example, you could define a new
function RegExp.join
(which takes either an array or a string as its
first argument, see below):
var pattern = RegExp.join`^(abc${ "someString, escaped" }|def${ /a reg|exp/
})$`
// apply flags?
var pattern = RegExp.join('x')` abc | def \$`;
RegExp.escape()
would be used internally to handle the interpolation of
an string into the regexp. But these features are orthogonal.
@Alexander Jones: no new syntax is needed to implement what you want; we already have Template Strings. For example, you could define a new function `RegExp.join` (which takes either an array or a string as its first argument, see below): ``` var pattern = RegExp.join`^(abc${ "someString, escaped" }|def${ /a reg|exp/ })$` // apply flags? var pattern = RegExp.join('x')` abc | def \$`; ``` `RegExp.escape()` would be used internally to handle the interpolation of an string into the regexp. But these features are orthogonal. --scott On Fri, Jun 12, 2015 at 2:57 PM, Juriy Zaytsev <kangax at gmail.com> wrote: > I made this gist back in the days — https://gist.github.com/kangax/9698100 > — and I believe Rick was going to bring it up at one of the meetings. I > don't have time to set up repo and work with TC39 member so if you can > continue carrying that torch, that would be awesome! > > -- > kangax > > On Fri, Jun 12, 2015 at 2:52 PM, Benjamin Gruenaum <benjamingr at gmail.com> > wrote: > >> You know what? Why not. I'm going to try to champion this. >> >> I talked to Domenic and he said he's willing to help me with this which >> is a big help (this would be my first time). >> >> I'll open a GitHub repo and see what I can come up with. >> >> _______________________________________________ >> es-discuss mailing list >> es-discuss at mozilla.org >> https://mail.mozilla.org/listinfo/es-discuss >> >> > > _______________________________________________ > es-discuss mailing list > es-discuss at mozilla.org > https://mail.mozilla.org/listinfo/es-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150612/49917d60/attachment.html>
Ok, with a ton of help from Domenic I've put up benjamingr.github.io/RexExp.escape
Less cool coloring but more links and motivating examples and so on at benjamingr/RexExp.escape
As this is my first attempt at this sort of thing - any non-bikeshed feedback would be appreciated :)
Ok, with a ton of help from Domenic I've put up https://benjamingr.github.io/RexExp.escape/ Less cool coloring but more links and motivating examples and so on at https://github.com/benjamingr/RexExp.escape As this is my first attempt at this sort of thing - any non-bikeshed feedback would be appreciated :) -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150613/c61c7fc5/attachment.html>
Nice! Inspired
// Based on
// https://github.com/benjamingr/RexExp.escape/blob/master/polyfill.js
function re(template, ...subs) {
const parts = [];
const numSubs = subs.length;
for (let i = 0; i < numSubs; i++) {
parts.push(template.raw[i]);
parts.push(subs[i].replace(/[\/\\^$*+?.()|[\]{}]/g, '\\$&'));
}
parts.push(template.raw[numSubs]);
return RegExp(parts.join(''));
}
Nice! Inspired // Based on // https://github.com/benjamingr/RexExp.escape/blob/master/polyfill.js function re(template, ...subs) { const parts = []; const numSubs = subs.length; for (let i = 0; i < numSubs; i++) { parts.push(template.raw[i]); parts.push(subs[i].replace(/[\/\\^$*+?.()|[\]{}]/g, '\\$&')); } parts.push(template.raw[numSubs]); return RegExp(parts.join('')); } On Fri, Jun 12, 2015 at 5:48 PM, Benjamin Gruenbaum <benjamingr at gmail.com> wrote: > Ok, with a ton of help from Domenic I've put up > https://benjamingr.github.io/RexExp.escape/ > > Less cool coloring but more links and motivating examples and so on at > https://github.com/benjamingr/RexExp.escape > > As this is my first attempt at this sort of thing - any non-bikeshed > feedback would be appreciated :) > > _______________________________________________ > es-discuss mailing list > es-discuss at mozilla.org > https://mail.mozilla.org/listinfo/es-discuss > > -- Cheers, --MarkM -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150612/7b2ea268/attachment.html>
A slight tweak allows you to pass flags:
function re(flags, ...args) {
if (typeof template !== 'string') {
// no flags given
return re(undefined)(flags, ...args);
}
return function(template, ...subs) {
const parts = [];
const numSubs = subs.length;
for (let i = 0; i < numSubs; i++) {
parts.push(template.raw[i]);
parts.push(subs[i].replace(/[\/\\^$*+?.()|[\]{}]/g, '\\$&'));
}
parts.push(template.raw[numSubs]);
return RegExp(parts.join(''), flags);
};
}
Use like this:
var r = re('i')`cAsEiNsEnSiTiVe`;
On Sat, Jun 13, 2015 at 1:51 AM, Mark S. Miller <erights at google.com> wrote: > Nice! Inspired > > // Based on > // https://github.com/benjamingr/RexExp.escape/blob/master/polyfill.js > function re(template, ...subs) { > const parts = []; > const numSubs = subs.length; > for (let i = 0; i < numSubs; i++) { > parts.push(template.raw[i]); > parts.push(subs[i].replace(/[\/\\^$*+?.()|[\]{}]/g, '\\$&')); > } > parts.push(template.raw[numSubs]); > return RegExp(parts.join('')); > } > A slight tweak allows you to pass flags: ``` function re(flags, ...args) { if (typeof template !== 'string') { // no flags given return re(undefined)(flags, ...args); } return function(template, ...subs) { const parts = []; const numSubs = subs.length; for (let i = 0; i < numSubs; i++) { parts.push(template.raw[i]); parts.push(subs[i].replace(/[\/\\^$*+?.()|[\]{}]/g, '\\$&')); } parts.push(template.raw[numSubs]); return RegExp(parts.join(''), flags); }; } ``` Use like this: ``` var r = re('i')`cAsEiNsEnSiTiVe`; ``` --scott -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150613/264f2cce/attachment.html>
Good idea bug infinite recursion bug. Fixed:
function re(first, ...args) {
let flags = first;
function tag(template, ...subs) {
const parts = [];
const numSubs = subs.length;
for (let i = 0; i < numSubs; i++) {
parts.push(template.raw[i]);
parts.push(subs[i].replace(/[\/\\^$*+?.()|[\]{}]/g, '\\$&'));
}
parts.push(template.raw[numSubs]);
return RegExp(parts.join(''), flags);
}
if (typeof first === 'string') {
return tag;
} else {
flags = void 0; // Should this be '' ?
return tag(first, ...args);
}
}
Good idea bug infinite recursion bug. Fixed: function re(first, ...args) { let flags = first; function tag(template, ...subs) { const parts = []; const numSubs = subs.length; for (let i = 0; i < numSubs; i++) { parts.push(template.raw[i]); parts.push(subs[i].replace(/[\/\\^$*+?.()|[\]{}]/g, '\\$&')); } parts.push(template.raw[numSubs]); return RegExp(parts.join(''), flags); } if (typeof first === 'string') { return tag; } else { flags = void 0; // Should this be '' ? return tag(first, ...args); } } -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150612/200646db/attachment-0001.html>
Perfection?
function re(first, ...args) {
let flags = first;
function tag(template, ...subs) {
const parts = [];
const numSubs = subs.length;
for (let i = 0; i < numSubs; i++) {
parts.push(template.raw[i]);
const subst = subs[i] instanceof RegExp ? subs[i].source :
subs[i].replace(/[\/\\^$*+?.()|[\]{}]/g, '\\$&');
parts.push(subst);
}
parts.push(template.raw[numSubs]);
return RegExp(parts.join(''), flags);
}
if (typeof first === 'string') {
return tag;
} else {
flags = void 0; // Should this be '' ?
return tag(first, ...args);
}
}
Perfection? function re(first, ...args) { let flags = first; function tag(template, ...subs) { const parts = []; const numSubs = subs.length; for (let i = 0; i < numSubs; i++) { parts.push(template.raw[i]); const subst = subs[i] instanceof RegExp ? subs[i].source : subs[i].replace(/[\/\\^$*+?.()|[\]{}]/g, '\\$&'); parts.push(subst); } parts.push(template.raw[numSubs]); return RegExp(parts.join(''), flags); } if (typeof first === 'string') { return tag; } else { flags = void 0; // Should this be '' ? return tag(first, ...args); } } -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150612/6626a8ee/attachment.html>
The point of this last variant is that data gets escaped but RegExp objects
do not -- allowing you to compose RegExps: re${re1}|${re2}*|${data}
But this requires one more adjustment:
>
> function re(first, ...args) {
> let flags = first;
> function tag(template, ...subs) {
> const parts = [];
> const numSubs = subs.length;
> for (let i = 0; i < numSubs; i++) {
> parts.push(template.raw[i]);
> const subst = subs[i] instanceof RegExp ?
`(?:${subs[i].source})` :
> subs[i].replace(/[\/\\^$*+?.()|[\]{}]/g, '\\amp;');
> parts.push(subst);
> }
> parts.push(template.raw[numSubs]);
> return RegExp(parts.join(''), flags);
> }
> if (typeof first === 'string') {
> return tag;
> } else {
> flags = void 0; // Should this be '' ?
> return tag(first, ...args);
> }
> }
The point of this last variant is that data gets escaped but RegExp objects do not -- allowing you to compose RegExps: re`${re1}|${re2}*|${data}` But this requires one more adjustment: > > function re(first, ...args) { > let flags = first; > function tag(template, ...subs) { > const parts = []; > const numSubs = subs.length; > for (let i = 0; i < numSubs; i++) { > parts.push(template.raw[i]); > const subst = subs[i] instanceof RegExp ? `(?:${subs[i].source})` : > subs[i].replace(/[\/\\^$*+?.()|[\]{}]/g, '\\amp;'); > parts.push(subst); > } > parts.push(template.raw[numSubs]); > return RegExp(parts.join(''), flags); > } > if (typeof first === 'string') { > return tag; > } else { > flags = void 0; // Should this be '' ? > return tag(first, ...args); > } > } -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150612/8df7931d/attachment.html>
All of these should be building on top of RegExp.escape :P
All of these should be building on top of RegExp.escape :P From: es-discuss [mailto:es-discuss-bounces at mozilla.org] On Behalf Of Mark S. Miller Sent: Saturday, June 13, 2015 02:39 To: C. Scott Ananian Cc: Benjamin Gruenbaum; es-discuss Subject: Re: RegExp.escape() The point of this last variant is that data gets escaped but RegExp objects do not -- allowing you to compose RegExps: re`${re1}|${re2}*|${data}` But this requires one more adjustment: > > function re(first, ...args) { > let flags = first; > function tag(template, ...subs) { > const parts = []; > const numSubs = subs.length; > for (let i = 0; i < numSubs; i++) { > parts.push(template.raw[i]); > const subst = subs[i] instanceof RegExp ? `(?:${subs[i].source})` : > subs[i].replace(/[\/\\^$*+?.()|[\]{}]/g, '\\amp;'); > parts.push(subst); > } > parts.push(template.raw[numSubs]); > return RegExp(parts.join(''), flags); > } > if (typeof first === 'string') { > return tag; > } else { > flags = void 0; // Should this be '' ? > return tag(first, ...args); > } > } -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150613/63912471/attachment.html>
On Sat, Jun 13, 2015 at 9:17 AM, Domenic Denicola <d at domenic.me> wrote:
All of these should be building on top of RegExp.escape :P
I am not yet agreeing or disagreeing with this. Were both to become std, clearly they should be consistent with each other. At the time I wrote this, it had not occurred to me that the tag itself might be stdized at the same time as RegExp.escape. Now that this possibility has been proposed, I am realizing lots of flaws with my polyfill. It's funny how, by considering it as leading to a proposal, I quickly saw deep flaws that I was previously missing.
-
The big one is that the literal template parts that are taken to represent the regexp pattern fragments being expressed should be syntactically valid fragments, in the sense that it makes semantic sense to inject data between these fragments. Escaping the data + validating the overall result does not do this. For example:
const data = ':x'; const rebad = RegExp.tag`(?${data})`; console.log(rebad.test('x')); // true
is nonsense. Since the RegExp grammar can be extended per platform, the same argument that says we should have the platform provide RegExp.escape says we should have the platform provide RegExp.tag -- so that they can conisistently reflect these platform extensions.
-
Now that we have modules, I would like to see us stop having each proposal for new functionality come at the price of further global namespace pollution. I would like to see us transition towards having most new std library entry points be provided by std modules. I understand why we haven't yet, but something needs to go first.
-
ES6 made RegExp subclassable with most methods delegating to a common @exec method, so that a subclass only needs to consistently override a small number of things to stay consistent. Neither RegExpSubclass.escape nor RegExpSubclass.tag can be derived from aRegExpSubclass[@exec]. Because of the first bullet, RegExpSubclass.tag also cannot be derived from RegExpSubclass.escape. But having RegExpSubclass.escape delegating to RegExpSubclass.tag seem weird.
-
The instanceof below prevents this polyfill from working cross-frame. Also, when doing
RegExpSubclass1.tag`xx${aRegExpSubclass2}yy
, where RegExpSubclass2.source produces a regexp grammar string that RegExpSubclass1 does not understand, I have no idea what the composition should do other than reject with an error. But what if the strings happen to be mutually valid but with conflicting meaning between these subclasses?
On Sat, Jun 13, 2015 at 9:17 AM, Domenic Denicola <d at domenic.me> wrote: > All of these should be building on top of RegExp.escape :P > I am not yet agreeing or disagreeing with this. Were both to become std, clearly they should be consistent with each other. At the time I wrote this, it had not occurred to me that the tag itself might be stdized at the same time as RegExp.escape. Now that this possibility has been proposed, I am realizing lots of flaws with my polyfill. It's funny how, by considering it as leading to a proposal, I quickly saw deep flaws that I was previously missing. * The big one is that the literal template parts that are taken to represent the regexp pattern fragments being expressed should be syntactically valid *fragments*, in the sense that it makes semantic sense to inject data between these fragments. Escaping the data + validating the overall result does not do this. For example: const data = ':x'; const rebad = RegExp.tag`(?${data})`; console.log(rebad.test('x')); // true is nonsense. Since the RegExp grammar can be extended per platform, the same argument that says we should have the platform provide RegExp.escape says we should have the platform provide RegExp.tag -- so that they can conisistently reflect these platform extensions. * Now that we have modules, I would like to see us stop having each proposal for new functionality come at the price of further global namespace pollution. I would like to see us transition towards having most new std library entry points be provided by std modules. I understand why we haven't yet, but something needs to go first. * ES6 made RegExp subclassable with most methods delegating to a common @exec method, so that a subclass only needs to consistently override a small number of things to stay consistent. Neither RegExpSubclass.escape nor RegExpSubclass.tag can be derived from aRegExpSubclass[@exec]. Because of the first bullet, RegExpSubclass.tag also cannot be derived from RegExpSubclass.escape. But having RegExpSubclass.escape delegating to RegExpSubclass.tag seem weird. * The instanceof below prevents this polyfill from working cross-frame. Also, when doing RegExpSubclass1.tag`xx${aRegExpSubclass2}yy`, where RegExpSubclass2.source produces a regexp grammar string that RegExpSubclass1 does not understand, I have no idea what the composition should do other than reject with an error. But what if the strings happen to be mutually valid but with conflicting meaning between these subclasses? > > > *From:* es-discuss [mailto:es-discuss-bounces at mozilla.org] *On Behalf Of *Mark > S. Miller > *Sent:* Saturday, June 13, 2015 02:39 > *To:* C. Scott Ananian > *Cc:* Benjamin Gruenbaum; es-discuss > *Subject:* Re: RegExp.escape() > > > > The point of this last variant is that data gets escaped but RegExp > objects do not -- allowing you to compose RegExps: > re`${re1}|${re2}*|${data}` > But this requires one more adjustment: > > > > > > function re(first, ...args) { > > let flags = first; > > function tag(template, ...subs) { > > const parts = []; > > const numSubs = subs.length; > > for (let i = 0; i < numSubs; i++) { > > parts.push(template.raw[i]); > > const subst = subs[i] instanceof RegExp ? > > > `(?:${subs[i].source})` : > > > subs[i].replace(/[\/\\^$*+?.()|[\]{}]/g, '\\amp;'); > > parts.push(subst); > > } > > parts.push(template.raw[numSubs]); > > return RegExp(parts.join(''), flags); > > } > > if (typeof first === 'string') { > > return tag; > > } else { > > flags = void 0; // Should this be '' ? > > return tag(first, ...args); > > } > > } > -- Cheers, --MarkM -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150613/33cb036b/attachment.html>
Would it help subclassing to have the list of syntax characters/code points
be on a well-known-symbol property? Like
RegExp.prototype[@@syntaxCharacters] = Object.freeze('^$\\.*+?()[]{}|'.split(''));
or something? Then @exec could
reference that, and similarly RegExp.escape
and RegExpSubclass.escape`
could reference it as well?
Would it help subclassing to have the list of syntax characters/code points be on a well-known-symbol property? Like `RegExp.prototype[@@syntaxCharacters] = Object.freeze('^$\\.*+?()[]{}|'.split(''));` or something? Then @exec could reference that, and similarly `RegExp.escape` and RegExpSubclass.escape` could reference it as well? On Sat, Jun 13, 2015 at 11:07 AM, Mark S. Miller <erights at google.com> wrote: > On Sat, Jun 13, 2015 at 9:17 AM, Domenic Denicola <d at domenic.me> wrote: > >> All of these should be building on top of RegExp.escape :P >> > > I am not yet agreeing or disagreeing with this. Were both to become std, > clearly they should be consistent with each other. At the time I wrote > this, it had not occurred to me that the tag itself might be stdized at the > same time as RegExp.escape. Now that this possibility has been proposed, I > am realizing lots of flaws with my polyfill. It's funny how, by considering > it as leading to a proposal, I quickly saw deep flaws that I was previously > missing. > > * The big one is that the literal template parts that are taken to > represent the regexp pattern fragments being expressed should be > syntactically valid *fragments*, in the sense that it makes semantic sense > to inject data between these fragments. Escaping the data + validating the > overall result does not do this. For example: > > const data = ':x'; > const rebad = RegExp.tag`(?${data})`; > console.log(rebad.test('x')); // true > > is nonsense. Since the RegExp grammar can be extended per platform, the > same argument that says we should have the platform provide RegExp.escape > says we should have the platform provide RegExp.tag -- so that they can > conisistently reflect these platform extensions. > > * Now that we have modules, I would like to see us stop having each > proposal for new functionality come at the price of further global > namespace pollution. I would like to see us transition towards having most > new std library entry points be provided by std modules. I understand why > we haven't yet, but something needs to go first. > > * ES6 made RegExp subclassable with most methods delegating to a common > @exec method, so that a subclass only needs to consistently override a > small number of things to stay consistent. Neither RegExpSubclass.escape > nor RegExpSubclass.tag can be derived from aRegExpSubclass[@exec]. Because > of the first bullet, RegExpSubclass.tag also cannot be derived from > RegExpSubclass.escape. But having RegExpSubclass.escape delegating to > RegExpSubclass.tag seem weird. > > * The instanceof below prevents this polyfill from working cross-frame. > Also, when doing RegExpSubclass1.tag`xx${aRegExpSubclass2}yy`, where > RegExpSubclass2.source produces a regexp grammar string that > RegExpSubclass1 does not understand, I have no idea what the composition > should do other than reject with an error. But what if the strings happen > to be mutually valid but with conflicting meaning between these subclasses? > > > > >> >> >> *From:* es-discuss [mailto:es-discuss-bounces at mozilla.org] *On Behalf Of >> *Mark S. Miller >> *Sent:* Saturday, June 13, 2015 02:39 >> *To:* C. Scott Ananian >> *Cc:* Benjamin Gruenbaum; es-discuss >> *Subject:* Re: RegExp.escape() >> >> >> >> The point of this last variant is that data gets escaped but RegExp >> objects do not -- allowing you to compose RegExps: >> re`${re1}|${re2}*|${data}` >> But this requires one more adjustment: >> >> >> > >> > function re(first, ...args) { >> > let flags = first; >> > function tag(template, ...subs) { >> > const parts = []; >> > const numSubs = subs.length; >> > for (let i = 0; i < numSubs; i++) { >> > parts.push(template.raw[i]); >> > const subst = subs[i] instanceof RegExp ? >> >> >> `(?:${subs[i].source})` : >> >> > subs[i].replace(/[\/\\^$*+?.()|[\]{}]/g, '\\amp;'); >> > parts.push(subst); >> > } >> > parts.push(template.raw[numSubs]); >> > return RegExp(parts.join(''), flags); >> > } >> > if (typeof first === 'string') { >> > return tag; >> > } else { >> > flags = void 0; // Should this be '' ? >> > return tag(first, ...args); >> > } >> > } >> > > > > -- > Cheers, > --MarkM > > _______________________________________________ > es-discuss mailing list > es-discuss at mozilla.org > https://mail.mozilla.org/listinfo/es-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150613/9b5d1e57/attachment-0001.html>
On Sat, Jun 13, 2015 at 9:07 PM, Mark S. Miller <erights at google.com> wrote:
On Sat, Jun 13, 2015 at 9:17 AM, Domenic Denicola <d at domenic.me> wrote:
All of these should be building on top of RegExp.escape :P
It's funny how, by considering it as leading to a proposal, I quickly saw deep flaws that I was previously missing.
That was a big part of making a proposal out of it - to find these things :)
the overall result does not do this. For example:
const data = ':x'; const rebad = RegExp.tag`(?${data})`; console.log(rebad.test('x')); // true
is nonsense. Since the RegExp grammar can be extended per platform, the same argument that says we should have the platform provide RegExp.escape says we should have the platform provide RegExp.tag -- so that they can conisistently reflect these platform extensions.
This is a good point, I considered whether or not -
should be included
for a similar reason. I think it is reasonable to only include syntax
identifiers and expect users to deal with parts of patterns of more than
one characters themselves (by wrapping the string with ()
in the
constructor). This is what every other language does practically.
That said - I'm very open to allowing implementations to escape more than
SyntaxCharacter
in their implementations and to even recommend that they
do so in such a way that is consistent with their regular expressions. What
do you think about doing that?
I'm also open to .tag
wrapping with ()
to avoid these issues but I'm
not sure if we have a way in JavaScript to not make a capturing group out
of it.
- Now that we have modules, I would like to see us stop having each proposal for new functionality come at the price of further global namespace pollution. I would like to see us transition towards having most new std library entry points be provided by std modules. I understand why we haven't yet, but something needs to go first.
I think that doing this should be an eventual target but I don't think adding a single much-asked-for static function to the RegExp function would be a good place to start. I think the committee first needs to agree about how this form of modularisation should be done - there are much bigger targets first and I would not like to see this proposal tied and held back by that (useful) goal.
- ES6 made RegExp subclassable with most methods delegating to a common @exec method, so that a subclass only needs to consistently override a small number of things to stay consistent. Neither RegExpSubclass.escape nor RegExpSubclass.tag can be derived from aRegExpSubclass[@exec]. Because of the first bullet, RegExpSubclass.tag also cannot be derived from RegExpSubclass.escape. But having RegExpSubclass.escape delegating to RegExpSubclass.tag seem weird.
Right but it makes sense that escape
does not play in this game since it
is a static method that takes a string argument - I'm not sure how it could
use @exec.
- The instanceof below prevents this polyfill from working cross-frame. Also, when doing RegExpSubclass1.tag
xx${aRegExpSubclass2}yy
, where RegExpSubclass2.source produces a regexp grammar string that RegExpSubclass1 does not understand, I have no idea what the composition should do other than reject with an error. But what if the strings happen to be mutually valid but with conflicting meaning between these subclasses?
This is hacky, but in my code I just did argument.exec ? treatAsRegExp : treatAsString
.
On Sat, Jun 13, 2015 at 9:07 PM, Mark S. Miller <erights at google.com> wrote: > On Sat, Jun 13, 2015 at 9:17 AM, Domenic Denicola <d at domenic.me> wrote: > >> All of these should be building on top of RegExp.escape :P >> > > It's funny how, by considering it as leading to a proposal, I quickly saw > deep flaws that I was previously missing. > > That was a big part of making a proposal out of it - to find these things :) > the overall result does not do this. For example: > > const data = ':x'; > const rebad = RegExp.tag`(?${data})`; > console.log(rebad.test('x')); // true > > is nonsense. Since the RegExp grammar can be extended per platform, the > same argument that says we should have the platform provide RegExp.escape > says we should have the platform provide RegExp.tag -- so that they can > conisistently reflect these platform extensions. > > This is a good point, I considered whether or not `-` should be included for a similar reason. I think it is reasonable to only include syntax identifiers and expect users to deal with parts of patterns of more than one characters themselves (by wrapping the string with `()` in the constructor). This is what every other language does practically. That said - I'm very open to allowing implementations to escape _more_ than `SyntaxCharacter` in their implementations and to even recommend that they do so in such a way that is consistent with their regular expressions. What do you think about doing that? I'm also open to `.tag` wrapping with `()` to avoid these issues but I'm not sure if we have a way in JavaScript to not make a capturing group out of it. > * Now that we have modules, I would like to see us stop having each > proposal for new functionality come at the price of further global > namespace pollution. I would like to see us transition towards having most > new std library entry points be provided by std modules. I understand why > we haven't yet, but something needs to go first. > > I think that doing this should be an eventual target but I don't think adding a single much-asked-for static function to the RegExp function would be a good place to start. I think the committee first needs to agree about how this form of modularisation should be done - there are much bigger targets first and I would not like to see this proposal tied and held back by that (useful) goal. > * ES6 made RegExp subclassable with most methods delegating to a common > @exec method, so that a subclass only needs to consistently override a > small number of things to stay consistent. Neither RegExpSubclass.escape > nor RegExpSubclass.tag can be derived from aRegExpSubclass[@exec]. Because > of the first bullet, RegExpSubclass.tag also cannot be derived from > RegExpSubclass.escape. But having RegExpSubclass.escape delegating to > RegExpSubclass.tag seem weird. > > Right but it makes sense that `escape` does not play in this game since it is a static method that takes a string argument - I'm not sure how it could use @exec. > * The instanceof below prevents this polyfill from working cross-frame. > Also, when doing RegExpSubclass1.tag`xx${aRegExpSubclass2}yy`, where > RegExpSubclass2.source produces a regexp grammar string that > RegExpSubclass1 does not understand, I have no idea what the composition > should do other than reject with an error. But what if the strings happen > to be mutually valid but with conflicting meaning between these subclasses? > > This is hacky, but in my code I just did `argument.exec ? treatAsRegExp : treatAsString`. > > > >> >> >> *From:* es-discuss [mailto:es-discuss-bounces at mozilla.org] *On Behalf Of >> *Mark S. Miller >> *Sent:* Saturday, June 13, 2015 02:39 >> *To:* C. Scott Ananian >> *Cc:* Benjamin Gruenbaum; es-discuss >> *Subject:* Re: RegExp.escape() >> >> >> >> The point of this last variant is that data gets escaped but RegExp >> objects do not -- allowing you to compose RegExps: >> re`${re1}|${re2}*|${data}` >> But this requires one more adjustment: >> >> >> > >> > function re(first, ...args) { >> > let flags = first; >> > function tag(template, ...subs) { >> > const parts = []; >> > const numSubs = subs.length; >> > for (let i = 0; i < numSubs; i++) { >> > parts.push(template.raw[i]); >> > const subst = subs[i] instanceof RegExp ? >> >> >> `(?:${subs[i].source})` : >> >> > subs[i].replace(/[\/\\^$*+?.()|[\]{}]/g, '\\amp;'); >> > parts.push(subst); >> > } >> > parts.push(template.raw[numSubs]); >> > return RegExp(parts.join(''), flags); >> > } >> > if (typeof first === 'string') { >> > return tag; >> > } else { >> > flags = void 0; // Should this be '' ? >> > return tag(first, ...args); >> > } >> > } >> > > > > -- > Cheers, > --MarkM > -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150613/9e6fdabf/attachment.html>
On Sat, Jun 13, 2015 at 11:39 AM, Benjamin Gruenbaum <benjamingr at gmail.com>
wrote:
On Sat, Jun 13, 2015 at 9:07 PM, Mark S. Miller <erights at google.com> wrote:
On Sat, Jun 13, 2015 at 9:17 AM, Domenic Denicola <d at domenic.me> wrote:
All of these should be building on top of RegExp.escape :P
It's funny how, by considering it as leading to a proposal, I quickly saw deep flaws that I was previously missing.
That was a big part of making a proposal out of it - to find these things :)
Indeed! Much appreciated.
the overall result does not do this. For example:
const data = ':x'; const rebad = RegExp.tag`(?${data})`; console.log(rebad.test('x')); // true
is nonsense. Since the RegExp grammar can be extended per platform, the same argument that says we should have the platform provide RegExp.escape says we should have the platform provide RegExp.tag -- so that they can conisistently reflect these platform extensions.
This is a good point, I considered whether or not
-
should be included for a similar reason. I think it is reasonable to only include syntax identifiers and expect users to deal with parts of patterns of more than one characters themselves (by wrapping the string with()
in the constructor). This is what every other language does practically.That said - I'm very open to allowing implementations to escape more than
SyntaxCharacter
in their implementations and to even recommend that they do so in such a way that is consistent with their regular expressions. What do you think about doing that?I'm also open to
.tag
wrapping with()
to avoid these issues but I'm not sure if we have a way in JavaScript to not make a capturing group out of it.
Better or different escaping is not issue of this first bullet, but rather, validating that a fragment is a valid fragment for that regexp grammar. For the std grammar, "(?" is not a valid fragment and the tag should have rejected the template with an error on that basis alone.
- Now that we have modules, I would like to see us stop having each proposal for new functionality come at the price of further global namespace pollution. I would like to see us transition towards having most new std library entry points be provided by std modules. I understand why we haven't yet, but something needs to go first.
I think that doing this should be an eventual target but I don't think adding a single much-asked-for static function to the RegExp function would be a good place to start. I think the committee first needs to agree about how this form of modularisation should be done - there are much bigger targets first and I would not like to see this proposal tied and held back by that (useful) goal.
I agree, but this will be true for any individual proposal.
Perhaps we need a sacrificial "first penguin through the ice" proposal whose only purpose is to arrive as a std import rather than a std primordial. (Just kidding.)
- ES6 made RegExp subclassable with most methods delegating to a common @exec method, so that a subclass only needs to consistently override a small number of things to stay consistent. Neither RegExpSubclass.escape nor RegExpSubclass.tag can be derived from aRegExpSubclass[@exec]. Because of the first bullet, RegExpSubclass.tag also cannot be derived from RegExpSubclass.escape. But having RegExpSubclass.escape delegating to RegExpSubclass.tag seem weird.
Right but it makes sense that
escape
does not play in this game since it is a static method that takes a string argument - I'm not sure how it could use @exec.
I agree that defining a class-side method to delegate to an instance-side method is unpleasant. But because we have class-side inheritance, static methods should be designed with this larger game in mind.
- The instanceof below prevents this polyfill from working cross-frame. Also, when doing RegExpSubclass1.tag
xx${aRegExpSubclass2}yy
, where RegExpSubclass2.source produces a regexp grammar string that RegExpSubclass1 does not understand, I have no idea what the composition should do other than reject with an error. But what if the strings happen to be mutually valid but with conflicting meaning between these subclasses?This is hacky, but in my code I just did
argument.exec ? treatAsRegExp : treatAsString
.
Yes, as with instanceof, that's the difference between the quality needed in a polyfill for personal use vs a proposed std.
On Sat, Jun 13, 2015 at 11:39 AM, Benjamin Gruenbaum <benjamingr at gmail.com> wrote: > On Sat, Jun 13, 2015 at 9:07 PM, Mark S. Miller <erights at google.com> > wrote: > >> On Sat, Jun 13, 2015 at 9:17 AM, Domenic Denicola <d at domenic.me> wrote: >> >>> All of these should be building on top of RegExp.escape :P >>> >> >> It's funny how, by considering it as leading to a proposal, I quickly saw >> deep flaws that I was previously missing. >> >> > That was a big part of making a proposal out of it - to find these things > :) > Indeed! Much appreciated. > > >> the overall result does not do this. For example: >> >> const data = ':x'; >> const rebad = RegExp.tag`(?${data})`; >> console.log(rebad.test('x')); // true >> >> is nonsense. Since the RegExp grammar can be extended per platform, the >> same argument that says we should have the platform provide RegExp.escape >> says we should have the platform provide RegExp.tag -- so that they can >> conisistently reflect these platform extensions. >> >> > This is a good point, I considered whether or not `-` should be included > for a similar reason. I think it is reasonable to only include syntax > identifiers and expect users to deal with parts of patterns of more than > one characters themselves (by wrapping the string with `()` in the > constructor). This is what every other language does practically. > > That said - I'm very open to allowing implementations to escape _more_ > than `SyntaxCharacter` in their implementations and to even recommend that > they do so in such a way that is consistent with their regular expressions. > What do you think about doing that? > > I'm also open to `.tag` wrapping with `()` to avoid these issues but I'm > not sure if we have a way in JavaScript to not make a capturing group out > of it. > Better or different escaping is not issue of this first bullet, but rather, validating that a fragment is a valid fragment for that regexp grammar. For the std grammar, "(?" is not a valid fragment and the tag should have rejected the template with an error on that basis alone. > > >> * Now that we have modules, I would like to see us stop having each >> proposal for new functionality come at the price of further global >> namespace pollution. I would like to see us transition towards having most >> new std library entry points be provided by std modules. I understand why >> we haven't yet, but something needs to go first. >> >> > I think that doing this should be an eventual target but I don't think > adding a single much-asked-for static function to the RegExp function would > be a good place to start. I think the committee first needs to agree about > how this form of modularisation should be done - there are much bigger > targets first and I would not like to see this proposal tied and held back > by that (useful) goal. > I agree, but this will be true for any individual proposal. Perhaps we need a sacrificial "first penguin through the ice" proposal whose *only* purpose is to arrive as a std import rather than a std primordial. (Just kidding.) > > >> * ES6 made RegExp subclassable with most methods delegating to a common >> @exec method, so that a subclass only needs to consistently override a >> small number of things to stay consistent. Neither RegExpSubclass.escape >> nor RegExpSubclass.tag can be derived from aRegExpSubclass[@exec]. Because >> of the first bullet, RegExpSubclass.tag also cannot be derived from >> RegExpSubclass.escape. But having RegExpSubclass.escape delegating to >> RegExpSubclass.tag seem weird. >> >> > Right but it makes sense that `escape` does not play in this game since it > is a static method that takes a string argument - I'm not sure how it could > use @exec. > I agree that defining a class-side method to delegate to an instance-side method is unpleasant. But because we have class-side inheritance, static methods should be designed with this larger game in mind. > > >> * The instanceof below prevents this polyfill from working cross-frame. >> Also, when doing RegExpSubclass1.tag`xx${aRegExpSubclass2}yy`, where >> RegExpSubclass2.source produces a regexp grammar string that >> RegExpSubclass1 does not understand, I have no idea what the composition >> should do other than reject with an error. But what if the strings happen >> to be mutually valid but with conflicting meaning between these subclasses? >> >> This is hacky, but in my code I just did `argument.exec ? treatAsRegExp : > treatAsString`. > Yes, as with instanceof, that's the difference between the quality needed in a polyfill for personal use vs a proposed std. -- Cheers, --MarkM -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150613/4e0010ce/attachment-0001.html>
Perhaps. I encourage you to draft a possible concrete proposal.
Perhaps. I encourage you to draft a possible concrete proposal. On Sat, Jun 13, 2015 at 11:30 AM, Jordan Harband <ljharb at gmail.com> wrote: > Would it help subclassing to have the list of syntax characters/code > points be on a well-known-symbol property? Like > `RegExp.prototype[@@syntaxCharacters] = > Object.freeze('^$\\.*+?()[]{}|'.split(''));` or something? Then @exec could > reference that, and similarly `RegExp.escape` and RegExpSubclass.escape` > could reference it as well? > > On Sat, Jun 13, 2015 at 11:07 AM, Mark S. Miller <erights at google.com> > wrote: > >> On Sat, Jun 13, 2015 at 9:17 AM, Domenic Denicola <d at domenic.me> wrote: >> >>> All of these should be building on top of RegExp.escape :P >>> >> >> I am not yet agreeing or disagreeing with this. Were both to become std, >> clearly they should be consistent with each other. At the time I wrote >> this, it had not occurred to me that the tag itself might be stdized at the >> same time as RegExp.escape. Now that this possibility has been proposed, I >> am realizing lots of flaws with my polyfill. It's funny how, by considering >> it as leading to a proposal, I quickly saw deep flaws that I was previously >> missing. >> >> * The big one is that the literal template parts that are taken to >> represent the regexp pattern fragments being expressed should be >> syntactically valid *fragments*, in the sense that it makes semantic sense >> to inject data between these fragments. Escaping the data + validating the >> overall result does not do this. For example: >> >> const data = ':x'; >> const rebad = RegExp.tag`(?${data})`; >> console.log(rebad.test('x')); // true >> >> is nonsense. Since the RegExp grammar can be extended per platform, the >> same argument that says we should have the platform provide RegExp.escape >> says we should have the platform provide RegExp.tag -- so that they can >> conisistently reflect these platform extensions. >> >> * Now that we have modules, I would like to see us stop having each >> proposal for new functionality come at the price of further global >> namespace pollution. I would like to see us transition towards having most >> new std library entry points be provided by std modules. I understand why >> we haven't yet, but something needs to go first. >> >> * ES6 made RegExp subclassable with most methods delegating to a common >> @exec method, so that a subclass only needs to consistently override a >> small number of things to stay consistent. Neither RegExpSubclass.escape >> nor RegExpSubclass.tag can be derived from aRegExpSubclass[@exec]. Because >> of the first bullet, RegExpSubclass.tag also cannot be derived from >> RegExpSubclass.escape. But having RegExpSubclass.escape delegating to >> RegExpSubclass.tag seem weird. >> >> * The instanceof below prevents this polyfill from working cross-frame. >> Also, when doing RegExpSubclass1.tag`xx${aRegExpSubclass2}yy`, where >> RegExpSubclass2.source produces a regexp grammar string that >> RegExpSubclass1 does not understand, I have no idea what the composition >> should do other than reject with an error. But what if the strings happen >> to be mutually valid but with conflicting meaning between these subclasses? >> >> >> >> >>> >>> >>> *From:* es-discuss [mailto:es-discuss-bounces at mozilla.org] *On Behalf >>> Of *Mark S. Miller >>> *Sent:* Saturday, June 13, 2015 02:39 >>> *To:* C. Scott Ananian >>> *Cc:* Benjamin Gruenbaum; es-discuss >>> *Subject:* Re: RegExp.escape() >>> >>> >>> >>> The point of this last variant is that data gets escaped but RegExp >>> objects do not -- allowing you to compose RegExps: >>> re`${re1}|${re2}*|${data}` >>> But this requires one more adjustment: >>> >>> >>> > >>> > function re(first, ...args) { >>> > let flags = first; >>> > function tag(template, ...subs) { >>> > const parts = []; >>> > const numSubs = subs.length; >>> > for (let i = 0; i < numSubs; i++) { >>> > parts.push(template.raw[i]); >>> > const subst = subs[i] instanceof RegExp ? >>> >>> >>> `(?:${subs[i].source})` : >>> >>> > subs[i].replace(/[\/\\^$*+?.()|[\]{}]/g, '\\amp;'); >>> > parts.push(subst); >>> > } >>> > parts.push(template.raw[numSubs]); >>> > return RegExp(parts.join(''), flags); >>> > } >>> > if (typeof first === 'string') { >>> > return tag; >>> > } else { >>> > flags = void 0; // Should this be '' ? >>> > return tag(first, ...args); >>> > } >>> > } >>> >> >> >> >> -- >> Cheers, >> --MarkM >> >> _______________________________________________ >> es-discuss mailing list >> es-discuss at mozilla.org >> https://mail.mozilla.org/listinfo/es-discuss >> >> > > _______________________________________________ > es-discuss mailing list > es-discuss at mozilla.org > https://mail.mozilla.org/listinfo/es-discuss > > -- Text by me above is hereby placed in the public domain Cheers, --MarkM -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150613/903361e8/attachment.html>
What about that part in particular?
That said - I'm very open to allowing implementations to escape more
than SyntaxCharacter
in their implementations and to even recommend that
they do so in such a way that is consistent with their regular expressions.
What do you think about doing that?
If we go with .escape
(and not tag at this stage) - implementations
extending the regexp syntax(which is apparently allowed?) to add
identifiers should be allowed to add identifiers to escape?
This sounds like the biggest barrier at this point from what I understand.
I'm also considering a bit of as if
to allow implementations to, for
example, not escape some characters inside [...]
as long as the end
result is the same.
What about that part in particular? > That said - I'm very open to allowing implementations to escape _more_ than `SyntaxCharacter` in their implementations and to even recommend that they do so in such a way that is consistent with their regular expressions. What do you think about doing that? If we go with `.escape` (and not tag at this stage) - implementations extending the regexp syntax(which is apparently allowed?) to add identifiers should be allowed to add identifiers to escape? This sounds like the biggest barrier at this point from what I understand. I'm also considering a bit of `as if` to allow implementations to, for example, not escape some characters inside `[...]` as long as the end result is the same. On Sat, Jun 13, 2015 at 9:57 PM, Mark S. Miller <erights at google.com> wrote: > On Sat, Jun 13, 2015 at 11:39 AM, Benjamin Gruenbaum <benjamingr at gmail.com > > wrote: > >> On Sat, Jun 13, 2015 at 9:07 PM, Mark S. Miller <erights at google.com> >> wrote: >> >>> On Sat, Jun 13, 2015 at 9:17 AM, Domenic Denicola <d at domenic.me> wrote: >>> >>>> All of these should be building on top of RegExp.escape :P >>>> >>> >>> It's funny how, by considering it as leading to a proposal, I quickly >>> saw deep flaws that I was previously missing. >>> >>> >> That was a big part of making a proposal out of it - to find these things >> :) >> > > Indeed! Much appreciated. > > > >> >> >>> the overall result does not do this. For example: >>> >>> const data = ':x'; >>> const rebad = RegExp.tag`(?${data})`; >>> console.log(rebad.test('x')); // true >>> >>> is nonsense. Since the RegExp grammar can be extended per platform, the >>> same argument that says we should have the platform provide RegExp.escape >>> says we should have the platform provide RegExp.tag -- so that they can >>> conisistently reflect these platform extensions. >>> >>> >> This is a good point, I considered whether or not `-` should be included >> for a similar reason. I think it is reasonable to only include syntax >> identifiers and expect users to deal with parts of patterns of more than >> one characters themselves (by wrapping the string with `()` in the >> constructor). This is what every other language does practically. >> >> That said - I'm very open to allowing implementations to escape _more_ >> than `SyntaxCharacter` in their implementations and to even recommend that >> they do so in such a way that is consistent with their regular expressions. >> What do you think about doing that? >> >> I'm also open to `.tag` wrapping with `()` to avoid these issues but I'm >> not sure if we have a way in JavaScript to not make a capturing group out >> of it. >> > > Better or different escaping is not issue of this first bullet, but > rather, validating that a fragment is a valid fragment for that regexp > grammar. For the std grammar, "(?" is not a valid fragment and the tag > should have rejected the template with an error on that basis alone. > > > > >> >> >>> * Now that we have modules, I would like to see us stop having each >>> proposal for new functionality come at the price of further global >>> namespace pollution. I would like to see us transition towards having most >>> new std library entry points be provided by std modules. I understand why >>> we haven't yet, but something needs to go first. >>> >>> >> I think that doing this should be an eventual target but I don't think >> adding a single much-asked-for static function to the RegExp function would >> be a good place to start. I think the committee first needs to agree about >> how this form of modularisation should be done - there are much bigger >> targets first and I would not like to see this proposal tied and held back >> by that (useful) goal. >> > > I agree, but this will be true for any individual proposal. > > Perhaps we need a sacrificial "first penguin through the ice" proposal > whose *only* purpose is to arrive as a std import rather than a std > primordial. > (Just kidding.) > > >> >> >>> * ES6 made RegExp subclassable with most methods delegating to a common >>> @exec method, so that a subclass only needs to consistently override a >>> small number of things to stay consistent. Neither RegExpSubclass.escape >>> nor RegExpSubclass.tag can be derived from aRegExpSubclass[@exec]. Because >>> of the first bullet, RegExpSubclass.tag also cannot be derived from >>> RegExpSubclass.escape. But having RegExpSubclass.escape delegating to >>> RegExpSubclass.tag seem weird. >>> >>> >> Right but it makes sense that `escape` does not play in this game since >> it is a static method that takes a string argument - I'm not sure how it >> could use @exec. >> > > I agree that defining a class-side method to delegate to an instance-side > method is unpleasant. But because we have class-side inheritance, static > methods should be designed with this larger game in mind. > > > >> >> >>> * The instanceof below prevents this polyfill from working cross-frame. >>> Also, when doing RegExpSubclass1.tag`xx${aRegExpSubclass2}yy`, where >>> RegExpSubclass2.source produces a regexp grammar string that >>> RegExpSubclass1 does not understand, I have no idea what the composition >>> should do other than reject with an error. But what if the strings happen >>> to be mutually valid but with conflicting meaning between these subclasses? >>> >>> This is hacky, but in my code I just did `argument.exec ? treatAsRegExp >> : treatAsString`. >> > > Yes, as with instanceof, that's the difference between the quality needed > in a polyfill for personal use vs a proposed std. > > > > -- > Cheers, > --MarkM > -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150613/f48a0b3e/attachment-0001.html>
To throw some more paint on the bikeshed:
The "instanceof RegExp" and "RegExp(...)" parts of the "perfect"
implementation of RegExp.tag
should also be fixed to play nicely with
species.
I think Allen and I would say that you should not use the species pattern
for instantiating the new regexp (because this is a factory), but you
should be doing new this(...)
to create the result, instead of
RegExp(...)
. (Domenic might disagree, but this is the pattern the ES6
spec is currently consistent with.)
The instanceof RegExp
test might also be reviewed. It might be okay, but
perhaps you want to invoke a toRegExp
method instead, or just look at
source
, so that we used duck typing instead of a fixed inheritance
chain. You could even define String#toRegExp
and have that handle proper
escaping. This pattern might not play as nicely with subtyping, though, so
perhaps using this.escape(string)
(returning an instance of this
) is in
fact preferable. Everything other than a string might be passed through
new this(somethingelse).source
which could cast it from a "base RegExp"
to a subclass as necessary. You could handle whatever conversions are
necessary in the constructor for your subclass.
If we did want to use the species pattern, the best way (IMO) would be to
expose the fundamental alternation/concatenation/etc operators. For
example, RegExp.prototype.concat(x)
would use the species pattern to
produce a result, and would also handle escaping x
if it was a string.
The set of instance methods needed is large but not too large: concat
,
alt
, mult
, and group
(with options) might be sufficient.
To throw some more paint on the bikeshed: The "instanceof RegExp" and "RegExp(...)" parts of the "perfect" implementation of `RegExp.tag` should also be fixed to play nicely with species. I think Allen and I would say that you should *not* use the species pattern for instantiating the new regexp (because this is a factory), but you should be doing `new this(...)` to create the result, instead of `RegExp(...)`. (Domenic might disagree, but this is the pattern the ES6 spec is currently consistent with.) The `instanceof RegExp` test might also be reviewed. It might be okay, but perhaps you want to invoke a `toRegExp` method instead, or just look at `source`, so that we used duck typing instead of a fixed inheritance chain. You could even define `String#toRegExp` and have that handle proper escaping. This pattern might not play as nicely with subtyping, though, so perhaps using `this.escape(string)` (returning an instance of `this`) is in fact preferable. Everything other than a string might be passed through `new this(somethingelse).source` which could cast it from a "base RegExp" to a subclass as necessary. You could handle whatever conversions are necessary in the constructor for your subclass. If we did want to use the species pattern, the best way (IMO) would be to expose the fundamental alternation/concatenation/etc operators. For example, `RegExp.prototype.concat(x)` would use the species pattern to produce a result, and would also handle escaping `x` if it was a string. The set of instance methods needed is large but not *too* large: `concat`, `alt`, `mult`, and `group` (with options) might be sufficient. --scott -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150613/de30cb66/attachment.html>
On Jun 13, 2015, at 1:18 PM, C. Scott Ananian wrote:
To throw some more paint on the bikeshed:
The "instanceof RegExp" and "RegExp(...)" parts of the "perfect" implementation of
RegExp.tag
should also be fixed to play nicely with species.I think Allen and I would say that you should not use the species pattern for instantiating the new regexp (because this is a factory), but you should be doing
new this(...)
to create the result, instead ofRegExp(...)
. (Domenic might disagree, but this is the pattern the ES6 spec is currently consistent with.)
Absolutely, new this(...)
is a pattern that everyone who whats to create inheritable "static" factory methods needs to learn. It's how such a factory says: I need to create an instance of the constructor I was invoked upon.
species
is very different. It is how an instance method says: I need to create an new instance that is similar to this instance, but lack its specialized behavior.
The
instanceof RegExp
test might also be reviewed. It might be okay, but perhaps you want to invoke atoRegExp
method instead, or just look atsource
, so that we used duck typing instead of a fixed inheritance chain. You could even defineString#toRegExp
and have that handle proper escaping. This pattern might not play as nicely with subtyping, though, so perhaps usingthis.escape(string)
(returning an instance ofthis
) is in fact preferable. Everything other than a string might be passed throughnew this(somethingelse).source
which could cast it from a "base RegExp" to a subclass as necessary. You could handle whatever conversions are necessary in the constructor for your subclass.
Originally, ES6 had a @@isRegExp property that was used to brand objects that could be used in context's where RegExp instances were expected. It was used by methods like String.prototype.match/split/search/replace to determine if the "pattern" argument was an "regular expression" rather than a string. Latter @@isRegExp was eliminated and replaced with @@match, @@search, @@split, and @@replace because we realized that the corresponding methods didn't depend upon the full generality of regular expressions, but only upon the more specific behaviors. When we did that, we also decided that we would use the present of @@match a property as the brand to identify regular expression like objects. This is captured in the ES6 spec. by the IsRegExp abstract operation people.mozilla.org/~jorendorff/es6-draft.html#sec-isregexp which is used at several places within the ES6 spec.
So, the property ES6 way to do a cross-realm friendly check for RegExp like behavior is to check for the existence of a property whose key is Symbol.match
If we did want to use the species pattern, the best way (IMO) would be to expose the fundamental alternation/concatenation/etc operators. For example,
RegExp.prototype.concat(x)
would use the species pattern to produce a result, and would also handle escapingx
if it was a string. The set of instance methods needed is large but not too large:concat
,alt
,mult
, andgroup
(with options) might be sufficient.
yes.
Again, the key difference is whether we are talking about a request to a constructor object or a request to an instance object.
MyArraySubclass.of(1,2,3,4) //I expect to get an instance of MyArraySubclass, not of Array or some other Array-like species
aMyArraySubclassObject.map(from=>f(from)) //I expect to get something with Array behavior but it may not be an instance of MyArraySubclass
On Jun 13, 2015, at 1:18 PM, C. Scott Ananian wrote: > To throw some more paint on the bikeshed: > > The "instanceof RegExp" and "RegExp(...)" parts of the "perfect" implementation of `RegExp.tag` should also be fixed to play nicely with species. > > I think Allen and I would say that you should *not* use the species pattern for instantiating the new regexp (because this is a factory), but you should be doing `new this(...)` to create the result, instead of `RegExp(...)`. (Domenic might disagree, but this is the pattern the ES6 spec is currently consistent with.) Absolute, `new this(...)` is a pattern that everyone who whats to create inheritable "static" factory methods needs to learn. It's how such a factory says: I need to create an instance of the constructor I was invoked upon. `species` is very different. It is how an instance method says: I need to create an new instance that is similar to this instance, but lack its specialized behavior. > > The `instanceof RegExp` test might also be reviewed. It might be okay, but perhaps you want to invoke a `toRegExp` method instead, or just look at `source`, so that we used duck typing instead of a fixed inheritance chain. You could even define `String#toRegExp` and have that handle proper escaping. This pattern might not play as nicely with subtyping, though, so perhaps using `this.escape(string)` (returning an instance of `this`) is in fact preferable. Everything other than a string might be passed through `new this(somethingelse).source` which could cast it from a "base RegExp" to a subclass as necessary. You could handle whatever conversions are necessary in the constructor for your subclass. Originally, ES6 had a @@isRegExp property that was used to brand objects that could be used in context's where RegExp instances were expected. It was used by methods like String.prototype.match/split/search/replace to determine if the "pattern" argument was an "regular expression" rather than a string. Latter @@isRegExp was eliminated and replaced with @@match, @@search, @@split, and @@replace because we realized that the corresponding methods didn't depend upon the full generality of regular expressions, but only upon the more specific behaviors. When we did that, we also decided that we would use the present of @@match a property as the brand to identify regular expression like objects. This is captured in the ES6 spec. by the IsRegExp abstract operation https://people.mozilla.org/~jorendorff/es6-draft.html#sec-isregexp which is used at several places within the ES6 spec. So, the property ES6 way to do a cross-realm friendly check for RegExp like behavior is to check for the existence of a property whose key is Symbol.match > > If we did want to use the species pattern, the best way (IMO) would be to expose the fundamental alternation/concatenation/etc operators. For example, `RegExp.prototype.concat(x)` would use the species pattern to produce a result, and would also handle escaping `x` if it was a string. The set of instance methods needed is large but not *too* large: `concat`, `alt`, `mult`, and `group` (with options) might be sufficient. yes. Again, the key difference is whether we are talking about a request to a constructor object or a request to an instance object. MyArraySubclass.of(1,2,3,4) //I expect to get an instance of MyArraySubclass, not of Array or some other Array-like species aMyArraySubclassObject.map(from=>f(from)) //I expect to get something with Array behavior but it may not be an instance of MyArraySubclass Allen -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150613/04d4724a/attachment.html>
(typo correction)
On Jun 13, 2015, at 4:16 PM, Allen Wirfs-Brock wrote: > > On Jun 13, 2015, at 1:18 PM, C. Scott Ananian wrote: > >> To throw some more paint on the bikeshed: >> >> The "instanceof RegExp" and "RegExp(...)" parts of the "perfect" implementation of `RegExp.tag` should also be fixed to play nicely with species. >> >> I think Allen and I would say that you should *not* use the species pattern for instantiating the new regexp (because this is a factory), but you should be doing `new this(...)` to create the result, instead of `RegExp(...)`. (Domenic might disagree, but this is the pattern the ES6 spec is currently consistent with.) Err- - Absolutely > Absolute, `new this(...)` is a pattern that everyone who whats to create inheritable "static" factory methods needs to learn. It's how such a factory says: I need to create an instance of the constructor I was invoked upon. > > `species` is very different. It is how an instance method says: I need to create an new instance that is similar to this instance, but lack its specialized behavior. > >> >> The `instanceof RegExp` test might also be reviewed. It might be okay, but perhaps you want to invoke a `toRegExp` method instead, or just look at `source`, so that we used duck typing instead of a fixed inheritance chain. You could even define `String#toRegExp` and have that handle proper escaping. This pattern might not play as nicely with subtyping, though, so perhaps using `this.escape(string)` (returning an instance of `this`) is in fact preferable. Everything other than a string might be passed through `new this(somethingelse).source` which could cast it from a "base RegExp" to a subclass as necessary. You could handle whatever conversions are necessary in the constructor for your subclass. > > Originally, ES6 had a @@isRegExp property that was used to brand objects that could be used in context's where RegExp instances were expected. It was used by methods like String.prototype.match/split/search/replace to determine if the "pattern" argument was an "regular expression" rather than a string. Latter @@isRegExp was eliminated and replaced with @@match, @@search, @@split, and @@replace because we realized that the corresponding methods didn't depend upon the full generality of regular expressions, but only upon the more specific behaviors. When we did that, we also decided that we would use the present of @@match a property as the brand to identify regular expression like objects. This is captured in the ES6 spec. by the IsRegExp abstract operation https://people.mozilla.org/~jorendorff/es6-draft.html#sec-isregexp which is used at several places within the ES6 spec. > > So, the property ES6 way to do a cross-realm friendly check for RegExp like behavior is to check for the existence of a property whose key is Symbol.match > >> >> If we did want to use the species pattern, the best way (IMO) would be to expose the fundamental alternation/concatenation/etc operators. For example, `RegExp.prototype.concat(x)` would use the species pattern to produce a result, and would also handle escaping `x` if it was a string. The set of instance methods needed is large but not *too* large: `concat`, `alt`, `mult`, and `group` (with options) might be sufficient. > > yes. > > Again, the key difference is whether we are talking about a request to a constructor object or a request to an instance object. > > MyArraySubclass.of(1,2,3,4) //I expect to get an instance of MyArraySubclass, not of Array or some other Array-like species > > aMyArraySubclassObject.map(from=>f(from)) //I expect to get something with Array behavior but it may not be an instance of MyArraySubclass > > Allen > _______________________________________________ > es-discuss mailing list > es-discuss at mozilla.org > https://mail.mozilla.org/listinfo/es-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150613/9c3cd19f/attachment-0001.html>
As a cross-cutting concern I'd like the feedback of more people on benjamingr/RegExp.escape#29
Basically we've got to make a design choice of readable output vs. potentially safer output.
As a cross-cutting concern I'd like the feedback of more people on https://github.com/benjamingr/RegExp.escape/issues/29 Basically we've got to make a design choice of readable output vs. potentially safer output. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150620/0129eea6/attachment.html>
I have never written a proposal before, but I would love if it was possible to do the following in JavaScript:
// This code exposes a function that when called bound to an `object` inserts a method in that `object`.
// The following are 3 ways to do this, the last one being my proposal. Note that I am using CommonJS
// require style to include modules, but that is simply to illustrate how I came up with thi. Using `export default function…`
// does not really address my use case, but just for completeness I have included an example using the new `import`
// `export` system as well.
// Disclaimer: This is a simple plugin-like system example, but notice I am not checking whether the
// method already exist in `object`. You can ignore this.
// Idiomatic JavaScript
module.exports = function () {
let method = require("./lib/method")
this.method = function (…args) {
return args.map((arg) => method(arg))
}
}
// Clumsy, but without using `let` variables
module.exports = function () {
this.method = function (…args) {
return (function (method) {
return args.map((arg) => method(path))
}(
require(“./lib/method")
))
}
}
// My proposal is to allow blocks to receive arguments (similar to the non-standard let blocks) and
// whatever you return inside the block is the return value of the enclosing function. Also, there
// should be no need to write `return` as it will always return the result of the last expression.
module.exports = function () {
this.method = function (…args) {
(method) { args.map((arg) => method(arg)) }(require("./lib/method")
}
}
// Just for completeness, but not really related to what I am proposing (although it
does solve this particular case very neatly) using import & exports
import method from "./method"
exports default function () {
this.method = function (…args) {
return args.map((arg) => method(arg))
}
}
If you think this is a good or bad idea, please let me know your comments and observations.
J
Hi. I have never written a proposal before, but I would love if it was possible to do the following in JavaScript: ```js // This code exposes a function that when called bound to an `object` inserts a method in that `object`. // The following are 3 ways to do this, the last one being my proposal. Note that I am using CommonJS // require style to include modules, but that is simply to illustrate how I came up with thi. Using `export default function…` // does not really address my use case, but just for completeness I have included an example using the new `import` // `export` system as well. // Disclaimer: This is a simple plugin-like system example, but notice I am not checking whether the // method already exist in `object`. You can ignore this. // Idiomatic JavaScript module.exports = function () { let method = require("./lib/method") this.method = function (…args) { return args.map((arg) => method(arg)) } } // Clumsy, but without using `let` variables module.exports = function () { this.method = function (…args) { return (function (method) { return args.map((arg) => method(path)) }( require(“./lib/method") )) } } // My proposal is to allow blocks to receive arguments (similar to the non-standard let blocks) and // whatever you return inside the block is the return value of the enclosing function. Also, there // should be no need to write `return` as it will always return the result of the last expression. module.exports = function () { this.method = function (…args) { (method) { args.map((arg) => method(arg)) }(require("./lib/method") } } // Just for completeness, but not really related to what I am proposing (although it does solve this particular case very neatly) using import & exports import method from "./method" exports default function () { this.method = function (…args) { return args.map((arg) => method(arg)) } } ``` If you think this is a good or bad idea, please let me know your comments and observations. Regards J > On Jun 20, 2015, at 8:07 PM, Benjamin Gruenbaum <benjamingr at gmail.com> wrote: > > As a cross-cutting concern I'd like the feedback of more people on https://github.com/benjamingr/RegExp.escape/issues/29 <https://github.com/benjamingr/RegExp.escape/issues/29> > > Basically we've got to make a design choice of readable output vs. potentially safer output. > > _______________________________________________ > es-discuss mailing list > es-discuss at mozilla.org > https://mail.mozilla.org/listinfo/es-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150620/ca9e9a8e/attachment.html>
Why is this a comment on the RegExp.escape discussion?
Why is this a comment on the RegExp.escape discussion? -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150620/62e9c7da/attachment.html>
I'd like to give benjamingr/RegExp.escape#29 another week please, if you have a strong opinion voice it after which we'll settle on a hopefully final API for RegExp.escape in terms of the escaped parts.
Some parts so you won't have to read the whole thread (debated issues):
- Numeric literals are escaped at the start of the string to not interfere with capturing groups (yes/no)
- Hex characters ([0-9a-f]) are escaped at the start of the string to not interfere with unicode escape sequences (yes/no)
/
is escaped to support passing a RegExp string to eval (yes/no)?
And so on.
I'd like to give https://github.com/benjamingr/RegExp.escape/issues/29 another week *please, if you have a strong opinion voice it* after which we'll settle on a hopefully *final* API for RegExp.escape in terms of the escaped parts. Some parts so you won't have to read the whole thread (debated issues): - Numeric literals are escaped at the start of the string to not interfere with capturing groups (yes/no) - Hex characters ([0-9a-f]) are escaped at the start of the string to not interfere with unicode escape sequences (yes/no) - `/` is escaped to support passing a RegExp string to eval (yes/no)? And so on. On Sat, Jun 20, 2015 at 2:07 PM, Benjamin Gruenbaum <benjamingr at gmail.com> wrote: > As a cross-cutting concern I'd like the feedback of more people on > https://github.com/benjamingr/RegExp.escape/issues/29 > > Basically we've got to make a design choice of readable output vs. > potentially safer output. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150627/b6de11de/attachment.html>
Safety over readability please. If there is a single fully escaped form that is safe to use in all the expected contexts, let's choose that. The results of RegExp.escape are not very readable anyway, and rarely read. So compromising safety for some contexts in exchange for incremental readability improvements of something that won't be read is not a good idea.
If there is not a clearly most escaped form that is safe in all expected contexts, then first, let us enumerate all the relevant contexts and the escaping demands of each.
Safety over readability please. If there is a single fully escaped form that is safe to use in all the expected contexts, let's choose that. The results of RegExp.escape are not very readable anyway, and rarely read. So compromising safety for some contexts in exchange for incremental readability improvements of something that won't be read is not a good idea. If there is not a clearly most escaped form that is safe in all expected contexts, then first, let us enumerate all the relevant contexts and the escaping demands of each. On Sat, Jun 27, 2015 at 8:37 AM, Benjamin Gruenbaum <benjamingr at gmail.com> wrote: > I'd like to give https://github.com/benjamingr/RegExp.escape/issues/29 > another week *please, if you have a strong opinion voice it* after which > we'll settle on a hopefully *final* API for RegExp.escape in terms of the > escaped parts. > > Some parts so you won't have to read the whole thread (debated issues): > > - Numeric literals are escaped at the start of the string to not > interfere with capturing groups (yes/no) > - Hex characters ([0-9a-f]) are escaped at the start of the string to not > interfere with unicode escape sequences (yes/no) > - `/` is escaped to support passing a RegExp string to eval (yes/no)? > > And so on. > > On Sat, Jun 20, 2015 at 2:07 PM, Benjamin Gruenbaum <benjamingr at gmail.com> > wrote: > >> As a cross-cutting concern I'd like the feedback of more people on >> https://github.com/benjamingr/RegExp.escape/issues/29 >> >> Basically we've got to make a design choice of readable output vs. >> potentially safer output. >> >> > > _______________________________________________ > es-discuss mailing list > es-discuss at mozilla.org > https://mail.mozilla.org/listinfo/es-discuss > > -- Cheers, --MarkM -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150627/66a4546b/attachment.html>
On Jun 27, 2015, at 9:17 AM, Mark S. Miller wrote:
Safety over readability please. If there is a single fully escaped form that is safe to use in all the expected contexts, let's choose that. The results of RegExp.escape are not very readable anyway, and rarely read. So compromising safety for some contexts in exchange for incremental readability improvements of something that won't be read is not a good idea.
If there is not a clearly most escaped form that is safe in all expected contexts, then first, let us enumerate all the relevant contexts and the escaping demands of each.
Alternatively, an optional options object argument could be used to adapt a single 'escape' function to differing use case requirements.
On Jun 27, 2015, at 9:17 AM, Mark S. Miller wrote: > Safety over readability please. If there is a single fully escaped form that is safe to use in all the expected contexts, let's choose that. The results of RegExp.escape are not very readable anyway, and rarely read. So compromising safety for some contexts in exchange for incremental readability improvements of something that won't be read is not a good idea. > > If there is not a clearly most escaped form that is safe in all expected contexts, then first, let us enumerate all the relevant contexts and the escaping demands of each. Alternatively, an optional options object argument could be used to adapt a single 'escape' function to differing use case requirements. Allen
This is currently discussed at benjamingr/RegExp.escape#29 .
Adding my comment from there to here too:
Some languages (PHP for example) do this (optional parameter with additional parameters) so it's not unprecedented.
The question we should ask ourselves is whether it would be a significant
improvement over the user just doing a .replace
to escape these
characters for the more fine grained cases (where the language will support
the more general one).
Also, I wonder what that options argument would look like (I think accepting any iterable over characters would be good and would allow an array or string or set etc).
This is currently discussed at https://github.com/benjamingr/RegExp.escape/issues/29#issuecomment-116789780 . Adding my comment from there to here too: Some languages (PHP for example) do this (optional parameter with additional parameters) so it's not unprecedented. The question we should ask ourselves is whether it would be a significant improvement over the user just doing a `.replace` to escape these characters for the more fine grained cases (where the language will support the more general one). Also, I wonder what that options argument would look like (I think accepting any iterable over characters would be good and would allow an array or string or set etc). On Mon, Jun 29, 2015 at 9:37 PM, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote: > > On Jun 27, 2015, at 9:17 AM, Mark S. Miller wrote: > > > Safety over readability please. If there is a single fully escaped form > that is safe to use in all the expected contexts, let's choose that. The > results of RegExp.escape are not very readable anyway, and rarely read. So > compromising safety for some contexts in exchange for incremental > readability improvements of something that won't be read is not a good idea. > > > > If there is not a clearly most escaped form that is safe in all expected > contexts, then first, let us enumerate all the relevant contexts and the > escaping demands of each. > > Alternatively, an optional options object argument could be used to adapt > a single 'escape' function to differing use case requirements. > > Allen > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150629/81660ba3/attachment.html>
Please, not an iterable over characters. (Or at least, "not only".) Use a RegExp. Imagine trying to ensure that any characters over \u007f were escaped. You don't want an iterable over ~64k characters.
In addition, a RegExp would allow you to concisely specify "hex digits, but only at the start of the string" and some of the other oddities we've considered. --scott
Please, not an iterable over characters. (Or at least, "not only".) Use a RegExp. Imagine trying to ensure that any characters over \u007f were escaped. You don't want an iterable over ~64k characters. In addition, a RegExp would allow you to concisely specify "hex digits, but only at the start of the string" and some of the other oddities we've considered. --scott -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150629/b091679e/attachment.html>
I meant something like RegExp.escape(str, "☺")
(also escapes ☺
). Since
strings are iterable by code points via the new iteration protocol this
sounds like the natural choice. I'm not sure such a second argument would
be a good idea.
I meant something like `RegExp.escape(str, "☺")` (also escapes `☺`). Since strings are iterable by code points via the new iteration protocol this sounds like the natural choice. I'm not sure such a second argument would be a good idea. On Mon, Jun 29, 2015 at 9:42 PM, C. Scott Ananian <ecmascript at cscott.net> wrote: > Please, not an iterable over characters. (Or at least, "not only".) Use a > RegExp. Imagine trying to ensure that any characters over \u007f were > escaped. You don't want an iterable over ~64k characters. > > In addition, a RegExp would allow you to concisely specify "hex digits, > but only at the start of the string" and some of the other oddities we've > considered. > --scott > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150629/200bb16a/attachment-0001.html>
And I'm suggesting that RegExp.escape(str, /[image: ☺]/ug)
is a much
better idea.
And I'm suggesting that `RegExp.escape(str, /[image: ☺]/ug)` is a much better idea. --scott -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150629/35fa8faf/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: emoji_u263a.png Type: image/png Size: 1681 bytes Desc: not available URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150629/35fa8faf/attachment.png>
Why? What advantage would it offer?
Why? What advantage would it offer? On Mon, Jun 29, 2015 at 9:49 PM, C. Scott Ananian <ecmascript at cscott.net> wrote: > And I'm suggesting that `RegExp.escape(str, /[image: ☺]/ug)` is a much > better idea. > --scott > -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150629/693cd752/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: emoji_u263a.png Type: image/png Size: 1681 bytes Desc: not available URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150629/693cd752/attachment.png>
On Mon, Jun 29, 2015 at 9:04 PM, Benjamin Gruenbaum <benjamingr at gmail.com> wrote:
Why? What advantage would it offer?
See Scott’s previous email:
On Mon, Jun 29, 2015 at 9:04 PM, Benjamin Gruenbaum <benjamingr at gmail.com> wrote: > Why? What advantage would it offer? See Scott’s previous email: On Mon, Jun 29, 2015 at 8:42 PM, C. Scott Ananian <ecmascript at cscott.net> wrote: > Imagine trying to ensure that any characters over \u007f were > escaped. You don't want an iterable over ~64k characters. > > In addition, a RegExp would allow you to concisely specify "hex digits, but > only at the start of the string" and some of the other oddities we've > considered.
I'm still not sure if it's worth it, after all it's just sugar for
RegExp.escape(str).replace(/[a-z]/gu, m =>
\${m})
I'm still not sure if it's worth it, after all it's just sugar for `RegExp.escape(str).replace(/[a-z]/gu, m => `\\${m}`)` On Tue, Jun 30, 2015 at 10:35 AM, Mathias Bynens <mathiasb at opera.com> wrote: > On Mon, Jun 29, 2015 at 9:04 PM, Benjamin Gruenbaum > <benjamingr at gmail.com> wrote: > > Why? What advantage would it offer? > > See Scott’s previous email: > > On Mon, Jun 29, 2015 at 8:42 PM, C. Scott Ananian <ecmascript at cscott.net> > wrote: > > Imagine trying to ensure that any characters over \u007f were > > escaped. You don't want an iterable over ~64k characters. > > > > In addition, a RegExp would allow you to concisely specify "hex digits, > but > > only at the start of the string" and some of the other oddities we've > > considered. > -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150630/32780a56/attachment.html>
On Tue, Jun 30, 2015 at 3:46 AM, Benjamin Gruenbaum <benjamingr at gmail.com>
wrote:
I'm still not sure if it's worth it, after all it's just sugar for
RegExp.escape(str).replace(/[a-z]/gu, m =>
\${m})
I think you're making my point! And I hope your version of RegExp.escape
doesn't use hexadecimal or unicode escapes. (And that no one extends it to
do so in the future.)
Over at benjamingr/RegExp.escape#29 I also suggested that you consider:
RegExp.escape(str, /[0-9]$/)
versus:
RegExp.escape(str).replace(/[0-9]$/, /* what goes here? */);
and then what happens with the latter code if str
is "\\010"
(ie, using
a literal backlash) or $
(since RegExp.escape('$') == "\\024"
).
It would also be nice to be able to do:
str.replace(/something/, (c) => RegExp.escape(c, /[^]/g));
that is, to be able to easily get a version of RegExp.escape
that safely
encodes every character it is given.
Let's give programmers powerful tools that aren't footguns, instead of
making them play with String#replace
after the fact and risk losing toes
on the corner cases.
On Tue, Jun 30, 2015 at 3:46 AM, Benjamin Gruenbaum <benjamingr at gmail.com> wrote: > I'm still not sure if it's worth it, after all it's just sugar for > `RegExp.escape(str).replace(/[a-z]/gu, m => `\\${m}`)` > I think you're making my point! And I hope your version of `RegExp.escape` doesn't use hexadecimal or unicode escapes. (And that no one extends it to do so in the future.) Over at https://github.com/benjamingr/RegExp.escape/issues/29#issuecomment-116845364 I also suggested that you consider: ``` RegExp.escape(str, /[0-9]$/) ``` versus: ``` RegExp.escape(str).replace(/[0-9]$/, /* what goes here? */); ``` and then what happens with the latter code if `str` is `"\\010"` (ie, using a literal backlash) or `$` (since `RegExp.escape('$') == "\\024"`). It would also be nice to be able to do: ``` str.replace(/something/, (c) => RegExp.escape(c, /[^]/g)); ``` that is, to be able to easily get a version of `RegExp.escape` that safely encodes *every* character it is given. Let's give programmers powerful tools that aren't footguns, instead of making them play with `String#replace` after the fact and risk losing toes on the corner cases. --scott -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20150630/8a01ac78/attachment.html>
A non-text attachment was scrubbed... Name: not available Type: text/markdown Size: 1404 bytes Desc: not available URL: esdiscuss/attachments/20171205/62206bb7/attachment.bin
A non-text attachment was scrubbed... Name: not available Type: text/markdown Size: 1404 bytes Desc: not available URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20171205/62206bb7/attachment.bin>
Reviving this thread a third time, is there any love left for introducing RegExp.escape? The previous attempt was abandoned because of a so-called "even-odd problem", but that can be fixed: backslash-escape every SyntaxCharacter, then wrap the full result in a new form of non-capturing group that is only valid as a unit (and therefore protected from otherwise dangerous preceding fragments). For example, (?](?)â¦)
is a good candidate because preceding such content with a right bracket (starting a CharacterClass) and/or a backslash (escaping special treatment of the initial parenthesis) would produce invalid syntax by exposing the "(?)".
As a result, new RegExp(RegExp.escape("foo.bar"))
is valid (i.e., /(?](?)foo\.bar)/
, equivalent in evaluation to /(?:foo\.bar)/
) but new RegExp("\\" + RegExp.escape("foo.bar"))
and even new RegExp("([\\" + RegExp.escape("foo.bar"))
would throw SyntaxErrors.
The upside of such a change is getting safe access to desired language functionality. The downside, of course, is the new pattern's supreme ugliness.
Sample polyfill:
const regExpSyntaxCharacter = /[\^$\\.*+?()[\]{}|]/g;
RegExp.escape = function( value ) {
return "(?](?)" + (value + "").replace(regExpSyntaxCharacter, "\\$&") + ")";
}
Reviving this [thread] a third time, is there any love left for introducing RegExp.escape? The previous attempt was abandoned because of a so-called "[even-odd problem]", but that can be fixed: backslash-escape every _SyntaxCharacter_, then wrap the full result in a new form of non-capturing group that is only valid **as a unit** (and therefore protected from otherwise dangerous preceding fragments). For example, `(?](?)â¦)` is a good candidate because preceding such content with a right bracket (starting a _CharacterClass_) and/or a backslash (escaping special treatment of the initial parenthesis) would produce invalid syntax by exposing the "(?)". As a result, `new RegExp(RegExp.escape("foo.bar"))` is valid (i.e., `/(?](?)foo\.bar)/`, equivalent in evaluation to `/(?:foo\.bar)/`) but `new RegExp("\\" + RegExp.escape("foo.bar"))` and even `new RegExp("([\\" + RegExp.escape("foo.bar"))` would throw SyntaxErrors. The upside of such a change is getting safe access to desired language functionality. The downside, of course, is the new pattern's supreme ugliness. Sample polyfill: ``` const regExpSyntaxCharacter = /[\^$\\.*+?()[\]{}|]/g; RegExp.escape = function( value ) { return "(?](?)" + (value + "").replace(regExpSyntaxCharacter, "\\$&") + ")"; } ``` [thread]: https://esdiscuss.org/topic/regexp-escape [even-odd problem]: https://github.com/benjamingr/RegExp.escape/issues/37
Or even better: (?]**foo)
("]" still terminates character classes; "**"
is a less ugly normally-invalid sequence).
Or even better: `(?]**foo)` ("]" still terminates character classes; "**" is a less ugly normally-invalid sequence). On Tue, Dec 5, 2017 at 11:25 AM, <richard.gibson at gmail.com> wrote: > Reviving this [thread] a third time, is there any love left for > introducing RegExp.escape? The previous attempt was abandoned because of a > so-called "[even-odd problem]", but that can be fixed: backslash-escape > every _SyntaxCharacter_, then wrap the full result in a new form of > non-capturing group that is only valid **as a unit** (and therefore > protected from otherwise dangerous preceding fragments). For example, > `(?](?)…)` is a good candidate because preceding such content with a right > bracket (starting a _CharacterClass_) and/or a backslash (escaping special > treatment of the initial parenthesis) would produce invalid syntax by > exposing the "(?)". > > As a result, `new RegExp(RegExp.escape("foo.bar"))` is valid (i.e., > `/(?](?)foo\.bar)/`, equivalent in evaluation to `/(?:foo\.bar)/`) but `new > RegExp("\\" + RegExp.escape("foo.bar"))` and even `new RegExp("([\\" + > RegExp.escape("foo.bar"))` would throw SyntaxErrors. > > The upside of such a change is getting safe access to desired language > functionality. The downside, of course, is the new pattern's supreme > ugliness. > > Sample polyfill: > ``` > const regExpSyntaxCharacter = /[\^$\\.*+?()[\]{}|]/g; > RegExp.escape = function( value ) { > return "(?](?)" + (value + "").replace(regExpSyntaxCharacter, > "\\$&") + ")"; > } > ``` > > [thread]: https://esdiscuss.org/topic/regexp-escape > [even-odd problem]: https://github.com/benjamingr/RegExp.escape/issues/37 > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.mozilla.org/pipermail/es-discuss/attachments/20171205/4fac803b/attachment.html>
How about standardizing something like RegExp.escape() ? simonwillison.net/2006/Jan/20/escape
It is trivial to implement, but it seems to me that this functionality belongs to the language - the implementation obviously knows better which characters must be escaped, and which ones don't need to.