CARVIEW |
Select Language
HTTP/2 200
cache-control: max-age=43200
server: Combust/Plack (Perl)
vary: Accept-Encoding
content-encoding: gzip
content-length: 3080
content-type: text/html; charset=utf-8
last-modified: Sun, 12 Oct 2025 03:25:53 GMT
date: Sun, 12 Oct 2025 03:25:53 GMT
strict-transport-security: max-age=15768000
cross-platform line-endings (was: Removing files called minus) - nntp.perl.org
Front page | perl.perl5.porters |
Postings from July 2008
nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About

cross-platform line-endings (was: Removing files called minus)
Thread NextFrom:
Tom ChristiansenDate:
July 30, 2008 13:12Subject:
cross-platform line-endings (was: Removing files called minus)Message ID:
5953.1217448746@chthonIn-Reply-To: Message from Tels <nospam-abuse@bloodgate.com> of "Wed, 30 Jul 2008 20:04:58 +0200." <200807302005.05971@bloodgate.com> > I'd like to inject here that chomp() is not enough to get rid of > Windows or Macintosh-style newlines. Well, it depends, but that's a good injection. > So you want: > $_ =~ s/[\r\n]//g; # or whatever If you're working on text-records, perhaps with $/ = q##, I think you might prefer: s/[\r\n]+/ /g; So you still have a break between "foo\nbar" when it becomes "foo bar". The join algorithm in fmt or (n)vi (but not vim the dratteed!) is much more clever about whether it should change newlines into q## or to q# #, depending on whether there's already a space or sentence- terminal punctuation there. That way your ( and ) motion-targets still work. Larry mentioned last week how often he uses \v now; it's certainly useful. The new \v \V and \h and \H are rather nice, although I was a bit surprised that \h didn't include the backspace character, and that while "\t" is HT anywhere and "\b" is a BS in a string or charclass but not regex, "\v" is only a VT in a regex or charclass, not in a string. Even more useful still is \R, I think, standing for "any return- sequence". "\R" will atomically match a linebreak, including the network line-ending "\x0D\x0A". Specifically, is exactly equivalent to (?>\x0D\x0A?|[\x0A-\x0C\x85\x{2028}\x{2029}]) Note: "\R" has no special meaning inside of a character class; use "\v" instead (vertical whitespace). It's a tiny pity that \v only works in patterns, though. And I have to admit that I tend to think of \R as the simpler-written: (?:\xOD\xOA|\v) Hm, now I *am* now curious. Why the no-backtracking (?>...|...) there? Is it the | with 0x0A on both sides? Would \x0D\x0A?* work as well? > to get really rid of them. Otherwise, the first time someone feeds > your code a textfile that has been "converted" on Win32, your code > will fail in interesting ways. > Just one of these captchas that beginners only learn after painful > experience :( Hm. Meaning "Captcha as in /(capture)/" || "Captcha as in Gotcha"? Native files on their native systems aren't the problem, of course, due to the "\n" abstraction. It's the alien ones you get where cross-platform annoyances set it. This is easily enough encountered, though, whether from mail, samba, or NFS. Somewhere I have a many-linked aaa2zzz program that looks at (aaa) and (zzz) in $0 to figure out which way you're going, then looks those systems in a nice hash to find the line-endings. I sure do run a lot of commands like this: perl -pe 's/\n/\r\n/' < README > README.txt perl -i -pe 's/\r//g' README.txt perl -i -pe 'tr[\n\r][\r\n]' Mac-README Although now that I think about it, perl -00 -i.pre-munge -pE 's/\R/\n/g' plaintextfile might be better. And you can't set $/ to a regex, or else we could just use qr{\R} and be done with it. Instead, you have to sniff at the file a bit to see what it feels like. And even then, $/ only affects chomp and readline, not . or ^ or $. I'd rather like a line-discipline-sniffing module that did something like /usr/bin/file does via /etc/magic, and lets you then binmode and/or $/-mangle appropriately. Hey, if *that's* not a magic open (ie, using /etc/magic), I sure dunno what is! :-) I don't suppose anyone might know whether one exists already? --tom % perl -wE 'say "\v"' Unrecognized escape \v passed through at -e line 1. v % perl -wE 'say chr(11) =~ /\v/' 1 % perl -wE 'say "\n" =~ /\v/' 1 % perl -wE 'say "\r" =~ /\v/' 1 % perl -wE 'say chr(0x2029) =~ /\v/' 1 % perl -Mcharnames=:full -wE \ 'say "\N{PARAGRAPH SEPARATOR}" =~ /\v/' 1Thread Next
- cross-platform line-endings (was: Removing files called minus) by Tom Christiansen
- Re: cross-platform line-endings (was: Removing files called minus) by Tom Christiansen
nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About