CARVIEW |
Select Language
HTTP/2 200
cache-control: max-age=43200
server: Combust/Plack (Perl)
vary: Accept-Encoding
content-encoding: gzip
content-length: 8588
content-type: text/html; charset=utf-8
last-modified: Sat, 11 Oct 2025 12:03:32 GMT
date: Sat, 11 Oct 2025 12:03:32 GMT
strict-transport-security: max-age=15768000
fixing the silently failing implicit-close bug (was magic crud) - nntp.perl.org
Front page | perl.perl5.porters |
Postings from July 2008
nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About

fixing the silently failing implicit-close bug (was magic crud)
From:
Tom ChristiansenDate:
July 29, 2008 19:07Subject:
fixing the silently failing implicit-close bug (was magic crud)Message ID:
19185.1217383672@chthonIn-Reply-To: Message from Ed Avis <eda@waniasset.com> of "Tue, 29 Jul 2008 23:19:47 -0000." <loom.20080729T222121-334@post.gmane.org> > Tom Christiansen <tchrist <at> perl.com> writes: >>> But if you think this is an important issue, wouldn't it make much >>> more sense to teach people RIGHT NOW that they shouldn't rely on >>> while (<>) automatically open files for them, >> Because while you can lead a horse to water, but you can't make it drink. >> And if you let it drink freely on its own, it may drink itself to death. >> >> Which just shows that you can't win for trying. >> I spent years on it--YEARS. They still just won't learn. Things >> like this have been in the FAQ since BEFORE Camel-1 even; they've >> been in plenty of the available documentation since then-- > [illustrative examples of how the documentation has deteriorated snipped] > You are certainly right that you've spent a lot of effort on this and > other aspects of the perl documentation, which is greatly appreciated > BTW, and that it was probably clearer in older versions. > But isn't all this effort on documenting the caveats and gotchas of > <>, with only moderate success in educating the perl-using public, an > indicator that it might be better to step back and think of a better > way of doing it? The mere fact that <> requires so much documentation > and controversy, and arguments about when it is safe to use and when > it isn't, and discussions about exactly the right incantation to add > \0 to the end of your filenames, surely shows that its semantics are > a bit hairy at the very least: too hairy for such a simple-looking > operator that everyone uses without a second thought. It's not the input operator that requires that explanation, save for whether it's a fileglob or a readline. The perl5.001 perlfunc manpage under open ended with: The filename that is passed to open will have leading and trailing whitespace deleted. In order to open a file with arbitrary weird characters in it, it's necessary to protect any leading and trailing whitespace thusly: $file =~ s#^(\s)#./$1#; open(FOO, "< $file\0"); Although since then, that's gotten harder and harder to find as the PC police have had their own way when nobody was looking. It today survives only in the perlopentut section I references before, which includes a very clear explanation of why we have the behavior we have, and it's merely what I've said before: The other important thing to notice is that, just as in the shell, any whitespace before or after the filename is ignored. This is good, because you wouldn't want these to do different things: open INFO, "<datafile" open INFO, "< datafile" open INFO, "< datafile" Ignoring surrounding whitespace also helps for when you read a filename in from a different file, and forget to trim it before opening: $filename = <INFO>; # oops, \n still there open(EXTRA, "< $filename") || die "can't open $filename: $!"; This is not a bug, but a feature. Because "open" mimics the shell in its style of using redirection arrows to specify how to open the file, it also does so with respect to extra whitespace around the filename itself as well. And that's all it takes. Period. Did you read it, or not? > (Of course, whatever warning appears under 'open' also needs to be > duplicated under <>, since it is by no means obvious that <> is using > the magical open internally.) It always has been. Sheesh! Don't people read? Here's the perl1 manpage on it: v1.0 The null filehandle <> is special and can be used to emuĀ v1.0 late the behavior of sed and awk. Input from <> comes v1.0 either from standard input, or from each file listed on v1.0 the command line. Here's how it works: the first time <> v1.0 is evaluated, the ARGV array is checked, and if it is v1.0 null, $ARGV[0] is set to '-', which when opened gives you v1.0 standard input. The ARGV array is then processed as a v1.0 list of filenames. The loop v1.0 v1.0 while (<>) { v1.0 ... # code for each line v1.0 } v1.0 v1.0 is equivalent to v1.0 v1.0 unshift(@ARGV, '-') if $#ARGV < $[; v1.0 while ($ARGV = shift) { v1.0 open(ARGV, $ARGV); v1.0 while (<ARGV>) { v1.0 ... # code for each line v1.0 } v1.0 } v1.0 v1.0 except that it isn't as cumbersome to say. It really does v1.0 shift array ARGV and put the current filename into variĀ v1.0 able ARGV. It also uses filehandle ARGV internally. You v1.0 can modify @ARGV before the first <> as long as you leave v1.0 the first filename at the beginning of the array. v1.0 v1.0 If you want to set @ARGV to you own list of files, go v1.0 right ahead. If you want to pass switches into your v1.0 script, you can put a loop on the front like this: v1.0 v1.0 while ($_ = $ARGV[0], /^-/) { v1.0 shift; v1.0 last if /^--$/; v1.0 /^-D(.*)/ && ($debug = $1); v1.0 /^-v/ && $verbose++; v1.0 ... # other switches v1.0 } v1.0 while (<>) { v1.0 ... # code for each line v1.0 } v1.0 v1.0 The <> symbol will return FALSE only once. If you call it v1.0 again after this it will assume you are processing another v1.0 @ARGV list, and if you haven't set @ARGV, will input from v1.0 stdin. And here we have the perlop manpage from 5.10.0: v5.10 The null filehandle <> is special: it can be used to v5.10 emulate the behavior of sed and awk. Input from <> comes v5.10 either from standard input, or from each file listed on v5.10 the command line. Here's how it works: the first time <> v5.10 is evaluated, the @ARGV array is checked, and if it is v5.10 empty, $ARGV[0] is set to "-", which when opened gives you v5.10 standard input. The @ARGV array is then processed as a v5.10 list of filenames. The loop v5.10 v5.10 while (<>) { v5.10 ... # code for each line v5.10 } v5.10 v5.10 is equivalent to the following Perl-like pseudo code: v5.10 v5.10 unshift(@ARGV, '-') unless @ARGV; v5.10 while ($ARGV = shift) { v5.10 open(ARGV, $ARGV); v5.10 while (<ARGV>) { v5.10 ... # code for each line v5.10 } v5.10 } v5.10 v5.10 except that it isn't so cumbersome to say, and will v5.10 actually work. It really does shift the @ARGV array and v5.10 put the current filename into the $ARGV variable. It also v5.10 uses filehandle ARGV internally--<> is just a synonym for v5.10 <ARGV>, which is magical. (The pseudo code above doesn't v5.10 work because it treats <ARGV> as non-magical.) v5.10 v5.10 You can modify @ARGV before the first <> as long as the v5.10 array ends up containing the list of filenames you really v5.10 want. Line numbers ($.) continue as though the input v5.10 were one big happy file. See the example in "eof" in v5.10 perlfunc for how to reset line numbers on each file. v5.10 v5.10 If you want to set @ARGV to your own list of files, go v5.10 right ahead. This sets @ARGV to all plain text files if v5.10 no @ARGV was given: v5.10 v5.10 @ARGV = grep { -f && -T } glob('*') unless @ARGV; v5.10 v5.10 You can even set them to pipe commands. For example, this v5.10 automatically filters compressed arguments through gzip: v5.10 v5.10 @ARGV = map { /\.(gz|Z)$/ ? "gzip -dc < $_ |" : $_ } @ARGV; v5.10 v5.10 If you want to pass switches into your script, you can use v5.10 one of the Getopts modules or put a loop on the front like v5.10 this: v5.10 v5.10 while ($_ = $ARGV[0], /^-/) { v5.10 shift; v5.10 last if /^--$/; v5.10 if (/^-D(.*)/) { $debug = $1 } v5.10 if (/^-v/) { $verbose++ } v5.10 # ... # other switches v5.10 } v5.10 v5.10 while (<>) { v5.10 # ... # code for each line v5.10 } v5.10 v5.10 The <> symbol will return "undef" for end-of-file only v5.10 once. If you call it again after this, it will assume you v5.10 are processing another @ARGV list, and if you haven't set v5.10 @ARGV, will read input from STDIN. I get the very strong idea you haven't read that. > Why not fix <> so it doesn't need all the explanation? Make it just > read the files, which can be adequately documented in one sentence. > And add a new operator, just a couple of characters longer, that > people can use to get all the handy magic behaviour, *if they want it* > and having read the documentation. It's not <> that's at issue, but that readline on the synthetic filehandle ARGV does what it has always said it does. And it does. No New Operators, thank you very much. Better to add something by the gz map in the doc (that nobody reads(?). Better to create some ARGV:: modules for autoinstalling the filters, or "autoprotecting" the filenames. Better to let people still use gz stuff and such if they want to--many of us clearly do. Not having a readonly open be anything else than that is fine. >> Which brings me to my final complaint of the day. >> % perl -i.orig -pe 'print "this is more stuff\n"' it >> >> % echo $? >> 0 >> >> % ls -l it* >> 0 -rw-r--r-- 1 tchrist wheel 0 Jul 29 15:05 it >> 0 -rw-r--r-- 1 tchrist wheel 0 Jul 29 14:06 it.orig >> >> To this day, Perl's implicit closing of files doesn't warn you of >> errors, let alone exit nonzero. This makes it do wrong thing and not >> even tell you it did them wrong. > I am glad you mentioned this, because I also think it's a pretty > egregious bug, but I was rather imagining you'd come out to defend it, Huh? > point out that it has always been that way and why change it now, and > anyway is clearly mentioned as a note to an addendum to a particular > perlfaq answer. I am glad we agree on this issue at least :-(. What you mean in section 8? I'm the one who came up with the END{} block with closing STDOUT, remember. I couldn't get people to believe that silent implicit closes were a problem. I'm especially amazed that they didn't fix this when they created autovivving filehandles, since those also autoclose. > It is not sufficient; I think it might sometimes be necessary, see > below. > Since nobody in practice checks the return status of every print() > call, I would like to see an enhancement to perl where if any print() > on a filehandle fails, it sets a flag which is then checked by > close(). So you could be sure to catch file I/O errors sooner or > later, if not immediately. But that is for another thread ;-p. And just what do you *think* happens!? See your local clearerr and ferror manpages in section 3 or 3S of your local mantree. I'd be aghast if this behavior weren't carried out for the perlio implementation. IO::Handle documentation suggests that it has, or is otherwise emulated: =item $io->error Returns a true value if the given handle has experienced any errors since it was opened or since the last call to C<clearerr>, or if the handle is invalid. It only returns false for a valid handle with no outstanding errors. =item $io->clearerr Clear the given handle's error indicator. Returns -1 if the handle is invalid, 0 otherwise. >> I've never convinced anybody this is important. Since *every* >> program should do this for correctness, it has to be in the run-time >> system to avoid it ever being forgotten. >> Sure, there's stuff like this you can do: >> >> END { close(STDOUT) || die "can't close stdout: $!" } > Is even that sufficient? Is it not possible a print() might fail but > the next print() succeed (perhaps the disk was full, but then some > space became available) and the error flag gets cleared? Does it > depend on whether stdout is opened to a file, a pipe, a tty, etc? See above. > [further example of incorrect behaviour] >> +-------------------------------------------------------+ >> | Notice how even my cat is smarter than your perl!? :( | >> +-------------------------------------------------------+ > I would also note that cat has absolutely no problem handing arbitrary > filenames, be they >>>LOL<<<, |^^|, or anything else. The only > exception is the standard Unix convention of - for stdin. > Of course, a perl filter program can be made to work with all possible > filenames if you write the code longhand instead of with <>, and it > can be made to handle file I/O robustly if you take the trouble to do > open and close by hand and check everything scrupulously. But you > shouldn't have to. The default should just work and be the safe, > checked code unless you specify otherwise. The system is "safe". This is a tempest a teapot. As for arbitrary filenames, you're forgetting some history again. Notice why there's no aux.sh hints file. Try creating con.tmp or aux.plx on some systems. If Larry had wanted Perl's open() to be identical to fopen(), he would simply have called it that, and that's all it would ever have done. But he wanted it to also be popen(), and he wanted it to deal with an idea of a standard synonym for standard streams. And please don't tell me that you really believe /dev/stdin or /dev/fd/0 or whatnot were in vogue during the 80s. They weren't. You can't rely on them today, either. The addition of dup2() and fdopen() handling to Perl's open(), not to mention the fork-open, all came later. The reason it all went under open() was to make open() easier to use, not harder. But then, people have been slowing pulling things out of it to make it more like C. We didn't even have the C version of open for aeons and aeons. No, really. Perl1 had no sys*() functions at all, save I guess for system(), but there's no SYS_tem #define in the jumptable in <sys/syscall.h> as there is for things like SYS_open, SYS_read, and SYS_close. And yes, this is a bit of a jest. Perl4, meanwhile, *had* syscall() -- and sysread() and syswrite(), so we wouldn't have to use syscall() to get them. Oddly, its manpage used SYS_write for the example. require 'syscall.ph'; # may need to run h2ph syscall(&SYS_write, fileno(STDOUT), "hi there\n", 9); Which, oddly, still persists there, whereas the SYS_close example from perlfaq5 might be more appropriate (despite the horribly malformatted "helpful" bit appended after that). But Perl4 certainly did *not* have sysopen() or sysseek(). Then again, neither did perl5! Well, not perl 5.001. Nor did perl5.001 or even perl5.001m. We didn't get sysopen() until 5.002. The change entry reads: NETaa15210: sysopen() From: Chip Salzenberg Files patched: doio.c keywords.pl lib/ExtUtils/typemap opcode.pl opd/perlfunc.pod pp_hot.c pp_sys.c p roto.h toke.c Applied suggested patch. Hope it works... We briefly had a systell() function, but no more. Anybody else rememeber systell()? You have to look very closely and not blink. It was there in 5.003_97f, which was what was supposed to become 5.004. The change log read: ----------------- Version 5.003_97g ----------------- This is it before _98. No more last-minute features. Really, I mean it this time. No kidding. CORE LANGUAGE CHANGES Title: "New operator systell()" From: Chip Salzenberg Files: doio.c ext/Opcode/Opcode.pm keywords.pl opcode.pl pod/perldelta.pod pod/perldiag.pod pod/perlfunc.pod pp_sys.c t/op/sysio.t toke.c But that wasn't to last long at all, and Chip got to eat his words very shortly thereafter with the *g* sub-sub-release of 5.003_97! ----------------- Version 5.003_97g ----------------- This one has two security bug fixes for buffer overflows. Perl has not yet been searched to see if more fixes are needed. CORE LANGUAGE CHANGES Title: "Improve sysseek(), remove systell(), fix Opcode" From: Chip Salzenberg Files: doio.c ext/Opcode/Makefile.PL ext/Opcode/Opcode.pm ext/Opcode/Opcode.xs global.sym keywords.pl opcode.pl pod/perldelta.pod pod/perldiag.pod pod/perlfunc.pod pp_sys.c proto.h t/op/sysio.t toke.c Seems to be there's still no sysreadline, but I've published source code for that. You need it if you're using sysread(), syswrite(), and C- style select. Otherwise you'll get wrong answers. I don't know if the raw, unbuffered I/O layer makes any of this any easier. I doubt it. As for the input operator, go back and reread *its* purpose. It does a higher-level thing than fgets(), and with good intent and effect. >> I firmly believe that this needs to be a part of *EVERY* Perl >> program, which consequently means it should *not* be there, but >> in the core itself: >> >> END { close(STDOUT) || die "can't close stdout: $!" } > Yup, and also for STDERR (if closing STDERR fails you can attempt to > print a warning message to it, which might or might not get through, > but you can at least exit with nonzero status). Possibly. Probably. Maybe. >> Furthermore, I also believe all *IMPLICIT* closes that fail need to >> generate a mandatory io warning, > This also makes perfect sense. Easy for you to say. Others have said otherwise in the past, and I've not prevailed. I don't know if those people are still alive anymore though, for I don't recall who they were nor the thrust of their arguments against such sanity. I don't recall ever having been convinced of whatever "reasoning" they used, but absence of evidence is not to be confused with evidence of absence, so perhaps I was and that memory is now hidden from me. > Does anyone really object to this? They clearly did, or it would have been done. But maybe this is one of those rare periods in history where we can fix it when the PC police aren't breathing down our throats. One shall see. > In principle it is a backwards incompatible change (code that > previously chugged along without visible problems will now start > producing warnings or exiting with nonzero status) but I can't see the > point of taking compatibility to such extremes. A bug is a bug, and this is. Always has been. --tom
- fixing the silently failing implicit-close bug (was magic crud) by Tom Christiansen
nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About