Carview!

CARVIEW

MOTORHOMES

Select Language

HTTP/2 302 server: nginx date: Sat, 16 Aug 2025 09:43:37 GMT content-type: text/plain; charset=utf-8 content-length: 0 x-archive-redirect-reason: found capture at 20080302145956 location: https://web.archive.org/web/20080302145956/https://www.oreilly.com/catalog/javaio/toc.html server-timing: captures_list;dur=0.605363, exclusion.robots;dur=0.019882, exclusion.robots.policy;dur=0.009296, esindex;dur=0.010937, cdx.remote;dur=20.974001, LoadShardBlock;dur=351.979127, PetaboxLoader3.datanode;dur=205.339989, PetaboxLoader3.resolve;dur=67.412788 x-app-server: wwwb-app203 x-ts: 302 x-tr: 399 server-timing: TR;dur=0,Tw;dur=0,Tc;dur=0 set-cookie: wb-p-SERVER=wwwb-app203; path=/ x-location: All x-rl: 0 x-na: 0 x-page-cache: MISS server-timing: MISS x-nid: DigitalOcean referrer-policy: no-referrer-when-downgrade permissions-policy: interest-cohort=() HTTP/2 200 server: nginx date: Sat, 16 Aug 2025 09:43:39 GMT content-type: text/html x-archive-orig-date: Sun, 02 Mar 2008 14:59:56 GMT x-archive-orig-server: Apache x-archive-orig-p3p: policyref="https://www.oreillynet.com/w3c/p3p.xml",CP="CAO DSP COR CURa ADMa DEVa TAIa PSAa PSDa IVAa IVDa CONo OUR DELa PUBi OTRa IND PHY ONL UNI PUR COM NAV INT DEM CNT STA PRE" x-archive-orig-last-modified: Tue, 26 Feb 2008 08:34:03 GMT x-archive-orig-accept-ranges: bytes x-archive-orig-content-length: 655503 x-archive-orig-x-cache: MISS from oregano.bp x-archive-orig-x-cache-lookup: MISS from oregano.bp:3128 x-archive-orig-via: 1.0 oregano.bp:3128 (squid/2.6.STABLE12) x-archive-orig-connection: close x-archive-guessed-content-type: text/html x-archive-guessed-charset: utf-8 memento-datetime: Sun, 02 Mar 2008 14:59:56 GMT link: ; rel="original", ; rel="timemap"; type="application/link-format", ; rel="timegate", ; rel="first memento"; datetime="Wed, 13 Oct 1999 11:58:19 GMT", ; rel="prev memento"; datetime="Wed, 30 Jan 2008 01:56:55 GMT", ; rel="memento"; datetime="Sun, 02 Mar 2008 14:59:56 GMT", ; rel="next memento"; datetime="Wed, 30 Apr 2008 09:53:11 GMT", ; rel="last memento"; datetime="Fri, 26 Sep 2008 10:00:00 GMT" content-security-policy: default-src 'self' 'unsafe-eval' 'unsafe-inline' data: blob: archive.org web.archive.org web-static.archive.org wayback-api.archive.org athena.archive.org analytics.archive.org pragma.archivelab.org wwwb-events.archive.org x-archive-src: 51_2_20080302140300_crawl104-c/51_2_20080302145358_crawl100.arc.gz server-timing: captures_list;dur=0.625281, exclusion.robots;dur=0.022983, exclusion.robots.policy;dur=0.011522, esindex;dur=0.011137, cdx.remote;dur=49.533969, LoadShardBlock;dur=345.453005, PetaboxLoader3.datanode;dur=363.647988, PetaboxLoader3.resolve;dur=148.829348, load_resource;dur=316.914372 x-app-server: wwwb-app203 x-ts: 200 x-tr: 1019 server-timing: TR;dur=0,Tw;dur=0,Tc;dur=0 x-location: All x-rl: 0 x-na: 0 x-page-cache: MISS server-timing: MISS x-nid: DigitalOcean referrer-policy: no-referrer-when-downgrade permissions-policy: interest-cohort=() content-encoding: gzip O'Reilly Media | Java I/O

Buy this Book

Read it Now!

Reprint Licensing

-- Please select a chapter from the Table of Contents and click the button above to begin the licensing process.

Tell a friend

Java I/O

By Elliotte Rusty Harold

Cover | Table of Contents | Index | Sample Chapter | Colophon

Chapter 1: Introducing I/O

Content preview·Buy reprint rights for this chapter

Input and output, I/O for short, are fundamental to any computer operating system or programming language. Only theorists find it interesting to write programs that don't require input or produce output. At the same time, I/O hardly qualifies as one of the more "thrilling" topics in computer science. It's something in the background, something you use every day—but for most developers, it's not a topic with much sex appeal.

There are plenty of reasons for Java programmers to find I/O interesting. Java includes a particularly rich set of I/O classes in the core API, mostly in the java.io package. For the most part I/O in Java is divided into two types: byte- and number-oriented I/O, which is handled by input and output streams; and character and text I/O, which is handled by readers and writers. Both types provide an abstraction for external data sources and targets that allows you to read from and write to them, regardless of the exact type of the source. You use the same methods to read from a file that you do to read from the console or from a network connection.

But that's just the tip of the iceberg. Once you've defined abstractions that let you read or write without caring where your data is coming from or where it's going to, you can do a lot of very powerful things. You can define I/O streams that automatically compress, encrypt, and filter from one data format to another, and more. Once you have these tools, programs can send encrypted data or write zip files with almost no knowledge of what they're doing; cryptography or compression can be isolated in a few lines of code that say, "Oh yes, make this an encrypted output stream."

In this book, I'll take a thorough look at all parts of Java's I/O facilities. This includes all the different kinds of streams you can use. We're also going to investigate Java's support for Unicode (the standard multilingual character set). We'll look at Java's powerful facilities for formatting I/O—oddly enough, not part of the java.io package proper. (We'll see the reasons for this design decision later.) Finally, we'll take a brief look at the Java Communications API (

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

What Is a Stream?

Content preview·Buy reprint rights for this chapter

A stream is an ordered sequence of bytes of undetermined length. Input streams move bytes of data into a Java program from some generally external source. Output streams move bytes of data from Java to some generally external target. (In special cases streams can also move bytes from one part of a Java program to another.)

The word stream is derived from an analogy with a stream of water. An input stream is like a siphon that sucks up water; an output stream is like a hose that sprays out water. Siphons can be connected to hoses to move water from one place to another. Sometimes a siphon may run out of water if it's drawing from a finite source like a bucket. On the other hand, if the siphon is drawing water from a river, it may well provide water indefinitely. So too an input stream may read from a finite source of bytes like a file or an unlimited source of bytes like System.in. Similarly an output stream may have a definite number of bytes to output or an indefinite number of bytes.

Input to a Java program can come from many sources. Output can go to many different kinds of destinations. The power of the stream metaphor and in turn the stream classes is that the differences between these sources and destinations are abstracted away. All input and output are simply treated as streams.

The first source of input most programmers encounter is System.in . This is the same thing as stdin in C, generally some sort of console window, probably the one in which the Java program was launched. If input is redirected so the program reads from a file, then System.in is changed as well. For instance, on Unix, the following command redirects stdin so that when the MessageServer program reads from System.in, the actual data comes from the file data.txt instead of the console:

% java MessageServer < data.txt

The console is also available for output through the static field

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Numeric Data

Content preview·Buy reprint rights for this chapter

Input streams read bytes and output streams write bytes. Readers read characters and writers write characters. Therefore, to understand input and output, you first need a solid understanding of how Java deals with bytes, integers, characters, and other primitive data types, and when and why one is converted into another. In many cases Java's behavior is not obvious.

The fundamental integer data type in Java is the int , a four-byte, big-endian, two's complement integer. An int can take on all values between -2,147,483,648 and 2,147,483,647. When you type a literal integer like 7, -8345, or 3000000000 in Java source code, the compiler treats that literal as an int. In the case of 3000000000 or similar numbers too large to fit in an int, the compiler emits an error message citing "Numeric overflow."

longs are eight-byte, big-endian, two's complement integers with ranges from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. long literals are indicated by suffixing the number with a lower- or uppercase L. An uppercase L is preferred because the lowercase l is too easily confused with the numeral 1 in most fonts. For example, 7L, -8345L, and 3000000000L are all 64-bit long literals.

There are two more integer data types available in Java, the short and the byte . shorts are two-byte, big-endian, two's complement integers with ranges from -32,768 to 32,767. They're rarely used in Java and are included mainly for compatibility with C.

bytes, however, are very much used in Java. In particular they're used in I/O. A byte is an eight-bit, two's complement integer that ranges from -128 to 127. Note that like all numeric data types in Java, a byte is signed. The maximum byte value is 127. 128, 129, and so on through 255 are not legal values for bytes.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Character Data

Content preview·Buy reprint rights for this chapter

Numbers are only part of the data a typical Java program needs to read and write. Most programs also need to handle text, which is composed of characters. Since computers only really understand numbers, characters are encoded by matching each character in a given script to a particular number. For example, in the common ASCII encoding, the character A is mapped to the number 65; the character B is mapped to the number 66; the character C is mapped to the number 67; and so on. Different encodings may encode different scripts or may encode the same or similar scripts in different ways.

Java understands several dozen different character sets for a variety of languages, ranging from ASCII to the Shift Japanese Input System (SJIS) to Unicode. Internally, Java uses the Unicode character set. Unicode is a two-byte extension of the one-byte ISO Latin-1 character set, which in turn is an eight-bit superset of the seven-bit ASCII character set.

ASCII, the American Standard Code for Information Interchange, is a seven-bit character set. Thus it defines 2⁷ or 128 different characters whose numeric values range from to 127. These characters are sufficient for handling most of American English and can make reasonable approximations to most European languages (with the notable exceptions of Russian and Greek). It's an often used lowest common denominator format for different computers. If you were to read a byte value between and 127 from a stream, then cast it to a char, the result would be the corresponding ASCII character.

ASCII characters 0-31 and character 127 are nonprinting control characters. Characters 32-47 are various punctuation and space characters. Characters 48-57 are the digits 0-9. Characters 58-64 are another group of punctuation characters. Characters 65-90 are the capital letters A-Z. Characters 91-96 are a few more punctuation marks. Characters 97-122 are the lowercase letters a-z. Finally, characters 123 through 126 are a few remaining punctuation symbols. The complete ASCII character set is shown in Table 2.1 in Appendix B.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Readers and Writers

Content preview·Buy reprint rights for this chapter

In Java 1.1 and later, streams are primarily intended for data that can be read as pure bytes—basically byte data and numeric data encoded as binary numbers of one sort or another. Streams are specifically not intended for use when reading and writing text, including both ASCII text, like "Hello World," and numbers formatted as text, like "3.1415929." For these purposes, you should use readers and writers.

Input and output streams are fundamentally byte-based. Readers and writers are based on characters, which can have varying widths depending on the character set. For example, ASCII and ISO Latin-1 use one-byte characters. Unicode uses two-byte characters. UTF-8 uses characters of varying width (between one and three bytes). Since characters are ultimately composed of bytes, readers take their input from streams. However, they convert those bytes into chars according to a specified encoding format before passing them along. Similarly, writers convert chars to bytes according to a specified encoding before writing them onto some underlying stream.

The java.io.Reader and java.io.Writer classes are abstract superclasses for classes that read and write character-based data. The subclasses are notable for handling the conversion between different character sets. There are nine reader and eight writer classes in the core Java API, all in the java.io package:

`BufferedReader`	`BufferedWriter`
`CharArrayReader`	`CharArrayWriter`

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

The Ubiquitous IOException

Content preview·Buy reprint rights for this chapter

As computer operations go, input and output are unreliable. They are subject to problems completely outside the programmer's control. Disks can develop bad sectors while a file is being read; construction workers drop backhoes through the cables that connect your WAN; users unexpectedly cancel their input; telephone repair crews shut off your modem line while trying to repair someone else's. (This last one actually happened to me while writing this chapter. My modem kept dropping the connection and then not getting a dial tone; I had to hunt down the telephone "repairman" in my building's basement and explain to him that he was working on the wrong line.)

Because of these potential problems and many more, almost every method that performs input or output is declared to throw IOException. IOException is a checked exception, so you must either declare that your methods throw it or enclose the call that can throw it in a try/catch block. The only real exceptions to this rule are the PrintStream and PrintWriter classes. Because it would be inconvenient to wrap a try/catch block around each call to System.out.println(), Sun decided to have PrintStream (and later PrintWriter) catch and eat any exceptions thrown inside a print() or println() method. If you do want to check for exceptions inside a print() or println() method, you can call checkError() :

public boolean checkError()

The checkError() method returns true if an exception has occurred on this print stream, false if one hasn't. It only tells you that an error occurred. It does not tell you what sort of error occurred. If you need to know more about the error, you'll have to use a different output stream or writer class.

IOException has many subclasses—15 in java.io—and methods often throw a more specific exception that subclasses IOException. (However, methods usually only declare that they throw an IOException.) Here are the subclasses of IOException that you'll find in

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

The Console: System.out, System.in, and System.err

Content preview·Buy reprint rights for this chapter

The console is the default destination for output written to System.out or System.err and the default source of input for System.in . On most platforms the console is the command-line environment from which the Java program was initially launched, perhaps an xterm (Figure 1.1) or a DOS shell window (Figure 1.2). The word console is something of a misnomer, since on Unix systems the console refers to a very specific command-line shell, rather than being a generic term for command-line shells overall.

Figure 1.1: An xterm console on Unix

Figure 1.2: A DOS shell console on Windows NT

Many common misconceptions about I/O occur because most programmers' first exposure to I/O is through the console. The console is convenient for quick hacks and toy examples commonly found in textbooks, and I will use it for that in this book, but it's really a very unusual source of input and destination for output, and good Java programs avoid it. It behaves almost, but not completely, unlike anything else you'd want to read from or write to. While consoles make convenient examples in programming texts like this one, they're a horrible user interface and really have little place in modern programs. Users are more comfortable with a well-defined graphical user interface. Furthermore, the console is unreliable across platforms. The Mac, for example, has no native console. Macintosh Runtime for Java 2 and earlier has a console window that works only for output, but not for input; that is, System.out works but System.in does not. Figure 1.3 shows the Mac console window.

Figure 1.3: The Mac console, used exclusively by Java programs

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Security Checks on I/O

Content preview·Buy reprint rights for this chapter

One of the original fears about downloading executable content like applets from the Internet was that a hostile applet could erase your hard disk or read your Quicken files. Nothing's happened to change that since Java was introduced. This is why Java applets run under the control of a security manager that checks each operation an applet performs to prevent potentially hostile acts.

The security manager is particularly careful about I/O operations. For the most part, the checks are related to these questions:

Can an applet read a file?
Can an applet write a file?
Can an applet delete a file?
Can an applet determine whether a file exists?
Can an applet make a network connection to a particular host?
Can applet accept an incoming connection from a particular host?

The short answer to all these questions is "No, it cannot." A slightly more elaborate answer would specify a few exceptions. Applets can make network connections to the host they came from; applets can read a few very specific files that contain information about the Java environment; and trusted applets may sometimes run without these restrictions. But for almost all practical purposes, the answer is almost always no.

For more exotic situations, such as trusted applets, see Java Security by Scott Oaks, (O'Reilly & Associates, 1998). Trusted applets are useful on corporate networks, but you shouldn't waste a lot of time laboring under the illusion that anyone on the Internet at large will trust your applets.

Because of these security issues, you need to be careful when using code fragments and examples from this book in an applet. Everything shown here works when run in an application, but when run in an applet, it may fail with a

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Chapter 2: Output Streams

Content preview·Buy reprint rights for this chapter

The java.io.OutputStream class declares the three basic methods you need to write bytes of data onto a stream. It also has methods for closing and flushing streams.

public abstract void write(int b) throws IOException
public void write(byte[] data) throws IOException
public void write(byte[] data, int offset, int length) throws IOException
public void flush() throws IOException
public void close() throws IOException

OutputStream is an abstract class. Subclasses provide implementations of the abstract write(int b) method. They may also override the four nonabstract methods. For example, the FileOutputStream class overrides all five methods with native methods that know how to write bytes into files on the host platform. Although OutputStream is abstract, often you only need to know that the object you have is an OutputStream ; the more specific subclass of OutputStream is hidden from you. For example, the getOutputStream() method of java.net.URLConnection has the signature:

public OutputStream getOutputStream() throws IOException

Depending on the type of URL associated with this URLConnection object, the actual class of the output stream that's returned may be a sun.net.TelnetOutputStream , a sun.net.smtp.SmtpPrintStream , a sun.net.www.http.KeepAliveStream , or something else completely. All you know as a programmer, and all you need to know, is that the object returned is in fact some instance of OutputStream. That's why the detailed classes that handle particular kinds of connections are hidden inside the sun packages.

Furthermore, even when working with subclasses whose types you know, you still need to be able to use the methods inherited from OutputStream. And since methods that are inherited are not included in the online documentation, it's important to remember that they're there. For example, the java.io.DataOutputStream class does not declare a

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

The OutputStream Class

Content preview·Buy reprint rights for this chapter

The java.io.OutputStream class declares the three basic methods you need to write bytes of data onto a stream. It also has methods for closing and flushing streams.

public abstract void write(int b) throws IOException
public void write(byte[] data) throws IOException
public void write(byte[] data, int offset, int length) throws IOException
public void flush() throws IOException
public void close() throws IOException

public OutputStream getOutputStream() throws IOException

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Writing Bytes to Output Streams

Content preview·Buy reprint rights for this chapter

The fundamental method of the OutputStream class is write():

public abstract void write(int b) throws IOException

This method writes a single unsigned byte of data whose value should be between and 255. If you pass a number larger than 255 or smaller than zero, it's reduced modulo 256 before being written.

Example 2.1, AsciiChart , is a simple program that writes the printable ASCII characters (32 to 126) on the console. The console interprets the numeric values as ASCII characters, not as numbers. This is a feature of the console, not of the OutputStream class or the specific subclass of which System.out is an instance. The write() method merely sends a particular bit pattern to a particular output stream. How that bit pattern is interpreted depends on what's connected to the other end of the stream.

Example 2.1. The AsciiChart Program

import java.io.*;
public class AsciiChart {
  public static void main(String[] args) {
    
    for (int i = 32; i < 127; i++) {
      System.out.write(i);
      // break line after every eight characters.
      if (i % 8 == 7) System.out.write('\n');
      else System.out.write('\t');
    }
    System.out.write('\n');
   }
}

Notice the use of the char literals '\t' and '\n'. The compiler converts these to the numbers 9 and 10, respectively. When these numbers are written on the console, the console interprets those numbers as a tab and a linefeed, respectively. The same effect could have been achieved by writing the if clause like this:

if (i % 8 == 7) System.out.write(10);
else System.out.write(9);

Here's the output:

% java AsciiChart

!       "       #       $       %       &       '
(       )       *       +       ,       -       .       /
0       1       2       3       4       5       6       7
8       9       :       ;       <       =       >       ?
@       A       B       C       D       E       F       G
H       I       J       K       L       M       N       O
P       Q       R       S       T       U       V       W
X       Y       Z       [       \       ]       ^       _
`       a       b       c       d       e       f       g
h       i       j       k       l       m       n       o
p       q       r       s       t       u       v       w
x       y       z       {       |       }       ~
%

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Writing Arrays of Bytes

Content preview·Buy reprint rights for this chapter

It's often faster to write larger chunks of data than to write byte by byte. Two overloaded variants of the write() method do this:

public void write(byte[] data) throws IOException
public void write(byte[] data, int offset, int length) throws IOException

The first variant writes the entire byte array data. The second writes only the sub-array of data starting at offset and continuing for length bytes. For example, the following code fragment blasts the bytes in a string onto System.out:

String s = "How are streams treating you?";
byte[] data = s.getBytes();
System.out.write(data);

Conversely, you may run into performance problems if you attempt to write too much data at a time. The exact turnaround point depends on the eventual destination of the data. Files are often best written in small multiples of the block size of the disk, typically 512, 1024, or 2048 bytes. Network connections often require smaller buffer sizes, 128 or 256 bytes. The optimal buffer size depends on too many system-specific details for anything to be guaranteed, but I often use 128 bytes for network connections and 1024 bytes for files.

Example 2.2 is a simple program that constructs a byte array filled with an ASCII chart, then blasts it onto the console in one call to write().

Example 2.2. The AsciiArray Program

import java.io.*;
public class AsciiArray {
  public static void main(String[] args) {
    
    byte[] b = new byte[(127-31)*2];
    int index = 0;
    for (int i = 32; i < 127; i++) {
      b[index++] = (byte) i;
      // Break line after every eight characters.
      if (i % 8 == 7) b[index++] = (byte) '\n';
      else b[index++] = (byte) '\t';
    }
    b[index++] = (byte) '\n';
    try {
      System.out.write(b);
    }
    catch (IOException e) { System.err.println(e); }
  }
}

The output is the same as in Example 2.1. Because of the nature of the console, this particular program probably isn't a lot faster than Example 2.1, but it certainly could be if you were writing data into a file rather than onto the console. The difference in performance between writing a

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Flushing and Closing Output Streams

Content preview·Buy reprint rights for this chapter

Many output streams buffer writes to improve performance. Rather than sending each byte to its destination as it's written, the bytes are accumulated in a memory buffer ranging in size from several bytes to several thousand bytes. When the buffer fills up, all the data is sent at once. The flush() method forces the data to be written whether or not the buffer is full:

public void flush() throws IOException

This is not the same as any buffering performed by the operating system or the hardware. These buffers will not be emptied by a call to flush(). (Then sync() method in the FileDescriptor class, discussed in Chapter 12, can sometimes be used to empty these buffers.) For example, assuming out is an OutputStream of some sort, you would call out.flush() to empty the buffers.

If you only use a stream for a short time, you don't need to flush it explicitly. It should be flushed when the stream is closed. This should happen when the program exits or when you explicitly invoke the close() method:

public void close() throws IOException

For example, again assuming out is an OutputStream of some sort, calling out.close() closes the stream and implicitly flushes it. Once you have closed an output stream, you can no longer write to it. Attempting to do so will throw an IOException.

Again, System.out is a partial exception because as a PrintStream , all exceptions it throws are eaten. Once you close System.out, you can't write to it, but trying to do so won't throw any exceptions. However, your output will not appear on the console.

You only need to flush an output stream explicitly if you want to make sure data is sent before you're through with the stream. For example, a program that sends a burst of data across the network periodically should flush after each burst of data is written to the stream.

Flushing is often important when you're trying to debug a crashing program. All streams flush automatically when their buffers fill up, and all streams should be flushed when a program terminates normally. If a program terminates abnormally, however, buffers may not get flushed. In this case, unless there is an explicit call to

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Subclassing OutputStream

Content preview·Buy reprint rights for this chapter

OutputStream is an abstract class that mainly describes the operations available with any particular OutputStream object. Specific subclasses know how to write bytes to particular destinations. For instance, a FileOutputStream uses native code to write data in files. A ByteArrayOutputStream uses pure Java to write its output in a potentially expanding byte array.

Recall that there are three overloaded variants of the write() method in OutputStream, one abstract, two concrete:

public abstract void write(int b) throws IOException
public void write(byte[] data) throws IOException
public void write(byte[] data, int offset, int length) throws IOException

Subclasses must implement the abstract write(int b) method. They often choose to override the third variant, write(byte[], data int offset, int length), for reasons of performance. The implementation of the three-argument version of the write() method in OutputStream simply invokes write(int b) repeatedly; that is:

public void write(byte[] data, int offset, int length) throws IOException {
  for (int i = offset; i < offset+length; i++) write(data[i]);
}

Most subclasses can provide more efficient implementations of this method. The one-argument variant of write() merely invokes write(data, 0, data.length); if the three-argument variant has been overridden, this method will perform reasonably well. However, a few subclasses may override it anyway.

Example 2.3 is a simple program called NullOutputStream that mimics the behavior of /dev/null on Unix operating systems. Data written into a null output stream is lost.

Example 2.3. The NullOutputStream Class

package com.macfaq.io;
import java.io.*;
public class NullOutputStream extends OutputStream {
  public void write(int b) { }
  public void write(byte[] data) { }
  public void write(byte[] data, int offset, int length) { }
}

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

A Graphical User Interface for Output Streams

Content preview·Buy reprint rights for this chapter

As a useful example, I'm going to show a subclass of java.awt.TextArea that can be connected to an output stream. As data is written onto the stream, it is appended to the text area in the default character set (generally ISO Latin-1). (This isn't ideal. Since text areas contain text, a writer would be a better source for this data; in later chapters I'll expand on this class to use a writer instead. For now this makes a neat example.) This subclass is shown in Example 2.4.

The actual output stream is contained in an inner class inside the StreamedTextArea class. Each StreamedTextArea component contains a TextAreaOutputStream object in its theOutput field. Client programmers access this object via the getOutputStream() method of the StreamedTextArea class. The StreamedTextArea class has five overloaded constructors that imitate the five constructors in the java.awt.TextArea class, each taking a different combination of text, rows, columns, and scrollbar information. The first four constructors merely pass their arguments and suitable defaults to the most general fifth constructor using this(). The fifth constructor calls the most general superclass constructor, then calls setEditable(false) to ensure that the user doesn't change the text while output is streaming into it.

I've chosen not to override any methods in the TextArea superclass. However, you might want to do so if you feel a need to change the normal abilities of a text area. For example, you could include a do-nothing append() method so that data can only be moved into the text area via the provided output stream or a setEditable() method that doesn't allow the client programmer to make this area editable.

Example 2.4. The StreamedTextArea Component

package com.macfaq.awt;
import java.awt.*;
import java.io.*;
public class StreamedTextArea extends TextArea {
  OutputStream theOutput = new TextAreaOutputStream();
  public StreamedTextArea() {
    this("carview.php?tsp=", 0, 0, SCROLLBARS_BOTH);
  }
  public StreamedTextArea(String text) {
    this(text, 0, 0, SCROLLBARS_BOTH);
  } 
  public StreamedTextArea(int rows, int columns) {
    this("carview.php?tsp=", rows, columns, SCROLLBARS_BOTH);
  }
  public StreamedTextArea(String text, int rows, int columns) {
    this(text, rows, columns, SCROLLBARS_BOTH);
  }
  public StreamedTextArea(String text, int rows, int columns, int scrollbars) {
    super(text, rows, columns, scrollbars);
    setEditable(false);
  }
  public OutputStream getOutputStream() {
    return theOutput;
  }
  class TextAreaOutputStream extends OutputStream {
    public synchronized void write(int b) {
      // recall that the int should really just be a byte
      b &= 0x000000FF;
      // must convert byte to a char in order to append it
      char c = (char) b;
      append(String.valueOf(c));
    }
    public synchronized void write(byte[] data, int offset, int length) {
      append(new String(data, offset, length));
    }
  }
}

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Chapter 3: Input Streams

Content preview·Buy reprint rights for this chapter

The java.io.InputStream class is the abstract superclass for all input streams. It declares the three basic methods needed to read bytes of data from a stream. It also has methods for closing and flushing streams, checking how many bytes of data are available to be read, skipping over input, marking a position in a stream and resetting back to that position, and determining whether marking and resetting are supported.

public abstract int read() throws IOException
public int read(byte[] data) throws IOException
public int read(byte[] data, int offset, int length) throws IOException
public long skip(long n) throws IOException
public int available() throws IOException
public void close() throws IOException
public synchronized void mark(int readlimit)
public synchronized void reset() throws IOException
public boolean markSupported()

The fundamental method of the InputStream class is read() , which reads a single unsigned byte of data and returns the integer value of the unsigned byte. This is a number between and 255:

public abstract int read() throws IOException

The following code reads 10 bytes from the System.in input stream and stores them in the int array data:

int[] data = new int[10];
for (int i = 0; i < data.length; i++) {
  data[i] = System.in.read();
}

Notice that although read() is reading a byte, it returns an int. If you want to store the raw bytes instead, you can cast the int to a byte. For example:

byte[] b = new byte[10];
for (int i = 0; i < b.length; i++) {
  b[i] = (byte) System.in.read();
}

Of course, this produces a signed byte instead of the unsigned byte returned by the read() method (that is, a byte in the range -128 to 127 instead of to 255). As long as you're clear in your mind and your code about whether you're working with signed or unsigned data, you won't have any trouble. Signed bytes can be converted back to

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

The InputStream Class

Content preview·Buy reprint rights for this chapter

public abstract int read() throws IOException
public int read(byte[] data) throws IOException
public int read(byte[] data, int offset, int length) throws IOException
public long skip(long n) throws IOException
public int available() throws IOException
public void close() throws IOException
public synchronized void mark(int readlimit)
public synchronized void reset() throws IOException
public boolean markSupported()

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

The read( ) Method

Content preview·Buy reprint rights for this chapter

The fundamental method of the InputStream class is read() , which reads a single unsigned byte of data and returns the integer value of the unsigned byte. This is a number between and 255:

public abstract int read() throws IOException

The following code reads 10 bytes from the System.in input stream and stores them in the int array data:

int[] data = new int[10];
for (int i = 0; i < data.length; i++) {
  data[i] = System.in.read();
}

Notice that although read() is reading a byte, it returns an int. If you want to store the raw bytes instead, you can cast the int to a byte. For example:

byte[] b = new byte[10];
for (int i = 0; i < b.length; i++) {
  b[i] = (byte) System.in.read();
}

int i = (b >= 0) ? b : 256 + b;

When you call read(), you also have to catch the IOException that it might throw. As I've observed, input and output are often subject to problems outside of your control: disks fail, network cables break, and so on. Therefore, virtually any I/O method can throw an IOException, and read() is no exception. You don't get an IOException if read() encounters the end of the input stream; in this case, it returns -1. You use this as a flag to watch for the end of stream. The following code shows how to catch the IOException and test for the end of the stream:

try {
  int[] data = new int[10];
  for (int i = 0; i < data.length; i++) {
    int datum = System.in.read();
    if (datum  == -1) break;
    data[i] = datum;
  }
}
catch (IOException e) {System.err.println("Couldn't read from System.in!");}

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Reading Chunks of Data from a Stream

Content preview·Buy reprint rights for this chapter

Input and output are often the performance bottlenecks in a program. Reading from or writing to disk can be hundreds of times slower than reading from or writing to memory; network connections and user input are even slower. While disk capacities and speeds have increased over time, they have never kept pace with CPU speeds. Therefore, it's important to minimize the number of reads and writes a program actually performs.

All input streams have overloaded read() methods that read chunks of contiguous data into a byte array. The first variant tries to read enough data to fill the array data. The second variant tries to read length bytes of data starting at position offset into the array data. Neither of these methods is guaranteed to read as many bytes as they want. Both methods return the number of bytes actually read, or -1 on end of stream.

public int read(byte[] data) throws IOException
public int read(byte[] data, int offset, int length) throws IOException

The default implementation of these methods in the java.io.InputStream class merely calls the basic read() method enough times to fill the requested array or subarray. Thus, reading 10 bytes of data takes 10 times as long as reading one byte of data. However, most subclasses of InputStream override these methods with more efficient methods, perhaps native, that read the data from the underlying source as a block.

For example, to attempt to read 10 bytes from System.in, you could write the following code:

try {
  byte[] b = new byte[10];
  System.in.read(b);
}
catch (IOException e) {System.err.println("Couldn't read from System.in!");}

Reads don't always succeed in getting as many bytes as you want. Conversely, there's nothing to stop you from trying to read more data into the array than will fit. If you read more data than the array can hold, an ArrayIndexOutOfBoundsException will be thrown. For example, the following code loops repeatedly until it either fills the array or sees the end of stream:

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Counting the Available Bytes

Content preview·Buy reprint rights for this chapter

It's sometimes convenient to know how many bytes are available to be read before you attempt to read them. The InputStream class's available() method tells you how many bytes you can read without blocking. It returns if there's no data available to be read.

public int available() throws IOException

For example:

try {
  byte[] b = new byte[100];
  int offset = 0;
  while (offset < b.length) {
    int a = System.in.available();
    int bytesRead = System.in.read(b, offset, a);
    if (bytesRead == -1) break; // end of stream
    offset += bytesRead;
}
catch (IOException e) {System.err.println("Couldn't read from System.in!");}

There's a potential bug in this code. There may be more bytes available than there's space in the array to hold them. One common idiom is to size the array according to the number available() returns, like this:

try {
  byte[] b = new byte[System.in.available()];
  System.in.read(b);
}
catch (IOException e) {System.err.println("Couldn't read from System.in!");}

This works well if you're only going to perform a single read. For multiple reads, however, the overhead of creating multiple arrays is excessive. You should probably reuse the array and only create a new array if more bytes are available than will fit in the array.

The available() method in java.io.InputStream always returns 0. Subclasses are supposed to override it, but I've seen a few that don't. You may be able to read more bytes from the underlying stream without blocking than available() suggests; you just can't guarantee that you can. If this is a concern, you can place input in a separate thread so that blocked input doesn't block the rest of the program.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Skipping Bytes

Content preview·Buy reprint rights for this chapter

Although you can just read from a stream and ignore the bytes read, Java provides a skip() method that jumps over a certain number of bytes in the input:

public long skip(long bytesToSkip) throws IOException

The argument to skip() is the number of bytes to skip. The return value is the number of bytes actually skipped, which may be less than bytesToSkip. -1 is returned if the end of stream is encountered. Both the argument and return value are longs, allowing skip() to handle extremely long input streams. Skipping is often faster than reading and discarding the data you don't want. For example, when an input stream is attached to a file, skipping bytes just requires that an integer called the file pointer be changed, whereas reading involves copying bytes from the disk into memory. For example, to skip the next 80 bytes of the input stream in:

try {
  long bytesSkipped = 0;
  long bytesToSkip = 80;
  while (bytesSkipped < bytesToSkip) {
    long n = in.skip(bytesToSkip - bytesSkipped);
    if (n == -1) break;
    bytesSkipped += n;
  }
}
catch (IOException e) {System.err.println(e);}

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Closing Input Streams

Content preview·Buy reprint rights for this chapter

When you're through with a stream, you should close it. This allows the operating system to free any resources associated with the stream; exactly what these resources are depends on your platform and varies with the type of the stream. However, systems only have finite resources. For example, on most personal computer operating systems, no more than several hundred files can be open at once. Multiuser operating systems have larger limits, but limits nonetheless.

To close a stream, you invoke its close() method:

public void close() throws IOException

Not all streams need to be closed—System.in generally does not need to be closed, for example. However, streams associated with files and network connections should always be closed when you're done with them. For example:

try {
  URL u = new URL("https://www.javasoft.com/");
  InputStream in = u.openStream();
  // Read from the stream...
  in.close();
}
catch (IOException e) {System.err.println(e);}

Once you have closed an input stream, you can no longer read from it. Attempting to do so will throw an IOException.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Marking and Resetting

Content preview·Buy reprint rights for this chapter

It's often useful to be able to read a few bytes and then back up and reread them. For example, in a Java compiler, you don't know for sure whether you're reading the token <, <<, or <<= until you've read one too many characters. It would be useful to be able to back up and reread the token once you know which token you've read. Compiler design and other parsing problems provide many more examples, and this need occurs in other domains as well.

Some (but not all) input streams allow you to mark a particular position in the stream and then return to it. Three methods in the java.io.InputStream class handle marking and resetting:

public synchronized void mark(int readLimit)
public synchronized void reset() throws IOException
public boolean markSupported()

The boolean markSupported() method returns true if this stream supports marking and false if it doesn't. If marking is not supported, reset() throws an IOException and mark() does nothing. Assuming the stream does support marking, the mark() method places a bookmark at the current position in the stream. You can rewind the stream to this position later with reset() as long as you haven't read more than readLimit bytes. There can be only one mark in the stream at any given time. Marking a second location erases the first mark.

The only two input stream classes in java.io that always support marking are BufferedInputStream (of which System.in is an instance) and ByteArrayInputStream. However, other input streams, like DataInputStream , may support marking if they're chained to a buffered input stream first.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Subclassing InputStream

Content preview·Buy reprint rights for this chapter

Immediate subclasses of InputStream must provide an implementation of the abstract read() method. They may also override some of the nonabstract methods. For example, the default markSupported() method returns false, mark() does nothing, and reset() throws an IOException. Any class that allows marking and resetting must override these three methods. Furthermore, they may want to override methods that perform functions like skip() and the other two read() methods to provide more efficient implementations.

Example 3.2 is a simple class called RandomInputStream that "reads" random bytes of data. This provides a useful source of unlimited data you can use in testing. A java.util.Random object provides the data.

Example 3.2. The RandomInputStream Class

package com.macfaq.io;
import java.util.*;
import java.io.*;
public class RandomInputStream extends InputStream {
  private transient Random generator = new Random();
  public int read() {
    int result = generator.nextInt() % 256;
    if (result < 0) result = -result;
    return result;
  }
  public int read(byte[] data, int offset, int length) throws IOException {
    byte[] temp = new byte[length];
    generator.nextBytes(temp);
    System.arraycopy(temp, 0, data, offset, length);
    return length;
  }
  public int read(byte[] data) throws IOException {
    generator.nextBytes(data);
    return data.length;
  }
  public long skip(long bytesToSkip) throws IOException {
  
    // It's all random so skipping has no effect.
    return bytesToSkip;
  
  }
}

The no-argument read() method returns a random int in the range of an unsigned byte (0 to 255). The other two read() methods fill a specified part of an array with random bytes. They return the number of bytes read (in this case the number of bytes created).

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

An Efficient Stream Copier

Content preview·Buy reprint rights for this chapter

As a useful example of both input and output streams, in Example 3.3 I'll present a StreamCopier class that copies data between two streams as quickly as possible. (I'll reuse this class in later chapters.) This method reads from the input stream and writes onto the output stream until the input stream is exhausted. A 256-byte buffer is used to try to make the reads efficient. A main() method provides a simple test for this class by reading from System.in and copying to System.out.

Example 3.3. The StreamCopier Class

package com.macfaq.io;
import java.io.*;
public class StreamCopier {
  public static void main(String[] args) {
    try {
    }
    catch (IOException e) {System.err.println(e);}
  }
  public static void copy(InputStream in, OutputStream out) 
   throws IOException {
    // Do not allow other threads to read from the input
    // or write to the output while copying is taking place
    synchronized (in) {
      synchronized (out) {
        byte[] buffer = new byte[256];
        while (true) {
          int bytesRead = in.read(buffer);
          if (bytesRead == -1) break;
          out.write(buffer, 0, bytesRead);
        }
      }
    }
  }
}

Here's a simple test run:

D:\JAVA\ioexamples\03>java com.macfaq.io.StreamCopier

this is a test
this is a test
0987654321
0987654321
^Z

Input was not fed from the console (DOS prompt) to the StreamCopier program until the end of each line. Since I ran this in Windows, the end-of-stream character is Ctrl-Z. On Unix it would have been Ctrl-D.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Chapter 4: File Streams

Content preview·Buy reprint rights for this chapter

Until now, most of the examples in this book have used the streams System.in and System.out. These are convenient for examples, but in real life, you'll more commonly attach streams to data sources like files and network connections. You'll use the java.io.FileInputStream and java.io.FileOutputStream classes, which are concrete subclasses of java.io.InputStream and java.io.OutputStream, to read and write files. FileInputStream and FileOutputStream provide input and output streams that let you read and write files. We'll discuss these classes in detail in this chapter; they provide the standard methods for reading and writing data. What they don't provide is a mechanism for file-specific operations, like finding out whether a file is readable or writable. For that, you may want to look forward to Chapter 12, which talks about the File class itself and the way Java works with files.

java.io.FileInputStream is a concrete subclass of java.io.InputStream. It provides an input stream connected to a particular file.

public class FileInputStream extends InputStream

FileInputStream has all the usual methods of input streams, such as read(), available(), skip(), and close(), which are used exactly as they are for any other input stream.

public native int read() throws IOException
public int read(byte[] data) throws IOException
public int read(byte[] data, int offset, int length) throws IOException
public native long skip(long n) throws IOException
public native int available() throws IOException
public native void close() throws IOException

These methods are all implemented in native code, except for the two multibyte read() methods. These, however, just pass their arguments on to a private native method called readBytes(), so effectively all these methods are implemented with native code. (In Java 2,

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Reading Files

Content preview·Buy reprint rights for this chapter

java.io.FileInputStream is a concrete subclass of java.io.InputStream. It provides an input stream connected to a particular file.

public class FileInputStream extends InputStream

FileInputStream has all the usual methods of input streams, such as read(), available(), skip(), and close(), which are used exactly as they are for any other input stream.

public native int read() throws IOException
public int read(byte[] data) throws IOException
public int read(byte[] data, int offset, int length) throws IOException
public native long skip(long n) throws IOException
public native int available() throws IOException
public native void close() throws IOException

There are three FileInputStream() constructors, which differ only in how the file to be read is specified:

public FileInputStream(String fileName) throws IOException
public FileInputStream(File file) throws FileNotFoundException
public FileInputStream(FileDescriptor fdObj)

The first constructor uses a string containing the name of the file. The second constructor uses a java.io.File object. The third constructor uses a java.io.FileDescriptor object. Filenames are platform-dependent, so hardcoded file names should be avoided where possible. Using the first constructor violates Sun's rules for "100% Pure Java" immediately. Therefore, the second two constructors are much preferred. Nonetheless, the second two will have to wait until File objects and file descriptors are discussed in Chapter 12. For now, I will use only the first.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Writing Files

Content preview·Buy reprint rights for this chapter

The java.io.FileOutputStream class is a concrete subclass of java.io.OutputStream that provides output streams connected to files.

public class FileOutputStream extends OutputStream

This class has all the usual methods of output streams, such as write(), flush(), and close(), which are used exactly as they are for any other output stream.

public native void write(int b) throws IOException
public void write(byte[] data) throws IOException
public void write(byte[] data, int offset, int length) throws IOException
public native void close() throws IOException

These are all implemented in native code except for the two multibyte write() methods. These, however, just pass their arguments on to a private native method called writeBytes(), so effectively all these methods are implemented with native code.

There are three main FileOutputStream() constructors, differing primarily in how the file is specified:

public FileOutputStream(String filename) throws IOException
public FileOutputStream(File file) throws IOException
public FileOutputStream(FileDescriptor fd)

The first constructor uses a string containing the name of the file; the second constructor uses a java.io.File object; the third constructor uses a java.io.FileDescriptor object. I will avoid using the second and third constructors until I've discussed File objects and file descriptors (Chapter 12). To write data to a file, just pass the name of the file to the FileOutputStream() constructor, then use the write() methods as normal. If the file does not exist, all three constructors will create it. If the file does exist, any data inside it will be overwritten.

A fourth constructor also lets you specify whether the file's contents should be erased before data is written into it (append == false) or whether data is to be tacked onto the end of the file (

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

File Viewer, Part 1

Content preview·Buy reprint rights for this chapter

I often find it useful to be able to open an arbitrary file and interpret it in an arbitrary fashion. Most commonly I want to view a file as text, but occasionally it's useful to interpret it as hexadecimal integers, IEEE 754 floating-point data, or something else. In this book, I'm going to develop a program that lets you open any file and view its contents in a variety of different ways. In each chapter, I'll add a piece to the program until it's fully functional. Since this is only the beginning of the program, it's important to keep the code as general and adaptable as possible.

Example 4.3 reads a series of filenames from the command line in the main() method. Each filename is passed to a method that opens the file. The file's data is read and printed on System.out. Exactly how the data is printed on System.out is determined by a command-line switch. If the user selects ASCII format (-a), then the data will be assumed to be ASCII (more properly, ISO Latin-1) text and printed as chars. If the user selects decimal dump (-d), then each byte should be printed as unsigned decimal numbers between and 255, 16 to a line. For example:

000 234 127 034 234 234 000 000 000 002 004 070 000 234 127 098

Leading zeros are used to maintain a constant width for the printed byte values and for each line. A simple selection algorithm is used to determine how many leading zeros to attach to each number. For hex dump format (-h), each byte should be printed as two hexadecimal digits. For example:

CA FE BA BE 07 89 9A 65 45 65 43 6F F6 7F 8F EE E5 67 63 26 98 9E 9C

Hexadecimal encoding is easier, because each byte is always exactly two hex digits. The static Integer.toHexString() method is used to convert each byte read into two hexadecimal digits.

ASCII format is the default and is the simplest to implement. This conversion can be accomplished merely by copying the input data to the console.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Chapter 5: Network Streams

Content preview·Buy reprint rights for this chapter

From its first days, Java has had the network in mind, more so than any other common programming language. Java is the first programming language to provide as much support for network I/O as it does for file I/O, perhaps even more—Java's URL, URLConnection, Socket, and ServerSocket classes are all fertile sources of streams. The exact type of the stream used by a network connection is typically hidden inside the undocumented sun classes. Thus, network I/O relies primarily on the basic InputStream and OutputStream methods, which you can wrap with any higher-level stream that suits your needs: buffering, cryptography, compression, or whatever your application requires.

The java.net.URL class represents a Uniform Resource Locator like https://metalab.unc.edu/javafaq/. Each URL unambiguously identifies the location of a resource on the Internet. The URL class has four constructors. All are declared to throw MalformedURLException, a subclass of IOException.

public URL(String u) throws MalformedURLException
public URL(String protocol, String host, String file) 
 throws MalformedURLException
public URL(String protocol, String host, int port, String file) 
 throws MalformedURLException
public URL(URL context, String u) throws MalformedURLException

A MalformedURLException is thrown if the constructor's arguments do not specify a valid URL. Often this means a particular Java implementation does not have the right protocol handler installed. Thus, given a complete absolute URL like https://www.poly.edu/schedule/fall97/bgrad.html#cs, you construct a URL object like this:

URL u = null;
try {
  u = new URL("https://www.poly.edu/schedule/fall97/bgrad.html#cs");
}
catch (MalformedURLException e) { }

You can also construct the URL object by passing its pieces to the constructor:

URL u = null;
try {
  u = new URL("http", "www.poly.edu", "/schedule/fall97/bgrad.html#cs");
}
catch (MalformedURLException e) { }

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

URLs

Content preview·Buy reprint rights for this chapter

public URL(String u) throws MalformedURLException
public URL(String protocol, String host, String file) 
 throws MalformedURLException
public URL(String protocol, String host, int port, String file) 
 throws MalformedURLException
public URL(URL context, String u) throws MalformedURLException

URL u = null;
try {
  u = new URL("https://www.poly.edu/schedule/fall97/bgrad.html#cs");
}
catch (MalformedURLException e) { }

You can also construct the URL object by passing its pieces to the constructor:

URL u = null;
try {
  u = new URL("http", "www.poly.edu", "/schedule/fall97/bgrad.html#cs");
}
catch (MalformedURLException e) { }

You don't normally need to specify a port for a URL; most protocols have default ports. For instance, the HTTP port is 80. Sometimes the port used does change, and in that case you can use the third constructor:

URL u = null;
try {
  u = new URL("http", "www.poly.edu", 80, "/schedule/fall97/bgrad.html#cs");
}
catch (MalformedURLException e) { }

Finally, many HTML files contain relative URLs. The fourth constructor in the previous code creates URLs relative to a given URL and is particularly useful when parsing HTML. For example, the following code creates a URL pointing to the file 08.html, taking the rest of the URL from

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

URL Connections

Content preview·Buy reprint rights for this chapter

URL connections are closely related to URLs, as their name implies. Indeed, you get a reference to a URLConnection by using the openConnection() method of a URL object; in many ways, the URL class is only a wrapper around the URLConnection class. However, URL connections provide more control over the communication between the client and the server. In particular, URL connections provide not just input streams by which the client can read data from the server, but also output streams to send data from the client to the server. This is essential for protocols like mailto.

The java.net.URLConnection class is an abstract class that handles communication with different kinds of servers, like FTP servers and web servers. Protocol-specific subclasses of URLConnection, hidden inside the sun classes, handle different kinds of servers.

URL connections take place in five steps:

The URL object is constructed.
The openConnection() method of the URL object creates the URLConnection object.
The parameters for the connection and the request properties that the client sends to the server are set up.
The connect() method makes the connection to the server, perhaps using a socket for a network connection or a file input stream for a local connection. The response header information is read from the server.
Data is read from the connection by using the input stream returned by getInputStream() or through a content handler with getContent(). Data can be sent to the server using the output stream provided by getOutputStream().

This scheme is very much based on the HTTP/1.0 protocol. It does not fit other schemes that have a more interactive "request, response, request, response, request, response" pattern instead of HTTP/1.0's "single request, single response, close connection" pattern. In particular, FTP and even HTTP/1.1 aren't well suited to this pattern. I wouldn't be surprised to see this replaced with something more general in a future version of Java.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Sockets

Content preview·Buy reprint rights for this chapter

Before data is sent across the Internet from one host to another, it is split into packets of varying but finite size called datagrams. Datagrams range in size from a few dozen bytes to about 60,000 bytes. Anything larger, and often things smaller, must be split into smaller pieces before it can be transmitted. The advantage of this scheme is that if one packet is lost, it can be retransmitted without requiring redelivery of all other packets. Furthermore, if packets arrive out of order, they can be reordered at the receiving end of the connection.

Fortunately, packets are invisible to the Java programmer. The host's native networking software splits data into packets on the sending end and reassembles packets on the receiving end. Instead, the Java programmer is presented with a higher-level abstraction called a socket. The socket represents a reliable connection for the transmission of data between two hosts. It isolates you from the details of packet encodings, lost and retransmitted packets, and packets that arrive out of order. A socket performs four fundamental operations:

Connect to a remote machine
Send data
Receive data
Close the connection

A socket may not be connected to more than one host at a time. However, a socket may both send data to and receive data from the host to which it's connected.

The java.net.Socket class is Java's interface to a network socket and allows you to perform all four fundamental socket operations. It provides raw, uninterpreted communication between two hosts. You can connect to remote machines; you can send data; you can receive data; you can close the connection. No part of the protocol is abstracted out, as it is with URL and URLConnection

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Server Sockets

Content preview·Buy reprint rights for this chapter

There are two ends to each connection: the client, which initiates the connection, and the server, which responds to the connection. So far, we've only discussed the client side and assumed that a server existed out there for the client to talk to. To implement a server, you need to write a program that waits for other hosts to connect to it. A server socket binds to a particular port on the local machine (the server); once it has successfully bound to a port, it listens for incoming connection attempts from remote machines (the clients). When the server detects a connection attempt, it accepts the connection. This creates a socket between the two machines over which the client and the server communicate.

Many clients can connect to a port on the server simultaneously. Incoming data is distinguished by the port to which it is addressed and the client host and port from which it came. The server can tell for which service (like HTTP or FTP) the data is intended by inspecting the port. It knows where to send any response by looking at the client address and port stored with the data.

No more than one server socket can listen to a particular port at one time. Therefore, since a server may need to handle many connections at once, server programs tend to be heavily multithreaded. Generally, the server socket listening on the port only accepts the connections. It passes off the actual processing of each connection to a separate thread. Incoming connections are stored in a queue until the server can accept them. On most systems, the default queue length is between 5 and 50. Once the queue fills up, further incoming connections are refused until space in the queue opens up.

The java.net.ServerSocket class represents a server socket. Three constructors let you specify the port to bind to, the queue length for incoming connections, and the IP address:

public ServerSocket(int port) throws IOException
public ServerSocket(int port, int backlog) throws IOException
public ServerSocket(int port, int backlog, InetAddress bindAddr) 
 throws IOException

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

URLViewer

Content preview·Buy reprint rights for this chapter

Example 5.6 is an improved version of the URLViewer you first encountered in Chapter 2. This is a simple application that provides a window in which you can view the contents of a URL. It assumes that those contents are more or less ASCII text. (In future chapters, I'll remove that restriction.) Figure 5.1 shows the result. Our application has a text area in which the user can type a URL, a Load button that the user uses to load the specified URL, and a StreamedTextArea component that displays the text from the URL. Each of these corresponds to a field in the URLViewer class.

Figure 5.1: The URLViewer

Example 5.6. The URLViewer Program

import java.awt.*;
import java.awt.event.*;
import java.io.*;
import java.net.*;
import com.macfaq.awt.*;
import com.macfaq.io.*;
public class URLViewer extends Frame 
 implements WindowListener, ActionListener {
  TextField theURL = new TextField();
  Button loadButton = new Button("Load");
  StreamedTextArea theDisplay = new StreamedTextArea();
  
  public URLViewer() {
    super("URL Viewer");
  }
  public void init() {
  
    this.add("North", theURL);
    this.add("Center", theDisplay);
    Panel south = new Panel();
    south.add(loadButton);
    this.add("South", south);
    theURL.addActionListener(this);
    loadButton.addActionListener(this);
    this.addWindowListener(this);
    this.setLocation(50, 50);
    this.pack();
    this.show();
  }
  public void actionPerformed(ActionEvent evt) {
  
    try {
      URL u = new URL(theURL.getText());
      InputStream in = u.openStream();
      OutputStream out = theDisplay.getOutputStream();
      StreamCopier.copy(in, out);
      in.close();
      out.close();
    }
    catch (MalformedURLException ex) {theDisplay.setText("Invalid URL");}
    catch (IOException ex) {theDisplay.setText("Invalid URL");}
  }
  
  public void windowClosing(WindowEvent e) {
  
    this.setVisible(false);
    this.dispose();
  }
  
  public void windowOpened(WindowEvent e) {}
  public void windowClosed(WindowEvent e) {}
  public void windowIconified(WindowEvent e) {}
  public void windowDeiconified(WindowEvent e) {}
  public void windowActivated(WindowEvent e) {}
  public void windowDeactivated(WindowEvent e) {}
  public static void main(String args[]) {
    URLViewer me = new URLViewer();
    me.init();
  }
}

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Chapter 6: Filter Streams

Content preview·Buy reprint rights for this chapter

Filter input streams read data from a preexisting input stream like a FileInputStream and have an opportunity to work with or change the data before it is delivered to the client program. Filter output streams write data to a preexisting output stream such as a FileOutputStream and have an opportunity to work with or change the data before it is written onto the underlying stream. Multiple filters can be chained onto a single underlying stream. Filter streams are used for encryption, compression, translation, buffering, and much more.

The word filter is derived by analogy from a water filter. A water filter sits between the pipe and faucet, pulling out impurities. A stream filter sits between the source of the data and its eventual destination and applies a specific algorithm to the data. As drops of water are passed through the water filter and modified, so too are bytes of data passed through the stream filter. Of course, there are some big differences—most notably, a stream filter can add data or some other kind of annotation to the stream, in addition to removing things you don't want; it may even produce a stream that is completely different from its original input (for example, by compressing the original data).

java.io.FilterInputStream and java.io.FilterOutputStream are concrete superclasses for input and output stream subclasses that somehow modify or manipulate data of an underlying stream:

public class FilterInputStream extends InputStream 
public class FilterOutputStream extends OutputStream

Each of these classes has a single protected constructor that specifies the underlying stream from which the filter stream reads or writes data:

protected FilterInputStream(InputStream in)
protected FilterOutputStream(OutputStream out)

These constructors set protected InputStream

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

The Filter Stream Classes

Content preview·Buy reprint rights for this chapter

java.io.FilterInputStream and java.io.FilterOutputStream are concrete superclasses for input and output stream subclasses that somehow modify or manipulate data of an underlying stream:

public class FilterInputStream extends InputStream 
public class FilterOutputStream extends OutputStream

Each of these classes has a single protected constructor that specifies the underlying stream from which the filter stream reads or writes data:

protected FilterInputStream(InputStream in)
protected FilterOutputStream(OutputStream out)

These constructors set protected InputStream and OutputStream fields, called in and out, inside the FilterInputStream and FilterOutputStream classes, respectively.

protected InputStream in
protected OutputStream out

Since the constructors are protected, filter streams may only be created by subclasses. Each subclass implements a particular filtering operation. Normally, such a pattern suggests that polymorphism is going to be used heavily, with subclasses standing in for the common superclass; however, it is uncommon to use filter streams polymorphically as instances of FilterInputStream or FilterOutputStream. Most of the time, references to a filter stream are either references to a more specific subclass like BufferedInputStream or they're polymorphic references to InputStream or OutputStream with no hint of the filter left.

Beyond the constructors, both FilterInputStream and FilterOutputStream declare exactly the methods of their respective superclasses. For FilterInputStream , these are:

public int read() throws IOException
public int read(byte[] data) throws IOException
public int read(byte[] data, int offset, int length) throws IOException
public long skip(long n) throws IOException
public int available() throws IOException
public void close() throws IOException
public synchronized void mark(int readlimit)
public synchronized void reset() throws IOException
public boolean markSupported()

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

The Filter Stream Subclasses

Content preview·Buy reprint rights for this chapter

The java.io package contains many useful filter stream classes. The BufferedInputStream and BufferedOutputStream classes buffer reads and writes by first putting data into a buffer (an internal array of bytes). Thus, an application can read or write bytes to the stream without necessarily calling the underlying native methods. The data is read from or written into the buffer in blocks; subsequent accesses go straight to the buffer. This improves performance in many situations. Buffered input streams also allow the reader to back up and reread data.

The java.io.PrintStream class, which System.out and System.err are instances of, allows very simple printing of primitive values, objects, and string literals. It uses the platform's default character encoding to convert characters into bytes. This class traps all IOExceptions and is primarily intended for debugging. System.out and System.err are the most popular examples of the PrintStream class, but you can connect a PrintStream filter to other output streams as well. For example, you can chain a PrintStream to a FileOutputStream to easily write text into a file.

The PushbackInputStream class has a one-byte pushback buffer so a program can "unread" the last character read. The next time data is read from the stream, the unread character is reread.

The DataInputStream and DataOutputStream classes read and write primitive Java data types and strings in a machine-independent way. (Big-endian for integer types, IEEE-754 for floats and doubles, UTF-8 for Unicode.) These are important enough to justify a chapter of their own and will be discussed in the next chapter. The ObjectInputStream and ObjectOutputStream classes extend DataInputStream and DataOutputStream with methods to read and write arbitrary Java objects as well as primitive data types. These will be taken up in Chapter 11.

The java.util.zip package also includes several filter stream classes. The filter input streams in this package decompress compressed data; the filter output streams compress raw data. These will be discussed in Chapter 9.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Buffered Streams

Content preview·Buy reprint rights for this chapter

Buffered input streams read more data than they initially need into a buffer (an internal array of bytes). When the stream's read() methods are invoked, the data is removed from the buffer rather than the underlying stream. When the buffer runs out of data, the buffered stream refills its buffer from the underlying stream. Likewise, buffered output streams store data in an internal byte array until the buffer is full or the stream is flushed; then the data is written out to the underlying output stream in one swoop. In situations where it's almost as fast to read or write several hundred bytes from the underlying stream as it is to read or write a single byte, a buffered stream can provide a significant performance gain.

There are two BufferedInputStream constructors and two BufferedOutputStream constructors:

public BufferedInputStream(InputStream in)
public BufferedInputStream(InputStream in, int size)
public BufferedOutputStream(OutputStream out)
public BufferedOutputStream(OutputStream out, int size)

The first argument is the underlying stream from which data will be read or to which data will be written. The size argument is the number of bytes in the buffer. If a size isn't specified, a 2048-byte buffer is used. The best size for the buffer depends on the platform and is generally related to the block size of the disk (at least for file streams). Less than 512 bytes is probably too small and more than 4096 bytes is probably too large. Ideally, you want an integral multiple of the block size of the disk. However, you might want to use smaller buffer sizes for unreliable network connections. For example:

URL u = new URL("https://java.developer.com");
BufferedInputStream bis = new BufferedInputStream(u.openStream(), 256);

Example 6.4 copies files named on the command line to System.out with buffered reads and writes.

Example 6.4. A BufferedStreamCopier

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

PushbackInputStream

Content preview·Buy reprint rights for this chapter

The java.io.PushbackInputStream class provides a pushback buffer so a program can "unread" the last several bytes read. The next time data is read from the stream, the unread bytes are reread.

public void unread(int b) throws IOException
public void unread(byte[] data, int offset, int length) throws IOException
public void unread(byte[] data) throws IOException

By default the buffer is only one byte long, and trying to unread more than one byte throws an IOException. However, you can change the default buffer size with the second constructor:

public PushbackInputStream(InputStream in)
public PushbackInputStream(InputStream in, int size)

Although both PushbackInputStream and BufferedInputStream use buffers, only a PushbackInputStream allows unreading, and only a BufferedInputStream allows marking and resetting. In a PushbackInputStream , markSupported() returns false.

public boolean markSupported()

The read() and available() methods work exactly as with normal input streams. However, they first attempt to read from the pushback buffer.

public int read() throws IOException
public int read(byte[] data, int offset, int length) throws IOException
public int available() throws IOException

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Print Streams

Content preview·Buy reprint rights for this chapter

System.out and System.err are instances of the java.io.PrintStream class. This is a subclass of FilterOutputStream that converts numbers and objects to text. System.out is primarily used for simple, character-mode applications and for debugging. Its raison d'être is convenience, not robustness; print streams ignore many issues involved in internationalization and error checking. This makes System.out easy to use in quick and dirty hacks and simple examples, while simultaneously making it unsuitable for production code, which should use the java.io.PrintWriter class (discussed in Chapter 15) instead.

The PrintStream class has print() and println() methods that handle every Java data type. The print() and println() methods differ only in that println() prints a platform-specific line terminator after printing its arguments and print() does not. These methods are:

public void print(boolean b)
public void print(char c)
public void print(int i)
public void print(long l)
public void print(float f)
public void print(double d)
public void print(char[] s)
public void print(String s)
public void print(Object o)
public void println()
public void println(boolean b)
public void println(char c)
public void println(int i)
public void println(long l)
public void println(float f)
public void println(double d)
public void println(char[] s)
public void println(String s)
public void println(Object o)

Anything at all can be passed to a print() method; whatever argument you give is guaranteed to match at least one of these methods. Object types are converted to strings by invoking their toString() method. Primitive types are converted with the appropriate String.valueOf() method.

One aspect of making System.out simple for quick jobs is not in the PrintStream class at all but in the compiler. Because Java overloads the + operator to signify concatenation of strings, primitive data types, and objects, you can pass multiple variables to the

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Multitarget Output Streams

Content preview·Buy reprint rights for this chapter

As a final example, I present two slightly unusual filter output streams that direct their data to multiple underlying streams. The TeeOutputStream class, given in Example 6.5, has not one but two underlying streams. The TeeOutputStream does not modify the data that's written in any way; it merely writes it on both of its underlying streams.

Example 6.5. The TeeOutputStream Class

package com.macfaq.io;
import java.io.*;
public class TeeOutputStream extends FilterOutputStream {
  OutputStream out1;
  OutputStream out2;
  public TeeOutputStream(OutputStream stream1, OutputStream stream2) {
    super(stream1);
    out1 = stream1;
    out2 = stream2;
  }
  public synchronized void write(int b) throws IOException {
    out1.write(b);
    out2.write(b);  
  }
  public synchronized void write(byte[] data, int offset, int length) 
   throws IOException {
    out1.write(data, offset, length);
    out2.write(data, offset, length);
  }
  public void flush() throws IOException {
    out1.flush();
    out2.flush();  
  }
  
  public void close() throws IOException {
    out1.close();
    out2.close();
  }
}

It would be possible to store one of the output streams in FilterOutputStream's protected out field and the other in a field in this class. However, it's simpler and cleaner to maintain the parallelism between the two streams by storing them both in the TeeOutputStream class.

I've synchronized the write() methods to make sure that two different threads don't try to write to the same TeeOutputStream at the same time. Depending on unpredictable thread-scheduling issues, this could lead to data being written out of order or in different orders on different streams. It's important to make sure that one write is completely finished on all streams before the next write begins.

Example 6.6 demonstrates how one might use this class to write a TeeCopier

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

File Viewer, Part 2

Content preview·Buy reprint rights for this chapter

There's a saying among object-oriented programmers that you should create one design just to throw away. Now that we've got filter streams in hand, I'm ready to throw out the monolithic design for the FileDumper program used in Chapter 4. I'm going to rewrite it using a more flexible, extensible, object-oriented approach that relies on multiple chained filters. This allows us to extend the system to handle new formats without rewriting all the old classes. (It also makes some of the examples in subsequent chapters smaller, since I won't have to repeat all the code each time.) The basic idea is to make each interpretation of the data a filter input stream. Bytes from the underlying stream move into the filter; the filter converts the bytes into strings. Since more bytes generally come out of the filter than go into it (for instance, the single byte 32 is replaced by the four bytes "0", "3", "2", " " in decimal dump format), our filter streams buffer the data as necessary.

The architecture revolves around the abstract DumpFilter class shown in Example 6.9. The public interface of this class is identical to that of FilterInputStream . Internally, a buffer holds the string interpretation of each byte as an array of bytes. The read() method returns bytes from this array as long as possible. An index field tracks the next available byte. When index reaches the length of the array, the abstract fill() method is invoked to read from the underlying stream and place data in the buffer. By changing how the fill() method translates the bytes it reads into the bytes in the buffer, you can change how the data is interpreted.

Example 6.9. DumpFilter

package com.macfaq.io;
import java.io.*;
public abstract class DumpFilter extends FilterInputStream {
  // This is really an array of unsigned bytes.
  protected int[] buf = new int[0];
  protected int index = 0;
  
  public DumpFilter(InputStream in) {
    super(in);
  }
  public int read() throws IOException {
  
    int result;
    if (index < buf.length) {
      result = buf[index];
      index++;
    }  // end if
    else {
      try {
        this.fill();
        // fill is required to put at least one byte 
        // in the buffer or throw an EOF or IOException.
        result = buf[0];
        index = 1;
      }
      catch (EOFException e) {result = -1;}
    }  // end else
    
    return result;
  }
  protected abstract void fill() throws IOException;
  
  public int read(byte[] data, int offset, int length) throws IOException {
  
    if (data == null) {
      throw new NullPointerException();
    } 
    else if ((offset < 0) || (offset > data.length) || (length < 0) 
     || ((offset + length) > data.length) || ((offset + length) < 0)) {
      throw new ArrayIndexOutOfBoundsException();
    } 
    else if (length == 0) {
      return 0;
    }
    // Check for end of stream.
    int datum = this.read();
    if (datum == -1) {
      return -1;
    }
    
    data[offset] = (byte) datum;
    int bytesRead = 1;
    try {
      for (; bytesRead < length ; bytesRead++) {
      
        datum = this.read();
        
        // In case of end of stream, return as much as we've got,
        // then wait for the next call to read to return -1.
        if (datum == -1) break;
        data[offset + bytesRead] = (byte) datum;
      }
    }
    catch (IOException e) {
      // Return what's already in the data array.
    }
    return bytesRead;   
  }
  
  public int available() throws IOException {
    return buf.length - index;
  }
  
  public long skip(long bytesToSkip) throws IOException {
  
    long bytesSkipped = 0;
    for (; bytesSkipped < bytesToSkip; bytesSkipped++) {
      int c = this.read();
      if (c == -1) break;
    }
    return bytesSkipped;
  }
  public synchronized void mark(int readlimit) {}
  public synchronized void reset() throws IOException {
    throw new IOException("marking not supported");
  }
  public boolean markSupported() {
    return false;
  }
}

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Chapter 7: Data Streams

Content preview·Buy reprint rights for this chapter

Data streams read and write strings, integers, floating-point numbers, and other data that's commonly presented at a higher level than mere bytes. The java.io.DataInputStream and java.io.DataOutputStream classes read and write the primitive Java data types (boolean, int, double, etc.) and strings in a particular, well-defined, platform-independent format. Since DataInputStream and DataOutputStream use the same formats, they're complementary. What a data output stream writes, a data input stream can read. These classes are especially useful when you need to move data between platforms that may use different native formats for integers or floating-point numbers.

The java.io.DataInputStream and java.io.DataOutputStream classes are subclasses of FilterInputStream and FilterOutputStream , respectively.

public class DataInputStream extends FilterInputStream implements DataInput
public class DataOutputStream extends FilterOutputStream 
             implements DataOutput

They have all the usual methods you've come to associate with input and output stream classes, such as read(), write(), flush(), available(), skip(), close(), markSupported(), and reset(). (Data input streams support marking if, and only if, their underlying input stream supports marking.) However, the real purpose of DataInputStream and DataOutputStream is not to read and write raw bytes using the standard input and output stream methods. It's to read and interpret multibyte data like ints, floats, doubles, and chars.

The java.io.DataInput interface declares 15 methods that read various kinds of data:

public abstract boolean readBoolean() throws IOException
public abstract byte readByte() throws IOException
public abstract int readUnsignedByte() throws IOException
public abstract short readShort() throws IOException
public abstract int readUnsignedShort() throws IOException
public abstract char readChar() throws IOException
public abstract int readInt() throws IOException
public abstract long readLong() throws IOException
public abstract float readFloat() throws IOException
public abstract double readDouble() throws IOException
public abstract String readLine() throws IOException
public abstract String readUTF() throws IOException
public void readFully(byte[] data) throws IOException
public void readFully(byte[] data, int offset, int length) throws IOException
public int skipBytes(int n) throws IOException

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

The Data Stream Classes

Content preview·Buy reprint rights for this chapter

The java.io.DataInputStream and java.io.DataOutputStream classes are subclasses of FilterInputStream and FilterOutputStream , respectively.

public class DataInputStream extends FilterInputStream implements DataInput
public class DataOutputStream extends FilterOutputStream 
             implements DataOutput

The java.io.DataInput interface declares 15 methods that read various kinds of data:

public abstract boolean readBoolean() throws IOException
public abstract byte readByte() throws IOException
public abstract int readUnsignedByte() throws IOException
public abstract short readShort() throws IOException
public abstract int readUnsignedShort() throws IOException
public abstract char readChar() throws IOException
public abstract int readInt() throws IOException
public abstract long readLong() throws IOException
public abstract float readFloat() throws IOException
public abstract double readDouble() throws IOException
public abstract String readLine() throws IOException
public abstract String readUTF() throws IOException
public void readFully(byte[] data) throws IOException
public void readFully(byte[] data, int offset, int length) throws IOException
public int skipBytes(int n) throws IOException

These methods are all available from the DataInputStream class and any other class that implements

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Reading and Writing Integers

Content preview·Buy reprint rights for this chapter

The DataOutputStream class has methods for writing all of Java's primitive integer data types: byte, short, int, and long. The DataInputStream class has methods to read these types. It also has methods for reading two integer data types not directly supported by Java or the DataOutputStream class: the unsigned byte and the unsigned int.

While Java's platform independence guarantees that you don't have to worry about precise data formats when working exclusively in Java, you frequently need to read data created by a program written in another language. Similarly, it's not unusual to have to write data that will be read by a program written in a different language. For example, most Java network clients (like HotJava) talk primarily to servers written in other languages, and most Java network servers (like the Java Web Server) talk primarily to clients written in other languages. You cannot naively assume that the data format Java uses is the data format other programs will understand; you must take care to understand and recognize the data formats being used.

Although other schemes are possible, almost all modern computers have standardized on binary arithmetic performed on integers composed of an integral number of bytes. Furthermore, they've standardized on two's complement arithmetic for signed numbers. In two's complement arithmetic, the most significant bit is 1 for a negative number and for a positive number; the absolute value of a negative number is calculated by taking the complement of the number and adding 1. In Java terms, this means (-n == ~n + 1) is true where n is a negative int.

Regrettably, this is about all that's been standardized. One big difference between computer architectures is the size of an int. Probably the majority of modern computers use four-byte integers that can hold a number between -2,147,483,648 and 2,147,483,647. However, some systems are moving to 64-bit architectures where the native integer ranges from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 and takes eight bytes. And many older systems use 16-bit integers that only range from -32,768 to 32,767. Exactly how many bytes a C compiler uses for each

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Reading and Writing Floating-Point Numbers

Content preview·Buy reprint rights for this chapter

Java understands two floating-point number formats, both specified by the IEEE 754 standard. Floats are stored in four bytes with a 1-bit sign, a 24-bit mantissa, and an 8-bit exponent. Float values range from 1.40129846432481707×10^-45 to 3.40282346638528860×10 ³⁸, either positive or negative. Doubles take up eight bytes with a one-bit sign, 53-bit mantissa, and 11-bit exponent. This gives them a range of 4.94065645841246544×10 ^-324 to 1.79769313486231570×10 ³⁰⁸, either positive or negative. Both floats and doubles also have representations of positive and negative zero, positive and negative infinity, and not a number (or NaN).

Astute readers will notice that the number of bits given for floats and doubles adds up to 33 and 65 bits, respectively, one too many for the width of the number. A trick is used whereby the first bit of the mantissa of a nonzero number is assumed to be 1. With this trick, it is unnecessary to include the first bit of the mantissa. Thus, an extra bit of precision is gained for free.

The details of this format are too complicated to discuss here. You can order the actual specification from the IEEE for about $29.00. That's approximately $1.50 a page, more than a little steep in my opinion. The specification isn't available online, but it was published in the February 1985 issue of ACM SIGPLAN Notices (Volume 22, #2, pp. 9-18), which should be available in any good technical library. The main thing you need to know is that these formats are supported by most modern RISC architectures and by all Pentium and Motorola 680x0 chips with either external or internal floating-point units (FPUs). Nowadays the only chips that don't natively support this format are a few embedded processors and some old 486SX, 68LC040, and other earlier FPU-less chips in legacy hardware. And even these systems are able to emulate IEEE 754 floating-point arithmetic in software.

The DataInputStream class reads and the DataOutputStream class writes floating-point numbers of either four or eight bytes in length, as specified in the IEEE 754 standard. They do not support the 10-byte and longer long

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Reading and Writing Booleans

Content preview·Buy reprint rights for this chapter

The DataOutputStream class has a writeBoolean() method and the DataInputStream class has a corresponding readBoolean() method:

public final void writeBoolean(boolean b) throws IOException
public final boolean readBoolean() throws IOException

Although theoretically a single bit could be used to indicate the value of a boolean, in practice a whole byte is used. This makes alignment much simpler and doesn't waste enough space to be an issue on modern machines. The writeBoolean() method writes a zero byte (0x00) to indicate false, a one byte (0x01) to indicate true. The readBoolean() method interprets as false and any positive number as true. Negative numbers indicate end of stream and lead to an EOFException being thrown.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Reading Byte Arrays

Content preview·Buy reprint rights for this chapter

As already mentioned, the DataInputStream class has the usual two methods for reading bytes into a byte array:

public int read(byte[] data) throws IOException
public int read(byte[] data, int offset, int length) throws IOException

Neither of these methods guarantees that all the bytes requested will be read. Instead, you're expected to check the number of bytes actually read, then call read() again for a different part of the array as necessary. For example, to read 1024 bytes from the InputStream in into the byte array data:

int offset = 0;
while (true){
  int bytesRead = in.read(data, offset, data.length - offset);
  offset += bytesRead;
  if (bytesRead == -1 || offset >= data.length) break;
}

The DataInputStream class has two readFully() methods that provide this logic. Each reads repeatedly from the underlying input stream until the array data or specified portion thereof is filled.

public final void readFully(byte[] data) throws IOException
public final void readFully(byte[] data, int offset, int length) 
                  throws IOException

If the data runs out before the array is filled and no more data is forthcoming, then an IOException is thrown.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Reading and Writing Text

Content preview·Buy reprint rights for this chapter

Because of the difficulties caused by different character sets, reading and writing text is one of the trickiest things you can do with streams. Most of the time, text should be handled with readers and writers, a subject we'll take up in Chapter 15. However, the DataInputStream and DataOutputStream classes do provide methods a Java program can use to read and write text that another Java program will understand. The text format used is a compressed form of Unicode called UTF-8. It's unlikely that other, non-Java programs will understand this format unless they've been specially coded to interoperate with text data written by Java, especially since Java's UTF-8 differs slightly from the standard UTF-8 used in XML and elsewhere.

Java strings and chars are Unicode. However, Unicode isn't particularly efficient. Most files of English text contain almost nothing but ASCII characters. Thus, using two bytes for these characters is really overkill. UTF-8 solves this problem by encoding the ASCII characters in a single byte at the expense of having to use three bytes for many more of the less common characters. For the purposes of this chapter, UTF-8 provides a more efficient way to read and write strings; it is used by the readUTF() and writeUTF() methods implemented by the DataInputStream and DataOutputStream classes. For a full description of UTF-8, see Chapter 14.

The variant form of UTF-8 that these classes use is intended for string literals embedded in compiled byte code and serialized Java objects and for communication between two Java programs. It is not intended for reading and writing arbitrary UTF-8 text. To read standard UTF-8, you should use an InputStreamReader; to write it, you should use an OutputStreamWriter. These classes do not improperly encode the null character and will be discussed in Chapter 15.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Miscellaneous Methods

Content preview·Buy reprint rights for this chapter

The DataInputStream and DataOutputStream classes each have one method left to discuss, skipBytes() and size(), respectively.

The DataOutputStream class has a protected field called written that stores the number of bytes written to the output stream since it was constructed. The value of this field is returned by the public size() method:

protected int written
public final int size()

Every time you invoke writeInt(), writeBytes(), writeUTF(), or some other write method, the written field is incremented by the number of bytes written. This might be useful if for some reason you're trying to limit the number of bytes you write. For instance, you may prefer to open a new file when you reach some preset size rather than continuing to write into a very large file.

The DataInputStream class's skipBytes() method skips over a specified number of bytes without reading them. Unlike the skip() method of java.io.InputStream that DataInputStream inherits, skipBytes() either skips over all the bytes it's asked to skip or it throws an exception:

public final int skipBytes(int n) throws IOException
public long skip(long n) throws IOException

skipBytes() blocks and waits for more data until n bytes have been skipped (successful execution) or an exception is thrown. The method returns the number of bytes skipped, which is always n (because if it's not n , an exception is thrown and nothing is returned). On end of stream, an EOFException is thrown. An IOException is thrown if the underlying stream throws an IOException.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Reading and Writing Little-Endian Numbers

Content preview·Buy reprint rights for this chapter

It's likely that at some point in time you'll need to read a file full of little-endian data, especially if you're working on Intel hardware or with data written by native code on such a platform. Java has essentially no support for little-endian numbers. The LittleEndianOutputStream class in Example 7.8 and the LittleEndianInputStream class in Example 7.9 provide the support you need to do this. These classes are closely modeled on the java.io.DataInputStream and java.io.DataOutputStream classes. Some of the methods in these classes do exactly the same thing as the same methods in the DataInputStream and DataOutputStream classes. After all, a big-endian byte is no different from a little-endian byte. In fact, these two classes come very close to implementing the java.io.DataInput and java.io.DataOutput interfaces. Actually doing so would have been a bad idea, however, because client programmers will expect objects implementing DataInput and DataOutput to use big-endian numbers, and it's best not to go against such common assumptions.

I also considered making the little-endian classes subclasses of DataInputStream and DataOutputStream. While this would have eliminated some duplicated methods like readBoolean() and writeBoolean(), it would also have required the new, little-endian methods to have unwieldy names like readLittleEndianInt() and writeLittleEndianInt(). Furthermore, it's unlikely you'll need to read or write both little-endian and big-endian numbers from the same stream. Most streams will contain one or the other but not both.

Example 7.8. A LittleEndianOutputStream Class

/*
 * @(#)LittleEndianOutputStream.java  1.0 98/08/29
 */
package com.macfaq.io;
import java.io.*;
/**
 * A little-endian output stream writes primitive Java numbers 
 * and characters to an output stream in a little-endian format. 
 * The standard java.io.DataOutputStream class which this class
 * imitates uses big-endian integers.
 *
 * @author  Elliotte Rusty Harold
 * @version 1.0, 29 Aug 1998
 * @see     com.macfaq.io.LittleEndianInputStream
 * @see     java.io.DataOutputStream
 */
public class LittleEndianOutputStream extends FilterOutputStream {
  /**
   * The number of bytes written so far to the little-endian output stream. 
   */
  protected int written;
  /**
   * Creates a new little-endian output stream and chains it to the  
   * output stream specified by the out argument. 
   *
   * @param   out   the underlying output stream.
   * @see     java.io.FilterOutputStream#out
   */
  public LittleEndianOutputStream(OutputStream out) {
    super(out);
  }
  /**
   * Writes the specified byte value to the underlying output stream. 
   *
   * @param      b   the <code>byte</code> value to be written.
   * @exception  IOException  if the underlying stream throws an IOException.
   */
  public synchronized void write(int b) throws IOException {
    out.write(b);
    written++;
  }
  /**
   * Writes <code>length</code> bytes from the specified byte array 
   * starting at <code>offset</code> to the underlying output stream.
   *
   * @param      data     the data.
   * @param      offset   the start offset in the data.
   * @param      length   the number of bytes to write.
   * @exception  IOException  if the underlying stream throws an IOException.
   */
  public synchronized void write(byte[] data, int offset, int length) 
   throws IOException {
    out.write(data, offset, length);
    written += length;
  }
  /**
   * Writes a <code>boolean</code> to the underlying output stream as 
   * a single byte. If the argument is true, the byte value 1 is written.
   * If the argument is false, the byte value <code>0</code> is written.
   *
   * @param      b   the <code>boolean</code> value to be written.
   * @exception  IOException  if the underlying stream throws an IOException.
   */
  public void writeBoolean(boolean b) throws IOException {
  
    if (b) this.write(1);
    else this.write(0);
  }
  /**
   * Writes out a <code>byte</code> to the underlying output stream
   *
   * @param      b   the <code>byte</code> value to be written.
   * @exception  IOException  if the underlying stream throws an IOException.
   */
  public void writeByte(int b) throws IOException {
    out.write(b);
    written++;
  }
  /**
   * Writes a two byte <code>short</code> to the underlying output stream in
   * little-endian order, low byte first. 
   *
   * @param      s   the <code>short</code> to be written.
   * @exception  IOException  if the underlying stream throws an IOException.
   */
  public void writeShort(int s) throws IOException {
    out.write(s & 0xFF);
    out.write((s >>> 8) & 0xFF);
    written += 2;
  }
  /**
   * Writes a two byte <code>char</code> to the underlying output stream 
   * in little-endian order, low byte first. 
   *
   * @param      c   the <code>char</code> value to be written.
   * @exception  IOException  if the underlying stream throws an IOException.
   */
  public void writeChar(int c) throws IOException {
    out.write(c & 0xFF);
    out.write((c >>> 8) & 0xFF);
    written += 2;
  }
  /**
   * Writes a four-byte <code>int</code> to the underlying output stream 
   * in little-endian order, low byte first, high byte last
   *
   * @param      i   the <code>int</code> to be written.
   * @exception  IOException  if the underlying stream throws an IOException.
   */
  public void writeInt(int i) throws IOException {
    out.write(i & 0xFF);
    out.write((i >>> 8) & 0xFF);
    out.write((i >>> 16) & 0xFF);
    out.write((i >>> 24) & 0xFF);
    written += 4;
  }
  /**
   * Writes an eight-byte <code>long</code> to the underlying output stream 
   * in little-endian order, low byte first, high byte last
   *
   * @param      l   the <code>long</code> to be written.
   * @exception  IOException  if the underlying stream throws an IOException.
   */
  public void writeLong(long l) throws IOException {
    out.write((int) l & 0xFF);
    out.write((int) (l >>> 8) & 0xFF);
    out.write((int) (l >>> 16) & 0xFF);
    out.write((int) (l >>> 24) & 0xFF);
    out.write((int) (l >>> 32) & 0xFF);
    out.write((int) (l >>> 40) & 0xFF);
    out.write((int) (l >>> 48) & 0xFF);
    out.write((int) (l >>> 56) & 0xFF);
    written += 8;
  }
 /**
  * Writes a 4 byte Java float to the underlying output stream in
  * little-endian order.
  *
  * @param      f   the <code>float</code> value to be written.
  * @exception  IOException  if an I/O error occurs.
  */
  public final void writeFloat(float f) throws IOException {
  
    this.writeInt(Float.floatToIntBits(f));
  }
 /**
  * Writes an 8 byte Java double to the underlying output stream in
  * little-endian order.
  *
  * @param      d   the <code>double</code> value to be written.
  * @exception  IOException  if an I/O error occurs.
  */
  public final void writeDouble(double d) throws IOException {
  
    this.writeLong(Double.doubleToLongBits(d));
  }
  /**
   * Writes a string to the underlying output stream as a sequence of 
   * bytes. Each character is written to the data output stream as 
   * if by the <code>writeByte()</code> method. 
   *
   * @param      s   the <code>String</code> value to be written.
   * @exception  IOException  if the underlying stream throws an IOException.
   * @see        java.io.LittleEndianOutputStream#writeByte(int)
   * @see        java.io.LittleEndianOutputStream#out
   */
  public void writeBytes(String s) throws IOException {
  int length = s.length();
  for (int i = 0; i < length; i++) {
    out.write((byte) s.charAt(i));
  }
  written += length;
  }
  /**
   * Writes a string to the underlying output stream as a sequence of 
   * characters. Each character is written to the data output stream as 
   * if by the <code>writeChar</code> method. 
   *
   * @param      s   a <code>String</code> value to be written.
   * @exception  IOException  if the underlying stream throws an IOException.
   * @see        java.io.LittleEndianOutputStream#writeChar(int)
   * @see        java.io.LittleEndianOutputStream#out
   */
  public void writeChars(String s) throws IOException {
    int length = s.length();
    for (int i = 0; i < length; i++) {
      int c = s.charAt(i);
      out.write(c & 0xFF);
      out.write((c >>> 8) & 0xFF);
    }
    written += length * 2;
  }
  /**
   * Writes a string of no more than 65,535 characters 
   * to the underlying output stream using UTF-8 
   * encoding. This method first writes a two byte short 
   * in <b>big</b> endian order as required by the 
   * UTF-8 specification. This gives the number of bytes in the 
   * UTF-8 encoded version of the string, not the number of characters
   * in the string. Next each character of the string is written
   * using the UTF-8 encoding for the character.
   *
   * @param      s   the string to be written.
   * @exception  UTFDataFormatException if the string is longer than 
   *             65,535 characters.
   * @exception  IOException  if the underlying stream throws an IOException.
   */
  public void writeUTF(String s) throws IOException {
    int numchars = s.length();
    int numbytes = 0;
    for (int i = 0 ; i < numchars ; i++) {
      int c = s.charAt(i);
      if ((c >= 0x0001) && (c <= 0x007F)) numbytes++;
      else if (c > 0x07FF) numbytes += 3;
      else numbytes += 2;
    }
    if (numbytes > 65535) throw new UTFDataFormatException();     
    out.write((numbytes >>> 8) & 0xFF);
    out.write(numbytes & 0xFF);
    for (int i = 0 ; i < numchars ; i++) {
      int c = s.charAt(i);
      if ((c >= 0x0001) && (c <= 0x007F)) {
        out.write(c);
      }
      else if (c > 0x07FF) {
        out.write(0xE0 | ((c >> 12) & 0x0F));
        out.write(0x80 | ((c >>  6) & 0x3F));
        out.write(0x80 | (c & 0x3F));
        written += 2;
      } 
      else {
        out.write(0xC0 | ((c >> 6) & 0x1F));
        out.write(0x80 | (c & 0x3F));
        written += 1;
      }
    }
    written += numchars + 2;
  }
  /**
   * Returns the number of bytes written to this little-endian output stream.
   * (This class is not thread-safe with respect to this method. It is 
   * possible that this number is temporarily less than the actual 
   * number of bytes written.)
   * @return  the value of the <code>written</code> field.
   * @see     java.io.LittleEndianOutputStream#written
   */
  public int size() {
    return this.written;
  }
}

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Thread Safety

Content preview·Buy reprint rights for this chapter

The LittleEndianInputStream class is not perfectly thread-safe. Consider the readInt() method:

public int readInt() throws IOException {
    int byte1 = in.read();
    int byte2 = in.read();
    int byte3 = in.read();
    int byte4 = in.read();
    if (byte4 == -1  || byte3 == -1 || byte2 == -1 || byte1 == -1) {
      throw new EOFException();
    }
    return (byte4 << 24) + (byte3 << 16) + (byte2 << 8) + byte1;
  }

If two threads are trying to read from this input stream at the same time, there is no guarantee that bytes 1 through 4 will be read in order. The first thread might read bytes 1 and 2, then the second thread could preempt it and read any number of bytes. When the first thread regained control, it would no longer be able to read bytes 3 and 4, but would read whichever bytes happened to be next in line. It would then return an erroneous result.

A synchronized block would solve this problem neatly:

public int readInt() throws IOException {
  int byte1, byte2, byte3, byte4;
    
  synchronized (this) {
    byte1 = in.read();
    byte2 = in.read();
    byte3 = in.read();
    byte4 = in.read();
  }
  if (byte4 == -1  || byte3 == -1 || byte2 == -1 || byte1 == -1) {
    throw new EOFException();
  }
  return (byte4 << 24) + (byte3 << 16) + (byte2 << 8) + byte1;
}

It isn't necessary to synchronize the entire method, only the four lines that read from the underlying stream. However, this solution is still imperfect. It is remotely possible that another thread has a reference to the underlying stream rather than the little-endian input stream and will try to read directly from that. Therefore, you might be better off synchronizing on the underlying input stream in.

However, this would only prevent another thread from reading from the underlying input stream if the second thread also synchronized on the underlying input stream. In general you can't count on this, so it's not really a solution. In fact, Java really doesn't provide a good means to guarantee thread safety when you have to modify objects you don't control passed as arguments to your methods.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

File Viewer, Part 3

Content preview·Buy reprint rights for this chapter

In Chapter 4, I introduced a FileDumper program that could print the raw bytes of a file in ASCII, hexadecimal, or decimal. In this chapter, I'm going to expand that program so that it can interpret the file as containing binary numbers of varying widths. In particular I'm going to make it possible to dump a file as shorts, unsigned shorts, ints, longs, floats, and doubles. Integer types may be either big-endian or little-endian. The main class, FileDumper3, is shown in Example 7.10. As in Chapter 4, this program reads a series of filenames and arguments from the command line in the main() method. Each filename is passed to a method that opens a file input stream from the file. Depending on the command-line arguments, a particular subclass of DumpFilter from Chapter 6 is selected and chained to the input stream. Finally, the StreamCopier.copy() method pours data from the input stream onto System.out.

Example 7.10. The FileDumper3 Class

import java.io.*;
import com.macfaq.io.*;
public class FileDumper3 {
  public static final int ASC = 0;
  public static final int DEC = 1;
  public static final int HEX = 2;
  public static final int SHORT = 3;
  public static final int INT = 4;
  public static final int LONG = 5;
  public static final int FLOAT = 6;
  public static final int DOUBLE = 7;
  
  public static void main(String[] args) {
    if (args.length < 1) {
      System.err.println(
       "Usage: java FileDumper3 [-ahdsilfx] [-little] file1 file2...");
    }
    boolean bigEndian = true; 
    int firstFile = 0;
    int mode = ASC;
    // Process command-line switches.
    for (firstFile = 0; firstFile < args.length; firstFile++) {
      if (!args[firstFile].startsWith("-")) break;
      if (args[firstFile].equals("-h")) mode = HEX;
      else if (args[firstFile].equals("-d")) mode = DEC;
      else if (args[firstFile].equals("-s")) mode = SHORT;
      else if (args[firstFile].equals("-i")) mode = INT;
      else if (args[firstFile].equals("-l")) mode = LONG;
      else if (args[firstFile].equals("-f")) mode = FLOAT;
      else if (args[firstFile].equals("-x")) mode = DOUBLE;
      else if (args[firstFile].equals("-little")) bigEndian = false;
    }
    
    for (int i = firstFile; i < args.length; i++) {
      try {
        InputStream in = new FileInputStream(args[i]);
        dump(in, System.out, mode, bigEndian);
        
        if (i < args.length-1) {  // more files to dump
          System.out.println();
          System.out.println("--------------------------------------");
          System.out.println();
        }
      }
      catch (Exception e) {
        System.err.println(e);
        e.printStackTrace();
      }
    }
  }
  
  public static void dump(InputStream in, OutputStream out, int mode, 
   throws IOException {
    
    // The reference variable in may point to several different objects
    // within the space of the next few lines. We can attach
    // more filters here to do decompression, decryption, and more.
      
    if (bigEndian) {
      DataInputStream din = new DataInputStream(in);
      switch (mode) {
        case HEX: 
          in = new HexFilter(in);
          break;
        case DEC: 
          in = new DecimalFilter(in);
          break;
        case INT: 
          in = new IntFilter(din);
          break;
        case SHORT: 
          in = new ShortFilter(din);
          break;
        case LONG: 
          in = new LongFilter(din);
          break;
        case DOUBLE: 
          in = new DoubleFilter(din);
          break;
        case FLOAT: 
          in = new FloatFilter(din);
          break;
        default:
      }
    }
    else {
      LittleEndianInputStream lin = new LittleEndianInputStream(in);
      switch (mode) {
        case HEX: 
          in = new HexFilter(in);
          break;
        case DEC: 
          in = new DecimalFilter(in);
          break;
        case INT: 
          in = new LEIntFilter(lin);
          break;
        case SHORT: 
          in = new LEShortFilter(lin);
          break;
        case LONG: 
          in = new LELongFilter(lin);
          break;
        case DOUBLE: 
          in = new LEDoubleFilter(lin);
          break;
        case FLOAT: 
          in = new LEFloatFilter(lin);
          break;
        default:  
      }
    }   
    
    StreamCopier.copy(in, out);
    in.close();
  }
}

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Chapter 8: Streams in Memory

Content preview·Buy reprint rights for this chapter

In the last several chapters, you've learned how to use streams to move data between a running Java program and external programs and stores. Streams can also be used to move data from one part of a Java program to another. This chapter explores three such methods. Sequence input streams chain several input streams together so that they appear as a single stream. Byte array streams allow output to be stored in byte arrays and input to be read from byte arrays. Finally, piped input and output streams allow output from one thread to become input for another thread.

The java.io.SequenceInputStream class connects multiple input streams together in a particular order:

public class SequenceInputStream extends InputStream

Reads from a SequenceInputStream first read all the bytes from the first stream in the sequence, then all the bytes from the second stream in the sequence, then all the bytes from the third stream, and so on. When the end of one of the streams is reached, that stream is closed; the next data comes from the next stream. Of course, this assumes that the streams in the sequence are in fact finite. There are two constructors for this class:

public SequenceInputStream(Enumeration e)
public SequenceInputStream(InputStream in1, InputStream in2)

The first constructor creates a sequence out of all the elements of the Enumeration e. This assumes all objects in the enumeration are input streams. If this isn't the case, a ClassCastException will be thrown the first time a read is attempted from an object that is not an InputStream. The second constructor creates a sequence input stream that reads first from in1, then from in2. Note that in1 or in2 may themselves be sequence input streams, so repeated application of this constructor allows a sequence input stream with an indefinite number of underlying streams to be created. For example, to read the home pages of both JavaSoft and AltaVista, you might do this:

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Sequence Input Streams

Content preview·Buy reprint rights for this chapter

The java.io.SequenceInputStream class connects multiple input streams together in a particular order:

public class SequenceInputStream extends InputStream

public SequenceInputStream(Enumeration e)
public SequenceInputStream(InputStream in1, InputStream in2)

try {
  URL u1 = new URL("https://java.sun.com/");
  URL u2 = new URL("https://www.altavista.com");
  SequenceInputStream sin = new SequenceInputStream(u1.openStream(), 
    u2.openStream());
}
catch (IOException e) { //...

Example 8.1 reads a series of filenames from the command line, creates a sequence input stream from file input streams for each file named, then copies the contents of all the files onto System.out. The SequenceInputStream class already provides the necessary layer of abstraction for this problem. There's nothing to be gained by constructing a new object that chains streams together and prints them. Therefore, this class only has a

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Byte Array Streams

Content preview·Buy reprint rights for this chapter

It's sometimes convenient to use stream methods to manipulate data in byte arrays. For example, you might receive an array of raw bytes that you want to interpret as double-precision, floating-point numbers. (This is common when using UDP to transfer data across the Internet, for one example.) The quickest way to do this is to use a DataInputStream. However, before you can create a data input stream, you first need to create a raw, byte-oriented stream. This is what the java.io.ByteArrayInputStream class gives you. Similarly, you might want to send a group of double-precision, floating-point numbers across the network with UDP. Before you can do this, you have to convert the numbers into bytes. The simplest solution is to use a data output stream chained to a java.io.ByteArrayOutputStream. By chaining the data output stream to a byte array output stream, you can write the binary form of the floating-point numbers into a byte array, then send the entire array in a single packet.

Byte array input and output streams are commonly used when sending and receiving UDP data over the Internet. Unlike the more common TCP data, which acts like the streams I discuss in this book, UDP data arrives in raw packets of bytes, which do not necessarily have any relation to the previous packet or the next packet. Each packet is just a group of bytes to be processed in isolation from other packets. Thus, you may get nothing for several seconds, or even minutes, and then suddenly have a few hundred numbers to deal with.

In Java, UDP data is sent and received via the java.net.DatagramSocket and java.net.DatagramPacket classes. The receive() method of the DatagramSocket class returns its data in a DatagramPacket, which is little more than a wrapper around a byte array. This byte array can be easily used as the source of a ByteArrayInputStream . UDP is discussed in more detail in Chapter 9 of my book Java Network Programming (O'Reilly & Associates, 1997).

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Communicating Between Threads with Piped Streams

Content preview·Buy reprint rights for this chapter

The java.io.PipedInputStream class and java.io.PipedOutputStream class provide a convenient means to move streaming data from one thread to another. Output from one thread becomes input for the other thread, as shown in Figure 8.1

Figure 8.1: Data moving between threads with piped streams

public class PipedInputStream extends InputStream 
public class PipedOutputStream extends OutputStream

The PipedInputStream class has two constructors:

public PipedInputStream()
public PipedInputStream(PipedOutputStream source) throws IOException

The no-argument constructor creates a piped input stream that is not yet connected to a piped output stream. The second constructor creates a piped input stream that's connected to the piped output stream source.

The PipedOutputStream class also has two constructors:

public PipedOutputStream(PipedInputStream sink) throws IOException
public PipedOutputStream()

The no-argument constructor creates a piped output stream that is not yet connected to a piped input stream. The second constructor creates a piped output stream that's connected to the piped input stream sink.

Piped streams are normally created in pairs. The piped output stream becomes the underlying source for the piped input stream. For example:

PipedOutputStream pout = new PipedOutputStream();
PipedInputStream pin = new PipedInputStream(pout);

This simple example is a little deceptive, because these lines of code will normally be in different methods and perhaps even different classes. Some mechanism must be established to pass a reference to the PipedOutputStream into the thread that handles the PipedInputStream. Or you can create them in the same thread, then pass a reference to the connected stream into a separate thread. Alternately, you can reverse the order:

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Chapter 9: Compressing Streams

Content preview·Buy reprint rights for this chapter

The java.util.zip package, shown in Figure 9.1, contains six stream classes and another half dozen assorted classes that read and write data in zip, gzip, and inflate/deflate formats. Java uses these classes to read and write JAR archives and to display PNG images. You can use the java.util.zip classes as general utilities for general-purpose compression and decompression. Among other things, these classes make it trivial to write a simple file compression or decompression program.

Figure 9.1: The java.util.zip package hierarchy

The java.util.zip.Deflater and java.util.zip.Inflater classes provide compression and decompression services for all other classes. They are Java's compression and decompression engines. These classes support several related compression formats, including zlib, deflate, and gzip. These formats are documented in RFCs 1950, 1951, and 1952. (See ftp://ftp.uu.net/graphics/png/documents/zlib/zdoc-index.html) They all use the Lempel-Ziv 1977 (LZ77) compression algorithm (named after the inventors, Jakob Ziv and Abraham Lempel), though each has a different way of storing metadata that describes an archive's contents. Since compression and decompression are extremely CPU-intensive operations, for the most part these classes are Java wrappers around native methods written in C. More precisely, these are wrappers around the zlib compression library written by Jean-Loup Gailly and Mark Adler. According to Greg Roelofs, writing on the zlib web page at https://www.cdrom.com/pub/infozip/zlib/, "zlib is designed to be a free, general-purpose, legally unencumbered—that is, not covered by any patents—lossless data-compression library for use on virtually any computer hardware and operating system."

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Inflaters and Deflaters

Content preview·Buy reprint rights for this chapter

Without going into excessive detail, zip, gzip, and zlib all compress data in more or less the same way. Repeated bit sequences in the input data are replaced with pointers back to the first occurrence of that bit sequence. Other tricks are used, but this is basically how these compression schemes work and has certain implications for compression and decompression code. First, you can't randomly access data in a compressed file. To decompress the nth byte of data, you must first decompress bytes 1 through n-1 of the data. Second, a single twiddled bit doesn't just change the meaning of the byte it's part of. It also changes the meaning of bytes that come after it in the data, since subsequent bytes may be stored as copies of the previous bytes. Therefore, compressed files are much more susceptible to corruption than uncompressed files. For more general information about compression and archiving algorithms and formats, the

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Compressing and Decompressing Streams

Content preview·Buy reprint rights for this chapter

The Inflater and Deflater classes are a little raw for easy digestion. It would be more convenient to write uncompressed data onto an output stream and have it compressed by the stream itself, without having to worry about the mechanics of deflation. Similarly, it would be useful to have an input stream class that could read from a compressed file but return the uncompressed data. Java, in fact, has several classes that do exactly this. The java.util.zip.DeflaterOutputStream class is a filter stream that compresses the data it receives in deflated format before writing it out to the underlying stream. The java.util.zip.InflaterInputStream class inflates deflated data before passing it to the reading program. java.util.zip.GZIPInputStream and java.util.zip.GZIPOutputStream do the same thing except with the gzip format.

DeflaterOutputStream is a filter stream that deflates data before writing it onto the underlying stream:

public class DeflaterOutputStream extends FilterOutputStream

Each stream uses a protected Deflater object called def to compress data stored in a protected internal buffer called buf:

protected Deflater def;
protected byte[] buf;

The same deflater must not be used in multiple streams at the same time, though Java takes no steps to guarantee this.

The underlying output stream that receives the deflated data, the deflater object def, and the length of the byte array buf are all set by one of the three DeflaterOutputStream constructors:

public DeflaterOutputStream(OutputStream out, Deflater def, int bufferLength)
public DeflaterOutputStream(OutputStream out, Deflater def)
public DeflaterOutputStream(OutputStream out)

The underlying output stream must be specified. The buffer length defaults to 512 bytes, and the

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Working with Zip Files

Content preview·Buy reprint rights for this chapter

Gzip and deflate are compression formats. Zip is both a compression and an archive format. This means that a single zip file may contain more than one uncompressed file, along with information about the names, permissions, creation and modification dates, and other information about each file in the archive. This makes reading and writing zip archives somewhat more complex and somewhat less amenable to a stream metaphor than reading and writing deflated or gzipped files.

The java.util.zip.ZipFile class represents a file in the zip format. Such a file might be created by zip, PKZip, ZipIt, WinZip, or any of the many other zip programs. The java.util.zip.ZipEntry class represents a single file stored in such an archive.

public class ZipFile extends Object implements ZipConstants 
public class ZipEntry extends Object implements ZipConstants

The java.util.zip.ZipConstants interface that both these classes implement is a rare nonpublic interface that contains constants useful for reading and writing zip files. Most of these constants define the positions in a zip file where particular information, like the compression method used, is found. You don't need to concern yourself with it.

The ZipFile class contains two constructors. The first takes a filename as an argument. The second takes a java.io.File object as an argument. File objects will be discussed in Chapter 12 ; for now, I'll just use the constructor that accepts a filename. Functionally, these two constructors are similar.

public ZipFile(String filename) throws IOException
public ZipFile(File file) throws ZipException, IOException

ZipException is a subclass of IOException that generally indicates that data in the zip file doesn't fit the zip format. In this case, the zip exception's message will contain more details, like "invalid END header signature" or "cannot have more than one drive." While these may be useful to a zip expert, in general they indicate that the file is corrupted, and there's not much that can be done about it.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Checksums

Content preview·Buy reprint rights for this chapter

Compressed files are especially susceptible to corruption. While changing a bit from to 1 or vice versa in a text file generally only affects a single character, changing a single bit in a compressed file often makes the entire file unreadable. Therefore, it's customary to store a checksum with the compressed file so that the recipient can verify that the file is intact. The zip format does this automatically, but you may wish to use manual checksums in other circumstances as well.

There are many different checksum schemes. A particularly simple example adds a parity bit to the data, typically 1 if the number of 1 bits is odd, if the number of 1 bits is even. This checksum can be calculated by summing up the number of 1 bits and taking the remainder when that sum is divided by two. However, this scheme isn't very robust. It can detect single-bit errors, but in the face of bursts of errors as often occur in transmissions over modems and other noisy connections, there's a 50/50 chance that corrupt data will be reported as correct.

Better checksum schemes use more bits. For example, a 16-bit checksum could sum up the number of 1 bits and take the remainder modulo 65,536. This means that in the face of completely random data, there's only 1 in 65,536 chances of corrupt data being reported as correct. This chance drops exponentially as the number of bits in the checksum increases. More mathematically sophisticated schemes can reduce the likelihood of a false positive even further. For more details about checksums, see "Everything you wanted to know about CRC algorithms, but were afraid to ask for fear that errors in your understanding might be detected," by Ross Williams, available from https://www.geocities.com/CapeCanaveral/Launchpad/3632/crcguide.htm. Of course, the advantage of a class library is that you only really need to understand the interface of the classes you use and what they do in broad perspective. You don't necessarily have to know all the technical details of the algorithms used inside the classes.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

JAR Files

Content preview·Buy reprint rights for this chapter

Java 1.1 added support for Java ARchive files, JAR files for short. JAR files bundle the many different classes, images, and sound files an applet requires into a single file. It is generally faster for a web browser to download one JAR file than to download the individual files the archive contains, since only one HTTP connection is required. An applet stored in a JAR file, instead of as merely loose .class files, is embedded in a web page with an <applet> tag with an archive attribute pointing to the JAR file. For example:

<applet code=NavigationMenu archive="NavigationMenu.jar" width=400 height=80>
</applet>

The code attribute still says that the main class of this applet is called NavigationMenu. However, a Java 1.1 web browser, rather than asking the web server for the file NavigationMenu.class as a Java 1.0 web browser would, asks the web server for the file NavigationMenu.jar. Then the browser looks inside NavigationMenu.jar to find the file NavigationMenu.class. Only if it doesn't find NavigationMenu.class inside NavigationMenu.jar does it then go back to the web server and ask for NavigationMenu.class. Now suppose the NavigationMenu applet tries to load an image called menu.gif. The applet will look for this file inside the JAR archive too. It only has to make a new connection to the web server if it can't find menu.gif in the archive.

Sun wisely decided not to attempt to define a new file format for JAR files. Instead, they stuck with the tried-and-true zip format. This means that the classes, images, sounds, and other files stored inside a JAR archive can be compressed, making the applet even faster to download. This also means that standard tools like PKZip and standard zip libraries like java.util.zip can work with JAR files.

JAR files have also become Java's preferred means of distributing Java Beans and class libraries. For instance, the Java Cryptography Extension, discussed in the next chapter, is mostly a set of classes packed up in the file

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

File Viewer, Part 4

Content preview·Buy reprint rights for this chapter

Because of the nature of filter streams, it is relatively straightforward to add decompression services to the FileDumper program last seen in Chapter 7. Generally, you'll want to decompress a file before dumping it. Adding decompression does not require a new dump filter. Instead, it simply requires passing the file through an inflater input stream before passing it to one of the dump filters. We'll let the user choose from either gzipped or deflated files with the command-line switches -gz and -deflate. When one of these switches is seen, the appropriate inflater input stream is selected; it is an error to select both. Example 9.15, FileDumper4 , demonstrates.

Example 9.15. FileDumper4

import java.io.*;
import java.util.zip.*;
import com.macfaq.io.*;
public class FileDumper4 {
  public static final int ASC = 0;
  public static final int DEC = 1;
  public static final int HEX = 2;
  public static final int SHORT = 3;
  public static final int INT = 4;
  public static final int LONG = 5;
  public static final int FLOAT = 6;
  public static final int DOUBLE = 7;
  
  public static void main(String[] args) {
    if (args.length < 1) {
      System.err.println("Usage: java FileDumper4 [-ahdsilfx] [-little]"+ 
                                      "[-gzip|-deflated] file1...");
    }
    boolean bigEndian = true; 
    int firstFile = 0;
    int mode = ASC;
    boolean deflated = false;
    boolean gzipped = false;
    
    // Process command-line switches.
    for (firstFile = 0; firstFile < args.length; firstFile++) {
      if (!args[firstFile].startsWith("-")) break;
      if (args[firstFile].equals("-h")) mode = HEX;
      else if (args[firstFile].equals("-d")) mode = DEC;
      else if (args[firstFile].equals("-s")) mode = SHORT;
      else if (args[firstFile].equals("-i")) mode = INT;
      else if (args[firstFile].equals("-l")) mode = LONG;
      else if (args[firstFile].equals("-f")) mode = FLOAT;
      else if (args[firstFile].equals("-x")) mode = DOUBLE;
      else if (args[firstFile].equals("-little")) bigEndian = false;
      else if (args[firstFile].equals("-deflated") && !gzipped) deflated = true;
      else if (args[firstFile].equals("-gzip") && !deflated) gzipped = true;
    }
    
    for (int i = firstFile; i < args.length; i++) {
      try {
        InputStream in = new FileInputStream(args[i]);
        dump(in, System.out, mode, bigEndian, deflated, gzipped);
        
        if (i < args.length-1) {  // more files to dump
          System.out.println();
          System.out.println("--------------------------------------");
          System.out.println();
        }
      }
      catch (Exception e) {
        System.err.println(e);
        e.printStackTrace();
      }
    }
  }
  public static void dump(InputStream in, OutputStream out, int mode, 
   boolean bigEndian, boolean deflated, boolean gzipped) throws IOException {
    
    // The reference variable in may point to several different objects
    // within the space of the next few lines. We can attach
    //  more filters here to do decompression, decryption, and more.
    if (deflated) {
      in = new InflaterInputStream(in);
    }
    else if (gzipped) {
      in = new GZIPInputStream(in);
    }
    // could really pass to FileDumper3 at this point
    if (bigEndian) {
      DataInputStream din = new DataInputStream(in);
      switch (mode) {
        case HEX: 
          in = new HexFilter(in);
          break;
        case DEC: 
          in = new DecimalFilter(in);
          break;
        case INT: 
          in = new IntFilter(din);
          break;
        case SHORT: 
          in = new ShortFilter(din);
          break;
        case LONG: 
          in = new LongFilter(din);
          break;
        case DOUBLE: 
          in = new DoubleFilter(din);
          break;
        case FLOAT: 
          in = new FloatFilter(din);
          break;
        default:
      }
    }
    else {
      LittleEndianInputStream lin = new LittleEndianInputStream(in);
      switch (mode) {
        case HEX: 
          in = new HexFilter(in);
          break;
        case DEC: 
          in = new DecimalFilter(in);
          break;
        case INT: 
          in = new LEIntFilter(lin);
          break;
        case SHORT: 
          in = new LEShortFilter(lin);
          break;
        case LONG: 
          in = new LELongFilter(lin);
          break;
        case DOUBLE: 
          in = new LEDoubleFilter(lin);
          break;
        case FLOAT: 
          in = new LEFloatFilter(lin);
          break;
        default:  
      }
    }   
    StreamCopier.copy(in, out);
    in.close();
  }
}

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Chapter 10: Cryptographic Streams

Content preview·Buy reprint rights for this chapter

This chapter discusses filter streams that implement some sort of cryptography. The Java core API contains two of these in the java.security package, DigestInputStream and DigestOutputStream. There are two more cryptography streams in the javax.crypto package, CipherInputStream and CipherOutputStream . All four of these streams use an engine object to handle the filtering. DigestInputStream and DigestOutputStream use a MessageDigest object, while CipherInputStream and CipherOutputStream use a Cipher object. The streams rely on the programmer to properly initialize and—in the case of the digest streams—clean up after the engines. Therefore, we'll first look at the engine classes, then at the streams built around these engines.

In a sane world, these classes would all be part of the core API in a java.crypto package. Regrettably, U.S. export laws prohibit the export of cryptographic software without special permission. Therefore, the cryptography API and associated classes must be downloaded separately from the main JDK. Collectively these are called the Java Cryptography Extension, or JCE for short. To protect national security, you'll have to fill out a form promising you're not an international terrorist before you can download it. I feel safer already. If you're outside the United States and Canada, and you're one of the three people worldwide who actually respect U.S. export laws or who can't figure out how to penetrate the incredible security Sun has placed around JCE to make sure it doesn't fall into the hands of international terrorists, there are several third-party implementations of the JCE created outside the United States and thus not subject to its laws, including at least two free ones. These may not be completely synced with the beta release of the JCE 1.2 discussed here, but they should be close by the time you read this.

Although the initial version of the JCE worked with Java 1.1, the only version available from Sun at the time of this writing, JCE 1.2, requires Java 2 to run. The material in this chapter about message digests, hash functions, and digest streams applies to both Java 1.1 and 2. The remainder of the chapter, encryption and decryption mostly, only works in Java 2.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Hash Function Basics

Content preview·Buy reprint rights for this chapter

Sometimes it's essential to know whether data has changed. For instance, crackers invading Unix systems often replace crucial files like /etc/passwd or /usr/ucb/cc with their own hacked versions that allow them to regain access to the system if the original hole they entered through is plugged. Therefore, if you discover your system has been penetrated, one of the first things you need to do is to replace any changed files. Of course, this raises the question of how you identify the changed files, especially since anybody who's capable of replacing system executables is more than capable of resetting the last-modified date of the files. You can keep an offline copy of the system files, but this is costly and difficult, especially since multiple copies need to be stored for long periods of time. If you don't discover a penetration until several months after it occurred, you may need to roll back the system files to that point in time. Recent backups are likely to have been made after the penetration occurred and thus are also likely to be compromised.

As a less threatening example, suppose you want to be notified whenever a particular web page changes. It's not hard to write a robot that connects to the site at periodic intervals, downloads the page, and compares it to a previously retrieved copy for changes. However, if you need to do this for hundreds or thousands of web pages, the space to store the pages becomes prohibitive. Email clients have similar needs. Many broken mail clients and mailing list managers send multiple copies of the same message. A mail client should recognize when multiple copies of the same message are being passed through the system and delete them. On an ISP level, it might be possible to use this as a spam filter by comparing messages sent to different customers.

All these tasks need a way to compare files at different times without storing the files themselves. You can write a special kind of method called a hash function that reads an indefinite number of sequential bytes and assigns a number to that sequence of bytes. This number is called a

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

The MessageDigest Class

Content preview·Buy reprint rights for this chapter

The java.security.MessageDigest class is an abstract class that represents a hash code and its associated algorithm. Concrete subclasses (actually concrete subclasses of java.security.MessageDigestSPI, though the difference isn't relevant from a client's point of view) implement particular, professionally designed, well-known hash code algorithms. Thus, rather than constructing instances of this class directly, you ask the static MessageDigest.getInstance() factory method to provide an implementation of an algorithm with a particular name. Table 10.1 lists the standard names for message digest algorithms. Depending on which service providers are installed, you may or may not have all of these. The JDK 1.1 includes SHA-1 (which is the same as SHA) and MD5 but not MD2. RSA's paywareCrypto-J cryptography library also supports MD2. (See https://www.rsa.com/rsa/products/jsafe/.)

Table 10.1: Message Digest Algorithms in Java 1.1
Name	Algorithm
SHA-1	The Secure Hash Algorithm, as defined in Secure Hash Standard, NIST FIPS 180-1 (National Institute of Standards and Technology Federal Information Processing Standards Publications 180-1); produces 20-byte digests; see `https://www.itl.nist.gov/div897/pubs/fip180-1.htm`
SHA	Another name for SHA-1
MD2	RSA-MD2 as defined in RFC 1319 and RFC 1423 (RFC 1423 corrects a mistake in RFC 1319); produces 16-byte digests; suitable for use with digital signatures; see `https://www.faqs.org/rfcs/rfc1319.html`

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Digest Streams

Content preview·Buy reprint rights for this chapter

The MessageDigest class isn't particularly hard to use, as I hope Example 10.1 and Example 10.2 demonstrated. It's flexible and can be used to calculate a digest for anything that can be converted into a byte array, such as a string, an array of floating point numbers, or the contents of a text area. Nonetheless, the input data almost always comes from streams. Therefore, the java.security package contains an input stream and an output stream class that each possess a MessageDigest object to calculate a digest for the stream as it is read or written. These are DigestInputStream and DigestOutputStream .

The DigestInputStream class is a subclass of FilterInputStream :

public class DigestInputStream extends FilterInputStream

DigestInputStream has all the usual methods of any input stream, like read(), skip(), and close(). It overrides two read() methods to do its filtering. Clients use these methods exactly as they use the read() methods of other input streams:

public int read() throws IOException
public int read(byte[] data, int offset, int length) throws IOException

DigestInputStream does not change the data it reads in any way. However, as each byte or group of bytes is read, it is fed as input to a MessageDigest object stored in the class as the protected digest field:

protected MessageDigest digest;

The digest field is normally set in the constructor:

public DigestInputStream(InputStream stream, MessageDigest digest)

For example:

URL u = new URL("https://java.sun.com");
DigestInputStream din = new DigestInputStream(u.openStream(),
           MessageDigest.getInstance("SHA"));

The digest is not cloned inside the class. Only a reference to it is stored. Therefore, the message digest used inside the stream should only be used by the stream. Simultaneous or interleaved use by other objects will corrupt the digest.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Encryption Basics

Content preview·Buy reprint rights for this chapter

In this section we begin discussing cryptography. The packages, classes, and methods discussed in this and following sections are part of Sun's separately available Java Cryptography Extension ( JCE). As a standard extension to Java, the JCE cryptography classes live in the javax package rather than the java package. They are not part of the core API. You will need to download JCE from https://java.sun.com/products/jce/index.html and install it before continuing.

Because Sun is not legally allowed to export the JCE outside the U.S. and Canada, a number of third parties in other countries have implemented their own versions. In particular, Austria's Institute for Applied Information Processing and Communications has released the IAIK_ JCE, which is free for noncommercial use and can be retrieved from https://jcewww.iaik.tu-graz.ac.at/products/jce/index.php. Also notable is the more-or-less open source Cryptix package, which can be downloaded from many mirror sites worldwide. See https://www.cryptix.org/.

There are many different kinds of codes and ciphers, both for digital and nondigital data. To be precise, a code encrypts data at word or higher levels. Ciphers encrypt data at the level of letters or, in the case of digital ciphers, bytes. Most ciphers replace each byte in the original, unencrypted data, called plaintext, with a different byte, thus producing encrypted data, called ciphertext. There are many different possible algorithms for determining how plaintext is transformed into ciphertext (encryption) and how the ciphertext is transformed back into plaintext (decryption).

All the algorithms discussed here, and included in the JCE, are key-based. The key is a sequence of bytes used to parameterize the cipher. The same algorithm will encrypt the same plaintext differently when a different key is used. Decryption also requires a key. Good algorithms make it effectively impossible to decrypt ciphertext without knowing the right key.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

The Cipher Class

Content preview·Buy reprint rights for this chapter

The javax.crypto.Cipher class is a concrete class that encrypts arrays of bytes. The default implementation performs no encryption, but you'll never see this. You'll only receive subclasses that implement particular algorithms.

public class Cipher extends Object

The subclasses of Cipher that do real encryption are supplied by providers. Different providers can provide different sets of algorithms. For instance, an authoritarian government might only allow the installation of algorithms it knew how to crack, and create a provider that provided those algorithms and only those algorithms. A corporation might want to install algorithms that allowed for key recovery in the event that an employee left the company or forgot their password.

JDK 1.2 only includes the Sun provider that supplies no encryption schemes, though it does supply several digest algorithms. The JCE adds one more provider, SunJCE, which provides DES, triple DES (DESede), and password-based encryption (PBE). RSA's payware JSafe product has a security provider that provides the RSA, DES, DESede, RC2, RC4, and RC5 cipher algorithms. Ireland's Baltimore Technologies payware J/Crypto software has a security provider that provides the RSA, DES, DESede, RC2, RC4, and PBE cipher algorithms. Table 10.2 lists several of the available security providers and the algorithms they implement.

Table 10.2: Security Providers
Product (Company, Country)	URL	Digests	Ciphers	License

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Cipher Streams

Content preview·Buy reprint rights for this chapter

The Cipher class is the engine that powers encryption. Chapter 10 and Example 10.7 showed how this class could be used to encrypt and decrypt data read from a stream. The javax.crypto package also provides CipherInputStream and CipherOutputStream filter streams that use a Cipher object to encrypt or decrypt data passed through the stream. Like DigestInputStream and DigestOutputStream, they aren't a great deal of use in themselves. However, you can chain them in the middle of several other streams. For example, if you chain a GZIPOutputStream to a CipherOutputStream that is chained to a FileOutputStream, you can compress, encrypt and write to a file, all with a single call to write(). This is shown in Figure 10.3. Similarly, you might read from a URL with the input stream returned by openStream(), decrypt the data read with a CipherInputStream, then check the decrypted data with a MessageDigestInputStream, then finally pass it all into an InputStreamReader for conversion from ISO Latin-1 to Unicode. On the other side of the connection, a web server could read a file from its hard drive, write the file onto a socket with an output stream, calculate a digest with a DigestOutputStream, and encrypt the file with a CipherOutputStream.

Figure 10.3: The CipherOutputStream in the middle of a chain of filters

CipherInputStream is a subclass of FilterInputStream.

public class CipherInputStream extends FilterInputStream

CipherInputStream has all the usual methods of any input stream, like read(), skip(), and close(). It overrides seven of these methods to do its filtering:

public int read() throws IOException
public int read(byte[] data) throws IOException
public int read(byte[] data, int offset, int length) throws IOException
public long skip(long n) throws IOException
public int available() throws IOException
public void close() throws IOException
public boolean markSupported()

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

File Viewer, Part 5

Content preview·Buy reprint rights for this chapter

Handling a particular form of encryption in the FileDumper program is not hard. Handling the general case is not. It's not that decryption is difficult. In fact, it's quite easy. However, most encryption schemes require more than simply providing a key. You also need to know an assortment of algorithm parameters, like initialization vector, salt, iteration count, and more. Higher-level protocols are usually used to pass this information between the encryption program and the decryption program. The most common type of protocol is to simply store the information unencrypted at the beginning of the encrypted file. You saw an example of this in the FileDecryptor and FileEncryptor programs. The FileEncryptor chose a random initialization vector and placed its length and the vector itself at the beginning of the encrypted file so the decryptor could easily find it.

For the next iteration of the FileDumper program, I am going to use the simplest available encryption scheme, DES in ECB mode with PKCS5Padding. Furthermore, the key will simply be the first eight bytes of the password. This is probably the least secure algorithm discussed in this chapter; however, it doesn't require an initialization vector, salt, or other meta-information to be passed between the encryptor and the decryptor. Because of the nature of filter streams, it is relatively straightforward to add decryption services to the FileDumper program, assuming you know the format in which the encrypted data is stored. Generally, you'll want to decrypt a file before dumping it. This does not require a new dump filter. Instead, I simply pass the file through a cipher input stream before passing it to one of the dump filters.

When a file is both compressed and encrypted, compression is usually performed first. Therefore, we'll always decompress after decrypting. The reason is twofold. Since encryption schemes make data appear random, and compression works by taking advantage of redundancy in nonrandom data, it is difficult, if not impossible, to compress encrypted files. In fact, one quick test of how good an encryption scheme is checks whether encrypted files are compressible; if they are, it's virtually certain the encryption scheme is flawed and can be broken. Conversely, compressing files before encrypting them removes redundancy from the data that a code breaker can exploit. Therefore, it may serve to shore up some weaker algorithms. On the other hand, some algorithms have been broken by taking advantage of magic numbers and other known plaintext sequences that some compression programs insert into the encrypted data. Thus, there's no guarantee that compressing files before encrypting them will make them harder to penetrate. The best option is simply to use the strongest encryption that's available to you.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Chapter 11: Object Serialization

Content preview·Buy reprint rights for this chapter

The last several chapters have shown you how to read and write Java's fundamental data types (byte, int, String, etc.). However, there's been one glaring omission. Java is a fully object-oriented language; and yet aside from the special case of strings, you haven't seen any general-purpose methods for reading or writing objects.

Object serialization, first used in the context of Remote Method Invocation (RMI) and later for JavaBeans, addresses this need. The java.io.ObjectOutputStream class provides a writeObject() method you can use to write a Java object onto a stream. The java.io.ObjectInputStream class has a readObject() method you can use to read an object from a stream. In this chapter you'll learn how to use these two classes to read and write objects as well as how to customize the format used for serialization.

Object serialization saves an object's state in a sequence of bytes so that the object can be reconstituted from those bytes at a later time. Serialization in Java was first developed for use in RMI. RMI allows an object in one virtual machine to invoke methods in an object in another virtual machine, possibly in a different computer on the other side of the planet, by sending arguments and return values across the Internet. This requires a way to convert those arguments and return values to and from byte streams. It's a trivial task for primitive data types, but you need to be able to convert objects as well. That's what object serialization provides.

Object serialization is also used in the JavaBeans component software architecture. Bean classes are loaded into visual builder tools like the BeanBox (shown in Figure 11.1) or Borland's JBuilder. The designer then customizes the beans by assigning fonts, sizes, text, and other properties to each bean and connects them together with events. For instance, a button bean generally has a label property that is encoded as a string of text ("Start" in the button in Figure 11.1). The designer can change this text.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Reading and Writing Objects

Content preview·Buy reprint rights for this chapter

Figure 11.1: The BeanBox showing a Juggler bean and an ExplicitButton bean

Once the designer has assembled and customized the beans, the form containing all the beans must be saved. It's not enough to save the bean classes themselves; the customizations that have been applied to the beans must also be saved. That's where serialization comes in: it stores the bean as an object and thus includes any customizations, which are nothing more than the values of the bean's fields. The customized beans are stored in a .ser file, which is often placed inside a JAR archive. This JAR archive can then be loaded into web browsers as an applet; then both the classes and the objects used by the applet are loaded into the virtual machine. Thus, instead of having to write long

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Object Streams

Content preview·Buy reprint rights for this chapter

Objects are serialized by object output streams. They are deserialized by object input streams. These are instances of java.io.ObjectOutputStream and java.io.ObjectInputStream, respectively:

public class ObjectOutputStream extends OutputStream 
  implements ObjectOutput, ObjectStreamConstants 
public class ObjectInputStream extends InputStream 
  implements ObjectInput, ObjectStreamConstants

The ObjectOutput interface is a subinterface of java.io.DataOutput that declares the basic methods used to write objects and data. The ObjectInput interface is a subinterface of java.io.DataInput that declares the basic methods used to read objects and data. java.io.ObjectStreamConstants is an unimportant interface that merely declares mnemonic constants for "magic numbers" used in the object serialization. (A major goal of the object stream classes is shielding client programmers from details of the format used to serialize objects such as magic numbers.)

Although these classes are not technically filter output streams, since they do not extend FilterOutputStream and FilterInputStream, they are chained to underlying streams in the constructors:

public ObjectOutputStream(OutputStream out) throws IOException
public ObjectInputStream(InputStream in) throws IOException

To write an object onto a stream, you chain an object output stream to the stream, then pass the object to the object output stream's writeObject() method:

public final void writeObject(Object o) throws IOException

For example:

try {
  Point p = new Point(34, 22);
  FileOutputStream fout = new FileOutputStream("point.ser");
  ObjectOutputStream oout = new ObjectOutputStream(fout);
  oout.writeObject(p);
  oout.close();
}
catch (Exception e) {System.err.println(e);}

Later, the object can be read back using the readObject() method of the ObjectInputStream class:

public final Object readObject() 
            throws OptionalDataException, ClassNotFoundException, IOException

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

How Object Serialization Works

Content preview·Buy reprint rights for this chapter

Objects possess state. This state is stored in the values of the nonstatic, nontransient fields of an object's class. Consider this TwoDPoint class:

public class TwoDPoint {
  public double x;
  public double y;
}

Every object of this class has a state defined by the values of the double fields x and y. If you know the values of those fields, you know the value of the TwoDPoint. Nothing changes if you add some methods to the class or make the fields private, as in Example 11.1.

Example 11.1. The TwoDPoint Class

public class TwoDPoint {
  private double x;
  private double y;
  public TwoDPoint(double x, double y) {
    this.x = x;
    this.y = y;
  }
  public double getX() {
    return x;
  }
  public double getY() {
    return y;
  }
  
  public void setX(double x) {
    this.x = x;
  }
  public void setY(double y) {
    this.y = y;
  }
  
  public String toString() {
    return "[TwoDPoint:x=" + this.x + ", y=" + y +"]";
  }
}

The object information, the information stored in the fields, is still the same. If you know the values of x and y, you know everything there is to know about the state of the object. The methods only affect the actions an object can perform. They do not change what an object is. Now suppose you wanted to save the state of a particular point object by writing a sequence of bytes onto a stream. This process is called serialization, since the object is serialized into a sequence of bytes. You could add a writeState() method to your class that looked something like this:

public void writeState(OutputStream out) throws IOException {
    DataOutputStream dout = new DataOutputStream(out);
    dout.writeDouble(x);
    dout.writeDouble(y);
  }

To restore the state of a Point object, you could add a readState() method like this:

public void readState(InputStream in) throws IOException {
    DataInputStream din = new DataInputStream(in);
    this.x = din.readDouble();
    this.y = din.readDouble();
  }

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Performance

Content preview·Buy reprint rights for this chapter

Serialization is often the easiest way to save the state of your program. You simply write out the objects you're using, then read them back in when you're ready to restore the document. There is a downside, however. First of all, serialization is slow. If you can define a custom file format for your application's documents, using that format will almost certainly be much faster than object serialization.

Second, serialization can slow or prevent garbage collection. Every time an object is written onto an object output stream, the stream holds on to a reference to the object. Then, if the same object is written onto the same stream again, it can be replaced with a reference to its first occurrence in the stream. However, this means that your program holds on to live references to the objects it has written until the stream is reset or closed—which means these objects won't be garbage-collected. The worst-case scenario is when you keep a stream open as long as your program runs and write every object you create onto the stream. This prevents any objects from being garbage-collected.

The easy solution is to avoid keeping a running stream of the objects you create. Instead, save the entire state only when the entire state is available, and then close the stream immediately.

If this isn't possible, you have the option to reset the stream by invoking its reset() method:

public void reset() throws IOException

reset() flushes the ObjectOutputStream object's internal cache of the objects it has already written so they can be garbage-collected. However, this also means that an object may be written onto the stream more than once, so use this method with caution.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

The Serializable Interface

Content preview·Buy reprint rights for this chapter

Unlimited serialization would introduce some security problems. For one thing, it allows unrestricted access to an object's private fields. By chaining an object output stream to a byte array output stream, a hacker can convert an object into a byte array. The byte array can be manipulated and modified without any access protection or security manager checks. Then the byte array can be reconstituted into a Java object by using it as the source of a byte array input stream.

Security isn't the only potential problem. Some objects exist only as long as the current program is running. A java.net.Socket object represents an active connection to a remote host. Suppose a socket is serialized to a file, and the program exits. Later the socket is deserialized from the file in a new program—but the connection it represents no longer exists. Similar problems arise with file descriptors, I/O streams, and many more classes.

For these and other reasons, Java does not allow instances of arbitrary classes to be serialized. You can only serialize instances of classes that implement the java.io.Serializable interface. By implementing this interface, a class indicates that it may be serialized without undue problems.

public interface Serializable

This interface does not declare any methods or fields; it serves purely to indicate that a class may be serialized. You should recall, however, that subclasses of a class that implements a particular interface also implement that interface by inheritance. Thus, many classes that do not explicitly declare that they implement Serializable are in fact serializable. For instance, java.awt.Component implements Serializable. Therefore, its direct and indirect subclasses, including Button, Scrollbar, TextArea, List, Container, Panel, java.applet.Applet, all subclasses of Applet, and all Swing components may be serialized. java.lang.Throwable implements Serializable. Therefore, all exceptions and errors are serializable.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

The ObjectInput and ObjectOutput Interfaces

Content preview·Buy reprint rights for this chapter

As well as the ObjectInputStream and ObjectOutputStream classes, the java.io package also provides ObjectInput and ObjectOutput interfaces:

public interface ObjectInput extends DataInput 
public interface ObjectOutput extends DataOutput

These interfaces are not much used in Java 1.1 and 2. The only classes in the core API that actually implement them are ObjectInputStream and ObjectOutputStream. However, several methods used for customization of the serialization process are declared to accept ObjectInput or ObjectOutput objects as arguments, rather than specifically ObjectInputStream or ObjectOutputStream objects. This provides a little wiggle room for Java to grow in unforeseen ways.

The ObjectInput interface declares seven methods, all of which ObjectInputStream faithfully implements:

public abstract Object readObject() 
  throws ClassNotFoundException, IOException
public abstract int read() throws IOException
public abstract int read(byte[] data) throws IOException
public abstract int read(byte[] data, int offset, int length) 
  throws IOException
public abstract long skip(long n) throws IOException
public abstract int available() throws IOException
public abstract void close() throws IOException

The readObject() method has already been discussed in the context of object input streams. The other six methods behave exactly as they do for all input streams. In fact, at first glance, all these methods except readObject() appear superfluous, since any InputStream subclass will possess read(), skip(), available(), and close() methods with these signatures. However, this interface may be implemented by classes that aren't subclasses of InputStream.

The ObjectOutput interface declares the following six methods, all of which ObjectOutputStream faithfully implements. Except for writeObject(), which has already been discussed in the context of object output streams, these methods should behave exactly as they do for all output streams:

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Versioning

Content preview·Buy reprint rights for this chapter

When an object is written onto a stream, only the state of the object and the name of the object's class are stored; the byte codes for the object's class are not stored with the object. There's no guarantee that a serialized object will be deserialized into the same environment from which it was serialized. It's possible for the class definition to change between the time the object is written and the time it's read. For instance, a Component object may be written in Java 1.1 but read in Java 2. However, in Java 2 the Component class has three nonstatic, nontransient fields the 1.1 version of Component does not:

boolean inputMethodsEnabled;
DropTarget dropTarget;
private PropertyChangeSupport changeSupport;

There are even more differences when methods, constructors, and static and transient fields are considered. Not all changes, however, prevent deserialization. For instance, the values of static fields aren't saved when an object is serialized. Therefore, you don't have to worry about adding or deleting a static field to or from a class. Similarly, serialization completely ignores the methods in a class, so changing method bodies or adding or removing methods does not affect serialization. However, removing an instance field does affect serialization, because deserializing an object saved by the earlier version of the class will result in an attempt to set the value of a field that no longer exists.

Changes to a class are divided into two groups: compatible changes and incompatible changes. Compatible changes are those that do not affect the serialization format of the object, like adding a method or deleting a static field. Incompatible changes are those that do prevent a previously serialized object from being restored. Examples include deleting an instance field or changing the type of a field. As a general rule, any change that affects the signatures of the nontransient instance fields of a class is incompatible, while any change that does not affect the signatures of the nontransient instance fields of a class is compatible. However, there are a couple of exceptions. The following is a complete list of compatible changes:

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Customizing the Serialization Format

Content preview·Buy reprint rights for this chapter

The default serialization procedure does not always produce the results you want. Most often, a nonserializable field like a Socket or a FileOutputStream needs to be excluded from serialization. Sometimes, a class may contain data in nonserializable fields like a Socket that you nonetheless want to save—for example, the host that the socket's connected to. Or perhaps a singleton object wants to verify that no other instance of itself exists in the virtual machine before it's reconstructed. Or perhaps an incompatible change to a class (such as changing a Font field to three separate fields storing the font's name, style, and size) can be made compatible with a little programmer-supplied logic. Or perhaps you want an exceptionally large array of image data to be compressed before being written to disk. For these or many other reasons, you're allowed to customize the serialization process.

The simplest way to customize serialization is to declare certain fields transient. The values of transient fields will not be written onto the underlying output stream when an object in the class is serialized. However, this only goes as far as excluding certain information from serialization; it doesn't help you change the format that's used to store the data or take action on deserialization or ensure that no more than one instance of a singleton class is created.

For more control over the details of your class's serialization, you can provide custom readObject() and writeObject() methods. These are private methods that the virtual machine uses to read and write the data for your class. This gives you complete control over how objects in your class are written onto the underlying stream but does not require you to handle data stored in your objects' superclasses.

If you need even more control over the superclasses and everything else, you can implement the java.io.Externalizable interface, a subinterface of java.io.Serializable. When serializing an externalizable object, the virtual machine does almost nothing except identify the class. The class itself is completely responsible for reading and writing its state and its superclass's state in whatever format it chooses.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Resolving Classes

Content preview·Buy reprint rights for this chapter

The readObject() method of java.io.ObjectInputStream only creates new objects from known classes. It doesn't load classes. If a class for an object can't be found, readObject() throws a ClassNotFoundException. It specifically does not attempt to read the class data from the object stream. This is limiting for some things you might want to do, particularly RMI. Therefore, trusted subclasses of ObjectInputStream may be allowed to load classes from the stream or some other source like a URL. Specifically, a class is trusted if, and only if, it was loaded from the local class path; that is, the ClassLoader object returned by getClassLoader() is null.

Two protected methods are involved. The first is the annotateClass() method of ObjectOutputStream :

protected void annotateClass(Class c) throws IOException

In ObjectOutputStream this is a do-nothing method. A subclass of ObjectOutputStream can provide a different implementation that provides data for the class. For instance, this might be the byte code of the class itself or a URL where the class can be found.

Standard object input streams cannot read and resolve the class data written by annotateClass(). For each subclass of ObjectOutputStream that overrides annotateClass(), there will normally be a corresponding subclass of ObjectInputStream that implements the resolveClass() method:

protected Class resolveClass(ObjectStreamClass v) 
  throws IOException, ClassNotFoundException

In java.io.ObjectInputStream, this is a do-nothing method. A subclass of ObjectInputStream can provide an implementation that loads a class based on the data read from the stream. For instance, if annotateClass() wrote byte code to the stream, then the resolveClass() method would need to have a class loader that read the data from the stream. If annotateClass() wrote the URL of the class to the stream, then the resolveClass() method would need a class loader that read the URL from the stream and downloaded the class from that URL.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Resolving Objects

Content preview·Buy reprint rights for this chapter

There may be occasions where you want to replace the objects read from the stream with other, alternative objects. Perhaps an old version of a program whose data you need to read used Franc objects, but the new version of the program uses Euro objects. The ObjectInputStream can replace each Franc object read with the equivalent Euro object.

Only trusted subclasses of ObjectInputStream may replace objects. A class is only trusted if it was loaded from the local class path; that is, the class loader returned by getClassLoader() is null. To make it possible for a trusted subclass to replace objects, you must first pass true to its enableResolveObject() method:

protected final boolean enableResolveObject(boolean enable) 
  throws SecurityException

Generally, you would do this in the constructor of any class that needed to replace objects. Once object replacement is enabled, whenever an object is read, it is passed to the ObjectInputStream subclass's resolveObject() method before readObject() returns:

protected Object resolveObject(Object o) throws IOException

The resolveObject() method may return the object itself (the default behavior) or return a different object. Resolving objects is a tricky business. The substituted object must be compatible with the use of the original object, or errors will soon surface as the program tries to invoke methods or access fields that don't exist. Most of the time, the replacing object is an instance of a subclass of the class of the replaced object. Another possibility is that the replacing object and the object it replaces are both instances of different subclasses of a common superclass or interface, where the original object was only used as an instance of that superclass or interface.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Validation

Content preview·Buy reprint rights for this chapter

It is not always enough to merely restore the state of a serialized object. You may need to verify that the value of a field still makes sense, you may need to notify another object that this object has come into existence, or you simply may need to have the entire graph of the object available before you can finish initializing it.

For example, valid XML documents are essentially trees of elements combined with a document type definition (DTD). The DTD defines a grammar the document must follow. The Document Object Model (DOM) defines a means of representing XML (and HTML) documents as instances of Java classes and interfaces, including XMLNode, EntityReference, EntityDeclaration , DocumentType,

ElementDefinition

, AttributeDefinition , and others.

An XML document could be saved as a set of these serialized objects. In that case, when you deserialized the document, you would want to check that the deserialized document is still valid; that is, that the document adheres to the grammar given in the DTD. You can't do this until the entire document—all its elements, and its entire DTD—has been read. There are also a number of smaller checks you might want to perform. For instance, well-formedness (well-formedness is a slightly less stringent requirement than validity) requires that all entity references like &date; be defined in the DTD. To check this, it's not enough to have deserialized the EntityReference object. You must also have deserialized the corresponding DocumentType object that contains the necessary EntityDeclaration objects.

You can use the ObjectInputStream class's registerValidation() method to specify an ObjectInputValidation object that will be notified of the object after its entire graph has been reconstructed but before readObject() has returned it. This gives the validator an opportunity to make sure that the object doesn't violate any implicit assertions about the state of the system.

public synchronized void registerValidation(ObjectInputValidation oiv, 
  int priority) throws NotActiveException, InvalidObjectException

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Sealed Objects

Content preview·Buy reprint rights for this chapter

The JCE standard extension to Java 2, discussed in the last chapter, provides a SealedObject class that lets you encrypt objects written onto an object output stream using any available cipher. Most of the time, I suspect, you'll either encrypt the entire object output stream by chaining it to a cipher output stream, or you won't encrypt anything at all. However, if there's some reason to encrypt only some of the objects you're writing to the stream, you can make them sealed objects.

The javax.crypto.SealedObject class wraps a serializable object in an encrypted digital lockbox. The sealed object is serializable so it can be written onto object output streams and read from object input streams as normal. However, the object inside the sealed object can only be deserialized by someone who knows the key.

public class SealedObject extends Object implements Serializable

The big advantage to using sealed objects rather than encrypting the entire output stream is that the sealed objects contain all necessary parameters for decryption (algorithm used, initialization vector, salt, iteration count). All the receiver of the sealed object needs to know is the key. Thus, there doesn't necessarily have to be any prior agreement about these other aspects of encryption.

You seal an object with the SealedObject() constructor. The constructor takes as arguments the object to be sealed, which must be serializable, and the properly initialized Cipher object with which to encrypt the object:

public SealedObject(Serializable object, Cipher c) 
  throws IOException, IllegalBlockSizeException

Inside the constructor, the object is immediately serialized by an object output stream chained to a byte array output stream. The byte array is then stored in a private field that is encrypted using the Cipher object c. The cipher's algorithms and parameters are also stored. Thus, the state of the original object written onto the ultimate object output stream is the state of the object when it was sealed; subsequent changes it may undergo between being sealed and being written are not reflected in the sealed object. Since serialization takes place immediately inside the constructor, the constructor throws a

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Chapter 12: Working with Files

Content preview·Buy reprint rights for this chapter

You've already learned how to read and write data in files using file input streams and file output streams. That's not all there is to files. Files can be created, moved, renamed, copied, deleted, and otherwise manipulated without respect to their contents. Files are also often associated with meta-information that's not strictly part of the contents of the file, such as the time the file was created, the icon for the file, the permissions that determine which users can read or write to the file, and even the name of the file.

While the abstraction of the contents of a file as an ordered sequence of bytes used by file input and output streams is almost standard across platforms, the meta-information is not. The java.io.File class attempts to provide a platform-independent abstraction for common file operations and meta-information. Unfortunately, this class really shows its Unix roots. It works well on Unix, adequately on Windows and OS/2—with a few caveats—and fails miserably on the Macintosh. Java 2 improves things, but there's still a lot of history—and coming up with something that genuinely works on all platforms is an extremely difficult problem.

File manipulation is thus one of the real difficulties of cross-platform Java programming. Before you can hope to write truly cross-platform code, you need a solid understanding of the filesystem basics on all the target platforms. This chapter tries to cover those basics for the major platforms that support Java—Unix; DOS/Windows 3.x ; Windows 95, 98, and NT; OS/2; and the Mac—then it shows you how to write your file code so that it's as portable as possible.

As far as a Java program knows, a file is a sequential set of bytes stored on a disk like a hard drive or a CD-ROM. There is a first byte in the file, a second byte, and so on, until the end of the file. In this way a file is similar to a stream. However, a program can jump around in a file, reading first one part of a file, then another. This isn't possible with a stream.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Understanding Files

Content preview·Buy reprint rights for this chapter

Macintosh files are a little different. Mac files are divided into two forks, each of which is equivalent to a separate file on other platforms. The first part of a Mac file is called the data fork and contains the text, image data, or other basic information of the file. The second part of the file is called the resource fork and typically contains localizable strings, pictures, icons, graphical user interface components like menubars and dialogs, executable code, and more. On a Macintosh, all the standard java.io classes work exclusively with the data fork.

Every file has a name. The format of the filename is determined by the operating system. For example, in DOS and Windows 3.1, filenames are case-insensitive, (though generally rendered as all capitals), eight ASCII characters long with a three-letter extension. README.TXT is a valid DOS filename, but Read me before you run this program or your hard drive will get trashed is not. All ASCII characters from 32 up (that is, noncontrol characters), except for the 15 punctuation characters (+=/][":;,?*\<>|) and the space character, may be used in filenames. A period may be used only as a separator between the eight-character name and the three-letter extension. Furthermore, the complete path to the file, including the disk drive and all directories, may not exceed 80 characters in length.

On the other hand, Read me before you run this program or your hard drive will get trashed is a valid Win32 (Windows 95, 98, and NT) filename. On those systems filenames may contain up to 255 characters, though room also has to be left for the path to the file. The full pathname may not exceed 255 characters. Furthermore, Win32 filenames are stored in Unicode, though in most circumstances only the ISO Latin-1 character set is actually used to name files. Win32 systems allow any Unicode character with value 32 or above to be used, except

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Directories and Paths

Content preview·Buy reprint rights for this chapter

Modern operating systems organize files into hierarchical directories. Each directory contains zero or more files or other directories. Like files, directories have names and attributes, though—depending on the operating system—those names and attributes may be different from the attributes allowed for files. For example on the Macintosh, a file or directory name can be up to 31 bytes long, but a volume name can be no more than 27 bytes long.

To specify a file completely, you don't just give its name. You also give the directory the file lives in. Of course, that directory may itself be inside another directory, which may be in another directory, until you reach the root of the filesystem. The complete list of directories from the root to a specified file plus the name of the file itself is called the absolute path to the file. The exact syntax of absolute paths varies from system to system. Here are a few examples:

DOS	`C:\PUBLIC\HTML\JAVAFAQ\INDEX.HTM`
Win32	`C:\public\html\javafaq\index.html`
MacOS	`Macintosh HD:public:html:javafaq:index.html`
Unix	`/public/html/javafaq/index.html`

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

The File Class

Content preview·Buy reprint rights for this chapter

Instances of the java.io.File class represent filenames on the local system, not actual files. Occasionally, this distinction is crucial. For instance, File objects can represent directories as well as files. Also, you cannot assume that a file exists just because you have a File object for a file.

public class File extends Object implements Serializable

In Java 2, the File class also implements the java.lang.Comparable interface:

public class File extends Object implements Serializable, Comparable // Java 2

Although there are no guarantees that a file named by a File object actually exists, the File class does contain many methods for getting information about the attributes of a file and for manipulating those files. The File class attempts to account for system-dependent features like the file separator character and file attributes, though in practice it doesn't do a very good job, especially in Java 1.0 and 1.1.

Each File object contains a single String field called path that contains either a relative or absolute path to the file, including the name of the file or directory itself:

private String path

Many methods in this class work solely by looking at this string. They do not necessarily look at any part of the filesystem.

The java.io.File class has three constructors. Each accepts some variation of a filename as an argument. This one is the simplest:

public File(String path)

The path argument should be either an absolute or relative path to the file in a format understood by the host operating system. For example, using Unix filename conventions:

File uf1 = new File("25.html");
File uf2 = new File("course/week2/25.html");
File uf3 = new File("/public/html/course/week2/25.html");

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Filename Filters

Content preview·Buy reprint rights for this chapter

You often want to look for a particular kind of file—for example, text files. To do this, you need a FilenameFilter object that specifies which files you'll accept. FilenameFilter is an interface in the java.io package:

public interface FilenameFilter

This interface declares a single method, accept():

public abstract boolean accept(File directory, String name);

The directory argument is a File object pointing to a directory, and the name argument is the name of a file. The method should return true if a file with this name in this directory passes through the filter and false if it doesn't. Because FilenameFilter is an interface, it must be implemented in a class. Example 12.6 is a class that filters out everything that is not an HTML file.

Example 12.6. HTMLFilter

import java.io.*;
public class HTMLFilter implements FilenameFilter {
 public boolean accept(File directory, String name) {
 
   if (name.endsWith(".html")) return true;
   if (name.endsWith(".htm")) return true;
   return false;
 }
}

Files can be filtered using any criteria you like. An accept() method may test modification date, permissions, file size, and any attribute Java supports. (You can't filter by attributes Java does not support, like Macintosh file and creator codes, at least not without native methods or some sort of access to the native API.) This accept() method tests whether the file ends with .html and is in a directory where the program can read files:

public boolean accept(File directory, String name) {
 
  if (name.endsWith(".html") && directory.canRead()) {
    return true;
  }
  return false;
}

Filename filters are primarily intended for the use of file dialogs, which will be discussed in the next chapter. However, in Java 2 the File class has a listFiles() method that takes a FilenameFilter

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

File Filters

Content preview·Buy reprint rights for this chapter

Java 2 adds a new java.io.FileFilter interface that's very similar to FilenameFilter:

public abstract interface FileFilter  // Java 2

The accept() method of FileFilter takes a single File object as an argument, rather than two strings giving the directory and path:

public boolean accept(File pathname)  // Java 2

Example 12.7 is a filter that only passes HTML files. Its logic is essentially the same as the filter of Example 12.6.

Example 12.7. HTMLFileFilter

import java.io.*;
public class HTMLFileFilter implements FileFilter {
 public boolean accept(File pathname) {
 
   if (pathname.getName().endsWith(".html")) return true;
   if (pathname.getName().endsWith(".htm")) return true;
   return false;
 }
}

This class appears as an argument in one of the listFiles() methods of java.io.File:

public File[] listFiles(FileFilter filter)      // Java 2

Example 12.8 uses the HTMLFileFilter to list the HTML files in the current working directory.

Example 12.8. List HTML Files

import java.io.*; 
public class HTMLFiles {
  public static void main(String[] args) {
    
    File cwd = new File(System.getProperty("user.dir"));
    File[] htmlFiles = cwd.listFiles(new HTMLFileFilter());
    for (int i = 0; i < htmlFiles.length; i++) {
      System.out.println(htmlFiles[i]);
    }
  }
}

There's a nasty name conflict between the java.io.FileFilter interface and the abstract javax.swing.filechooser.FileFilter class discussed in the next chapter. I would not be surprised if this interface were replaced by a new abstract FileFilter class more like javax.swing.filechooser.FileFilter.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

File Descriptors

Content preview·Buy reprint rights for this chapter

As I've said several times so far, the existence of a java.io.File object doesn't imply the existence of the file it represents. A java.io.FileDescriptor object does, however, refer to an actual file:

public final class FileDescriptor extends Object

A FileDescriptor object is an abstraction of an underlying machine-specific structure that represents an open file. While file descriptors are very important for the underlying OS and filesystem, their only real use in Java is to guarantee that data that's been written to a stream is in fact committed to disk; that is, to synchronize between the program and the hardware.

In addition to open files, file descriptors can also represent open sockets, though this use won't be emphasized in this book. There are also three file descriptors for the console: System.in, System.out, and System.err. These are available as the three mnemonic constants FileDescriptor.in, FileDescriptor.out, and FileDescriptor.err:

public static final FileDescriptor in
public static final FileDescriptor out
public static final FileDescriptor err

Because file descriptors are very closely tied to the native operating system, you never construct your own file descriptors. Various methods in other classes that refer to open files or sockets may return them. Both the FileInputStream and FileOutputStream classes and the RandomAccessFile class have a getFD() method that returns the file descriptor associated with the open stream or file:

public final FileDescriptor getFD() throws IOException

The java.net.SocketImpl class stores the file descriptor for a socket in a protected field called

fd

protected FileDescriptor fd

This field is returned by SocketImpl's protected getFileDescriptor() method:

protected FileDescriptor getFileDescriptor()

Since file descriptors are only associated with open

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Random-Access Files

Content preview·Buy reprint rights for this chapter

File input and output streams require you to start reading or writing at the beginning of a file and then read or write the file in order, possibly skipping over some bytes or backing up but more or less moving from start to finish. Sometimes, however, you need to read parts of a file in a more or less random order, where the data near the beginning of the file isn't necessarily read before the data nearer the end. Other times you need to both read and write the same file. For example, in record-oriented applications like databases, the actual data may be indexed; you would use the index to determine where in the file to find the record you need to read or write. While you could do this by constantly opening and closing the file and skipping to the point where you needed to read, this is far from efficient. Writes are even worse, since you would need to read and rewrite the entire file, even to change just one byte of data.

Random-access files can be read from or written to or both from a particular byte position in the file. A single random-access file can be both read and written without first being closed. The position in the file where reads and writes start from is indicated by an integer called the file pointer. Each read or write advances the file pointer by the number of bytes read or written. Furthermore, the programmer can reposition the file pointer at different bytes in the file without closing the file.

In Java, random file access is performed through the java.io.RandomAccessFile class. This is not a subclass of java.io.File:

public class RandomAccessFile extends Object implements DataInput, DataOutput

Among other differences between File objects and RandomAccessFile objects, the RandomAccessFile constructors actually open the file in question and throw an IOException if it doesn't exist:

public RandomAccessFile(String filename, String mode) throws FileNotFoundException
public RandomAccessFile(File file, String mode) throws IOException

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

General Techniques for Cross-Platform File Access Code

Content preview·Buy reprint rights for this chapter

File manipulation vies with AWT for being the part of Java where it's hardest to write truly cross-platform, robust code. Until Java 2, Sun really didn't pay a lot of attention to differences between filesystems on different platforms. The situation is getting better, however. The java.io.File class does work much more reliably across Windows and Unix in Java 2 and has hooks to allow it to work more naturally on other platforms as well. Of course, Java 1.1 is still the primary delivery platform for most Java applications that work with files. To help you achieve greater serenity and overall cross-platform nirvana, I've summarized some basic rules from this chapter to help you write file manipulation code that's robust across a multitude of platforms:

Never, never, never hardcode pathnames in your application.
Ask the user to name your files. If you must provide a name for a file, try to make it fit in an 8.3 DOS filename with only pure ASCII characters.
Do not assume the file separator is "/" (or anything else). Use File.separatorChar instead.
Do not parse pathnames to find directories. Use the methods of the java.io.File class instead.
Do not use renameTo() for anything except renaming a file. In particular, do not use it to move a file.
Try to avoid moving and copying files from within Java programs if at all possible.
Do not use . to refer to the current directory. Use System.getProperty ("user.dir") instead.
Do not use .. to refer to the parent directory. Use getParent() instead.
Do not assume the current working directory is the one where your

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Chapter 13: File Dialogs and Choosers

Content preview·Buy reprint rights for this chapter

Filenames are problematic, even if you don't have to worry about cross-platform idiosyncrasies. Users forget filenames, mistype them, can't remember the exact path to files they need, and more. The proper way to ask a user to select a file is to show them a list of the files in the current directory and get them to select from that list. You also need to allow them to navigate between directories, insert and remove floppy disks, mount network servers, and more.

Most graphical user interfaces (and not a few nongraphical ones) provide standard widgets for selecting a file. In Java the platform's native file selector widget is exposed through the java.awt.FileDialog class. Like many native peer-based classes, however, FileDialog doesn't behave exactly the same on all platforms. Therefore, Swing (part of the Java Foundation Classes) provides a pure Java implementation of a file dialog, the javax.swing.JFileChooser class. JFileChooser (and Swing in general) has much more reliable cross-platform behavior.

I'm going to jump out of the java.io package for a minute to pick up one file-related class from the AWT, java.awt.FileDialog. File dialogs are the standard open and save dialogs provided by the host GUI. Users use them to pick a directory and a name under which to save a file or to choose a file to open. The appearance varies from platform to platform, but the intent is the same. Figure 13.1 shows a standard Save dialog on the Mac; Figure 13.2 shows a standard open dialog on Solaris.

Figure 13.1: The Mac's standard Save dialog

Figure 13.2: Motif standard Open dialog

FileDialog is a subclass of java.awt.Dialog

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

File Dialogs

Content preview·Buy reprint rights for this chapter

Figure 13.1: The Mac's standard Save dialog

Figure 13.2: Motif standard Open dialog

FileDialog is a subclass of java.awt.Dialog that represents the native save and open dialog boxes:

public class FileDialog extends Dialog

A file dialog is almost completely implemented by a native peer. Your program doesn't add components to a file dialog or handle user interaction with event listeners. It just displays the dialog and retrieves the name and directory of the file the user chose after the dialog is dismissed.

Since applets normally can't read or write files, file dialogs are primarily useful only in applications. Nonetheless, there is no specific security manager check to see whether file dialogs are allowed. Sun's applet viewer, HotJava, and some recent versions of Netscape Navigator do allow untrusted applets to display file dialogs, retrieve the name and path of the file selected, and send that information back to the originating host over the network. Although this is a very minor security hole, since it only exposes the name and path of a single file selected by the user, it's still on the worrisome side for the paranoid. Internet Explorer 4.0 and Navigator 4.0.3 and earlier do not allow applets to display file dialogs. Certainly, you can't count on being allowed to use a file dialog in an applet, nor can you be guaranteed that it isn't allowed either.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

JFileChooser

Content preview·Buy reprint rights for this chapter

Swing, part of the Java Foundation Classes, provides a much more sophisticated and useful file chooser component written in pure Java, javax.swing.JFileChooser :

public class JFileChooser extends JComponent implements Accessible

JFileChooser is not an independent, free-standing window like FileDialog. Instead, it is a component you can add to your own frame, dialog, or other container or window. You can, however, ask the JFileChooser class to create a modal dialog just for your file chooser. Figure 13.3 shows a file chooser embedded in a JFrame window with the Metal look and feel. Of course, like all Swing components, the exact appearance depends on the look and feel currently selected.

Figure 13.3: A JFileChooser with the Metal look and feel

For the most part, the file chooser works as you expect, especially if you're accustomed to Windows. You select a file with the mouse. Double-clicking the filename or pressing the Open button returns the currently selected file. You can change which files are displayed by selecting different filters from the pop-up list of choosable file filters. All the components have tooltips to help users who are a little thrown by an unfamiliar look and feel. One difference between a Swing file chooser and a standard, native chooser may surprise you. While double-clicking on a directory will open the directory as you expect, selecting a directory and then pressing the Open button returns the selected directory as a File object.

The JFileChooser class relies on support from several classes in the javax.swing.filechooser package, including:

public abstract class FileFilter
public abstract class FileSystemView
public abstract class FileView

Unfortunately, these classes still have a few rough edges as of Java 2. They still don't support the Macintosh (though an early access release is available), and they have to jump through some hoops to account for the different levels of support for I/O in Java 1.1 and Java 2.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

File Viewer, Part 6

Content preview·Buy reprint rights for this chapter

We've now got the tools needed to put a graphical user interface onto the FileViewer application we've been developing. The back end doesn't need to change at all. It's still based on the same filter streams we've used for the last several chapters. However, instead of reading filenames from the command line, we can get them from a file chooser. Instead of dumping the files on System.out, we can display them in a text area. And instead of relying on the user remembering a lot of confusing command-line switches, we can provide simple radio buttons for the user to choose from. This has the added advantage of making it easy to repeatedly interpret the same file according to different filters.

Figure 13.6 shows the finished application. This will give you some idea of what the code is aiming at. Initially, I started with a pencil-and-paper sketch, but I'll spare you my inartistic renderings. The single JFrame window is organized with a border layout. The west panel contains various controls for determining how the data is interpreted. The east panel contains the JFileChooser used to select the file. Notice that the Approve button has been customized to say "View File" rather than "Open". Ideally, I'd like to make the Cancel button say "Quit" instead, but the JFileChooser class doesn't allow you to do that without using resource bundles, a subject I would prefer to leave for another book. The south panel contains a scroll pane. Inside the scroll pane is a streamed text area.

Figure 13.6: The FileViewer

One fact I discovered while developing this application was that Swing components don't get along well with standard AWT components like Frame and TextArea. My initial attempts that mixed AWT components with the Swing JFileChooser rapidly crashed the VM. Replacing all components with their Swing equivalents solved the problem.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Chapter 14: Multilingual Character Sets and Unicode

Content preview·Buy reprint rights for this chapter

We live on a planet on which many languages are spoken. I can walk out my front door in Brooklyn on any given day and hear people conversing in French, Creole, Hebrew, Arabic, Spanish, and languages I don't even recognize. And the Internet is even more diverse than Brooklyn. A local doctor's office that sets up a storefront on the Web to sell vitamins may soon find itself shipping to customers whose native language is Chinese, Gujarati, Turkish, German, Portuguese, or something else. There's no such thing as a local business on the Internet.

However, the first computers and the first programming languages were mostly designed by English-speaking programmers in countries where English was the native language. These programmers designed character sets that worked well for English text, though not much else. The preeminent such set is ASCII. Since ASCII is a seven-bit character set, each ASCII character can easily be represented as a single byte, signed or unsigned. Thus, it's natural for ASCII-based programming languages to equate the character data type with the byte data type. In these languages, such as C, the same operations that read and write bytes also read and write characters.

Unfortunately, ASCII is inadequate for almost all non-English languages. It contains no cedillas, umlauts, betas, thorns, or any of the other thousands of non-English characters that are used to read and write text around the world. Fairly shortly after the development of ASCII, there was an explosion of extended character sets around the world, each of which encoded the basic ASCII characters as well as the additional characters needed for another language like Greek, Turkish, Arabic, Chinese, Japanese, or Russian. Many of these character sets are still used today, and much existing data is encoded in them.

However, these character sets are still inadequate for many needs. For one thing, most assume that you only want to encode English plus one other language. This makes it difficult for a Russian classicist to write a commentary on an ancient Greek text, for example. Furthermore, documents are limited by their character sets. Email sent from Morocco may become illegible in India if the sender is using an Arabic character set but the recipient is using Devanagari.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Unicode

Content preview·Buy reprint rights for this chapter

Unicode is Java's native character set. Each Unicode character is a two-byte, unsigned number with a value between and 65,535. This provides enough space for characters from all the world's alphabetic scripts and the most common characters from the ideographic scripts of Chinese and Japanese. The current version of Unicode (2.1) defines 38,887 different characters from many languages, including English, Russian, Arabic, Hebrew, Greek, Thai, Korean, and Sanskrit. The most common ideographic characters from Japanese and Chinese are also included. However, Chinese alone contains over 80,000 different ideograms, so it's impossible to include them all in a two-byte set. A four-byte Universal Character Set (UCS) that will include the full Chinese and Japanese scripts is under development. Java does not yet support UCS.

The first 128 Unicode characters (characters through 127) are identical to the ASCII character set. 32 is the ASCII space; therefore, 32 is the Unicode space. 33 is the ASCII exclamation point, so 33 is the Unicode exclamation point, and so on. Table 2.1, in Appendix B, shows this character set. The next 128 Unicode characters (characters 128 through 255) have the same values as the equivalent characters in the Latin-1 character set defined by ISO standard 8859-1. Latin-1, a slight variation of which is used by Windows, adds the various accented characters, umlauts, cedillas, upside-down question marks, and other characters needed to write text in most Western European languages. Table 2.2 shows these characters. The first 128 characters in Latin-1 are identical to the ASCII character set.

Values beyond 255 encode characters from various other character sets. Where possible, character blocks describing a particular group of characters map onto established encodings for that set of characters by simple transposition. For instance, Unicode characters 884 through 1011 encode the Greek alphabet and associated characters like the Greek question mark (;). This is a direct transposition by 756 of characters 128 through 255 of the ISO 8859-7 character set, which is in turn based on the Greek national standard ELOT 928. For example, the small letter delta,

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Displaying Unicode Text

Content preview·Buy reprint rights for this chapter

Although internally Java can handle full Unicode data (it's just numbers, after all), not all Java environments can display all Unicode characters. In fact, I'll go so far as to say none of the current Java environments, whether standalone virtual machines or web browsers, can display all Unicode characters.

Unicode is divided into blocks. For example, characters through 127 are the Basic Latin block and contain ASCII. Characters 128 through 255 are the Latin Extended-A block and contain the upper 128 characters of the Latin-1 character set. Characters 9984 through 10,175 are the Dingbats block and contain the characters in the popular Zapf Dingbats font. Characters 19,968 through 40,959 are the unified Chinese-Japanese-Korean ideograph block. Each block represents a script or a subset of a script. As a rule of thumb, most runtime environments can display only some of these blocks. Occasionally, a particular runtime may be able to display some characters from a block but not others. For instance, most Macintoshes can display the entire Latin Extended-A block except for the Icelandic characters þ, Þ, Ý, Ð, and ð .

The biggest problem is the lack of fonts. Few computers have fonts for all the scripts Java supports. Even computers that possess the necessary fonts can't install a lot of them because of their size. A normal, 8-bit outline font ranges from about 30-60K. A Unicode font that omits the Han ideographs will be about 10 times that size. And a full Unicode font that includes the full range of Han ideographs will occupy between five and seven megabytes. Furthermore, text display algorithms based on English often break down when faced with right-to-left languages like Hebrew and Arabic, vertical languages like the traditional Chinese still used in Taiwan, or context-sensitive languages like Arabic.

Finally, even web browsers that can handle Chinese, Cyrillic, Arabic, Japanese, or other non-Roman scripts in HTML don't necessarily support those same scripts in applets. (HotJava 1.1 and earlier is a notable offender here.) It's even sometimes the case that characters an applet can draw directly using a

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Unicode Escapes

Content preview·Buy reprint rights for this chapter

Currently, there isn't a large installed base of Unicode text editors. There's an even smaller installed base of machines with full Unicode fonts installed. Therefore, it's essential that all valid Java programs can be written using nothing more than ASCII characters.

All Java keywords and operators as well as the names of all the classes, methods, and fields in the core API may be written in pure ASCII. This is by deliberate design on the part of JavaSoft. However, Unicode characters are explicitly allowed in comments, string and char literals, and identifiers. The following, the opening line from Homer's Odyssey, should be legal Java:

To enable statements like that in Java source, non-ASCII characters are embedded through Unicode escape sequences. The escape sequence for a character is a backslash ( \ ) followed by a small u, followed by the four-digit hexadecimal code for the character. For example:

char tab = '\u0009';
char softHyphen = '\u00AD';
char sigma = '\u03C3';
char squareKeesu = '\u30B9';.

Using Unicode escapes, the opening line from Homer's Odyssey would be rendered as:

/* \u039F\u03B4\u03C5\u03C3\u03C3\u03B5\u03B9\u03B1 */
String \u03B1\u03C1\u03C7\u03B7 = 
 "\u0386\u03BD\u03B4\u03C1\u03B1 \u03BC\u03BF\u03B9 "
 + "\u03AD\u03BD\u03BD\u03B5\u03C0\u03B5, " 
 + "\u039C\u03BF\u03C5\u03C3\u03B1, " 
 + " \u03BF\u03C2 \u03BC\u03AC\u03BB\u03B1 \u03C0\u03BF\u03BB\u03BB\u03B1";

Obviously, this is horribly inconvenient for anything more than an occasional non-ASCII character.

Many Java compilers assume that source files are written in ASCII and that the only Unicode characters present are Unicode escapes. During a single-pass preprocessing phase, the compiler converts each raw ASCII character or Unicode escape sequence to a two-byte Unicode character it stores in memory. Only after preprocessing is complete and the ASCII file has been converted to in-memory Unicode, is the file actually compiled. Some compilers and runtimes will also compile the upper 128 characters of the ISO Latin-1 character set. However, some do not. Worse yet, some Java virtual machines can compile files containing non-ASCII, ISO Latin-1 characters but can't run the files they've compiled. For safety's sake and maximum portability, you should escape all non-ASCII characters.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

UTF-8

Content preview·Buy reprint rights for this chapter

Since every Unicode character is encoded in exactly two bytes, Unicode is a fairly simple encoding. The first two bytes of a file are the first character. The next two bytes are the second character, and so on. This makes parsing Unicode data relatively simple compared to schemes that use variable-width characters. The downside is that Unicode is far from the most efficient encoding possible. In a file containing mostly English text, the high bytes of almost all the characters will be 0. These bytes can occupy as much as half of the file. If you're sending data across the network, Unicode data can take twice as long.

A more efficient encoding can be achieved for files that are composed primarily of ASCII text by encoding the more common characters in fewer bytes. UTF-8 is one such format that encodes the non-null ASCII characters in a single byte, characters between 128 and 2047 and ASCII null in two bytes, and the remaining characters in three bytes. While theoretically this encoding might expand a file's size by 50%, because most text files contain primarily ASCII, in practice it's almost always a huge savings. Therefore, Java uses UTF-8 in string literals, identifiers, and other text data in compiled byte code. UTF-8 is also a common encoding for XML files and the native encoding of Bell Labs' experimental Plan 9 operating system.

To better understand UTF-8, consider a typical Unicode character as a sequence of 16 bits:

x15

x14

x13

x12

x11

x10

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

The char Data Type

Content preview·Buy reprint rights for this chapter

The char primitive data type in Java is a two-byte unsigned integer whose values range from to 65,535. char variables may be assigned from int literals, like this:

char exclamationPoint = 33;

In the virtual machine, chars are promoted to ints in arithmetic operations like addition and multiplication. Therefore, operations more complicated than a simple assignment require an explicit cast to char, like this:

char a = 97;
char b = (char) (a + 1);

In practice, chars are rarely used in arithmetic operations. Instead, they're given symbolic meanings through mappings to particular elements of the Unicode character set. For instance, 33 is the Unicode (and ASCII) character for the exclamation point (!). 97 is the Unicode (and ASCII) character for the small letter a. When the Unicode and printable ASCII characters converge, as they do for values between 32 and 127, a char may be written in Java source code as a char literal. This is the desired ASCII character between single quote marks, like this:

char exclamationPoint = '!';
char a = 'a';
char b = 'b';

For characters outside this range, you can assign values to chars using Unicode escape sequences, like this:

char tab = '\u0009';
char softHyphen = '\u00AD';
char sigma = '\u03C3';
char squareKeesu = '\u30B9';

As for the other primitive data types, the core API includes a type wrapper class for char values. This is java.lang.Character :

public final class Character implements Serializable

In Java 2 Character also implements Comparable:

public final class Character implements Serializable, Comparable // Java 2

Section 14.5.1.1: Constructor

This class has a single constructor:

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Other Encodings

Content preview·Buy reprint rights for this chapter

Although Unicode is the most advanced and comprehensive character set yet designed on this planet, it has not taken the world by storm. Compared to the vast quantities of ASCII data, there are virtually no Unicode files on today's computers. Although Unicode support is growing, there will doubtless be legacy data in other encodings that must be read for centuries to come. A lot of it is in the Unicode subsets ASCII and ISO Latin-1, but a lot of it is also in less popular encoding schemes like EBCDIC and MacRoman. Those only cover English and a few Western European languages. There are multiple encodings in use for Arabic, Turkish, Hebrew, Greek, Cyrillic, Chinese, Japanese, Korean, and many other languages and scripts. The Reader and Writer classes (discussed in the next chapter) allow you to read and write data in these different character sets. The String class also has a number of methods that convert between different encodings (though a String object itself is always represented in Unicode). Furthermore, the JDK includes a character mode tool based on these classes called native2ascii that performs such conversions on existing files.

The name native2ascii is a misnomer. Rather than converting to ASCII, it converts to ISO Latin-1 with Unicode characters embedded with Unicode escape sequences like \u020F. It can also work in reverse, converting an ISO Latin-1 file with embedded Unicode to a native character set. For example, to copy the contents of the file macdata.txt from the MacRoman encoding into a new file called isodata.txt encoded with ISO Latin-1 with Unicode escapes, you would type:

% native2ascii -encoding MacRoman macdata.txt isodata.txt

You can convert it back with the -reverse option:

% native2ascii -encoding MacRoman -reverse isodata.txt macdata.txt

If you don't specify a particular encoding, native2ascii makes its best guess as to the platform's native encoding. This best guess is read from the system property file.encoding

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Converting Between Byte Arrays and Strings

Content preview·Buy reprint rights for this chapter

The java.lang.String class has several constructors that form strings from byte arrays and several methods that return a byte array corresponding to a given string. Anytime a Unicode string is converted to bytes or vice versa, that conversion happens according to one of the encodings listed in Table 2.4. The same string can produce different byte arrays if different encodings are used. Six constructors form a new String object from a byte array:

public String(byte[] ascii, int highByte)
public String(byte[] ascii, int highByte, int offset, int length)
public String(byte[] data, String encoding) 
  throws UnsupportedEncodingException
public String(byte[] data, int offset, int length, String encoding) 
  throws UnsupportedEncodingException
public String(byte[] data)
public String(byte[] data, int offset, int length)

The first two constructors, the ones with the highByte argument, are leftovers from Java 1.0 that are deprecated in Java 1.1. These two constructors do not accurately translate non-Latin-1 character sets into Unicode. Instead, they read each byte in the ascii array as the low-order byte of a two-byte character, then fill in the high-order byte with the highByte argument. For example:

byte[] isoLatin1 = new byte[256];
for (int i = 0; i < 256; i++) isoLatin1[i] = (byte) i;
String s = new String(isoLatin1, 0);

Frankly, this is a kludge; it's deprecated for good reason. This scheme works quite well for Latin-1 data with a high byte of 0. However, it's extremely difficult to use for character sets where different characters need to have different high bytes, and it's completely unworkable for character sets like MacRoman that also need to adjust bits in the low-order byte to conform to Unicode. The only approach that genuinely works for the broad range of character sets Java programs may be asked to handle is table lookup. Each character set in Table 2.4 is associated with a table mapping characters in the set to Unicode characters. These tables are hidden inside the

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Chapter 15: Readers and Writers

Content preview·Buy reprint rights for this chapter

A language that supports international text must separate the reading and writing of raw bytes from the reading and writing of characters, since in an international system they are no longer the same thing. Classes that read characters must be able to parse a variety of character encodings, not just ASCII, and translate them into the language's native character set. Classes that write characters must be able to translate the language's native character set into a variety of formats and write those. In Java this task is performed by the Reader and Writer classes.

You're probably going to experience a little déjà vu. The java.io.Writer class is modeled on the java.io.OutputStream class. The java.io.Reader class is modeled on the java.io.InputStream class. The names and signatures of the members of the Reader and Writer classes are similar (sometimes identical) to the names and signatures of the members of the InputStream and OutputStream classes. The patterns these classes follow are similar as well. Filtered input and output streams are chained to other streams in their constructors. Similarly, filtered readers and writers are chained to other readers and writers in their constructors. InputStream and OutputStream are abstract superclasses that identify common functionality in the concrete subclasses. Likewise, Reader and Writer are abstract superclasses that identify common functionality in the concrete subclasses. The difference between readers and writers and input and output streams is that streams are fundamentally byte based, while readers and writers are fundamentally character based. Where an input stream reads a byte, a reader reads a character; where an output stream writes a byte, a writer writes a character.

While bytes are a more or less universal concept, characters are not. As you learned in the last chapter, the same character can be encoded differently in different character sets. Different character sets encode different characters. Characters can even have different widths in different character sets. For example, ASCII and ISO Latin-1 use one-byte characters. Unicode uses two-byte characters. UTF-8 uses characters of varying width between one and three bytes. Concrete subclasses of the

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

The java.io.Writer Class

Content preview·Buy reprint rights for this chapter

The Writer class is abstract, just like OutputStream is abstract. You won't have any pure instances of Writer that are not also instances of some concrete subclass of Writer. However, many of the subclasses of Writer differ primarily in the targets of the text they write, just as many concrete subclasses of OutputStream differ only in the targets of the data they write. Most of the time you don't care about the difference between FileOutputStream and ByteArrayOutputStream. Similarly, most of the time you won't care about the differences between FileWriter and StringWriter. You'll just use the methods of the common superclass, java.io.Writer.

You use a writer almost exactly as you use an output stream. Rather than writing bytes, you write chars. The write() method writes a subarray from the char array text starting at offset and continuing for length characters:

public abstract void write(char[] text, int offset, int length) 
  throws IOException

For example, given some Writer object w, you can write the string Testing 1-2-3 like this:

char[] test = {'T', 'e', 's', 't', 'i', 'n', 'g', ' ', 
               '1', '-', '2', '-', '3'};
w.write(test, 0, test.length);

This method is abstract. Concrete subclasses that convert chars into bytes according to a specified encoding and write those bytes onto an underlying stream must override this method. An IOException may be thrown if the underlying stream's write() method throws an IOException. You can also write a single character, an entire array of characters, a string, or a substring:

public void write(int c) throws IOException
public void write(char[] text) throws IOException
public void write(String s) throws IOException
public void write(String s, int offset, int length) throws IOException

The default implementations of these four methods convert their first argument into an array of chars and pass that to

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

The OutputStreamWriter Class

Content preview·Buy reprint rights for this chapter

java.io.Writer is an abstract class. Its most basic concrete subclass is OutputStreamWriter :

public class OutputStreamWriter extends Writer

Its constructor connects a character writer to an underlying output stream:

public OutputStreamWriter(OutputStream out)
public OutputStreamWriter(OutputStream out, String encoding) throws
  UnsupportedEncodingException

The first constructor assumes that the text in the stream is to be written using the platform's default encoding. The second constructor specifies an encoding. There's no easy way to determine which encodings are supported, but the ones listed in Table 2.4 in Appendix B, are supported by most VMs. For example, this code attaches an OutputStreamWriter to System.out with the default encoding:

OutputStreamWriter osw = new OutputStreamWriter(System.out);

The default encoding is normally ISO Latin-1, except on Macs, where it is MacRoman. Whatever it is, you can find it in the system property file.encoding:

String defaultEncoding = System.getProperty("file.encoding");

On the other hand, if you want to write a file encoded in ISO 8859-7 (ASCII plus Greek) you might do this:

FileOutputStream fos = new FileOutputStream("greek.txt");
OutputStreamWriter greekWriter = new OutputStreamWriter(fos, "8859_7");

The write() methods convert characters to bytes according to a specified character encoding and write those bytes onto the underlying output stream:

public void write(int c) throws IOException
public void write(char[] text, int offset, int length) throws IOException
public void write(String s, int offset, int length) throws IOException

Once the Writer is constructed, writing the characters is easy. For example:

String  arete = "\u03B1\u03C1\u03B5\u03C4\u03B7";
greekWriter.write(arete, 0, arete.length());

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

The java.io.Reader Class

Content preview·Buy reprint rights for this chapter

You use a reader almost exactly as you use an input stream. Rather than reading bytes, you read characters. The basic read() method reads a specified number of characters from the underlying input stream into an array starting at a given offset:

public abstract int read(char[] buffer, int offset, int length) 
  throws IOException

This read() method returns the number of characters actually read. As with input streams reading bytes, there may not be as many characters available as you requested. Also like the read() method of an input stream, it returns -1 when it detects the end of the data.

This read() method is abstract. Concrete subclasses that read bytes from some source must override this method. An IOException may be thrown if the underlying stream's read() method throws an IOException or an encoding error is detected.

You can also fill an array with characters using this method:

public int read(char[] buffer) throws IOException

This is equivalent to invoking read(buffer, 0, buffer.length). Thus, it also returns the number of characters read and throws an IOException when the underlying stream throws an IOException or when an encoding error is detected. The following method reads a single character and returns it:

public int read() throws IOException

Although an int is returned, this int is always between and 65,535 and may be cast to a char without losing information. All three read() methods block until some input is available, an I/O error occurs, or the end of the stream is reached.

You can skip a certain number of characters. This method also blocks until some characters are available. It returns the number of characters skipped or -1 if the end of stream is reached.

public long skip(long n) throws IOException

The ready() method returns true if the reader is ready to be read from,

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

The InputStreamReader Class

Content preview·Buy reprint rights for this chapter

The most basic concrete subclass of Reader is InputStreamReader:

public class InputStreamReader extends Reader

The constructor connects a character reader to an underlying input stream:

public InputStreamReader(InputStream in)
public InputStreamReader(InputStream in, String encoding) 
 throws UnsupportedEncodingException

The first constructor uses the platform's default encoding, as given by the system property file.encoding. The second one uses the specified encoding. For example, to attach an InputStreamReader to System.in with the default encoding (generally ISO Latin-1):

InputStreamReader isr = new InputStreamReader(System.in);

If you want to read a file encoded in Latin-5 (ASCII plus Turkish, as specified by ISO 8859-9), you might do this:

FileInputStream fin = new FileInputStream("symbol.txt");
InputStreamReader isr = new InputStreamReader(fin, "8859_9");

There's no easy way to determine which encodings are supported, but the ones listed in Table 2.4 are supported by most VMs.

The read() methods read bytes from an underlying input stream and convert those bytes to characters according to the specified encoding:

public int read() throws IOException
public int read(char c[], int off, int length) throws IOException

The getEncoding() method returns a string containing the name of the encoding used by this reader:

public String getEncoding()

The remaining two methods just override methods from java.io.Reader but behave identically from the perspective of the programmer:

public boolean ready() throws IOException
public void close() throws IOException

Example 15.2 uses an InputStreamReader to read a file in a user-specified encoding. The FileConverter reads the name of the input file, the name of the of the output file, the input encoding, and the output encoding. Characters that are not available in the output character set are replaced by the substitution character, generally the question mark.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Character Array Readers and Writers

Content preview·Buy reprint rights for this chapter

The java.io.ByteArrayInputStream and java.io.ByteArrayOutputStream classes let programmers use stream methods to read and write arrays of bytes. The java.io.CharArrayReader and java.io.CharArrayWriter classes allow programmers to use Reader and Writer methods to read and write arrays of chars. Since char arrays are purely internal to Java and thus composed of true Unicode characters, this is one of the few uses of readers and writers where you don't need to concern yourself with conversions between different encodings. If you want to read arrays of text encoded in some non-Unicode encoding, you should chain a ByteArrayInputStream to an InputStreamReader instead. Similarly, to write text into a byte array in a non-Unicode encoding, just chain an OutputStreamWriter to a ByteArrayOutputStream.

The CharArrayWriter maintains an internal array of chars into which successive characters are written. The array is expanded as needed. This array is stored in a protected field called buf:

protected char[] buf

For efficiency, the array generally contains more components than characters. The number of characters actually written is stored in a protected int field called count:

protected int count

The value of the count field is always less than or equal to buf.length.

The no-argument constructor creates a CharArrayWriter object with a 32-character buffer. This is on the small side, so you can expand it with the second constructor:

public CharArrayWriter()
public CharArrayWriter(int initialSize)

The write() methods write their characters into the buffer. If there's insufficient space in buf to hold the characters, its size is doubled.

public void write(int c)
public void write(char[] text, int offset, int length)
public void write(String s, int offset, int length)

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

String Readers and Writers

Content preview·Buy reprint rights for this chapter

The java.io.StringReader and java.io.StringWriter classes allow programmers to use Reader and Writer methods to read and write strings. Like char arrays, Java strings are also composed of pure Unicode characters. Therefore, they're good sources of data for readers and good targets for writers. This is the other common case where readers and writers don't need to convert between different encodings.

This class would more accurately be called StringBufferWriter, but StringWriter is more poetic. A StringWriter maintains an internal java.lang.StringBuffer object to which written characters are appended. This buffer can easily be converted to a string as necessary.

public class StringWriter extends Writer

There is a single public constructor:

public StringWriter()

There is also a constructor that allows you to specify the initial size of the internal string buffer. This isn't too important, because string buffers (and, by extension, string writers) are expanded as necessary. Still, if you can estimate the size of the string in advance, it's marginally more efficient to select a size big enough to hold all characters that will be written. The constructor is protected in Java 1.1 and public in Java 2:

protected StringWriter(int initialSize)
public StringWriter(int initialSize)  // Java 2

The StringWriter class has the usual collection of write() methods, all of which just append their data to the StringBuffer:

public void write(int c)
public void write(char[] text, int offset, int length)
public void write(String s) 
public void write(String s, int offset, int length)

There are flush() and close() methods, but both have empty method bodies, as string writers operate completely internal to Java and do not require flushing or closing:

public void flush()
public void close()

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Reading and Writing Files

Content preview·Buy reprint rights for this chapter

You've already learned how to chain an OutputStreamWriter to a FileOutputStream and an InputStreamReader to a FileInputStream. Although this isn't hard, Java provides two simple utility classes that take care of the details, java.io.FileWriter and java.io.FileReader.

The FileWriter class is a subclass of OutputStreamWriter that writes text files using the platform's default character encoding and buffer size. If you need to change these values, construct an OutputStreamWriter on a FileOutputStream instead.

public class FileWriter extends OutputStreamWriter

This class has four constructors:

public FileWriter(String fileName) throws IOException
public FileWriter(String fileName, boolean append) throws IOException
public FileWriter(File file) throws IOException
public FileWriter(FileDescriptor fd)

The first constructor opens a file and positions the file pointer at the beginning of the file. Any text in the file is overwritten. For example:

FileWriter fw = new FileWriter("36.html");

The second constructor allows you to specify that new text is appended to the existing contents of the file rather than overwriting them by setting the second argument to true. For example:

FileWriter fw = new FileWriter("36.html", true);

The third and fourth constructors use a File object and a FileDescriptor, respectively, instead of a filename to identify the file to be written to. Any pre-existing contents in a file so opened are overwritten.

No methods other than the constructors are declared in this class. You use the standard Writer methods like write(), flush(), and close() to write the text in the file.

The FileReader class is a subclass of InputStreamReader that reads text files using the platform's default character encoding. If you need to change the encoding, construct an

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Buffered Readers and Writers

Content preview·Buy reprint rights for this chapter

Input and output can be time-consuming operations. It's often quicker to read or write text in large chunks rather than in many separate smaller pieces, even when you only process the text in the smaller pieces. The java.io.BufferedReader and java.io.BufferedWriter classes provide internal character buffers. Text that's written to a buffered writer is stored in the internal buffer and only written to the underlying writer when the buffer fills up or is flushed. Likewise, reading text from a buffered reader may cause more characters to be read than were requested; the extra characters are stored in an internal buffer. Future reads first access characters from the internal buffer and only access the underlying reader when the buffer is emptied.

The java.io.BufferedWriter class is a subclass of java.io.Writer that you chain to another Writer class to buffer characters. This allows more efficient writing of text.

public class BufferedWriter extends Writer

There are two constructors. One has a default buffer size (8192 characters); the other lets you specify the buffer size:

public BufferedWriter(Writer out)
public BufferedWriter(Writer out, int size)

Each time you write to an unbuffered writer, there's a matching write to the underlying output stream. Therefore, it's a good idea to wrap a BufferedWriter around each writer whose write() operations are expensive, such as a FileWriter. For example:

BufferedWriter bw = new BufferedWriter(new FileWriter("37.html"));

BufferedWriter overrides most of its superclass's methods, including:

public void write(int c) throws IOException
public void write(char[] text,int offset, int length) throws IOException
public void write(String s, int offset, int length) throws IOException
public void flush() throws IOException
public void close() throws IOException

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Print Writers

Content preview·Buy reprint rights for this chapter

The java.io.PrintWriter class is a subclass of java.io.Writer that contains the familiar print() and println() methods from System.out and other instances of PrintStream. It's deliberately similar to the java.io.PrintStream class. In Java 1.0 PrintStream was used for text-oriented output, but it didn't handle multiple-byte character sets particularly well (or really at all). In Java 1.1 and later, streams are only for byte-oriented and numeric output; writers should be used when you want to output text.

The main difference between PrintStream and PrintWriter is that PrintWriter handles multiple-byte and other non-ISO Latin-1 character sets properly. The other, more minor difference is that automatic flushing is performed only when println() is invoked, not every time a newline character is seen. Sun would probably like to deprecate PrintStream and use PrintWriter instead, but that would break too much existing code. (In fact, Sun did deprecate the PrintStream() constructors in 1.1, but they undeprecated them in Java 2.)

There are four constructors in this class:

public PrintWriter(Writer out)
public PrintWriter(Writer out, boolean autoFlush)
public PrintWriter(OutputStream out)
public PrintWriter(OutputStream out, boolean autoFlush)

The PrintWriter can send text either to an output stream or to another writer. If autoFlush is set to true, the PrintWriter is flushed every time println() is invoked.

The PrintWriter class implements the abstract write() method from java.io.Writer and overrides five other methods:

public void write(int c)
public void write(char[] text)
public void write(String s)
public void write(String s, int offset, int length)
public void flush()
public void close()

These methods are used almost identically to their equivalents in any other Writer class. The one difference is that none of them throw IOExceptions; in fact, no method in the PrintWriter

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Piped Readers and Writers

Content preview·Buy reprint rights for this chapter

Piped readers and writers do for character streams what piped input and output streams do for byte streams: they allow two threads to communicate. Character output from one thread becomes character input for the other thread:

public class PipedWriter extends Writer
public class PipedReader extends Reader

The PipedWriter class has two constructors. The first constructs an unconnected PipedWriter object. The second constructs one that's connected to the PipedReader object sink:

public PipedWriter()
public PipedWriter(PipedReader sink) throws IOException

The PipedReader class also has two constructors. Again, the first constructor creates an unconnected PipedReader object. The second constructs one that's connected to the PipedWriter object source:

public PipedReader()
public PipedReader(PipedWriter source) throws IOException

Piped readers and writers are normally created in pairs. The piped writer becomes the underlying source for the piped reader. This is one of the few cases where a reader does not have an underlying input stream. For example:

PipedWriter pw = new PipedWriter();
PipedReader pr = new PipedReader(pw);

This simple example is a little deceptive, because these lines of code will normally be in different methods and perhaps even different classes. Some mechanism must be established to pass a reference to the PipedWriter into the thread that handles the PipedReader, or you can create them in the same thread, then pass a reference to the connected stream into a separate thread.

Alternately, you can start with a PipedReader and then wrap it with a PipedWriter:

PipedReader pr = new PipedReader();
PipedWriter pw = new PipedWriter(pr);

Or you can create them both unconnected, then use one or the other's connect() method to link them:

public void connect(PipedReader sink) throws IOException
public void connect(PipedWriter source) throws IOException

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Filtered Readers and Writers

Content preview·Buy reprint rights for this chapter

The java.io.FilterReader and java.io.FilterWriter classes are abstract classes that read characters and filter them in some way before passing the text along. You can imagine a FilterReader that converts all characters to uppercase.

public abstract class FilterReader extends Reader 
public abstract class FilterWriter extends Writer

Although FilterReader and FilterWriter are modeled after java.io.FilterInputStream and java.io.FilterOutputStream, they are much less commonly used than those classes. There are no concrete subclasses of FilterWriter in the java packages and only one concrete subclass of FilterReader (PushbackReader discussed later). These classes exist so you can write your own filters.

FilterReader has a single constructor, which is protected:

protected FilterReader(Reader in)

The in argument is the Reader to which this filter is chained. This reference is stored in a protected field called in from which text for this filter is read and is null after the filter has been closed.

protected Reader in

Since FilterReader is an abstract class, only subclasses may be instantiated. Therefore, it doesn't matter that the constructor is protected, since it may only be invoked from subclass constructors.

FilterReader provides the usual collection of read(), skip(), ready(), markSupported(), mark(), reset(), and close() methods:

public int read() throws IOException
public int read(char[] text, int offset, int length) throws IOException
public long skip(long n) throws IOException
public boolean ready() throws IOException
public boolean markSupported()
public void mark(int readAheadLimit) throws IOException
public void reset() throws IOException
public void close() throws IOException

These all simply invoke the equivalent method in the

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

File Viewer Finis

Content preview·Buy reprint rights for this chapter

As a final example of working with readers and writers, we return for the last time to the FileDumper application last seen in Chapter 13. At that point, we had a GUI program that allowed any file to be opened and interpreted in one of several formats, including ASCII, decimal, hexadecimal, short, regular, and long integers in both big- and little-endian formats, floating point, and double-precision floating point.

In this section we expand the program to read many different text formats besides ASCII. The user interface must be adjusted to allow a binary choice of whether the file contains text or numeric data. If they choose text, you'll need to use a reader to read the file instead of an input stream. You'll also need to provide some means for the user to pick the encoding they want text read in (e.g., MacRoman, ISO Latin-1, Unicode, etc). Since there are several dozen text encodings, the best choice is a list box. All of this can be integrated into the mode panel. Figure 15.1 shows the revised ModePanel2 class. The code is given in Example 15.9. Two new public methods are added, isText() and getEncoding(). The rest of the changes are fairly minor ones to set up the GUI.

Figure 15.1: A mode panel with a list box for encodings

Example 15.9. ModePanel2

import java.awt.*;
import javax.swing.*;
public class ModePanel2 extends JPanel {
  JCheckBox bigEndian = new JCheckBox("Big Endian", true);
  JCheckBox deflated  = new JCheckBox("Deflated", false);
  JCheckBox gzipped   = new JCheckBox("GZipped", false);
  
  ButtonGroup dataTypes     = new ButtonGroup();
  JRadioButton asciiRadio   = new JRadioButton("Text");
  JRadioButton decimalRadio = new JRadioButton("Decimal");
  JRadioButton hexRadio     = new JRadioButton("Hexadecimal");
  JRadioButton shortRadio   = new JRadioButton("Short");
  JRadioButton intRadio     = new JRadioButton("Int");
  JRadioButton longRadio    = new JRadioButton("Long");
  JRadioButton floatRadio   = new JRadioButton("Float");
  JRadioButton doubleRadio  = new JRadioButton("Double");
  
  JTextField password = new JTextField();
  
  final static String[] encodings = {"8859_1", "8859_2", "8859_3", "8859_4", 
   "8859_5", "8859_6", "8859_7", "8859_8", "8859_9", "Big5", "CNS11643", 
   "Cp037", "Cp273", "Cp277", "Cp278", "Cp280", "Cp284", "Cp285", "Cp297", 
   "Cp420", "Cp424", "Cp437", "Cp500", "Cp737", "Cp775", "Cp850", "Cp852", 
   "Cp855", "Cp856", "Cp857", "Cp860", "Cp861", "Cp862", "Cp863", "Cp864", 
   "Cp865", "Cp866", "Cp868", "Cp869", "Cp870", "Cp871", "Cp874", "Cp875", 
   "Cp918", "Cp921", "Cp922", "Cp1006", "Cp1025", "Cp1026", "Cp1046", 
   "Cp1097", "Cp1098", "Cp1112", "Cp1122", "Cp1123", "Cp1124", "Cp1250", 
   "Cp1251", "Cp1252", "Cp1253", "Cp1254", "Cp1255", "Cp1256", "Cp1257", 
   "Cp1258", "EUCJIS", "GB2312", "JIS", "JIS0208", "KSC5601", "MacArabic", 
   "MacCentralEurope", "MacCroatian", "MacCyrillic", "MacDingbat", "MacGreek", 
   "MacHebrew", "MacIceland", "MacRoman", "MacRomania", "MacSymbol", "MacThai", 
   "MacTurkish", "MacUkraine", "SJIS", "UTF8", "Unicode" };
  
  JList theEncoding = new JList(encodings);
  
  public ModePanel2() {
  
    this.setLayout(new GridLayout(1, 2));
    
    JPanel left = new JPanel();
    JScrollPane right = new JScrollPane(theEncoding);
    left.setLayout(new GridLayout(13, 1));
    left.add(bigEndian);
    left.add(deflated);
    left.add(gzipped);
    
    left.add(asciiRadio);
    asciiRadio.setSelected(true);
    left.add(decimalRadio);
    left.add(hexRadio);
    left.add(shortRadio);
    left.add(intRadio);
    left.add(longRadio);
    left.add(floatRadio);
    left.add(doubleRadio);
    
    dataTypes.add(asciiRadio);
    dataTypes.add(decimalRadio);
    dataTypes.add(hexRadio);
    dataTypes.add(shortRadio);
    dataTypes.add(intRadio);
    dataTypes.add(longRadio);
    dataTypes.add(floatRadio);
    dataTypes.add(doubleRadio);
    
    left.add(password);
    this.add(left);
    this.add(right);
  }
  public boolean isBigEndian() {
    return bigEndian.isSelected();
  }
  
  public boolean isDeflated() {
    return deflated.isSelected();
  }
  
  public boolean isGZipped() {
    return gzipped.isSelected();
  }
  
  public boolean isText() {
    if (this.getMode() == FileDumper6.ASC) return true;
    return false;
  }
  
  public String getEncoding() {
    return (String) theEncoding.getSelectedValue();
  }
  
  public int getMode() {
    if (asciiRadio.isSelected()) return FileDumper6.ASC;
    else if (decimalRadio.isSelected()) return FileDumper6.DEC;
    else if (hexRadio.isSelected()) return FileDumper6.HEX;
    else if (shortRadio.isSelected()) return FileDumper6.SHORT;
    else if (intRadio.isSelected()) return FileDumper6.INT;
    else if (longRadio.isSelected()) return FileDumper6.LONG;
    else if (floatRadio.isSelected()) return FileDumper6.FLOAT;
    else if (doubleRadio.isSelected()) return FileDumper6.DOUBLE;
    else return FileDumper6.ASC;
  }
  
  public String getPassword() {
    return password.getText();
  }
  
  // A simple test method.
  public static void main(String[] args) {
  
    JFrame jf = new JFrame("Test Mode Panel");
    ModePanel2 mp2 = new ModePanel2();
    jf.getContentPane().add(mp2);
    jf.pack();
    jf.show();
    System.out.println("done");
  }
}

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Chapter 16: Formatted I/O with java.text

Content preview·Buy reprint rights for this chapter

One of the most obvious differences between Java and C is that Java has no equivalent of printf() or scanf() . Part of the reason is that Java doesn't support the variable length argument lists on which these functions depend. However, the real reason Java doesn't have equivalents to C's formatted I/O routines is a difference in philosophy. C's printf() and the like combine number formatting with I/O in an inflexible manner. Java separates number formatting and I/O into separate packages and by so doing produces a much more general and powerful system.

More than one programmer has attempted to recreate printf() and scanf() in Java. This task is difficult, since those functions are designed around variable length argument lists, which Java does not support. However, overloading the + signs for string concatenation is easily as effective, probably more so, since it doesn't share the problems of mismatched argument lists. For example, which is clearer to you? This:

printf("%s worked %d hours at $%d per/hour for a total of %d dollars.\n", 
 hours, salary, hours*salary);

or this:

System.out.println(employee + " worked " + hours + " hours at $" + salary 
 + "per/hour for a total of $%d.");

I'd argue that the second is clearer. Among other advantages, it avoids problems with mismatched format strings and argument lists. (Did you notice that an argument is missing from the previous printf() statement?) On the flip side, the format string approach is a little less prone to missing spaces. (Did you notice that the println() statement would print pay scales as "$5.35per/hour" rather than "$5.35 per/hour"?) However, this is only a cosmetic problem and is easily fixed. A mismatched argument list in a printf() or scanf() statement may crash the computer, especially if pointers are involved.

The real advantage of the printf()/scanf() family of functions is not the format string. It's number formatting:

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

The Old Way

Content preview·Buy reprint rights for this chapter

Traditional computer languages have combined input of text with the parsing of numeric strings. For example, to read a decimal number into the variable x, programmers are accustomed to writing C code like this:

scanf("%d", &x);

In C++, that line would become:

cin >> x;

In Pascal:

READLN (X);

In Fortran:

READ 2, X
   2 FORMAT (F5.1)

Similarly, formatting numeric strings for output tends to be mixed up with writing the string to the screen. For instance, consider the simple task of writing the double variable salary with two decimal digits of precision. In C, you'd write this:

printf("%.2d", salary);

In C++:

cout.precision(2);
cout << salary;

In Fortran:

PRINT 20, SALARY
   20 FORMAT(F10.2)

This conflation of basic input and output with number formatting is so ingrained in most programmers today that we rarely stop to think whether it actually makes sense. What, precisely, does the formatting of numbers as text strings have to do with input and output? It's certainly true that you often need to format numbers to print numbers on the console, but you also need to format numbers to write data in files, to include numbers in text fields and text areas, and to send data across the network. What makes the console so special that it has to have a group of number-formatting routines all to itself? In C, the printf() and scanf() functions are supplemented by fprintf() and fscanf() for formatted I/O to files and by sprintf() and sscanf() for formatted I/O to strings. Perhaps the conflation of I/O with number formatting is really a relic of a time when command-line interfaces were a lot more important than they are today, and it's simply that nobody's thought to challenge this assumption, at least until Java. When you think about it, there's no fundamental connection between converting a binary number like 11010100110110100100011101011011 to a text string like " -7.500E+12" and writing that string onto an output stream. These are two different operations, and in Java they're handled by separate classes. Input and output are handled by all the streams and readers and writers I've been discussing, while number formatting is handled by a few

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Choosing a Locale

Content preview·Buy reprint rights for this chapter

Number formats are dependent on the locale ; that is, the country/language/ culture group of the local operating system. The number formats most English-speaking Americans are accustomed to use are a period as a decimal point, a comma to separate every three orders of magnitude, a dollar sign for currency, and numbers in base 10 that read from left to right. In this locale, Bill Gates's personal fortune, in Microsoft stock alone as of January 12, 1998, is represented as $74,741,086,650.

However, in Egypt this number would be written as:

The primary difference here is that Egyptians use a different set of glyphs for the digits through 9. For example, in Egypt zero is a

and the

glyph means 6. There are other differences in how Arabic and English treat numbers, and these vary from country to country. In most of the rest of North Africa, this number would be $74,741,086,650 as it is in the U.S. These are just two different scripts; there are several dozen more to go!

Java encapsulates many of the common differences between language/script/culture/country combinations in a loosely defined group called a locale. There's really no better word for it. You can't just rely on language or country or culture alone. Many languages are shared between countries (English is only the most obvious example) but with subtle differences between how they are used in different places: Do commas and periods belong inside or outside of quotation marks? Is it color or colour? Many countries have no clearly dominant tongue: Is Canada an English- or a French-speaking nation? Switzerland has four official languages. Almost all countries have significant minority populations with their own languages. The New York City public school system has to hire teachers fluent in over 100 different languages.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Number Formats

Content preview·Buy reprint rights for this chapter

To print a formatted number in Java, perform these two steps:

Format the number as a string.
Print the string.

Simple, right? Of course, this is a little like the old recipe for rabbit stew:

Catch a rabbit.
Boil rabbit in pot with vegetables and spices.

Obviously, step 1 is the tricky part. Fortunately, formatting numbers as strings is somewhat easier than catching a rabbit. The key class that formats numbers as strings is java.text.NumberFormat. This is an abstract subclass of java.text.Format. Concrete subclasses such as java.text.DecimalFormat implement formatting policies for particular kinds of numbers.

public abstract class NumberFormat extends Format implements Cloneable

The static NumberFormat.getAvailableLocales() method returns a list of all locales installed that provide number formats. (There may be a few locales installed that only provide date or text formats, not number formats.)

public static Locale[] getAvailableLocales()

You can request a NumberFormat object for the default locale of the host computer or for one of the specified locales in Table 16.1 using the static NumberFormat.getInstance() method. For example:

NumberFormat myFormat = NumberFormat.getInstance();
NumberFormat canadaFormat = NumberFormat.getInstance(Locale.CANADA);
Locale turkey = new Locale("tr", "carview.php?tsp=");
NumberFormat turkishFormat = NumberFormat.getInstance(turkey);
Locale swissItalian = new Locale("it", "CH");
NumberFormat swissItalianFormat = NumberFormat.getInstance(swissItalian);

The number format returned by NumberFormat.getInstance() should do a reasonable job of formatting most numbers. However, there's at least a theoretical possibility that the instance returned will format numbers as currencies or percentages. Therefore, it wouldn't hurt to use

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Specifying Width with FieldPosition

Content preview·Buy reprint rights for this chapter

The Java core API does not include any classes that pad numbers with spaces like the traditional I/O APIs in Fortran, C, and other languages. Part of the reason is that it's no longer a valid assumption that all output is written in a monospaced font on a VT-100 terminal. Therefore, spaces are insufficient to line up numbers in tables. Ideally, if you're writing tabular data in a GUI, you can use a real table component like JTable in the Java foundation classes. If that's not possible, you can measure the width of the string using a FontMetrics object and offset the position at which you draw the string. And if you are outputting to a terminal or a monospaced font, then you can manually prefix the string with the right number of spaces.

The java.text.FieldPosition class separates strings into their component parts, called fields. (This is another unfortunate example of an overloaded term. These fields have nothing to do with the fields of a Java class.) For example, a typical date string can be separated into 18 fields including era, year, month, day, date, hour, minute, second, and so on. Of course, not all of these may be present in any given string. For example, 1999 CE includes only a year and an era field. The different fields that can be parsed are represented as public final static int fields (there's that annoying overloading again) in the corresponding format class. The java.text.DateFormat class defines these kinds of fields as mnemonic constants:

public static final int ERA_FIELD
public static final int YEAR_FIELD
public static final int MONTH_FIELD
public static final int DATE_FIELD
public static final int HOUR_OF_DAY1_FIELD
public static final int HOUR_OF_DAY0_FIELD
public static final int MINUTE_FIELD
public static final int SECOND_FIELD
public static final int MILLISECOND_FIELD
public static final int DAY_OF_WEEK_FIELD
public static final int DAY_OF_YEAR_FIELD
public static final int DAY_OF_WEEK_IN_MONTH_FIELD
public static final int WEEK_OF_YEAR_FIELD
public static final int WEEK_OF_MONTH_FIELD
public static final int AM_PM_FIELD
public static final int HOUR1_FIELD
public static final int HOUR0_FIELD
public static final int TIMEZONE_FIELD

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Parsing Input

Content preview·Buy reprint rights for this chapter

Number formats also handle input. When used for input, a number format converts a string in the appropriate format to a binary number, achieving more flexible conversions than you can get with the methods in the type wrapper classes (like Integer.parseInt()). For instance, a percent format parse() method can interpret 57% as 0.57 instead of 57. A currency format can read (12.45) as -12.45.

There are three parse() methods in the NumberFormat class. All do roughly the same thing:

public Number parse(String text) throws ParseException
public abstract Number parse(String text, ParsePosition parsePosition)
public final Object parseObject(String source, ParsePosition parsePosition)

The first parse() method attempts to parse a number from the given text. If the text represents an integer, it's returned as an instance of java.lang.Long. Otherwise, it's returned as an instance of java.lang.Double. If a string contains multiple numbers, only the first one is returned. For instance, if you parse "32 meters" you'll get the number 32 back. Java throws away everything after the number finishes. If the text cannot be interpreted as a number in the given format, a ParseException is thrown. The second parse() method specifies where in the text parsing starts. The position is given by a ParsePosition object. This is a little more complicated than using a simple int but does have the advantage of allowing one to read successive numbers from the same string. The third parse() method merely invokes the second. It's declared to return Object rather than Number so that it can override the method of the same signature in java.text.Format. If you know you're working with a NumberFormat rather than a DateFormat or some other nonnumeric format, there's no reason to use it.

The java.text.ParsePosition class has one constructor and two public methods:

public ParsePosition(int index)
public int getIndex()
public void setIndex(int index)

This whole class is just a wrapper around an

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Decimal Formats

Content preview·Buy reprint rights for this chapter

The java.text package contains a single concrete subclass of NumberFormat, DecimalFormat. The DecimalFormat class provides even more control over how floating point numbers are formatted:

public class DecimalFormat extends NumberFormat

Most number formats are in fact decimal formats. Generally, you can simply cast any number format to a decimal format, like this:

DecimalFormat df = (DecimalFormat) NumberFormat.getCurrencyInstance();

At least in theory, you might encounter a nondecimal format. Therefore, you should use instanceof to test whether or not you've got a DecimalFormat:

NumberFormat nf = NumberFormat.getCurrencyInstance();
if (nf instanceof DecimalFormat) {
  DecimalFormat df = (DecimalFormat) NumberFormat.getCurrencyInstance();
  //...
}

Alternately, you can place the cast and associated operations in a try/catch block that catches ClassCastExceptions:

try {
  DecimalFormat df = (DecimalFormat) NumberFormat.getCurrencyInstance();
  //...
}
catch (ClassCastException e) {System.err.println(e);}

Every DecimalFormat object has a pattern that describes how numbers are formatted and a list of symbols that describes with which characters they're formatted. This allows the single DecimalFormat class to be parameterized so that it can handle many different formats for different kinds of numbers in many locales. The pattern is given as an ASCII string. The symbols are provided by a DecimalFormatSymbols object. These are accessed and manipulated through the following six methods:

public DecimalFormatSymbols getDecimalFormatSymbols()
public void setDecimalFormatSymbols(DecimalFormatSymbols newSymbols)
public String toPattern()
public String toLocalizedPattern()
public void applyPattern(String pattern)
public void applyLocalizedPattern(String pattern)

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

An Exponential Number Format

Content preview·Buy reprint rights for this chapter

The DecimalFormat class is useful for medium-sized numbers, but it doesn't work very well for exceptionally large numbers like Avogadro's number (6,022,094,300,000,000,000,000,000) or exceptionally small numbers like Planck's constant (0.00000000000000000000000000625 erg-seconds). These are traditionally written in scientific notation as a decimal number times 10 to a certain power, positive or negative; for example, 6.0220943 × 10²³ and 6.25 × 10^-27 erg-seconds. In most programming languages, including Java, an E followed by either a + or a - is used to represent "× 10 to the power"; for example, 6.0220943E+23 or 6.25E-27 erg-seconds.

The java.text package does not provide support for formatting numbers in scientific notation, so as the final example of this chapter, I'll develop a new subclass of NumberFormat that does use scientific notation. Technically, scientific notation requires exactly one nonzero digit before the decimal point, but I'll be a little more general than that, providing for numbers like 13.2E-8 as well.

The NumberFormat class is abstract. It declares three abstract methods any subclass must implement:

public abstract StringBuffer format(double number, StringBuffer toAppendTo, 
                                    FieldPosition pos)
public abstract StringBuffer format(long number, StringBuffer toAppendTo, 
                                    FieldPosition pos)
public abstract Number parse(String text, ParsePosition parsePosition)

The two format methods must format a long and a double respectively, update the FieldPosition object with the locations of the different fields, append the formatted string to the string buffer toAppendTo, and return that same string buffer. The parse() method must read a number in scientific notation, convert it to a java.lang.Number (that is, a java.lang.Long or a java.lang.Double) and return that.

The concrete formatting methods in NumberFormat all invoke these methods, so they may be kept as is rather than being overridden. However, it would not hurt to override

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Chapter 17: The Java Communications API

Content preview·Buy reprint rights for this chapter

This chapter covers the Java Communications API 2.0, a standard extension available in Java 1.1 and later that allows Java applications (but not applets) to send and receive data to and from the serial and parallel ports of the host computer. The Java Communications API allows Java programs to communicate with essentially any device connected to a serial or parallel port, like a printer, a scanner, a modem, a tape backup unit, and so on. The Comm API operates at a very low level. It only understands how to send and receive bytes to these ports. It does not understand anything about what these bytes mean. Doing useful work generally requires not only understanding the Java Communications API (which is actually quite simple) but also the protocols spoken by the devices connected to the ports (which can be almost arbitrarily complex).

Because the Java Communications API is a standard extension, it is not installed by default with the JDK. You have to download it from https://java.sun.com/products/javacomm/index.html and install it separately.

This chapter is based on the first beta of the Java Communications API. It is almost certain that some parts of this chapter will become inaccurate by the time you read this. Indeed, throughout the process of writing this chapter, I identified a number of bugs and inconsistencies that I forwarded to Sun. They even fixed a few in between early access 3 and beta 1. If you have trouble with anything you see here, cross-check it with the most up-to-date documentation from Sun. I'll also try to post minor corrections on my web site at https://metalab.unc.edu/javafaq/books/javaio/.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

The Architecture of the Java Communications API

Content preview·Buy reprint rights for this chapter

The Java Communications API contains a single package, javax.comm, which holds a baker's dozen of classes, exceptions, and interfaces. Because the Comm API is a standard extension, the javax prefix is used instead of the java prefix. The Java Comm API also includes a DLL, or shared library, containing the native code to communicate with the ports, and a few driver classes in the com.sun.comm package that mostly handle the vagaries of Unix or Wintel ports. Other vendors may need to muck around with these if they're porting the Comm API to another platform (e.g., the Mac or OS/2), but as a user of the API, you'll only concern yourself with the documented classes in javax.comm.

javax.comm is divided into high-level and low-level classes. High-level classes are responsible for controlling access to and ownership of the communication ports and performing basic I/O. The CommPortIdentifier class lets you find and open the ports available on a system. The CommPort class provides input and output streams connected to the ports. Low-level classes—javax.comm.SerialPort and javax.comm.ParallelPort, for example—manage interaction with particular kinds of ports and help you read and write the control wires on the ports. They also provide event-based notification of changes to the state of the port.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Identifying Ports

Content preview·Buy reprint rights for this chapter

The javax.comm.CommPortIdentifier class is the control room for the ports on a system. It has methods that list the available ports, figure out which program owns them, take control of a port, and open a port so you can perform I/O with it. The actual I/O, stream-based or otherwise, is performed through an instance of javax.comm.CommPort that represents the port in question. The purpose of CommPortIdentifier is to mediate between different programs, objects, or threads that want to use the same port.

Before you can use a port, you need a port identifier for the port. Because the possible port identifiers are closely tied to the physical ports on the system, you cannot simply construct an arbitrary CommPortIdentifier object. (For instance, Macs have no parallel ports, and iMacs don't have serial or parallel ports.) Instead, you use one of several static methods in javax.comm.CommPortIdentifier that use native methods and nonpublic constructors to find and create the right port. These include:

public static Enumeration getPortIdentifiers()
public static CommPortIdentifier getPortIdentifier(String portName) 
              throws NoSuchPortException
public static CommPortIdentifier getPortIdentifier(CommPort port) 
              throws NoSuchPortException

The most general of these is CommPortIdentifier.getPortIdentifiers(), which returns a java.util.Enumeration containing one CommPortIdentifier for each of the ports on the system. Example 17.1 uses this method to list all the ports on the system.

Example 17.1. PortLister

import javax.comm.*;
import java.util.*;
public class PortLister {
  public static void main(String[] args) {
    Enumeration e = CommPortIdentifier.getPortIdentifiers();
    while (e.hasMoreElements()) {
      System.out.println((CommPortIdentifier) e.nextElement());
    }
  }
}

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Communicating with a Device on a Port

Content preview·Buy reprint rights for this chapter

The open() method of the CommPortIdentifier class returns a CommPort object. The javax.comm.CommPort class has methods for getting input and output streams from a port and for closing the port. There are also a number of driver-dependent methods for adjusting the properties of the port.

There are five basic steps to communicating with a port:

Open the port using the open() method of CommPortIdentifier. If the port is available, this returns a CommPort object. Otherwise, a PortInUseException is thrown.
Get the port's output stream using the getOutputStream() method of CommPort.
Get the port's input stream using the getInputStream() method of CommPort.
Read and write data onto those streams as desired.
Close the port using the close() method of CommPort.

Steps 2 through 4 are new. However, they're not particularly complex. Once the connection has been established, you simply use the normal methods of any input or output stream to read and write data. The getInputStream() and getOutputStream() methods of CommPort are similar to the methods of the same name in the java.net.URL class. The primary difference is that with Comm ports, you're completely responsible for understanding and handling the data that's sent to you. There are no content or protocol handlers that perform any manipulation of the data. If the device attached to the port requires a complicated protocol—for example, a fax modem—then you'll have to handle the protocol manually.

public abstract InputStream getInputStream() throws IOException
public abstract OutputStream getOutputStream() throws IOException

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Serial Ports

Content preview·Buy reprint rights for this chapter

The javax.comm.SerialPort class is an abstract subclass of CommPort that provides various methods and constants useful for working with RS-232 serial ports and devices. The main purposes of the class are to allow the programmer to inspect, adjust, and monitor changes in the settings of the serial port. Simple input and output is accomplished with the methods of the superclass, CommPort. SerialPort has a public constructor, but that shouldn't be used by applications. Instead, you should call the open() method of a CommPortIdentifier that maps to the port you want to communicate with, then cast the result to SerialPort. For example:

CommPortIdentifier cpi = CommPortIdentifier.getPortIdentifier("COM2");
  if (cpi.getType() == CommPortIdentifier.PORT_SERIAL) {
    try {
      SerialPort modem = (SerialPort) cpi.open();
    }
    catch (PortInUseException e) {}
  }

Methods in the SerialPort class fall into roughly three categories:

Methods that return the state of the port
Methods that set the state of the port
Methods that listen for the changes in the state of the port

Data cannot simply be sent over a wire; you need to deal with many issues, like timing, noise, and the fundamentally analog nature of electronics. Therefore, there's a host of layered protocols so that the receiving end can recognize when data is being sent, whether the data was received correctly, and more.

Serial communication uses some very basic, simple protocols. Sending between 3 and 25 volts across the serial cable for a number of nanoseconds inversely proportional to the baud rate of the connection is a one bit. Sending between -3 and -25 volts for the same amount of time is a bit. These bits are grouped into serial data units, SDUs for short. Common SDU lengths are 8 (used for binary data) and 7 (used for basic ASCII text). Most modern devices use eight data bits per SDU. However, some older devices use seven, six, or even five data bits per SDU. Once an SDU is begun, the rest of the SDU follows in close order. However, there may be gaps of indeterminate length between SDUs.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Parallel Ports

Content preview·Buy reprint rights for this chapter

Parallel ports are most common on PCs. Sun SparcStations from the Sparc V on also have them. However, Macs do not have them, nor do many non-x86 workstations. Parallel ports are sometimes called printer ports, because their original purpose was to support printers. The names of the parallel ports—"LPT1," "LPT2," etc.—stand for "Line PrinTer," reflecting this usage. Nowadays, parallel ports are also used for Zip drives, tape drives, and various other devices. However, parallel ports are still largely limited by their original goal of providing simple printing. A parallel port sends data eight bits at a time on eight wires. These bits are sent at the same time in parallel, hence the name. The original parallel ports only allowed data to flow one way, from the PC to the printer. The printer could only respond by sending a few standard messages on other wires. Each return wire corresponded to a particular message, like "Out of paper" or "Printer busy." Modern parallel ports allow full, bidirectional communication.

The javax.comm.ParallelPort class is a concrete subclass of javax.comm.CommPort that provides various methods and constants useful for working with parallel ports and devices. The main purposes of the class are to allow the programmer to inspect, adjust, and monitor changes in the settings of the parallel port. Simple input and output are accomplished with the methods of the superclass, CommPort. ParallelPort has a single public constructor, but that shouldn't be used by applications. Instead, you should simply call the open() method of a CommPortIdentifier that maps to the port you want to communicate with, then cast it to ParallelPort:

CommPortIdentifier cpi = CommPortIdentifier.getPortIdentifier("LPT2");
  if (cpi.getType() == CommPortIdentifier.PORT_PARALLEL) {
    try {
      ParallelPort printer = (ParallelPort) cpi.open ();
    }
    catch (PortInUseException e) {}
  }

Methods in the ParallelPort class fall into roughly four categories:

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Appendix A: Additional Resources

Content preview·Buy reprint rights for this chapter

When I began work on this book, I thought it would take me about 200 pages and about two months. Now, more than a year and 500 pages later, I can see that I/O is a far larger, more important, and more encompassing topic than I originally guessed. Many chapters could easily lead to books of their own. Indeed, several (Chapter 5, and Chapter 10) already are other books.

Since I can't possibly say everything there is to say about all these fascinating topics I've touched on in one page or another in this tome, I'd like to point you to several books, mailing lists, and web sites that explore some of the issues raised in this book in greater detail. Some of these are I/O-specific; some are mostly tangential. However, they're all interesting and worthy of further study and thought.

Section A.1: Digital Think

Section A.2: Design Patterns

Section A.3: The java.io Package

Section A.4: Network Programming

Section A.5: Data Compression

Section A.6: Encryption and Related Technology

Section A.7: Object Serialization

Section A.8: International Character Sets and Unicode

Section A.9: Java Communications API

Section A.10: Updates and Breaking News

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Digital Think

Content preview·Buy reprint rights for this chapter

Digital Think (https://www.digitalthink.com/) offers web-based training courses for programmers, developers, system administrators, and end users in C, C++, Java, Windows, web development, object-oriented programming, and more. This book grew out of two web-based courses I wrote for Digital Think, Java Streams (https://www.digitalthink.com/catalog/cs/cs108/) and Java Readers and Writers (https://www.digitalthink.com/catalog/cs/cs208/). Although this book is far more comprehensive than those two courses, they're a good way to get started with this material, especially if you think you need a personal helping hand or a leg up. Each course includes graded exercises, a hands-on course project, and tutors to answer your questions and assist you with the difficult parts.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Design Patterns

Content preview·Buy reprint rights for this chapter

At the time I was writing the first draft of this book, I also happened to be learning about design patterns. Gradually, it became obvious that much of the AWT was written by programmers who had patterns on the brain. The java.awt.Toolkit class is a textbook example of the "abstract factory" pattern. The URL class's openConnection() method is a factory method. The Reader and Writer classes are decorators on top of InputStream and OutputStream. The engine classes in the JCE are proxies, and I could cite many more examples. Much of the class library—including the java.io package—has been designed with design patterns, and it will all make a lot more sense if you're familiar with the standard patterns.

The seminal text on the subject is Design Patterns, by Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides (Addison-Wesley, 1995). The four authors are colloquially known as the "Gang of Four," and the book is often cited informally as "GoF." The 23 patterns covered in GoF are rapidly becoming part of the vocabulary of the object-oriented programming community. Design patterns are also beginning to be covered in many more introductory books about object-oriented programming and Java.

There are also several extremely active mailing lists and web sites devoted to design patterns. To subscribe to the patterns@cs.uiuc.edu list send email to patterns-request@cs.uiuc.edu with the word "subscribe" in the Subject: field. Archives of this and several related lists may be perused at https://www.DistributedObjects.com/portfolio/archives/patterns/index.html.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

The java.io Package

Content preview·Buy reprint rights for this chapter

The original source for much of the information contained herein about I/O is the javadoc documentation for the java.io package. You should have downloaded this with the JDK, but it's also available online at:

https://java.sun.com/products/jdk/1.2/docs/api/java/io/package-summary.html (Java 1.2)
https://java.sun.com/products/jdk/1.1/docs/api/Package-java.io.html (Java 1.1)
https://java.sun.com/products/jdk/1.0.2/api/Package-java.io.html (Java 1.0)

The class library documentation is, however, woefully incomplete. While it explains what each method does, it often fails to explain how, why, or when you should use those methods. Furthermore, it only occasionally discusses assumptions about the behavior of those methods—assumptions that are crucial for anyone not merely using but also subclassing particular classes. There are many implicit assumptions about what particular methods should do (for instance, that a close() method of a filter input stream also closes any other streams it's connected to), and these are generally not documented anywhere (or at least they weren't until I wrote this book).

I've tried to document all of these assumptions in this book, but if you're faced with a new class not covered here, the canonical reference is the source code itself. The JDK includes Java source code for the java packages. You'll find it in a file called src.zip in your JDK distribution. Sometimes the only way to figure out exactly what Sun intended particular classes to do or how they expected them to do it is to read the source code for those classes.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Network Programming

Content preview·Buy reprint rights for this chapter

In many ways this book is a prequel to my previous book with O'Reilly, Java Network Programming. Although written first, Java Network Programming presumes a solid familiarity with input and output, streams, and readers and writers as discussed in this book. Java Network Programming explains the fundamental protocols and technology that underlie the Internet, shows you how to communicate with sockets, provides detailed examples of working network clients and servers, and even develops content and protocol handlers. If you want to learn more about TCP/IP, HTTP, URLs, sockets and server sockets, and other elements of Internet programming in Java, you should definitely pick up Java Network Programming. (There's probably an ad for it in the back of this very book.)

The Centre for Distance-spanning Technology (CDT) runs the unmoderated java-networking@cdt.luth.se list for informal discussion of Java network programming, which I participate in. To subscribe, send an email containing the word "subscribe" in the body of the message to java-networking-request@cdt.luth.se. An archive of the list and complete instructions are available from https://www.cdt.luth.se/~peppar/java/java-networking-list/.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Data Compression

Content preview·Buy reprint rights for this chapter

Java supports several related compression formats, including zlib, deflate, and gzip. These formats are documented in RFCs 1950, 1951, and 1952, and are available wherever RFCs are found, including https://www.faqs.org/rfcs/. The master site for these particular RFCs is ftp://ftp.uu.net/graphics/png/documents/zlib/zdoc-index.html.

Java's compression classes are native wrappers around the ZLIB compression library written by Jean-Loup Gailly and Mark Adler. You can learn about this library at https://www.cdrom.com/pub/infozip/zlib/.

For more general information about compression and archiving algorithms and formats, the comp.compression FAQ is a good place to start. See https://www.faqs.org/faqs/compression-faq/part1/preamble.html. More technical details and sample code in C for a variety of algorithms are available in The Data Compression Book, by Mark Nelson and Jean-Loup Gailly (M&T Books, 1996, ISBN 1-55851-434-1).

The JAR file format was developed by Sun for Java. The full specification can be found at https://java.sun.com/products/jdk/1.2/docs/guide/jar/jarGuide.html ( Java 2) or https://java.sun.com/products/jdk/1.1/docs/guide/jar/jarGuide.html ( Java 1.1). Aside from the name, the only thing that really distinguishes a JAR file from a zip file is the optional manifest of the contents. The manifest format specification can be found at https://java.sun.com/products/jdk/1.2/docs/guide/jar/manifest.html.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Encryption and Related Technology

Content preview·Buy reprint rights for this chapter

Chapter 10 only began to explore the fascinating subject of cryptography. The JCE is explicated in much more detail by Jonathan Knudsen in Java Cryptography (O'Reilly & Associates, 1998) Java Cryptography expands on the coverage of the Cipher and MessageDigest classes you'll find in this book. It also includes thorough discussions of the java.security package and the Java Cryptography Extension (JCE), showing you how to use security providers and even implement your own provider. It discusses authentication, key management, and public and private key encryption and includes a secure talk application that encrypts all data sent over the network. If you write Java programs that communicate sensitive data, you'll find this book indispensable.

For a more in-depth look at the mathematics and protocols that underlie the JCE, you'll want to check out Bruce Schneier's Applied Cryptography (John Wiley & Sons, 1995). This is the standard practical text on cryptographic protocols and algorithms, and the attacks on them. Schneier discusses a wide range of cryptographic algorithms, key management and exchange schemes, one-way hash functions, signature algorithms, and many other problems in sufficient detail to allow a competent programmer to implement them. Although Schneier's language of choice is C, the techniques discussed are applicable in any language.

The formal specification of the Java Cryptography API is available from Sun at https://java.sun.com/products/jdk/1.2/docs/guide/security/CryptoSpec.html. The actual implementation is in beta at the time of this writing and can be downloaded from https://developer.java.sun.com/developer/earlyAccess/jdk12/jce.html.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Object Serialization

Content preview·Buy reprint rights for this chapter

Sun's serialization web page at https://java.sun.com/products/jdk/1.2/docs/guide/serialization/ includes a FAQ list, sample code, and the complete object serialization specification. The specification covers serialization as implemented in Java 1.2, which is mostly upward-compatible with the Java 1.1 serialization discussed in Chapter 11. An earlier prebeta specification that covers Java 1.0.2 serialization is posted at https://java.sun.com/products/jdk/rmi/doc/serial-spec/serialTOC.doc.html. A formal specification of Java 1.1 serialization was never published. However, the Java 1.2 spec is mostly the same, with the addition of a few extra features like the readResolve() method.

Sun's formal specification for object serialization is not always clear, especially when it comes to motivating the more esoteric areas of serialization like ObjectInputValidation. However, it is complete and does add some to what I discussed in Chapter 11, including the binary protocol for serialized objects and .ser files.

Object serialization was originally developed to support Remote Method Invocation (RMI), an architecture that allows Java objects in one virtual machine to invoke methods on objects in another virtual machine, possibly running on a different computer somewhere else on the Internet. RMI is discussed briefly in Chapter 14 of my Java Network Programming and at great length in Jim Farley's Java Distributed Computing (O'Reilly & Associates, 1998, ISBN 1-56592-206-9).

Object serialization is also used extensively as part of the JavaBeans component software architecture, a standard part of Java 1.1 and later. To learn more about this, I recommend you pick up Robert Englander's Developing Java Beans (O'Reilly & Associates, 1997, ISBN 1-56592-289-1) or my own JavaBeans: Developing Component Software in Java (IDG Books, 1997, ISBN 0-76458-052-3).

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

International Character Sets and Unicode

Content preview·Buy reprint rights for this chapter

The canonical reference to Unicode is The Unicode Standard, Version 2.0 (Addison-Wesley, 1996, ISBN 0-201-48345-9). This book features detailed analysis of the Unicode standard as well as discussion of the difficulties of defining character sets for all the world's different languages. It's also got tables of almost all the defined characters in Unicode, including about 20,000 Han ideographs. The size of the book and the large number of interesting tables of different scripts from around the world make it a good choice for a techie coffee-table book that can even amuse your liberal arts friends. Updates, corrections, and errata to that volume are available on the Web at https://www.unicode.org/.

There's no single source of information for all the different non-Unicode character sets Java readers and writers can translate. However, most of the Windows character sets are enumerated in Developing International Software for Windows 95 and NT, by Nadine Kano (Microsoft Press, 1995, ISBN 1-55615-840-8). Kano ignores non-Windows platforms, and she does occasionally sound too much like a Microsoft press release. Nonetheless, this book contains a lot of useful details about how various localized versions of Windows operate. This book is also available on the MSDN Online Library web site at https://premium.microsoft.com/msdn/library/. Registration is required, but otherwise it's free. Assuming Microsoft hasn't added an actually navigable interface to MSDN by the time you read this, you'll find it by clicking on "Books" in the lefthand frame, then clicking on "Developing International Software." (I normally wouldn't bother you with such details, but the interface really is painfully obscure.)

Roman Czyborra maintains a lot of useful information about various ISO 8859 and Cyrillic character sets on his web site at https://czyborra.com/, including charts of a wide range of character sets and code pages.

Ken Lunde's CJKV Information Processing: Chinese, Japanese, Korean & Vietnamese Computing

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Java Communications API

Content preview·Buy reprint rights for this chapter

This may well be the first book to cover the Java Communications API. Sun includes a limited amount of documentation with the Java Communications API itself, mostly javadoc class library documentation. The latter is also available from Sun's web site at https://java.sun.com/products/javacomm/javadocs/Package-javax.comm.html.

The RS-232 serial port and IEEE 1284 parallel port standards predate the Web and widespread use of the Internet. Thus, these standards are still available only on dead trees for the moment. A number of books do cover them in reasonable detail, including Scott Mueller's Upgrading and Repairing PCs, 10th edition (Que, 1998, ISBN 0-7897-1636-4).

Several books discuss writing port-aware programs in a variety of languages. Although none yet use Java, it's generally not hard to translate from the low-level C or Basic code to the equivalent code that uses the Java Communications API. The best book I've found for parallel ports is Jan Axelson's Parallel Port Complete (Lakeview Research, 1996, ISBN 096508191-5).

There are more choices for serial port books, but the most comprehensive one is certainly Joe Campbell's C Programmer's Guide to Serial Communications (Sams, 1993, ISBN 0-672-30286-1). Despite the title, the first half of this 900-page tome is an exhaustive treatment of more or less language-independent serial communication hardware and protocols from 19th-century telegraphy to the present day.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Updates and Breaking News

Content preview·Buy reprint rights for this chapter

In the fast-moving world of Java, it's an effort to publish a book that isn't out of date by the time it reaches store shelves. Most of what I've written about in this book seems fairly stable. However, there will undoubtedly by many new developments after publication. The following three web sites can help you stay abreast of new technologies and strategies for Java I/O.

My Café au Lait site at


https://metalab.unc.edu/javafaq/

features almost daily news updates about Java topics. I pay special attention to new material that's closely related to my books, like I/O and networking libraries. Café au Lait also features many resources to help you develop your Java programming skills, including FAQ lists, tutorials, course notes, examples, exercises, book reviews, and more. Of particular interest will be the Java I/O page at https://metalab.unc.edu/javafaq/books/javaio/. I'll post corrections and updates to this book there as necessary.

O'Reilly's official Java site at https://java.oreilly.com/ contains feature articles and links to the official O'Reilly sites for all our Java books. You can peruse the rather impressive O'Reilly Java catalog (18 books and counting) and view descriptions, author bios, tables of contents, indexes, reviews, exercises, examples, errata, and reader comments for all the books (including this one).

I/O isn't the sexiest topic in the programming community, but it is one of the most important. IDG's JavaWorld (https://www.javaworld.com/) is to be commended for treating I/O on an equal footing with sexier topics like JavaBeans and the Java Media APIs. JavaWorld publishes monthly how-to articles, book reviews, news, and more. They're particularly notable for providing short, technical articles that show you how to do things Sun's only hinted at and how to work around common problems programmers face.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Appendix B: Character Sets

Content preview·Buy reprint rights for this chapter

The first 128 Unicode characters—that is, characters through 127—are identical to the ASCII character set. 32 is the ASCII space; therefore, 32 is the Unicode space. 33 is the ASCII exclamation point; therefore, 33 is the Unicode exclamation point, and so on. Table 2.1 lists this character set.

In the first column, characters through 31 are referred to as control characters, because they're traditionally entered by holding down the control key and a letter key (on at least some dumb terminals). For instance Ctrl-H is often ASCII 8, backspace. Ctrl-S is often mapped to ASCII 19, DC3 or XOFF. Ctrl-Q is often mapped to ASCII 17, DC1 or XON. Generally, each control character is entered by pressing the Control key and the printable character whose ASCII value is the ASCII value of the character you want plus 64 (or 96, if you count from the capitals). Character 127, delete, is also a control character.

The common abbreviation for the character is given first, followed by its common meaning. Some of these codes are pretty much obsolete. For instance, I'm not aware of any modern OS that actually uses characters 28 through 31 as file, group, record, and unit separators. Those control codes that are still used often have different meanings on different platforms. For example, character 10, the linefeed, originally meant move the platen on the printer up one line, while character 13, the carriage return, meant return the print-head to the beginning of the line. On paper-based teletype terminals, this could be used to position the print-head anywhere on a page and perhaps overtype characters that had already been typed. This no longer makes sense in an era of glass terminals and GUIs, so linefeed has come to mean a generic end-of-line character.

The next 128 Unicode characters—that is 128 through 255—have the same values as the equivalent characters in the Latin-1 character set defined in ISO standard 8859-1. Latin-1, a slight variation of which is used by Windows, adds the various accented characters, umlauts, cedillas, upside-down question marks, and other characters needed to write text in most Western European languages. Table 2.2 shows these characters. The first 128 characters in Latin-1 are the ASCII characters shown in Table 2.1.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Return to Java I/O

Original Source | Taken Source