|
|
|
|
by Alan Silverstein, Hewlett-Packard Company
Driving a web browser or other computer application is as easy
as driving a car -- or at least it should be. "You don't have
to know how the computer works, just how to work the computer."
This presentation is a quick peek "under the hood", one level
deeper, to give you some idea what's going on behind the scenes.
The story of web browsers and web sites builds up from many
different parts to a grand finale. Here are the parts...
Client/Server Concepts:
- A client computer initiates a service request. A server
computer waits to reply, kind of like a person who waits to
answer the phone so you can make airline reservations.
- A client program can be directed by a human being (at a
screen, keyboard, and mouse), or it can run automatically.
- A server is a program that knows a protocol for communication
(see below), but often doesn't know much about networking --
it just exchanges bytes of information with the client, a
back-and-forth conversation that might be human-readable
ASCII, or binary code; kind of like when you call someone to
make a reservation, and they don't know how the phone system
works, just how to use the telephone to talk with you.
- It's entirely possible for the client and server programs to
be on the same computer, as well as on two different computers
connected by a network. The client/server model blurs the
boundaries between computers, to where "the network IS the
computer."
- Once a virtual connection is established between a client and
server, the two systems are peers, but the client/server
asymmetry usually continues through the protocol; just like
when you're done dialing someone, it doesn't matter much who
placed the call, you can talk to each other as equals, though
quite often the caller and the callee have very different
roles. (With phone calls the caller usually pays for any long
distance charges, but there's usually no equivalent in network
connections.)

Internet Concepts:
- A TCP/IP network is a common connection between two or more
computers. (There's a lot more to it than that, of course.)
- The connection is often through coaxial cable (coax), the same
sort of stuff that's used for TV antenna signals, but it
doesn't have to be. Network connections can be made by radio,
infra-red light, carrier pigeon, you name it. Nowadays it's
more common to see UTP (unshielded twisted pair) wiring, which
is "star shaped" from a central hub out to each system; and
the hubs talk with each other over fiber-optic cable.
- A network connection is shared by every computer on the same
Local Area Net (LAN). They can all hear each other; imagine a
bunch of people standing in a small room together.
- Only one network host (computer) on a LAN can talk at a time
and be understood, unless the arrangement is something like
UTP, where the hub can route traffic just between the
interested parties.
- A networked computer talks by sending a packet of data (a
series of bits/bytes) addressed to either one other computer,
(normal TCP/IP) or to everyone on the LAN who's listening
(often UDP/IP).
- Everyone on the LAN takes turns talking. On coax, if two
computers start to talk at the same time, there's a protocol
for deciding who gets to talk when. You can imagine a bunch
of people in a discussion group, except most of the time
they're talking to just one other person and the rest of the
people aren't listening (until their name is called). On UTP,
any pair of people can whisper to each other without bothering
others in the room who are doing the same thing.
Hardware addresses:
- Every networked computer has at least one LAN interface (card)
that contains a world-wide unique hardware address set by the
manufacturer (48 bits = 6 bytes long). They look like this:
080009352D52.
- You can think of hardware addresses as being like the numbers
on the wires that leave the local phone company to go to your
home telephone. You don't usually deal with them, but they
are necessary to route your calls.
- Hardware addresses are assigned to manufacturers in blocks of
numbers at a time.
- A network interface's hardware (station) address is only known
to the other computers on the local network.
- Two computers talking on a local network actually address each
other by hardware address.
Gateways:
- Local networks are connected by gateways, which are either
general-purpose or dedicated computers that are connected to
two or more different LANs or WANs (wide area networks). They
know which data packets should be forwarded from one network
to another. Imagine two groups of people in two adjacent
rooms, with one person at the common door who passes messages
between the rooms. Separate conversations can take place in
each room at the same time, but messages can also be relayed
between the rooms.
IP addresses:
- Every LAN interface is assigned a world-wide unique IP address
(32 bits = 4 bytes). You can think of this address as the
interface's "phone number". IP addresses look like this:
15.1.50.9.
- IP addresses are assigned to companies as blocks of subnet
addresses. For example, HP owns net 15, and also some parts
of net 192 such as 192.6.40.
- Since a computer can have more than one LAN interface, it can
have more than one IP address (phone number) -- just like your
house.
Hostnames and routes:
- We associate "hostnames" with IP addresses, just like we
associate human or business names with phone numbers. (See
below about domain names.)
- Since a computer can have more than one LAN interface and IP
address, it can have more than one hostname, just like people
can be addressed in different ways, depending on their roles
and relationships. However, each LAN interface does not
necessarily have a different name, because that could be
confusing.
- When a client computer connects to a server computer, it looks
up the server's domain name in a directory through a
nameserver (see below) to find the server computer's IP
address, and then figures out a route to that address by using
routing services on the same system or on other systems,
including a gateway.
- A simple form of a route is: "To reach any machine not on the
local net, go through the gateway at 15.1.50.1." The client
addresses a data packet to that gateway (using the gateway's
hardware address). The gateway in turn figures out how to
send the packet one step closer to the intended destination on
a different LAN.
- When a connection request (data packet) reaches a gateway on
the same LAN as the target server, the gateway talks to the
server using the server's hardware address, since it knows
that number. (If it doesn't, it can use ARP (address
resolution protocol) to find out, kind of like a waiter in a
restaurant announcing, "phone call for Mr. Liu.")

Port Numbers:
- Once a client computer reaches (connects to) a server over the
network, it needs a way to tell the server which service
(server program) it wants to talk with. It does this by
specifying a port number (at least in TCP/IP). You can think
of port numbers as telephone extensions. Every call to a
server system reaches an "operator" who asks for an extension
number to put the call through.
- On a server, when each server program starts running, it
attaches to one or more port numbers and receives traffic on
those ports.
- On a client, when a service is needed, the client program
looks up the standard port number for the service before
connecting to the server system. It's also possible for a
human talking with the client system to specify the port
number to use for a given connection. For example, you can
"telnet" to a server's email or HTTP port and do useful
things, since those services speak ASCII (not binary).
- Obviously there is a lot of agreement in the world about
standard port numbers, or clients would never be able to find
their servers! But in fact any server computer can attach any
server programs it likes to any port numbers.

Protocols:
- Just like in real life, a computer protocol is a formal
description of how to talk to (or interact with) someone else.
- TCP/IP stands for "transmission control protocol / internet
protocol". Along with other protocols, it describes the way
that virtual connections are made over networks, including
hardware addresses, IP addresses, hostnames, etc.
- Once a connection is made between a client and a server, how
they talk with each other is described by a service protocol.
For example, FTP (file transfer protocol) is a simple language
for asking a server to get and send back to the client all the
bytes in a specified file.
- As you can see, to carry on a conversation, a whole stack of
protocols happens at the same time. It's just like if you
make a phone call to a business. There's a protocol you don't
know about (because you can ignore it) that says how to build
and wire up telephones. There's a higher level protocol (that
you do know) about how to dial phone numbers. And once you
get through, there is an even higher level protocol for how
you might make a reservation or leave a message for someone.
- When you call someone on the phone, you're talking to a human
(or to an answering machine that will be heard by a human).
Humans are pretty flexible and interactive, so you don't have
to be real precise about protocols. But computer services are
not so smart, so you must know and obey the protocols to talk
with them.

Domain Names:
- Domains are a way of dividing up the world of computers so
each one can have a unique name (that includes its location in
the network) that's easy for people to use because it's made
of words, not numbers.
- Each domain can contain (know about) subdomains or individual
computers.
- A computer's full domain name starts with its hostname and
ends with its top level domain. For example: "ajs.fc.hp.com"
is a computer ("ajs", named for its owner) that lives in Fort
Collins, Colorado ("fc") on an HP computer network ("hp"),
which is a kind of commercial network ("com").
- You can think of domain names as being like postal addresses
-- name, apartment, street, city, state, country. (Though
most people don't realize it, the ZIP code is actually just
another way to get to the name, apartment, and street parts,
or sometimes just to the name, while ignoring the city, state,
and country parts. In exchange for making you deal with a few
numbers, the post office can deliver your mail a lot faster
and more accurately.)
- Just like there are lots of different kinds of mail addresses,
there are lots of different kinds of domain addresses.
They're all of the form of words separated by dots, but the
meanings of the first words in the name depend on which domain
they're in, that is, which words appear later in the domain
name.
- Just like you can have both a street address and a post office
box, a computer can be in different domains at the same time,
although for various reasons this isn't very common.
Nameservers:
- Every domain or subdomain has at least one computer running a
"nameserver" program, on a "well known port number", that can
do name lookup. For example, if a client program on a
computer named "jlk.co.edu" wants to talk with a service on
server machine "ajs.fc.com", the steps work like this:
- The client program on jlk.co.edu tells the computer system
(jlk) it wants to make the connection.
- jlk looks at "ajs.fc.com". Since it doesn't know anything
about this address, it asks a world-wide top-level nameserver
(at a known address) for the address of "ajs.fc.com". It gets
the IP address of a nameserver for "com".
- jlk asks the "com" server if it knows "ajs.fc.com". jlk gets
an IP address for the nameserver for the "fc.com" subdomain.
- jlk asks the "fc.com" server if it knows "ajs.fc.com". jlk
gets an IP address for that system.
- Now jlk connects to ajs by its IP address.
- Suppose the client on jlk was a mailer (mail program) that
wanted to send email to a person named "ajs" on the computer
named "ajs". It would connect to the mail server's port
(normally port 25), and say that it had mail for
"ajs@ajs.fc.com". Note that while jlk might talk to a number
of different systems in order to send the email, the letter
itself would go directly to the destination system (across
some patchwork of internet segments and gateways).
- If the client was trying to send mail to "ajs@fc.hp.com", a
computer named hpfcla.fc.hp.com might "take the call." It
would say, "I know how to reach ajs@fc.hp.com" (who's really
on ajs@ajs.fc.hp.com). hpfcla would accept the letter from
jlk, then connect with ajs.fc.hp.com and forward it. This is
especially common when the source system can't talk directly
to the destination system because it's behind a "firewall", as
described later.

Mark-up Languages:
- A computer file is a series of bytes, usually in a code like
ASCII that represents text (letters and digits), like English
text.
- When the text is simply lines of words, and each line ends
with an ASCII "newline" character, we call that "flat ASCII".
- A long time ago people realized that computers could be used
to do document and book preparation. They are great at
storing and manipulating text, and with printers they can
print that text on paper.
- "Flat ASCII" text is pretty flat-looking. Documents and books
have lots of fancy features like layout, pagination, different
fonts, chapter headings, drawings, etc. So people developed
what are called "mark-up languages." These are ways of
writing text intermingled with various formatting control
commands that affect how the text is displayed or printed.
For example, I might do something \fIlike this\fR to make some
words appear in italics and the words after them return to
normal (Roman) font.
Viewing and editing:
- Once you have a document that is marked up in some way, you
can view it two different ways. You can edit it as flat ASCII
and hand-modify the formatting control commands, or you can
view or print the document "pretty printed" so you can see
what the control commands do.
- Nowadays most mark-up languages are supported by WYSIWYG
("what you see is what you get" or "whizzywig") editors that
let you edit the document in a form similar to how it will
look when it's displayed or printed by the reader. You never
need to see the control codes, but you have to know a language
for talking with the editor about fancy features!
- Often a WYSIWYG editor will let you "view the source" of the
document so you can see the complicated control language.
- Computers don't have to stop with pretty-printed words. It's
possible for them to display pictures and symbols mixed with
the words. Some of the symbols or words can even be made
active, so when you "click on them" with a mouse, something
happens.
Hyperlinks:
- One useful action is called a "hyperlink". This is a way of
changing the display to show you a different document, or a
different place in the current document. Pictures and words
can both be made into hyperlinks. Text that is marked up to
include hyperlinks is called "hypertext". Here's an example
that shows you the control codes:
<A HREF="#URL">Click here</A> to page-align the following
table of contents. <a name="URL">First entry</a>...
This means, turn "Click here" into a hyperlink with a hidden
reference of "#URL", and if I click on it, move me down to the
next paragraph (after the "<P>"), at "First entry...", which
has a hidden name of "URL".
- The "language" called HTML (hypertext mark-up language) is the
common basis of the World Wide Web. Millions of HTML
documents are stored on millions of networked computers.
The example above is a simple little bit of HTML.
- Currently, most people who write HTML documents edit flat
ASCII forms of the documents, and then view them with a Web
browser to see how they look formatted. This is kind of like
writing software and then compiling it and running it to see
how it works.

What's a URL?
- A Universal Resource Locator is a fancy name for a line of
text that uniquely specifies a resource world-wide. A
resource is usually, but not necessarily, a computer file.
- A URL starts with the name of a server or service; for
example:
ftp:
http:
The first service is FTP, the file transfer protocol you read
about earlier. The second service is HTTP, "hypertext
transfer protocol". It's a simple way to ask HTTP servers for
HTML documents!
(By convention, the FTP server always listens on port 21 on a
server system, while the HTTP server lives on port 80.)
- Normally the resource (text or image file) you want to access
(view) is not on your own system, so the next piece of the URL
is the domain name of the system where it lives; for example:
https://ajs.fc.hp.com
- The exact form of the URL depends on the type of service! For
HTTP, the next piece is a path to the file, usually relative
to the "home directory" for the HTTP server; for example:
https://ajs.fc.hp.com/images/Megan.gif
This means to return an image (graphics interchange language)
file that lives in the HTTP home location under
images/Megan.gif.
- Finally, HTTP understands that after this part of the URL
there can be a wide range of other symbols that aren't a file
path name, but which carry information about the service
request; such as the ticker symbol for a stock whose price to
look up and return.
- One of the cool things about URLs is that you don't have to
keep local copies of documents. They only live at the source
(on the source system), and any time you need to see them, you
retrieve (download) a copy of the latest version -- assuming
the server system is awake and you can reach it! In the past,
people shared around many copies of on-line documents, and
they developed elaborate schemes to try to ensure the latest
copies were always distributed to all users.
- But you should know that most web browsers "cache" local
copies of files for a while. If you visit a website (URL) and
return to it, often the return is pretty fast because a
temporary local copy of the file is used. The only reason you
need to know this is so it doesn't surprise you, and so you
know what the "reload" button does for you.

Bitmapped Graphics:
- Once upon a time, to talk with a computer you had to press or
flip switches and watch lights. ("Once upon a time" tells you
this is a fairy tale and real life was much more complicated,
but anyway...)
- Later people figured out how to build mechanical "teletypes"
that were like typewriters, except the computer listened to
your key presses and typed a reply to you on paper.
- Later still, people figured out how to have computers display
text and simple symbols on CRTs (cathode ray tubes, like TV
sets but not exactly the same).
- And even later, people figured out that computers didn't have
to just display lines of text on these display screens. They
could address individual bits on the screen to draw pictures,
do all sorts of different sizes and fonts of text, etc. They
could even put up "windows" that each acted like a separate
physical display.
- Along with bitmapped displays came new and different kinds of
input devices than keyboards, such as mice and trackballs.
With these "pointing devices" it was possible to "point and
click" on a screen, choosing actions from "menus" instead of
having to type "commands" to get things done.
- Guess what! A hyperlink, in a hypertext document, displayed
"pretty printed" on a screen, is a kind of "menu item"! The
person who creates the document can easily "program" the
"menu" without having to write any software (unless you count
the mark-up language itself).

Putting it All Together:
- In the early 1990s all this technology -- client/server,
internet, domain names, mark-up languages, and graphical
displays -- began to come together. People were starting to
store lots of documents, marked up in various ways for
pretty-printing, on networked computers, and running
application programs that used windowed, graphical (bitmapped)
displays and point-and-click interfaces.
- Some people at CERN in Switzerland put together a markup
language (HTML) with computer networking (client/server) and a
new kind of server and protocol (HTTP), and with some new
window-oriented, graphical, point-and-click software on the
client side (a browser), to create the beginnings of the World
Wide Web. Things really got going around 1993.
- The Web grew like wildfire! Within just a few years, millions
of documents had been created or converted to HTML and made
available through HTTP servers. A variety of different web
browsers (client-side programs) like Mosaic and Netscape were
created and improved.
- So what exactly is the World Wide Web? It doesn't really
exist! It's "just" a collection of networked computers,
internet connections, services and servers with lots of
marked-up documents and pictures to share, and client-side
browsers with which to view them. But when you put them all
together, it's magic! The Web seems to have tangible
substance.
- What's a "homepage"? It's just an HTML document that is
brought up by default when you connect to a particular server
for HTTP. Often there are lots of homepages available through
one server, say, one for each user on the computer. You
specify which one (whose homepage) you want, as part of the
URL; for example: https://udltools.fc.hp.com/~ajs is a way of
retrieving the homepage for user "ajs" from the HTTP server on
"udltools". Homepages are usually starting places for
following hyperlinks to details about a person or
organization.
- Suppose someone emails you a URL. "Check out this cool
website/homepage." What can you do?
- You can cut and paste the URL (on your local graphical,
window-smart display, using a mouse) into a data entry form
provided by the web browser, say, Netscape, on your display
screen.
- Netscape figures out from the URL that the protocol is HTTP;
then as a client program, it asks the local computer to
connect it to the domain name (computer) specified in the URL.
- Along the way, if necessary, the client system looks up the IP
address of the server computer from nameservers (and you get
an error if that computer can't be reached).
- After reaching the server system, the client system asks for
the HTTP server by port number.
- Then it tells the HTTP server program the rest of the
gibberish in the URL.
- Some time later the server responds with an HTML document (or
with an error). This document is shipped back over the net to
your web browser, and the connection is broken. (HTTP is
"stateless", that is, it doesn't maintain a long-term
relationship between the client and server. Each time the
client needs something from the server, it makes a new,
independent request. However, often times the new request
includes saved information based on the previous request, such
as form filled out by the user.)
- The client-side web browser (Netscape) "pretty prints" this
HTML document and displays it on the screen for you.
- All of this takes place in just a few seconds. How long,
depends on how busy are the network(s) between your client
system and the server system, how busy are the two systems
themselves, and how much data is to be transferred. (A
picture might be worth a thousand words, but it's often worth
a hundred thousand bytes.)

Going Even Further:
- Much of the time a website's URL is of the common form:
https://www.whatever.com
and when people talk about the website, they leave off the
"https://" part, or even all but the "whatever" part. Bear in
mind that this is rather like giving a phone number without
the area code.
- Many computer networks are behind "firewalls". This means the
computers in the organization can talk to each other and can
connect to computers outside the firewall, but outside
computers can't make inbound connections. This is why, for
example, you can't reach some of the URLs I quoted above, from
outside of HP.
- Remember that automatic programs, not just people running
programs, can be network clients. What happens when you write
a "robot" that follows all the links it can, throughout the
Web, and remembers in a database the URLs and titles and
documents it's seen? You get a "web search engine", like the
ones at:
https://www.altavista.digital.com
https://searcher.fc.hp.com/arachnophilia (HP-internal)
You can tell these servers some words, and they locate all the
web pages they know about that contain those words. Then they
create (very fast, while you wait) a new, customized web page
(HTML document) that includes hyperlinks to the other web
pages that contain the words you wanted to find. Click! Off
you go!
- Most web browsers also have a way for you to record favorite
URLs and their document titles, as "bookmarks" or "hotlists".
- Remember that there is a protocol for every networked
service... And there are lots of different kinds of computer
services in common use. Guess what -- most web browsers know
lots of protocols! They can not only talk HTTP/HTML, they can
also talk FTP (bring back and display files for you), send and
receive email, and read and post netnews. That is, they can
be clients for a lot of different servers, presenting them all
to you, the user, through a common style of graphical display.
- The three most common types of computer services, which people
get confused, are these:
- Electronic mail (email), exchanged using SMTP (simple mail
transfer protocol) or other protocols. This is good for
point-to-point communications, and for "broadcasting"
using "mailing lists" or "mail reflectors" to lists of
people. To send someone email, you need their email
address, which is usually of this form:
username@domain
Conversations by email are slower and less interruptive
than by telephone, but can be more precise, more easily
shared widely, more easily saved and reused, etc. Email
combines features of both paper mail and telephones.
- Netnews (formerly called Usenet), exchanged using NNTP
(network news transfer protocol) or other protocols. This
is like a public bulletin board where anyone passing by
can read what's on the board, tack up their own sheet of
paper, and even send email to people who posted other
notices. To achieve some sanity, old notices are
automatically "taken down" (removed by the computer);
discussions are grouped into "newsgroups" and then into
"threads" (common titles or subjects) within each
newsgroup.
Newsgroups are great for widely sharing information,
especially if it is periodic in nature, like a newsletter,
or is well-suited for group discussion and debate.
However, people often forget that all they see locally is
a COPY of the bulletin board, with whatever "sheets of
paper" have been copied and posted locally (to their
system, or to a local news server system). There are
delays, postings can get out of order, etc.
- Web pages -- usually HTML documents. This medium is
rather like a highly interactive encyclopedia where anyone
with access to a networked computer can add their own
pages (and they are NOT alphabetized). It's great for
finding information when you need it, but it's not so
great for discussions or for knowing when new material
arrives. People forget this and often share information
by web pages that would be better shared by netnews.
- What is "link rot"? That's when a hyperlink (in one HTML
document) points to a computer or document that is moved,
deleted, or no longer accessible. The link looks fine, but
when you click on it you get an error. One of the sad truths
about an anarchy like the World Wide Web is that link rot is
common and unavoidable. This is one reason that in many cases
it's better to search for a resource (using a web search
engine) when you need it, than to record the URL.
- What's "web surfing"? This means following hyperlinks, either
from a search, or through a series of linked pages, or even at
random, in the course of learning something, or just having
fun. Kind of like flipping TV channels -- but less random!
WHEW! Happy Web surfing!
Related Articles:

Learn the Net (www.learnthenet.com) is
Copyright 1996-2009. Michael Lerner Productions.
All Rights Reserved.
|
|
|
 |
|
|