Carview!

HOME
ABOUT
- RESULTS
- differences
- BENEFITS
- HISTORY
- TEAM
- LOCATION
- FACILITIES
- BANKING
- MEMBERSHIPS
- APPROVALS
- LICENCES
- SUPPLIERS
- SPONSORSHIPS
- MEDIA
- PRIVACY
AUCTIONS
SHIPPING
FEES
- TS REWARDS
TOOLS
guides
FAQ
CONTACT
- CONNECT

VEHICLES
BRAND
- JAPANESE CARS
  - DAIHATSU
  - EUNOS
  - FORD
  - HONDA
  - ISUZU
  - LEXUS
  - MAZDA
  - MITSUBISHI
  - MITSUOKA
  - NISSAN
  - SUBARU
  - SUZUKI
  - TOYOTA
- GERMAN CARS
- AMERICAN CARS
- BRITISH CARS
- ITALIAN CARS
- FRENCH CARS
- SWEDISH CARS
- KOREAN CARS
TYPE
- mobility
- VENDING
- instruction
- TAXIS
- AMBULANCES
- FIRE ENGINES
- HEARSES
- LIMOUSINES
- COMMERCIAL
CLASS
FUEL
TRUCKS
minitrucks
- DAIHATSU
- HONDA
- MAZDA
- MITSUBISHI
- NISSAN
- SUBARU
- SUZUKI
- DUMP
- CRANE
- CAMPER
- REFRIGERATED
- 4WD
- NEW
BUSES
MOTORHOMES
- YAHOO!
- RAKUTEN
- DEALER

PARTS
- FREE REPORT
- PARTS CONTAINERS
- PARTS SYSTEMS
- PARTS PROTECTION
- BODY SHELLS
- DISMANTLING
- ONLINE PARTS
- NEW PARTS
- INTERIOR PARTS
- EXTERIOR PARTS
  - BONNETS
  - BUMPERS
  - GRILLES
  - FENDERS
  - DOORS
  - TRUNKS
  - SPOILERS
  - LIGHTS
  - EMBLEMS
  - CAMERAS
- ENGINES
- TRANSMISSIONS
- WHEELS & TYRES
  - WHEELS
  - TYRES
CUTS
PERFORMANCE PARTS
TRUCK PARTS
MOTORBIKE PARTS
- MOTORBIKE ENGINES
- MOTORBIKE ACCESSORIES

MOTORBIKES
MARINE
FORKLIFTS
MACHINERY
AGRICULTURAL
OTHER
COUNTRY
- AUSTRALIA
- CANADA
- KENYA
- MYANMAR
- NEW ZEALAND
- PAKISTAN
- TANZANIA
- UNITED STATES

CARVIEW

MOTORHOMES

Select Language

HTTP/2 302 server: nginx date: Fri, 01 Aug 2025 01:41:03 GMT content-type: text/plain; charset=utf-8 content-length: 0 x-archive-redirect-reason: found capture at 20091016063743 location: https://web.archive.org/web/20091016063743/https://toc.oreilly.com/laura-dawson/2009/02/ server-timing: captures_list;dur=0.463707, exclusion.robots;dur=0.017723, exclusion.robots.policy;dur=0.008292, esindex;dur=0.010716, cdx.remote;dur=19.994836, LoadShardBlock;dur=101.544229, PetaboxLoader3.datanode;dur=46.530925 x-app-server: wwwb-app218 x-ts: 302 x-tr: 145 server-timing: TR;dur=0,Tw;dur=0,Tc;dur=0 set-cookie: SERVER=wwwb-app218; path=/ x-location: All x-rl: 0 x-na: 0 x-page-cache: MISS server-timing: MISS x-nid: DigitalOcean referrer-policy: no-referrer-when-downgrade permissions-policy: interest-cohort=() HTTP/2 200 server: nginx date: Fri, 01 Aug 2025 01:41:03 GMT content-type: text/html x-archive-orig-date: Fri, 16 Oct 2009 06:37:44 GMT x-archive-orig-server: Apache/2.2.11 (Unix) mod_ssl/2.2.11 OpenSSL/0.9.7a mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635 PHP/4.4.9 x-archive-orig-last-modified: Tue, 13 Oct 2009 19:57:50 GMT x-archive-orig-etag: "2d888ad-2304a-475d675ffeb80" x-archive-orig-accept-ranges: bytes x-archive-orig-content-length: 143434 x-archive-orig-vary: Accept-Encoding x-archive-orig-connection: close x-archive-guessed-content-type: text/html x-archive-guessed-charset: utf-8 memento-datetime: Fri, 16 Oct 2009 06:37:43 GMT link: ; rel="original", ; rel="timemap"; type="application/link-format", ; rel="timegate", ; rel="first memento"; datetime="Fri, 16 Oct 2009 06:37:43 GMT", ; rel="memento"; datetime="Fri, 16 Oct 2009 06:37:43 GMT", ; rel="last memento"; datetime="Fri, 16 Oct 2009 06:37:43 GMT" content-security-policy: default-src 'self' 'unsafe-eval' 'unsafe-inline' data: blob: archive.org web.archive.org web-static.archive.org wayback-api.archive.org athena.archive.org analytics.archive.org pragma.archivelab.org wwwb-events.archive.org x-archive-src: portuguese-web-archive-AWP52009-130/IAH-20091016055128-19604-p5.arquivo.pt.arc.gz server-timing: captures_list;dur=0.431525, exclusion.robots;dur=0.016998, exclusion.robots.policy;dur=0.008449, esindex;dur=0.008043, cdx.remote;dur=8.111998, LoadShardBlock;dur=149.300653, PetaboxLoader3.datanode;dur=114.113908, PetaboxLoader3.resolve;dur=128.592497, load_resource;dur=123.992238 x-app-server: wwwb-app218 x-ts: 200 x-tr: 466 server-timing: TR;dur=0,Tw;dur=0,Tc;dur=0 x-location: All x-rl: 0 x-na: 0 x-page-cache: MISS server-timing: MISS x-nid: DigitalOcean referrer-policy: no-referrer-when-downgrade permissions-policy: interest-cohort=() content-encoding: gzip Tools of Change for Publishing: Laura Dawson: February 2009

O'Reilly Media Home

About
Job Board
Directory
Blog
Community

Archives
Webcasts
Resources
Reading
News

Laura Dawson: February 2009

Taxonomies and Starting With XML

Laura Dawson
February 25, 2009 | Permalink | Comments (11) | Listen

This is an excerpt from a blog post I wrote last week on taxonomies and chunking.

Last October, the StartWithXML team wrote a post called "To Chunk or Not To Chunk," where we discussed tagging and infrastructure issues, and a discussion ensued about what happens when you don't know what you'll be using chunks for. How do you tag those?

Later, in our StartwithXML One-Day Forum, we included a presentation on tagging and chunking best practices, where it was pointed out that no taxonomy for chunk-level content currently exists.

We have taxonomies for book-level content. These include formalized code sets such as theLibrary of Congress subject codes, the BISAC codes, the Dewey Decimal System, among others. There are also informal code sets, like the tag sets on Shelfari or Library Thing. There are proprietary taxonomies at Amazon and B&N.com that enable effective browsing.

But nothing like this exists for sub-book-level content. It's never been traded before. We've never really needed a taxonomy for it before.

Other industries that traditionally distribute "chunks" have their own taxonomies that might prove useful in building a book-chunk schema. These include the IPTC news codes, which identify the content of a particular news story -- that's the closest analogy I can find for small gobbets of content that require organization.

Industries have proprietary taxonomies to identify certain concepts -- culinary arts, music, agriculture, engineering, the sciences, literature and criticism, education, and on and on and on. But these do not necessarily identify concepts within a book.

Some might argue that we don't necessarily need taxonomies -- why can't we use natural-language search and the semantic Web to "bubble up" the "right" concepts? I'd argue that words don't always mean what we think they mean. A classic example from my library days is the term "mercury." That could mean the planet, the car or the element. Proponents of semantic search would say that the context in which "mercury" is mentioned should take care of defining that term. I'd say that's true in about 50 percent of all cases but not definitively true enough in 75-100%.

My original post gets into more detail about why taxonomies are important search tools, and how the digitization of books requires a good taxonomy ... and who should do it.

Related Stories:

Beyond the Tag Cloud
Simplifying Semantic Tagging
Library Uses Tags to Link Online-Offline Recommendations

Stay Connected

	TOC RSS Feeds News Posts Commentary Posts Combined Feed New to RSS?
	Subscribe to the TOC newsletter.
	Follow TOC on Twitter.
	Join the TOC Facebook group.
	Join the TOC LinkedIn group.
	Get the TOC Headline Widget.

Search

Events: TOC Online Conference

Join us on October 8th for this half-day online conference to explore the state of the art of electronic publishing.

TOC In-Depth

Impact of P2P and Free Distribution on Book Sales

This report tests assumptions about free digital book distribution and P2P impact on sales. Learn more.

StartWithXML Research Report

The StartWithXML report offers a pragmatic look at XML tools and publishing workflows. Learn more.

TOC 2008 Tutorial DVDs

Dive into the skills and tools critical to the future of publishing. Learn more.

Tag Cloud

'09
37 signals
aap
ab meta
abebooks
academic
academic publishing
accessibility
acquisition
adobe
adsense
advances
advertising
adwords
affiliate program
aggregated ebooks
aggregation
aggregators
agile
albums
alternate reality games
amateur
amazon
amazon tax
ambiguity
analog hole
analytics
android
announcement
anonymous reports
anthology
ap
api
app store
apple
applications
apps
archives
artists
associated press
asus
attention
auctions
audible
audience
audience development
audio
audiobooks
Authonomy
authoring
authoring tools
authors
authors guild
automated
awareness
azw
backlist
backups
baen
bands
banners
barcode
barnes and noble
bbc
Ben Vershbow
best of toc
bisg
bittorrent
blackwell
blogging
blogs
bloomsbury
bloomsbury academic
bonus features
book club
book groups
book publishing
book purchasing
book recommendations
book reviews
book scanning
book search
bookkake
booklamp
booklocker
books
bookseller
booksellers
bookselling
bookstores
booksurge
booktour
booktrade.info
bookworm
borders
boston.com
brand building
branding
brands
brick and mortar
britannica
broadcast
budgets
buglabs
bundles
business
business market
business models
cannibalization
Cannongate
cell phone
change
chris anderson
chris brogan
chumby
chunking
Chunks
circulation
citizen journalism
classic books
clay shirky
close reading
cloud computing
cnn
collaboration
college
colleges
comcast
comics
comments
communication
communication tools
community
community tools
community-funded journalism
conference
connectivity
consumers
containers
content
content delivery
content owners
context
conversation
cookbooks
copyright
copyright clearance center
copyright law
copyright office
costco
coupons
craft
criticism
crowdsourcing
crunchpad
css
cult of mac
culture
curation
curators
customization
dailycandy
dailylit
data
data portability
data systems
databases
deep web
derek powazek
design
devices
digital
digital archives
digital books
digital change
digital content
digital culture
digital delivery
digital distribution
digital editions
digital formats
digital generation
digital marketing
digital paper
digital platform
digital platforms
digital publishing
digital readers
digital revenue
digital revenue streams
digital rights management
digital sales
digital subscriptions
digital text
digital textbooks
digital tools
digital transition
digital workflow
digitial music
digitization
digitizing
dilbert
directors
display ads
displays
disruption
disruptive technology
distribution
diy
dmca
docbook
documentation
download
drm
drm-free
dvd
e-ink
ebay
ebook
ebook bundles
ebook distribution
ebook formats
ebook formatting
ebook pricing
ebook readers
ebook sales
ebooks
ebooks subscription
eco-friendly books
ecommerce
economics
economy
editing
editorial
editors
edits
edocs
education
eee pc
eff
efficiencies
election
electronic
electronic paper
electronics
email
embedded links
emerging markets
encyclopedia
engagement
environment
epd
epub
ereader
ereaders
ergonomics
erotica
espresso
espresso book machine
esquire
europe
event
events
excerpts
expectation
experience
experimentation
experiments
extras
faber
facebook
fair use
fair uses
fan sites
fbi
feedback
feedbooks
fiction
file sharing
film
films
first amendment
first sale
flash
flickr
flip test
format
formats
formatting
forms
Fragments
france telecom
frankfurt
free
free content
free economy
free models
free products
freelancers
freemium
frontlist
future of book publishing
gadgets
gail rebuck
gale
games
gatehouse media
generative ideas
genre fiction
genres
global
goodreads
google
google book search
gospoken
government data
grassroots
green books
hacking
hadoop
hard copy
hardware
harlequin
harpercollins
HarperStudio
harvard
hearst corp
high-end
hits
html
html 5
hulu
hyperlinks
hyperlocal content
hypertext
identity
idg
idpf
iliad
imdb
independent publishers
independent retailers
index
indiebound
info snacking
infringement
innovation
Institute for the Future of the Book
interactive fiction
interactivity
international
internet
Internet
internet archive
interstitial publishing
interviews
iphone
ipod
ipod touch
irex
itunes
jamie low
jane austen
japan
java
jeff bezos
joe wikert
joost
journalilsm
journalism
kevin kelly
kindle
kindle 2
kindle 2.0
kindle revenue
labels
labs
last.fm
law
legal
legislation
lexcycle
libraries
library
librarything
librivox
licensing
linking
literacy
literature
live web
local
local coverage
lock-in
london book fair
long
long tail
lulu
macintosh
magazines
maghound
mainstream media
make
manga
manuscripts
margins
Marketing
marketing
mashup
mashups
mechanical turk
media
media industry
media products
merchants
mesh
meta data
metadata
micro-blogging
microformats
microsoft
microsoft press
microsoft word
mobi
mobile
mobile content
mobile devices
mobile purchasing
mobile scanning
mobipocket
movie reviews
movies
mp3
multimedia
museums
music
mygazines
napster
narnia
national security letter
netflix
new publishing models
new york
new york times
news
news coverage
newspaper
newspapers
newsweek
niche
niche books
niche publications
non-media
novels
o'reilly
o'reilly ebooks
Obama
obscurity
ocr
ofps
olpc
on demand
online
online bookstore
online content
online marketing
online publishing
online retailers
online sales
online video
open access
open content alliance
open government
open source
openid
oprah
oprah winfrey
oreilly
oreilly ebooks
organisational culture
orphan works
orphaned works
out of print
outsourcing
p2p
page views
palm
pan macmillan
pandora
paperback
passion
patriot act
paulo coelho
pbs
pdf
peer production
penguin
performance
perseus book group
personalization
philadelphia
photojournalism
piracy
plastic logic
platform
platforms
pod
podcasting
podcasts
politico
politics
president
pricing
prince caspian
print
print books
print distribution
print on demand
print runs
printing
privacy
production
products
professional
professors
profit sharing
programmers
programming
projections
promotion
promotional material
proprietary
proximity
psychology
public access
public domain
publicity
publisher
publisher blogs
publishers
publishing
publishing industry
publishing rights
publishing technology
radio stations
radiohead
rand
random house
Random House
rdf
read & go
reader
reader 1000
reader behavior
readers
readership
reading
reading campaign
reading devices
readius ereader
recipes
recommended reading
reference books
rentals
reporters
reporting
repurposed content
research
research tools
reselling
resources
retail
retailers
reusable data
revenue
revenue share
revenue streams
reviews
rhapsody
riaa
Rights
rights
roundtables
roundup
royalties
rss
rtf
safari
safari books online
sale
sales
sales figures
sales tax
scale
scam
scanning
scarcity
schema
science
scifi
scribd
search engine optimization
search engines
see inside
self publishing
self-publishing
semantic web
sensors
seo
serialized
services
shelfari
Shortcovers
silicon valley
silverlight
slides
small products
small publishers
snack culture
social graph
social media
social network
social networking
social networks
Soft Skull
software
software development
songs
sony
sony reader
souvenir
spore
stanza
startwithxml
startwithxml survey
stephen abrams
stephen colbert
storytelling
strategy
streaming video
streams
students
stylesheets
subscription
subscription service
subscriptions
subversion
survey
survey results
swxml
sync
t-mobile
tablet
tablet computing
tablets
tagging
tags
takedown
tax
taxes
taxonomy
techcrunch
technology
Teleread
television
territories
text
text ads
textbooks
the last lecture
tim o'reilly
toc
toc '09
toc 10
toc announcement
toc community
toc conference
toc directory
toc resource pages
toc webcast
tools
tools of change for publishing conference
topaz
touchscreens
tracking
traditional models
training
transactions
trent reznor
trialware
trust
tv
tv stations
twitter
Twitter Scorecard for Publishers
uk
unique visitors
updateable book
usability
user information
user reviews
user-generated content
value chain
verizon
verticals
viacom
video
video games
volunteers
wal-mart
wall street journal
wealth of networks
web
web 2.0
web advertising
web applications
web browsers
web community
web comunities
web content
web distribution
web publicity
web publishers
web publishing
web retailers
web sales
web services
web sites
web tools
web tracking
web video
webcast
webook
webos
weread
Wheatland Press
widget
wiki
wikipedia
wikis
wire services
wireless
wisdom of crowds
workflow
worldcat
writers
writing
xml
xml research report
xml tools
ybox2
yearbook
yelp
youth
youtube

TOC Community Topics

BookRix - connecting writers to readers
The Borg That Roared: ESPN Attacks Local News
Newspaper Print Ad Revenue Only Going to Get Worse
Guardian UK Opens Up
Is the Apple Touchscreen For Real?

Tools of Change for Publishing is a division of O'Reilly Media, Inc.

All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.

HOME
ABOUT
AUCTIONS
SHIPPING
FEES
TOOLS
HOW
FAQ
CONTACT

Original Source | Taken Source