CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 date: Sat, 11 Oct 2025 09:38:09 GMT content-type: text/html content-encoding: gzip content-location: 0129.html vary: negotiate,Accept-Encoding tcn: choice last-modified: Thu, 13 Jul 2023 17:54:08 GMT cache-control: max-age=2592000, public expires: Mon, 10 Nov 2025 09:38:09 GMT access-control-allow-origin: * x-request-id: 98befe31ebd3cf4f strict-transport-security: max-age=15552015; preload x-frame-options: deny x-xss-protection: 1; mode=block cf-cache-status: REVALIDATED set-cookie: __cf_bm=FXrNqFqFcjH1J.jIYhX.q118YRIYGg6eHbVyFt51USU-1760175489-1.0.1.1-VyJuOgpTZdO2O_7gxgi_0jqr8Cokuco0Gwova61srsWVex66bD48yHbK2IgGHifrlOGWB.zujFgljOz_o36npdeg3NoEDmv.dpJHyyFalH0; path=/; expires=Sat, 11-Oct-25 10:08:09 GMT; domain=.w3.org; HttpOnly; Secure; SameSite=None server: cloudflare cf-ray: 98cd6206cc5bc179-BLR alt-svc: h3=":443"; ma=86400 Re: "canonical" URIs from Joseph Reagle on 2002-02-19 (www-tag@w3.org from February 2002)

Re: "canonical" URIs

From: Joseph Reagle <reagle@w3.org>
Date: Tue, 19 Feb 2002 14:39:59 -0500
To: www-tag@w3.org
Cc: PhillipHallam-Baker <pbaker@verisign.com>, xme <stephen.farrell@baltimore.ie>, Merlin Hughes <merlin@baltimore.ie>, duerst@w3.org
Message-Id: <200202191939.OAA10552@tux.w3.org>

Stephen has asked an interesting question below that I expect will be 
important  to any activity that uses URIs as identifiers in the context of 
a semantic/security application: when are two URI variants considered 
identical?
My first impulse was to check the XML namespace spec, "[Definition:] URI 
references which identify namespaces are considered identical when they are 
exactly the same character-for-character." [a] 
[a] https://www.w3.org/TR/REC-xml-names/
However, this could benefit from further specificity. What about the 
following sort of issues?
  The URI attribute identifies a data object using a URI-Reference,
  as specified by RFC2396 [URI]. The set of allowed characters for 
  URI attributes is the same as for XML, namely [Unicode]. However,
  some Unicode characters are disallowed from URI references
  including all non-ASCII characters and the excluded characters
  listed in RFC2396 [URI, section 2.4]. However, the number sign (#),
  percent sign (%), and square bracket characters re-allowed in RFC 2732
  [URI-Literal] are permitted. Disallowed characters must be escaped as
  follows: ...
  https://www.w3.org/TR/2002/REC-xmldsig-core-20020212/#sec-URI
I spoke to TimBL briefly about the question, he enumerated many of the 
places one might look for equivalence in the "URI stack" *while* stating 
that clearly one wouldn't want to address all these layers for the 
complexity and processing required:
  URI spec
	string = string
  HTTP DNS
	W3.org = w3.org
  DNS LOOKUP
	www.w3.org   <-- CNAME --  w3.org
  HTTP REDIRECT
	/foo --REDIRECT--> /foo/
  RDF
	/foo = /bar
Consequently, character by character comparison is probably the most 
straightforward approach -- assuming one addresses the character encoding 
issues well. 
Stephen is presently using "absolute URIs" with RFC2396 equivalence (see 
below). This seems fairly straightforward as well -- though it says, "if 
the URI is case insensitive ..." I think it might be useful to specify 
whether case *is* relevant or not for that app. Any thoughts?
Also, my broader question to the TAG is, does this seem like a worthwhile 
issue to address for all of our specifications? I also expect the 
validation/augmentation of URIs of type anyURI in schema might also be 
relevant to this question but haven't thought about it too carefully.
[1] On Thursday 14 February 2002 06:01, Stephen Farrell wrote:
> ...
> The OASIS security committes's [1] SAML spec [2] is about access
> control. One of its messages is of the form "can fred see
> https://foo.com/stuff" with a minimal answer being "yes/no".
>
> Now, we're trying to figure a good way to tell implementors not
> to fall for the following scenario:
>
> Q: "can fred see https://foo.com/stuff" A: no
> Q: "can fred see HTTP://Foo.COM:80/stuff" A: no
> Q: "can fred see https://foo.com/otherstuff/../stuff" A: yes
>
> Which involves us in giving some guidance for a "canonical
> form" or URI, at least for the de-referencable via HTTP
> URLs.
>
> My best bet so far is the following:
>
>    By the "canonical form" of a URI we mean an absolute URI (i.e. no
>    relative URIs) which is the shortest of all the equivalent URI
>    strings, where URI equivalence is defined according to [RFC2396].
>    For example, the URI "https://foo.com:80/go/../go/to/" is not in
>    canonical form, but "https://foo.com/go/to" is in canonical form.
>    Note that if a URI is partly or entirely case-insensitive, then
>    there will be more than one "canonical form" for that URI such
>    that a case sensitive matching rule would consider that the
>    strings differ (e.g. "HTTP://Foo.cOm/go/to" is "another" canonical
>    form of the URL above).
>
>
> Ta,
> Stephen.
>
> [1] https://www.oasis-open.org/committees/security/
> [2]
> https://www.oasis-open.org/committees/security/docs/draft-sstc-core-25.pdf
> [RFC2396] ftp://ftp.isi.edu/in-notes/rfc2396.txt
-- 
Joseph Reagle Jr.                 https://www.w3.org/People/Reagle/
W3C Policy Analyst                mailto:reagle@w3.org
IETF/W3C XML-Signature Co-Chair   https://www.w3.org/Signature/
W3C XML Encryption Chair          https://www.w3.org/Encryption/2001/

Received on Tuesday, 19 February 2002 14:40:06 UTC

Original Source | Taken Source