CARVIEW |
Select Language
HTTP/2 200
date: Sat, 11 Oct 2025 09:38:09 GMT
content-type: text/html
content-encoding: gzip
content-location: 0129.html
vary: negotiate,Accept-Encoding
tcn: choice
last-modified: Thu, 13 Jul 2023 17:54:08 GMT
cache-control: max-age=2592000, public
expires: Mon, 10 Nov 2025 09:38:09 GMT
access-control-allow-origin: *
x-request-id: 98befe31ebd3cf4f
strict-transport-security: max-age=15552015; preload
x-frame-options: deny
x-xss-protection: 1; mode=block
cf-cache-status: REVALIDATED
set-cookie: __cf_bm=FXrNqFqFcjH1J.jIYhX.q118YRIYGg6eHbVyFt51USU-1760175489-1.0.1.1-VyJuOgpTZdO2O_7gxgi_0jqr8Cokuco0Gwova61srsWVex66bD48yHbK2IgGHifrlOGWB.zujFgljOz_o36npdeg3NoEDmv.dpJHyyFalH0; path=/; expires=Sat, 11-Oct-25 10:08:09 GMT; domain=.w3.org; HttpOnly; Secure; SameSite=None
server: cloudflare
cf-ray: 98cd6206cc5bc179-BLR
alt-svc: h3=":443"; ma=86400
Re: "canonical" URIs from Joseph Reagle on 2002-02-19 (www-tag@w3.org from February 2002)
Re: "canonical" URIs
- From: Joseph Reagle <reagle@w3.org>
- Date: Tue, 19 Feb 2002 14:39:59 -0500
- To: www-tag@w3.org
- Cc: PhillipHallam-Baker <pbaker@verisign.com>, xme <stephen.farrell@baltimore.ie>, Merlin Hughes <merlin@baltimore.ie>, duerst@w3.org
- Message-Id: <200202191939.OAA10552@tux.w3.org>
Stephen has asked an interesting question below that I expect will be important to any activity that uses URIs as identifiers in the context of a semantic/security application: when are two URI variants considered identical? My first impulse was to check the XML namespace spec, "[Definition:] URI references which identify namespaces are considered identical when they are exactly the same character-for-character." [a] [a] https://www.w3.org/TR/REC-xml-names/ However, this could benefit from further specificity. What about the following sort of issues? The URI attribute identifies a data object using a URI-Reference, as specified by RFC2396 [URI]. The set of allowed characters for URI attributes is the same as for XML, namely [Unicode]. However, some Unicode characters are disallowed from URI references including all non-ASCII characters and the excluded characters listed in RFC2396 [URI, section 2.4]. However, the number sign (#), percent sign (%), and square bracket characters re-allowed in RFC 2732 [URI-Literal] are permitted. Disallowed characters must be escaped as follows: ... https://www.w3.org/TR/2002/REC-xmldsig-core-20020212/#sec-URI I spoke to TimBL briefly about the question, he enumerated many of the places one might look for equivalence in the "URI stack" *while* stating that clearly one wouldn't want to address all these layers for the complexity and processing required: URI spec string = string HTTP DNS W3.org = w3.org DNS LOOKUP www.w3.org <-- CNAME -- w3.org HTTP REDIRECT /foo --REDIRECT--> /foo/ RDF /foo = /bar Consequently, character by character comparison is probably the most straightforward approach -- assuming one addresses the character encoding issues well. Stephen is presently using "absolute URIs" with RFC2396 equivalence (see below). This seems fairly straightforward as well -- though it says, "if the URI is case insensitive ..." I think it might be useful to specify whether case *is* relevant or not for that app. Any thoughts? Also, my broader question to the TAG is, does this seem like a worthwhile issue to address for all of our specifications? I also expect the validation/augmentation of URIs of type anyURI in schema might also be relevant to this question but haven't thought about it too carefully. [1] On Thursday 14 February 2002 06:01, Stephen Farrell wrote: > ... > The OASIS security committes's [1] SAML spec [2] is about access > control. One of its messages is of the form "can fred see > https://foo.com/stuff" with a minimal answer being "yes/no". > > Now, we're trying to figure a good way to tell implementors not > to fall for the following scenario: > > Q: "can fred see https://foo.com/stuff" A: no > Q: "can fred see HTTP://Foo.COM:80/stuff" A: no > Q: "can fred see https://foo.com/otherstuff/../stuff" A: yes > > Which involves us in giving some guidance for a "canonical > form" or URI, at least for the de-referencable via HTTP > URLs. > > My best bet so far is the following: > > By the "canonical form" of a URI we mean an absolute URI (i.e. no > relative URIs) which is the shortest of all the equivalent URI > strings, where URI equivalence is defined according to [RFC2396]. > For example, the URI "https://foo.com:80/go/../go/to/" is not in > canonical form, but "https://foo.com/go/to" is in canonical form. > Note that if a URI is partly or entirely case-insensitive, then > there will be more than one "canonical form" for that URI such > that a case sensitive matching rule would consider that the > strings differ (e.g. "HTTP://Foo.cOm/go/to" is "another" canonical > form of the URL above). > > > Ta, > Stephen. > > [1] https://www.oasis-open.org/committees/security/ > [2] > https://www.oasis-open.org/committees/security/docs/draft-sstc-core-25.pdf > [RFC2396] ftp://ftp.isi.edu/in-notes/rfc2396.txt -- Joseph Reagle Jr. https://www.w3.org/People/Reagle/ W3C Policy Analyst mailto:reagle@w3.org IETF/W3C XML-Signature Co-Chair https://www.w3.org/Signature/ W3C XML Encryption Chair https://www.w3.org/Encryption/2001/
Received on Tuesday, 19 February 2002 14:40:06 UTC