Table of contents
Welcome to the geekiest corner of the web! If you're here, you're probably familiar with Uniform Resource Identifier (URI) - and if you're not, don't worry! You'll be an expert by the end of this blog. But first, let's lighten up the mood with some URI humor. Why did the URI go to the doctor? To get a URIscope!
What is URI?
A compact sequence of characters that identifies an abstract or physical resource.
- A resource is not necessarily available on the Web.
- URIs can be assigned even to objects from the real world or to concepts.
Current standard:
Tim Berners-Lee, Roy Fielding, Larry Masinter, Uniform Resource Identifier (URI): Generic Syntax, RFC 3986, January 2005. rfc-editor.org/rfc/rfc3986
Each URI begins with a scheme name that is separated by a ' : ' character from the scheme-specific part of the URI.
- Scheme specifications can define their scheme-specific syntax within certain limits.
The organization responsible for the administration of the URI schemes:
Internet Assigned Numbers Authority (IANA) iana.org
- See: Uniform Resource Identifier (URI) Schemes iana.org/assignments/uri-schemes/uri-scheme..
Well-Known URI Schemes
file
Matthew Kerwin, The "file" URI Scheme, RFC 8089, February 2017. rfc-editor.org/rfc/rfc8089
http/https
Roy T. Fielding (ed.), Mark Nottingham (ed.), Julian F. Reschke (ed.), HTTP Semantics, RFC 9110, June 2022. rfc-editor.org/rfc/rfc9110
mailto
Martin Dürst, Larry Masinter, Jamie Zawinski, The 'mailto' URI Scheme, RFC 6068, October 2010. rfc-editor.org/rfc/rfc6068
about
S. Moonesamy (ed.), The “about” URI Scheme, RFC 6694, August 2012. rfc-editor.org/rfc/rfc6694
URI Characters
Characters allowed in URIs
The following are reserved characters:
- ':', '/', '?', '#', '[', ']', '@', '!', '$', '&', ''', '(', ')', '*', '+', ',', ';', '='
- Characters used as delimiters.
The following are unreserved characters:
- 'A', ..., 'Z', 'a', ..., 'z'
- '0', ..., '9'
- '-', '.', '_', '~'
The specification does not mandate any particular character encoding.
Percent-encoding
used to represent a data octet in a component when that octet's corresponding character is outside the allowed set or is being used as a delimiter of, or within, the component.
- A percent-encoded octet is encoded as a character triplet %hh, consisting of the '%' character followed by the two hexadecimal digits representing that octet's numeric value.
- For example, %20 is the percent-encoding the space character.
- Both the uppercase ('A', ..., 'F') and the lowercase ('a', ...,'f') hexadecimal digits can be used.
- If two URIs differ only in the case of hexadecimal digits used in percent-encoded octets, they are equivalent.
URI Syntax
Syntax is organized hierarchically.
- Components listed in order of decreasing significance from left to right.
Generic syntax
scheme ':' hier-part ['?' query] ['#' fragment]
- The hier-part component may consist of an authority and a path component, its syntax is:
- '//' authority path or path
- When authority is present, the path must either be empty or begin with a '/' character.
- When authority is not present, the path cannot begin with two '/' characters.
Path
A sequence of path segments separated by a '/' character. Terminated by the first '?' or '#', or by the end of the URI. The path segments '.' and '..' can be used just as in some operating systems' file directory structures.
Query
Indicated by the first '?' character and terminated by a '#' character or by the end of the URI. Contains non-hierarchical data. Often contains name/value pairs of the form name '=' value delimited by an '&' character.
- In the case of the http and https URI schemes the query component is used for submitting form data (see the application/x-www-form-urlencoded format).
- Example:
- See: HTML Standard – URL-encoded form data
Fragment Identifier
Indicated by a '#' character and terminated by the end of the URI. Allows indirect identification of a secondary resource by reference to a primary resource and additional identifying information.
- The identified secondary resource may be some portion or subset of the primary resource, some view on representations of the primary resource, or some other resource defined or described by those representations.
The semantics of a fragment identifier are defined by the set of representations that might result from a retrieval action on the primary resource.
- Media types may also define their own restrictions on or structures within the fragment identifier syntax.
The fragment identifier is separated from the rest of the URI prior to a dereference.
URI scheme specifications must define their own syntax so that all strings matching their scheme-specific syntax must be an absolute URI without a fragment identifier.
- Scheme specifications will not define fragment identifier syntax or usage, regardless of its applicability to resources identifiable via that scheme, as fragment identification is orthogonal to scheme definition.
Meaning of the Fragment Identifier
text/html media type:
- Fragment identifiers either refer to the indicated part of the document or provide state information for in-page scripts. iana.org/assignments/media-types/text/html
- Detailed processing for fragment identifiers is defined in the HTML5 specification.
- See: Navigating to a fragment html.spec.whatwg.org/multipage/browsing-the..
- For example, the fragment identifier in the w3.org/blog/news/#w3c_footer URI refers to the element with id="w3c_footer".
- For example, the fragment identifier in the youtube.com/watch?v=w0ffwDYo00Q#t=77 URI indicates the position from which playback will be started (at the 77th second).
- application/xml, text/xml media types:
- The latter includes, for example, the following media types: application/xhtml+xml, image/svg+xml, model/x3d+xml
- The syntax and semantics of fragment identifiers is based on the XPointer Framework specification. iana.org/assignments/media-types/text/xml
- XPointer Framework (W3C Recommendation, 25 March 2003) w3.org/TR/xptr-framework
- For example, the fragment identifier in the w3.org/TR/xml/#sec-bibliography URI refers to the element with identifier sec-bibliography in the document.
Absolute URI, URI-reference, relative reference
Absolute URI: a URI without a fragment identifier.
- Only absolute URIs can be used as a base URI.
URI-reference: a URI or a relative reference.
Relative reference: a scheme-specific subpart of a URI or a suffix of it (can be empty).
- The specification does not use the term “relative URI” at all!
- URIs are interpreted consistently regardless of context, relative references are interpreted in a context.
- Relative references are resolved to a URI against a base URI. The resulting URI is also known as the target URI.
- The specification describes an algorithm for resolving relative references.
URI-reference Examples
- gnu.org/licenses/licenses.html
- w3.org/TR/xml/#abstract
- en.wikipedia.org/wiki/The_Beatles#History
- /pub/linux/kernel/v3.x/testing/
- ../../images/bullet.png
- index.html#contents
- contacts.xml#element(/1/2)
- #nav
- gpl.html
- < empty string>
URI Comparison
The scheme and host components are case-insensitive. The other syntax components are assumed to be case-sensitive unless specifically defined otherwise by the scheme. For example, the w3.org and W3.org URIs are equivalent.
A possible definition of equivalence:
- URIs should be considered equivalent when they identify the same resource.
- This definition is not of much practical use, because in general there is no way to compare two resources.
In practice, equivalence is determined by string comparison.
- Normalization is applied before comparison, for example, uppercase letters are converted to lowercase letters in case-insensitive components.
Relative Reference Resolution Examples
Let example/a/b/c?q be the base URI
Let example/a/b/c?q be the base URI
Example:
<!DOCTYPE html>
<html lang="en">
<head>
<title>Example</title>
<base href="http://example/docs/howto/">
<link rel="stylesheet" type="text/css" href="theme.css">
</head>
<body>
<a href="/about">
<img src="../images/logo.png" alt="Logo">
</a>
</body>
</html>
Resolution of the relative references:
- theme.css => example/docs/howto/theme.css
- /about => example/about
- ../images/logo.png => example/docs/images/logo.png
Conclusion
Well, there you have it! With the help of Uniform Resource Identifier (URI), you can locate any digital resource on the internet with ease. So, if you're ever in need of a quick answer to your online questions, just remember to #URIit! And don't forget to follow me for more fun and informative blogs!