Uniform Resource Identifier (URI)

Uniform Resource Identifier (URI)

Welcome to the geekiest corner of the web! If you're here, you're probably familiar with Uniform Resource Identifier (URI) - and if you're not, don't worry! You'll be an expert by the end of this blog. But first, let's lighten up the mood with some URI humor. Why did the URI go to the doctor? To get a URIscope!

What is URI?

A compact sequence of characters that identifies an abstract or physical resource.

  • A resource is not necessarily available on the Web.
  • URIs can be assigned even to objects from the real world or to concepts.

image

Current standard:

Tim Berners-Lee, Roy Fielding, Larry Masinter, Uniform Resource Identifier (URI): Generic Syntax, RFC 3986, January 2005. rfc-editor.org/rfc/rfc3986

Each URI begins with a scheme name that is separated by a ' : ' character from the scheme-specific part of the URI.

  • Scheme specifications can define their scheme-specific syntax within certain limits.

image

The organization responsible for the administration of the URI schemes:

Internet Assigned Numbers Authority (IANA) iana.org

Well-Known URI Schemes
file

Matthew Kerwin, The "file" URI Scheme, RFC 8089, February 2017. rfc-editor.org/rfc/rfc8089

http/https

Roy T. Fielding (ed.), Mark Nottingham (ed.), Julian F. Reschke (ed.), HTTP Semantics, RFC 9110, June 2022. rfc-editor.org/rfc/rfc9110

image

mailto

Martin Dürst, Larry Masinter, Jamie Zawinski, The 'mailto' URI Scheme, RFC 6068, October 2010. rfc-editor.org/rfc/rfc6068

about

S. Moonesamy (ed.), The “about” URI Scheme, RFC 6694, August 2012. rfc-editor.org/rfc/rfc6694

URI Characters

Characters allowed in URIs

The following are reserved characters:

  • ':', '/', '?', '#', '[', ']', '@', '!', '$', '&', ''', '(', ')', '*', '+', ',', ';', '='
    • Characters used as delimiters.

The following are unreserved characters:

  • 'A', ..., 'Z', 'a', ..., 'z'
  • '0', ..., '9'
  • '-', '.', '_', '~'

The specification does not mandate any particular character encoding.

Percent-encoding

used to represent a data octet in a component when that octet's corresponding character is outside the allowed set or is being used as a delimiter of, or within, the component.

  • A percent-encoded octet is encoded as a character triplet %hh, consisting of the '%' character followed by the two hexadecimal digits representing that octet's numeric value.
  • For example, %20 is the percent-encoding the space character.
  • Both the uppercase ('A', ..., 'F') and the lowercase ('a', ...,'f') hexadecimal digits can be used.
  • If two URIs differ only in the case of hexadecimal digits used in percent-encoded octets, they are equivalent.

URI Syntax

Syntax is organized hierarchically.

  • Components listed in order of decreasing significance from left to right.

image

Generic syntax

scheme ':' hier-part ['?' query] ['#' fragment]

  • The hier-part component may consist of an authority and a path component, its syntax is:
    • '//' authority path or path
    • When authority is present, the path must either be empty or begin with a '/' character.
    • When authority is not present, the path cannot begin with two '/' characters.
Path

A sequence of path segments separated by a '/' character. Terminated by the first '?' or '#', or by the end of the URI. The path segments '.' and '..' can be used just as in some operating systems' file directory structures.

Query

Indicated by the first '?' character and terminated by a '#' character or by the end of the URI. Contains non-hierarchical data. Often contains name/value pairs of the form name '=' value delimited by an '&' character.

Fragment Identifier

Indicated by a '#' character and terminated by the end of the URI. Allows indirect identification of a secondary resource by reference to a primary resource and additional identifying information.

  • The identified secondary resource may be some portion or subset of the primary resource, some view on representations of the primary resource, or some other resource defined or described by those representations.

The semantics of a fragment identifier are defined by the set of representations that might result from a retrieval action on the primary resource.

  • Media types may also define their own restrictions on or structures within the fragment identifier syntax.

The fragment identifier is separated from the rest of the URI prior to a dereference.

URI scheme specifications must define their own syntax so that all strings matching their scheme-specific syntax must be an absolute URI without a fragment identifier.

  • Scheme specifications will not define fragment identifier syntax or usage, regardless of its applicability to resources identifiable via that scheme, as fragment identification is orthogonal to scheme definition.
Meaning of the Fragment Identifier

text/html media type:

Absolute URI, URI-reference, relative reference

Absolute URI: a URI without a fragment identifier.

  • Only absolute URIs can be used as a base URI.

URI-reference: a URI or a relative reference.

Relative reference: a scheme-specific subpart of a URI or a suffix of it (can be empty).

  • The specification does not use the term “relative URI” at all!
  • URIs are interpreted consistently regardless of context, relative references are interpreted in a context.
  • Relative references are resolved to a URI against a base URI. The resulting URI is also known as the target URI.
  • The specification describes an algorithm for resolving relative references.
URI-reference Examples
URI Comparison

The scheme and host components are case-insensitive. The other syntax components are assumed to be case-sensitive unless specifically defined otherwise by the scheme. For example, the w3.org and W3.org URIs are equivalent.

A possible definition of equivalence:

  • URIs should be considered equivalent when they identify the same resource.
  • This definition is not of much practical use, because in general there is no way to compare two resources.

In practice, equivalence is determined by string comparison.

  • Normalization is applied before comparison, for example, uppercase letters are converted to lowercase letters in case-insensitive components.
Relative Reference Resolution Examples

Example:

<!DOCTYPE html>
<html lang="en">
  <head>
    <title>Example</title>
    <base href="http://example/docs/howto/">
    <link rel="stylesheet" type="text/css" href="theme.css">
  </head>
  <body>
    <a href="/about">
      <img src="../images/logo.png" alt="Logo">
    </a>
  </body>
</html>

Resolution of the relative references:

Conclusion

Well, there you have it! With the help of Uniform Resource Identifier (URI), you can locate any digital resource on the internet with ease. So, if you're ever in need of a quick answer to your online questions, just remember to #URIit! And don't forget to follow me for more fun and informative blogs!

Did you find this article valuable?

Support Mojtaba Maleki by becoming a sponsor. Any amount is appreciated!