HTTP Fundamentals

HTTP Fundamentals

HTTP Fundamentals

Welcome to my blog about HTTP Fundamentals! Here, I will delve into the basics of Hypertext Transfer Protocol (HTTP) and how it impacts the way we use the internet. We'll cover the history of HTTP, how it works, and the various aspects of the protocol and its importance in today's digital world. I'll also discuss the different types of HTTP requests and responses, as well as common HTTP status codes. Additionally, I'll provide tips and best practices for those interested in developing applications that use HTTP. By the end of this blog, you will have a comprehensive understanding of HTTP and its various moving parts. So, let's get started!

Characteristics

Stateless protocol

each request message’s semantics can be understood in isolation.

Extensible

HTTP defines a number of generic extension points that can be used to introduce capabilities to the protocol without introducing a new version, including methods, status codes, fields.

Uniform interface

provides a uniform interface for interacting with resources by sending messages that manipulate or transfer representations.

Request/response protocol based on the client-server model

operates by exchanging messages between clients and servers.

http and https URI Schemes

HTTP defines the http and https URI schemes.

  • The origin server for an http or https URI is identified by the host identifier and the optional port number.
  • The path component and the optional query component identify a potential target resource within that origin server’s namespace.

The presence of an http or https URI does not imply that there is always an HTTP server at the identified origin server listening for connections. Resources made available via the https scheme have no shared identity with the http scheme.

http URI Scheme

The http URI scheme is defined for the purpose of identifying resources on a potential origin server listening for TCP connections on a given port.

URI syntax:

  • "http://" host [":" port] [path] ["?" query]
    • If the port subcomponent is not given or empty, TCP port 80 is the default.
https URI Scheme

The https URI scheme is defined for the purpose of identifying resources on a potential origin server listening for TCP connections on a given port and capable of establishing a TLS connection that has been secured for HTTP communication.

URI syntax:

  • "https://" host [":" port] [path] ["?" query]
    • If the port subcomponent is not given or empty, TCP port 443 is the default.
http and https URI comparison

An empty path component is equivalent to a path of " / ".

The scheme and host are case-insensitive and normally provided in lowercase. All other components are compared in a case-sensitive manner.

Characters other than those in the “reserved” set are equivalent to their percent-encoded octets. HTTP URIs that are equivalent can be assumed to identify the same resource.

Message Abstraction

RFC 9110 provides an abstraction over messages, according to which a message consists of the following:

  • Control data
  • Header section
  • Content
  • Trailer section

This message abstraction is a generalization across many versions of HTTP, including features that might not be found in some versions.

Messages are intended to be self-descriptive, i.e., everything a recipient needs to know about the message can be determined by looking at the (decoded) message itself.

Control Data

Messages start with control data that describe its primary purpose. Every HTTP message has a protocol version.

Header Section

Fields that are sent or received before the content are referred to as header fields (or just headers, colloquially). The header section of a message consists of a sequence of header field lines.

Content

HTTP messages can carry a complete or partial representation as the message content.

Content is transferred as a stream of octets after the header section.

It is in a format and encoding defined by the the header fields Content-Type and Content-Encoding.

Content semantics: The purpose of content in a request is defined by the method semantics.

For example, a representation in the content of a POST request represents information to be processed by the target resource.

In a response, the content’s purpose is defined by the request method, response status code, and response fields describing that content.

For example, the content of a 200 (OK) response to GET represents the current state of the target resource, as observed at the time of the message origination date.

Trailer Section

Fields that are sent or received after the content are referred to as trailer fields (or just trailers, colloquially). Trailer fields can be used to carry checksums, digital signatures, delivery metrics, or post-processing status information. The trailer section of a message consists of a sequence of trailer field lines. Trailer fields should be processed and stored separately from header fields.

Fields

HTTP uses fields to provide data in the form of name/value pairs.

Fields are used to carry:

  • message metadata both in requests and responses (e.g., Date)
  • representation metadata both in requests and responses (e.g., Content-Type),
  • client information in requests (e.g., User-Agent),
  • server information in responses (e.g., Server),
  • resource metadata in responses (e.g., Last-Modified)

Fields are sent and received within the header and trailer sections of messages.

A field sent in the header or trailer section of a message is a called a header or trailer field, respectively. Certain fields (such as, e.g., ETag) can occur either as a header or a trailer field.

Field names:

  • A field name is a sequence of one or more characters from a subset of the US-ASCII character set.
  • Field names are case-insensitive.

Field values:

  • A field value is a sequence of one or more visible US-ASCII characters, spaces, and horizontal tabs.
  • Leading and trailing whitespaces must be stripped before consuming.
  • Fields can be defined to carry either a single member or a comma-separated list of members.
  • Each field can constrain the set of allowed values.

Field sections: Field sections are composed of any number of field lines, each with a field name identifying the field, and a field line value that conveys data for that instance of the field.

When a field name is repeated within a section, its value consists of the list of corresponding field line values within that section, concatenated in order, with each field line value separated by a comma.

  • In practice, the Set-Cookie header field often appears in a response message across multiple field lines, however, Set-Cookie field line values of are not combined into a single field value.

The order in which field lines with differing field names occur in a section is not significant. HTTP specifications define many standard fields. Many other specifications define fields beyond the ones specified by HTTP. Unless specified otherwise, fields are defined for all versions of HTTP. IANA maintains the registry of HTTP fields.

The User-Agent Header Field

Contains information about the user agent originating the request.

Can be used for tailoring responses or analytics regarding browser or operating system use.

A user agent should send a User-Agent header field in each request.

The field value consists of one or more product identifiers, each followed by zero or more comments.

  • Product identifiers are listed in decreasing order of their significance.
  • Each product identifier consists of a name and optional version.
  • Comments are surrounded by parentheses.

Examples:

  • curl: curl/7.84.0
  • Firefox: User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0
  • Google Chrome: User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36
  • Chromium-based Edge: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.114 Safari/537.36 Edg/103.0.1264.51

HTTP methods: GET, HEAD, POST, PUT, DELETE

image

GET method

Requests transfer of a current selected representation for the target resource. The primary mechanism of information retrieval.

HEAD method

Identical to the GET method except that the server must not send content in the response.

Can be used to obtain metadata about the selected representation without transferring its representation data.

  • Example:
$ curl --head https://www.r-project.org/
HTTP/1.1 200 OK
Date: Tue, 13 Sep 2022 19:37:14 GMT
Server: Apache
Last-Modified: Thu, 23 Jun 2022 08:10:02 GMT
ETag: "18a3-5e218fb0442da"
Accept-Ranges: bytes
Content-Length: 6307
Vary: Accept-Encoding
Content-Type: text/html
Post method

Requests that the target resource process the representation enclosed in the request according to the resource’s own specific semantics.

Common uses include:

  • Submitting data (e.g., form data) to a data-handling process.
  • Posting a message to a newsgroup, mailing list, or blog.
  • Creating a new resource.
  • Appending data to a resource’s existing representation(s).
  • Example: httpbin.org
$ http --form https://httpbin.org/post number=42 text="Hello, World!" -v
POST /post HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Content-Length: 34
Content-Type: application/x-www-form-urlencoded; charset=utf-8
Host: httpbin.org
User-Agent: HTTPie/3.2.1
number=42&text=Hello%2C+World%21
PUT method

Requests that the state of the target resource be created or replaced with the state defined by the representation enclosed in the request message content.

A successful PUT of a given representation would suggest that a subsequent GET on that same target resource will result in an equivalent representation being sent in a 200 (OK) response.

$ echo "Hello, World!" | http PUT https://transfer.sh/hello.txt \
Content-Type:text/plain -v
PUT /hello.txt HTTP/1.1
Accept: application/json, */*;q=0.5
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Content-Length: 14
Content-Type: text/plain
Host: transfer.sh
User-Agent: HTTPie/3.2.1
Hello, World!
Difference between PUT and POST

The fundamental difference between the POST and PUT methods is highlighted by the different intent for the enclosed representation:

  • The target resource in a POST request is intended to handle the enclosed representation according to the resource’s own semantics.
  • The enclosed representation in a PUT request is defined as replacing the state of the target resource.
DELETE method

Requests that the origin server remove the association between the target resource and its current functionality.

If the target resource has one or more current representations, they might or might not be destroyed by the origin server, and the associated storage might or might not be reclaimed, depending entirely on the nature of the resource and its implementation by the origin server.

Relatively few resources allow the DELETE method.

$ http DELETE https://transfer.sh/g3sYJs/hello.txt/FU5Qoc5iam6a -v
DELETE /g3sYJs/hello.txt/FU5Qoc5iam6a HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Content-Length: 0
Host: transfer.sh
User-Agent: HTTPie/3.2.1
HTTP/1.1 200 OK
Connection: keep-alive
Content-Length: 0
Date: Fri, 16 Sep 2022 11:38:12 GMT
Server: nginx/1.18.0
Strict-Transport-Security: max-age=63072000
X-Made-With: <3 by DutchCoders
X-Served-By: Proudly served by DutchCoders
OPTIONS method

Requests information about the communication options available for the target resource, at either the origin server or an intervening intermediary.

* as the request-target applies to the server in general rather than to a specific resource.

In a successful response to an OPTIONS request a server should send any header fields that might indicate optional features implemented by the server and applicable to the target resource.

  • For example, the Allow header field lists the set of methods advertised as supported by the target resource.

Example:

curl -v --request OPTIONS https://apache.org/foundation/contact.html
> OPTIONS /foundation/contact.html HTTP/1.1
> Host: apache.org
> User-Agent: curl/7.83.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Connection: keep-alive
< Content-Length: 0
< Server: Apache
< Allow: GET,POST,OPTIONS,HEAD,TRACE
< Cache-Control: max-age=3600
< Expires: Thu, 23 Jun 2022 08:33:33 GMT
< Content-Type: text/html
< Accept-Ranges: bytes
< Via: 1.1 varnish, 1.1 varnish
< Date: Thu, 23 Jun 2022 07:33:33 GMT
< Vary: Accept-Encoding

Status Codes

The status code of a response is a three-digit integer code that describes the result of the request and the semantics of the response, including whether the request was successful and what content is enclosed (if any).

All valid status codes are within the range of 100 to 599, inclusive.

The first digit of the status code defines the class of response:

  • 1xx (Informational): indicates an interim response for communicating connection status or request progress prior to sending a final response.
  • 2xx (Successful): indicates that the request was successfully received, understood, and accepted.
  • 3xx (Redirection): indicates that further action needs to be taken by the user agent in order to fulfill the request that can be performed automatically by the user agent.
  • 4xx (Client Error): indicates that the request contains bad syntax or cannot be fulfilled.
  • 5xx (Server Error): indicates that the server failed to fulfill an apparently valid request.

A client is not required to understand the meaning of all registered status codes.

  • However, a client must understand the class of any status code, as indicated by the first digit.
  • An unrecognized status code must be treated as being equivalent to the x00 status code of that class.

Except when responding to a HEAD request in a response with a status code of 4xx or 5xx, the server should send a representation containing an explanation of the error situation, and whether it is a temporary or permanent condition.

Status codes are extensible.

IANA maintains the registry of status codes.

Redirection

Example:

$ curl -v http://w3.org/
> GET / HTTP/1.1
> Host: w3.org
> User-Agent: curl/7.83.1
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
< content-length: 0
< location: http://www.w3.org/
<

Example:

$ curl -v -L http://w3.org/
> Host: w3.org
> User-Agent: curl/7.83.1
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
< content-length: 0
< location: http://www.w3.org/
<
> GET / HTTP/1.1
> Host: www.w3.org
> User-Agent: curl/7.83.1
> Accept: */*
>
< HTTP/1.1 200 OK
< ...

Content Negotiation

When responses convey content, whether indicating a success or an error, the origin server often has different ways of representing that information; for example, in different formats, languages, or encodings.

Likewise, different users or user agents might have differing capabilities, characteristics, or preferences that could influence which representation, among those available, would be best to deliver.

For this reason, HTTP provides mechanisms for content negotiation.

Proactive negotiation

The server selects the representation based upon the user agent’s stated preferences.

It is also known as server-driven negotiation.

The origin server uses an algorithm to select the preferred representation based on the preferences of the user agent.

Selection is based on the available representations for a response compared to various information supplied in the request, including both the explicit negotiation header fields and implicit characteristics, such as the client’s network address or parts of the User-Agent field.

A Vary header field is often sent in a response subject to proactive negotiation to indicate what parts of the request information were used in the selection algorithm.

Advantageous:

  • When the algorithm for selecting from among the available representations is difficult to describe to a user agent, or
  • when the server desires to send its “best guess” to the user agent along with the first response, to avoid a subsequent request.

Disadvantages:

  • It is impossible for the server to accurately determine what might be “best” for any given user, since that would require complete knowledge of both the capabilities of the user agent and the intended use for the response.
  • Having the user agent describe its capabilities in every request can be both very inefficient and a potential risk to the user’s privacy.
  • Complicates the implementation of an origin server and the algorithms for generating responses to a request.
  • Limits the reusability of responses for shared caching.

Conclusion

In conclusion, HTTP is an important part of the web and is used to request and transport data between clients and servers. It is the foundation for the world wide web and allows for communication between computers across the internet. Understanding HTTP and its fundamentals is essential for any web developer, as it is the basis for a successful website or application.

Did you find this article valuable?

Support Mojtaba Maleki by becoming a sponsor. Any amount is appreciated!