Tallaght Campus

Department of Computing

Hypertext Transfer Protocol (HTTP)
  1. HTTP Overview

      ____________
     /   //||\\   \
    /   ////\\\\   \
    \   \\\\////   /   
     \___\\||//___/
    				

    HTTP Overview

    • Hypertext Transfer Protocol (HTTP) is the protocol used for communication on the World Wide Web (WWW)
    • It is at layer 7 of the OSI RM, above the Internet's transport protocol, TCP
      Open Systems Interconnection Reference Model (OSI RM), source unknown
    • HTTP operates within a client-server model and involves two types of messages:
      • request - sent by the client (e.g. browser) and received by web server
      • response - sent by the server in reply to the client's request message
    • The request-response pair of messages constitutes an HTTP session, which takes place over a TCP connection
      HTTP Session (by E. Lee)
      • Early implementations of HTTP opened a new TCP connection for each HTTP session, which had negative effects
        • on responsiveness, owing to the TCP 3-way handshake and slow-start congestion control approach
        • on server resource usage, owing to the multiple TCP connections opened for an HTTP page (if it referred to other files)
      • To improve this situation, starting with HTTP version 1.1 persistent connections were introduced, allowing
        • TCP connections to be kept open for multiple HTTP sessions
        • pipelining (asynchronous sending of requests i.e. without waiting for responses)
        • chunked response (sending of responses in parts)
    • HTTP message content is entirely text based - media that are normally in binary format are transferred using Multi-purpose Internet Media Extensions (MIME)
      More about MIME
      • Multi-purpose Internet Media Extensions (MIME) is used with HTTP (on the WWW) and with SMTP (email)
      • It was originally devised for the encoding of email attachments
      • The MIME type is declared (in HTTP in the header) to inform the recipient of the original file format
      • MIME uses base64 encoding, whereby three bytes (3 x 8 = 24 bits) are encoded as 4 text characters (represented with 6 bits each)
        Base64 encoding example (by E.Lee)
    • HTTP communiation is stateless
      • what does this mean?
        • the server does not have to maintain any information about communication sessions with clients
        • the server receives a request, processes it, sends a response and then forgets all about it
      • advantages of stateless communication:
        • a lot simpler to implement
        • does not leave the system in an inconsistent state that needs to be dealt with, in case of a crash
      • disadvantages of stateless communication:
        • applications where the server needs to 'remember' something, e.g. those including 'shopping carts', cannot be implemented
        • the use of authentication is impractical (user would have to log in with every HTTP request!)
      • solutions that do not violate statelessness:
        • cookies
        • URL re-writing
        • invisible form fields
  2. HTTP Request and Response

     ----req---->
     <---res-----
    				

    HTTP Request and Response

    While studying this section, make sure to open the developer tools in your browser (key F12) and see for yourself some real examples of HTTP requests and responses.


    Structure of an HTTP Request

    Parts of an HTTP request (by E. Lee)
    1. The request line includes
      • the method, which can be (see Method Definitions in the standard):
        GET
        Request for a document on a server
        More about GET
        • If sending form data using this method, the browser appends it to the URL
        • Maximal length of such a URL is 4000 characters
        • Server will respond with error 413, Entity Too Large, if the length is greater than that

        HEAD
        Retrieve only the header fields but not the document itself (purpose is to check links, test for page modifications and page size)
        POST
        Send information to the server (most often information collected from a form)
        More about POST
        • Data is sent in the request body (not as part of the URL)
        • The amount of data that can be sent in a POST request is unlimited
        • Three different content types are supported (type used is specified in a header field)
          • application/x-www-form-urlencoded - key-value pairs, as in GET

            Request header:
            POST www.xyz.com/abc.php Http/1.1
            Content-Type: application/x-www-form-urlencoded

            Request payload (content):
            hl=en&as_q=url+length
          • multipart/form-data - binary data can be encoded


            In request header:
            Content-Type: multipart/form-data;
            boundary=---------------------------21387795301410495893260618559

            Request payload (content):
            -----------------------------21387795301410495893260618559
            Content-Disposition: form-data; name="syx_sid"
            
            I6QlewQNnbwgrzJYhdUWBUS3ta40O3
            -----------------------------21387795301410495893260618559
            Content-Disposition: form-data; name="syx_sov"
            
            jfgdL4
            -----------------------------21387795301410495893260618559
            Content-Disposition: form-data; name="syx_efi"
            
            
            -----------------------------21387795301410495893260618559
            Content-Disposition: form-data; name="u_69x_4607"
            
            Jelena
            -----------------------------21387795301410495893260618559
            Content-Disposition: form-data; name="u_69x_338354"
            
            Vasic
            -----------------------------21387795301410495893260618559
            Content-Disposition: form-data; name="u_69x_4608"
            
            abc@xyz.com
            -----------------------------21387795301410495893260618559
            Content-Disposition: form-data; name="u_69x_338367"
            
            Website Feedback
            -----------------------------21387795301410495893260618559
            Content-Disposition: form-data; name="u_69x_4609"
            
            Great!
            -----------------------------21387795301410495893260618559
            Content-Disposition: form-data; name="submit"
            
            Continue →
            -----------------------------21387795301410495893260618559--
          • text/plain
        PUT
        Upload resource to specified URI
        DELETE
        Delete resource with specified URI
        TRACE
        Echo contents of request header
      • the URL
      • the HTTP version
    2. Header fields (see Header Field Definitions in the standard), for example:
      User Agent
      client software, e.g. browser
      Referer
      URL that caused the page to be requested
      Accept
      document types, encodings, languages and character sets that the client can handle
      Authorization
      credentials for access to resource (especially after a 401 response)
      If-Modified-Since
      for use with caching (response 304 i.e. 'not modified' is returned if not modified since date)
      Host
      specified in the case of collocated hosts
      Connection
      keep alive for further use or close

    Structure of an HTTP response

    Parts of an HTTP response (by E. Lee)
    1. The status line includes
      • the HTTP version
      • the status code (see Status Code and Reason Phrase), classified as follows:
        • 1xx: informational, request received
        • 2xx: success, request accepted

          200 OK (request succeeded, requested object appears further in the response message)
        • 3xx: redirection

          301 Moved Permanently (requested object moved, new location specified later in this message with Location:...)
        • 4xx: client error

          401 Bad Request (request message not understood by server)
          404 Not Found (requested document not found on this server)
        • 5xx: server error

          505 HTTP Version Not Supported
      • the reason phrase (a short message in English explaining the status code)
    2. Header fields (see Header Field Definitions in the standard), for example:
      Location
      redirection
      Server
      server software
      WWW-Authenticate
      request for authentication
      Allow
      list of supported methods (GET, HEAD etc.), must be present in a 405 Method Not Allowed response
      Content-Encoding
      tells the requestor how the response content is encoded, for example gzip
      Content-Length
      decimal number representing the number of octets in the body of the response
      Content-Type
      the media type of the response content, for example text/html; charset=ISO-8859-4
      Expires
      date and time after which the response is considered stale (for caching purposes)
  3. HTTP History

    :-(   ???   :-)   !!!   t
    -+-----+-----+-----+----->
    				

    HTTP History

    early
    1990s
    HTTP 0.9 (numbered retrospectively)
    • one line protocol, with a single keyword, GET
    • fulfilled Tim Berners Lee's vision for a simple protocol that would help with the adoption of the WWW
    late
    1990s
    More complexity
    • HTTP 1.0 (RFC 1945 in 1996) - informational, not formal specification
      • headers for request and response introduced to carry metadata
      • status (success, error etc.) included in response
      • response content not limited to HTML, for example could be a JPG image
      • multi-part types are supported
      • the TCP connection between client and server closes after every request-response pair of messages is exchanged
    • HTTP 1.1 (RFC 2068 in 1997 and RFC 2616 in 1999) - formal specification i.e. standard
      • keep-alive feature: connection can be reused for multiple request-response pairs of messages
      • pipelining: multiple requests can be sent before responses are received
      • chunked response support
      • content negotiation, including language, encoding and content type
      • the Host header line allows for collocation of domains on a single IP address
    • other concepts
      • security was added with the Secure Sockets Layer (SSL) on top of TCP and eventually standardised as Transport Layer Security (TLS); HTTP operating over a security layer is referred to as HTTPS (formally specified in 2000)
      • the Web and HTTP in particular are based on the representational state transfer (REST) architectural style, defined by Roy Fielding in 2000; REST recommends a loosely coupled style that simultaneously allows the provision of rich content by the distributed system
      • WebSocket (standardised in 2011) is a protocol for full-duplex communication between client and server, different from HTTP; allows Javascript code on a webpage to communicate with the server in a lightweight manner, including 'server push'
    2010s
    HTTP 2 (RFC 7540 in 2015) - based on Google experimental protocol SPDY; implemented by about 30% of all websites (see graph on W3Techs)
    • binary protocol (cannot be created or read manually)
    • multiplexed (no restriction on number or order of requests of a single TCP connection
    • compressed headers
    • server push (send before client requests)
    • all of the above greatly improve responsiveness
  4. References