Saturday, 21 February 2015

Post 32: Difference between URI, URL, and URN

An URI (Uniform Resource Identifier) identifies a resource on the Internet either by location, or by name, or both. There are two subsets of URIs: The URL (Uniform Resource Locator) and the URN (Uniform Resource Name).

The URL specifies where an identified resource is located and the mechanism for retrieving it. In case of an HTTP URL it is the HTTP protocol. But the machanism for retrieving the resource, doesn't have to be HTTP URL, i.e. “http://”, an URL can also be “ftp://” (File Transfer Protocol for computer files) or “smb://” (Server Message Block for shared access to files, printers, serial ports, and miscellaneous communications between nodes on a network).

The URN is part of a larger Internet information architecture and unambigously identifies a resource. Contrary to the URL it does not imply availability of the identified resource.

The URL is similar to a person's address, as it defines somethings's location, while the URN is similar to the ISBN of a book, as it unambigosly defines something's identity.  In other words: The URL answers the question Where something is while the URN answers the question Who something is.

Source(s) and for more information:
RFIC3986: http://www.ietf.org/rfc/rfc3986.txt
http://en.wikipedia.org/wiki/File_Transfer_Protocol
http://en.wikipedia.org/wiki/Server_Message_Block

Sunday, 15 February 2015

Post 31: Safe and Un-Safe HTTP methods

When the HTTP specification talks about “safe” HTTP methods it means that no resource should be destroyed. The GET method is a safe HTTP method because it doesn't change the resource. It only retrieves it and that's it! On the other hand a POST method is an unsafe HTTP method because it changes the resource on the server, e. g. updates an account, submits an order, etc.

This browser behaves differently when executing GET or POST methods. For example you can easily refresh a webpage that was retrieved via a GET method, because the browser would just render the same HTTP response that he would get from the server just as before. However if you want to refresh a webpage that was retrieved via a POST method, then we would get a warning.

Therefore web applications try to show the user a GET retrieved webpage only by following the so called POST/Redirect/GET pattern:

If the user clicks a button to POST a request (e. g. submitting an order), then this request will be sent to the server. The server will then reply with an HTTP redirect, meaning the server tells the browser to GET another resource at a specifed location. The browser will follow this command and will GET the resource. The server will then reply with for example “thank you for the order” resource. By doing so the user can now refresh the webpage, since it is now a resource that was GET instead of POST.

Source(s):
HTTP Succinctly by Scott Allen Syncfusion
Wikipedia

Post 30: HTTP Request Methods

HTTP requests is done by a client in order to retrieve a resource from a server. Every HTTP request must contain one HTTP method (sometimes also called “verbs”). In a previous post you got to know one HTTP Request Method already: The GET method. Get, fetch, and retrieve a resource that the server has stored.

If you want to retrieve an image you could type in
GET /image.png

If you want to retrieve a PDF file you type in

GET /documents/report.pdf

and so on. Of course there are more than just the GET method. But the vast majority of HTTP requests use the GET method. Other HTTP request methods are shown below:


Method Description
GET Retrieve a resource (and should have no other effect).
HEAD Retrieve the headers for a resource. It basically does exactly what the GET method does but only wthout retrieving the whole body content. You'd use the HEAD request if only want to retrieve meta data of a resource.
POST Update a resource on the server, e. g. an item added to a database, a new message to a bulletin board, an annotation to an existing resource.
PUT Store a resource on the server in the supplied URI. If the URI already identifies an exisiting resource, then that resource is modified. If the URI doesn't point to an existing resource, then the server creates a new resource associated with that URI.
DELETE Remove a resource
TRACE The server returns the HTTP request message text back to the client, so the client can see what changes have been made to the HTTP request by intermediate servers.
OPTIONS Returns HTTP methods that the server supports for the specified URL. This can be used to check the web server's function.


Source(s): 

HTTP Succinctly by Scott Allen Syncfusion
Wikipedia

Post 29: HTTP request response protocol – The very basics

This article was also published on my blog here:
https://pandaquests.medium.com/how-does-http-work-5ed39f5fd18f

HTTP is a request response protocol. Meaning the client sends the request and the Server replies with a response. Request and response are both carefully formatted messages that the other can understand. Both messages are different message types. They are exchanged in a single HTTP transaction. These messages are in ASCII text and formatted according to the HTTP standard so client and server know how to interpret the content correctly.

Any application that is able to open a network connection to a server machine and is able send data over the network can make an HTTP request. Even you can try this. Just type in manually the HTTP request by using Telnet from the command line. Heads up: A normal telnet session connects over port 23. As it was mentioned before, in order to connect to a server via HTTP we have to use port 80.

In the following example we will use Telnet in order to
- connect to a server
- make an HTTP request
- receive an HTTP response

telnet www.google.com 80

This command tells the computer to connect to a server with the host name “www.google.com” on port 80. After the connection is established you can write the HTTP request message:

GET / HTTP/1.1

The “GET” part tells the server we want to retrieve a resource.
The “/” tells the server that the resource we want to retrieve is located at the root resource of the home page.
“HTTP/1.1” tells the server we are using the HTTP 1.1 protocol to speak to him.

Next you type this line:

host: www.google.com

That line specifies the requested resource on the server, because one server could host multiple websites.

After you type in this and pressed ENTER twice, you should see the HTTP response. The response is plain and simple HTML code. If the code would send to a browser then he would take the code render it into a website.

Source(s):
HTTP Succinctly by Scott Allen Syncfusion
Wikipedia

Post 28: Content Type Negotiation on the Internet

HTTP (Hyper Text Transfer Protocol) describes a flexible, generic protocol for moving high-fidelity information on the internet. In order to do this all participants (clients/browser and servers) have to know how to interpret the information correctly. The media type that is passed around is not fixed but can be negotiated by the participants. Meaning a resource identified by one and the same URL can have multiple representations.

For example: One webpage can be displayed in different languages dependent on which country the user is surfing. Or the same content can also be displayed in PDF, HTML, or plain text depending on the media type the browser is willing to accept or prefer.

When the browser sends an HTTP request of an URL to a host,  it specifies what media type it is willing to accept. It's up to the server to satisfy the browser's request. For example if the browser asks for a PDF file but the server only has got a plain text file, then the server will send the plain text file even though the browser asked for another filetype simply because the server has got only this file type. However if the server has got a PDF file and a plain text file and the browser happens to accept PDF files above all other files, then the server will send the PDF file instead of the plain text file.

Source(s):
HTTP Succinctly by Scott Allen Syncfusion
Wikipedia

Saturday, 14 February 2015

Post 27: Resources, Media Types, and Content Labeling

On the Internet you'll encounter lots of different resources, e. g. text, image, audio, video, etc. In order for the server to host and the client to display the resource correctly both have to be specific about the type of the resource. In order for the browser to correctly display these content for the user the resource has to be labeled accordingly.
̉
When a host responds to a HTTP request it returns the resource and specifies it's content type (also called media type). For content type specification HTTP uses the MIME (Multipurpose Internet Mail Extension) standard. The specification is done by labeling the resource so the client knows the resource's content. (As a site note: You may wonder why MIME stands for “Multipurpose Internet Mail Extension”. The reason is MIME was originally used for email communication but as it turned out to be very useful for labeling content types as well.)

Here are a few examples for HTTP labels:
- webpages are labeled with “text/html”; “text” is primary media type and “html” is media subtype.
- jpeg images are labeled with “image/jpeg”;
- png images are labeled with “image/png”;
- etc.

These content types are standard MIME types and will appear in the HTTP responses.

When the browser is requesting a resource it must know what type of resource to display. To get this information the browser will look at these location one after the other. If he doesn't find it in one of the location he will search in the next one:
1st: content type specified by the host in the HTTP response message
2nd: scan the first 200 bytes of the HTTP response message and trying to “guess” the content type
3rd: read the file extension

Here I explained to you the specification of the content media type for the client to correctly represent or display the resource. In the next post I'll explain that the client also has a say what kind of resources the server is sending to him.

Source(s):
HTTP Succinctly by Scott Allen Syncfusion
Wikipedia

Thursday, 12 February 2015

Post 26: URL encoding

URL was designed to make it as usable and interoperable as possible. Therefore the internet standard defines so called “unsafe characters”.

Examples for unsafe characters are:
The space “ ”, because they seem to disappear when printed or you don't know how man space characters are there.
The pond/sharp character “#”, because it is reserved for the fragment (we covered what a "fragment" is here already).
The caret “^”, because not all network devices transmit this character correctly.

What is considered a safe and what an unsafe character is defined in the RFC 3986. RFC stands for Request for Comments. It's a recommendation made by the IETF (Internet Engineering Task Force). Even though it is officially a recommendation only it is considered a de facto standard.

The RFC 3986 defines safe characters as alpha numeric characters in the US-ASCII and a few special characters like the colon “:” and the slash mark “/”.

If you want to transmit one of these unsafe characters, then you have to “percent-encode” or also called “URL encode” them. For example if you want to store on the server foo.com the file “^hello world.txt”, then the valid URL would look like: “http://foo.com/%5Ehello%20world.txt”

As you can see the caret “^” and the space “ ” have been replaced with “%5E” resp. “%20”. The characters after the percent characters “%” represent the corresponding hexadecimal number in the US-ASCII charachter table, i.e. “5E” and “20” are stands for “^” resp. “ ” in the US-ASCII table.

The full US-ASCII table can be found here.

Source(s):
HTTP Succinctly by Scott Allen Syncfusion
Wikipedia

Tweet