Saturday 21 February 2015

Post 32: Difference between URI, URL, and URN

An URI (Uniform Resource Identifier) identifies a resource on the Internet either by location, or by name, or both. There are two subsets of URIs: The URL (Uniform Resource Locator) and the URN (Uniform Resource Name).

The URL specifies where an identified resource is located and the mechanism for retrieving it. In case of an HTTP URL it is the HTTP protocol. But the machanism for retrieving the resource, doesn't have to be HTTP URL, i.e. “http://”, an URL can also be “ftp://” (File Transfer Protocol for computer files) or “smb://” (Server Message Block for shared access to files, printers, serial ports, and miscellaneous communications between nodes on a network).

The URN is part of a larger Internet information architecture and unambigously identifies a resource. Contrary to the URL it does not imply availability of the identified resource.

The URL is similar to a person's address, as it defines somethings's location, while the URN is similar to the ISBN of a book, as it unambigosly defines something's identity.  In other words: The URL answers the question Where something is while the URN answers the question Who something is.

Source(s) and for more information:
RFIC3986: http://www.ietf.org/rfc/rfc3986.txt
http://en.wikipedia.org/wiki/File_Transfer_Protocol
http://en.wikipedia.org/wiki/Server_Message_Block

Sunday 15 February 2015

Post 31: Safe and Un-Safe HTTP methods

When the HTTP specification talks about “safe” HTTP methods it means that no resource should be destroyed. The GET method is a safe HTTP method because it doesn't change the resource. It only retrieves it and that's it! On the other hand a POST method is an unsafe HTTP method because it changes the resource on the server, e. g. updates an account, submits an order, etc.

This browser behaves differently when executing GET or POST methods. For example you can easily refresh a webpage that was retrieved via a GET method, because the browser would just render the same HTTP response that he would get from the server just as before. However if you want to refresh a webpage that was retrieved via a POST method, then we would get a warning.

Therefore web applications try to show the user a GET retrieved webpage only by following the so called POST/Redirect/GET pattern:

If the user clicks a button to POST a request (e. g. submitting an order), then this request will be sent to the server. The server will then reply with an HTTP redirect, meaning the server tells the browser to GET another resource at a specifed location. The browser will follow this command and will GET the resource. The server will then reply with for example “thank you for the order” resource. By doing so the user can now refresh the webpage, since it is now a resource that was GET instead of POST.

Source(s):
HTTP Succinctly by Scott Allen Syncfusion
Wikipedia

Post 30: HTTP Request Methods

HTTP requests is done by a client in order to retrieve a resource from a server. Every HTTP request must contain one HTTP method (sometimes also called “verbs”). In a previous post you got to know one HTTP Request Method already: The GET method. Get, fetch, and retrieve a resource that the server has stored.

If you want to retrieve an image you could type in
GET /image.png

If you want to retrieve a PDF file you type in

GET /documents/report.pdf

and so on. Of course there are more than just the GET method. But the vast majority of HTTP requests use the GET method. Other HTTP request methods are shown below:


Method Description
GET Retrieve a resource (and should have no other effect).
HEAD Retrieve the headers for a resource. It basically does exactly what the GET method does but only wthout retrieving the whole body content. You'd use the HEAD request if only want to retrieve meta data of a resource.
POST Update a resource on the server, e. g. an item added to a database, a new message to a bulletin board, an annotation to an existing resource.
PUT Store a resource on the server in the supplied URI. If the URI already identifies an exisiting resource, then that resource is modified. If the URI doesn't point to an existing resource, then the server creates a new resource associated with that URI.
DELETE Remove a resource
TRACE The server returns the HTTP request message text back to the client, so the client can see what changes have been made to the HTTP request by intermediate servers.
OPTIONS Returns HTTP methods that the server supports for the specified URL. This can be used to check the web server's function.


Source(s): 

HTTP Succinctly by Scott Allen Syncfusion
Wikipedia

Post 29: HTTP request response protocol – The very basics

This article was also published on my blog here:
https://pandaquests.medium.com/how-does-http-work-5ed39f5fd18f

HTTP is a request response protocol. Meaning the client sends the request and the Server replies with a response. Request and response are both carefully formatted messages that the other can understand. Both messages are different message types. They are exchanged in a single HTTP transaction. These messages are in ASCII text and formatted according to the HTTP standard so client and server know how to interpret the content correctly.

Any application that is able to open a network connection to a server machine and is able send data over the network can make an HTTP request. Even you can try this. Just type in manually the HTTP request by using Telnet from the command line. Heads up: A normal telnet session connects over port 23. As it was mentioned before, in order to connect to a server via HTTP we have to use port 80.

In the following example we will use Telnet in order to
- connect to a server
- make an HTTP request
- receive an HTTP response

telnet www.google.com 80

This command tells the computer to connect to a server with the host name “www.google.com” on port 80. After the connection is established you can write the HTTP request message:

GET / HTTP/1.1

The “GET” part tells the server we want to retrieve a resource.
The “/” tells the server that the resource we want to retrieve is located at the root resource of the home page.
“HTTP/1.1” tells the server we are using the HTTP 1.1 protocol to speak to him.

Next you type this line:

host: www.google.com

That line specifies the requested resource on the server, because one server could host multiple websites.

After you type in this and pressed ENTER twice, you should see the HTTP response. The response is plain and simple HTML code. If the code would send to a browser then he would take the code render it into a website.

Source(s):
HTTP Succinctly by Scott Allen Syncfusion
Wikipedia

Post 28: Content Type Negotiation on the Internet

HTTP (Hyper Text Transfer Protocol) describes a flexible, generic protocol for moving high-fidelity information on the internet. In order to do this all participants (clients/browser and servers) have to know how to interpret the information correctly. The media type that is passed around is not fixed but can be negotiated by the participants. Meaning a resource identified by one and the same URL can have multiple representations.

For example: One webpage can be displayed in different languages dependent on which country the user is surfing. Or the same content can also be displayed in PDF, HTML, or plain text depending on the media type the browser is willing to accept or prefer.

When the browser sends an HTTP request of an URL to a host,  it specifies what media type it is willing to accept. It's up to the server to satisfy the browser's request. For example if the browser asks for a PDF file but the server only has got a plain text file, then the server will send the plain text file even though the browser asked for another filetype simply because the server has got only this file type. However if the server has got a PDF file and a plain text file and the browser happens to accept PDF files above all other files, then the server will send the PDF file instead of the plain text file.

Source(s):
HTTP Succinctly by Scott Allen Syncfusion
Wikipedia

Saturday 14 February 2015

Post 27: Resources, Media Types, and Content Labeling

On the Internet you'll encounter lots of different resources, e. g. text, image, audio, video, etc. In order for the server to host and the client to display the resource correctly both have to be specific about the type of the resource. In order for the browser to correctly display these content for the user the resource has to be labeled accordingly.
̉
When a host responds to a HTTP request it returns the resource and specifies it's content type (also called media type). For content type specification HTTP uses the MIME (Multipurpose Internet Mail Extension) standard. The specification is done by labeling the resource so the client knows the resource's content. (As a site note: You may wonder why MIME stands for “Multipurpose Internet Mail Extension”. The reason is MIME was originally used for email communication but as it turned out to be very useful for labeling content types as well.)

Here are a few examples for HTTP labels:
- webpages are labeled with “text/html”; “text” is primary media type and “html” is media subtype.
- jpeg images are labeled with “image/jpeg”;
- png images are labeled with “image/png”;
- etc.

These content types are standard MIME types and will appear in the HTTP responses.

When the browser is requesting a resource it must know what type of resource to display. To get this information the browser will look at these location one after the other. If he doesn't find it in one of the location he will search in the next one:
1st: content type specified by the host in the HTTP response message
2nd: scan the first 200 bytes of the HTTP response message and trying to “guess” the content type
3rd: read the file extension

Here I explained to you the specification of the content media type for the client to correctly represent or display the resource. In the next post I'll explain that the client also has a say what kind of resources the server is sending to him.

Source(s):
HTTP Succinctly by Scott Allen Syncfusion
Wikipedia

Thursday 12 February 2015

Post 26: URL encoding

URL was designed to make it as usable and interoperable as possible. Therefore the internet standard defines so called “unsafe characters”.

Examples for unsafe characters are:
The space “ ”, because they seem to disappear when printed or you don't know how man space characters are there.
The pond/sharp character “#”, because it is reserved for the fragment (we covered what a "fragment" is here already).
The caret “^”, because not all network devices transmit this character correctly.

What is considered a safe and what an unsafe character is defined in the RFC 3986. RFC stands for Request for Comments. It's a recommendation made by the IETF (Internet Engineering Task Force). Even though it is officially a recommendation only it is considered a de facto standard.

The RFC 3986 defines safe characters as alpha numeric characters in the US-ASCII and a few special characters like the colon “:” and the slash mark “/”.

If you want to transmit one of these unsafe characters, then you have to “percent-encode” or also called “URL encode” them. For example if you want to store on the server foo.com the file “^hello world.txt”, then the valid URL would look like: “http://foo.com/%5Ehello%20world.txt”

As you can see the caret “^” and the space “ ” have been replaced with “%5E” resp. “%20”. The characters after the percent characters “%” represent the corresponding hexadecimal number in the US-ASCII charachter table, i.e. “5E” and “20” are stands for “^” resp. “ ” in the US-ASCII table.

The full US-ASCII table can be found here.

Source(s):
HTTP Succinctly by Scott Allen Syncfusion
Wikipedia

Thursday 5 February 2015

Post 25: URL components and it's meaning


This post was also published here on my blog:
https://pandaquests.medium.com/url-components-and-its-meaning-ececbaf018ba

When browsing through the web you inevitably come across the address bar in your browser and the address in it. An example is this:


The address that you saw above is called URL (Unified Resource Locator). With the URL you can access certain resources on the internet. It's generalized structure looks like this:

<url-scheme>://<host-name>:<port-number>/<url-path>?<query-string>#<fragment>

In this post I'll explain each components and it's task.

First you have the URL scheme (in our example “http”). It describes how to access the resource at hand. Here it tells the browser to use the hyper text transfer protocol. Everything after :// will be specific to the protocol, e.g. ftp (file transfer protocol), mailto (for email address), etc. In this case we will concentrate on http.

After that comes the host name (in our example “foo.com”). The host is a server where the resource is saved. The browser will use the DNS (Domain Name Service) to “translate” the host name into an address that the browser can understand: The network address. Knowing the network address the browser is able to send the request for the resource.

In rare cases you can see the port-number after the host-name. The default port number is 80. Usually the default port number is omitted from the URL. It is only needed if the server is listening to a port number other than the default one. You usually only need to specify a different from the default port number when testing, debugging, or in development environment.

Next comes the URL path (in our example: “/some-link/”). It is the address within the host that directs to a specific resource on the host.

These resources that the URL path points to can be static, e. g. the resource can be a file (e. g. “/document.pdf” ), a picture (e. g. “/photo.jpg”), music file (e. g. “/music.mp3”), etc. However resources can also be dynamic. An URL path that points to /another-resource does not refer to a real file on the host server foo.com. You can see that because it has no file ending as in the static examples earlier. In this case an application is running on the host server that takes the request and build dynamically the resource using a content from a database. (That application is written in a web technology that is able to respond to incoming requests by creating HTML for the browser to display. That web technology can be ASP.NET, PHP, Perl, Ruby on Rails, etc.)

The section that comes after the “?” is called query (or “query string”). It contains information for the requested website to use or interpret. How this query is specified is totally up to the application. But usually the string that is passed is a name-value pair, e. g.:

http://foo.com?name1=value1&name2=value2

or

http://searchengine.com?q=what+i+am+searching

In this case the search engine searches for the name q and uses it's value for a search query.

Everything that comes after the “#” is called a fragment. Unlike the URL scheme, host name, URL path, port, or query string the fragment is not sent to the server. Instead it is only used by the client to identify a certain section within the resource. Typically web browsers will render the website so that the user will see the top element of a webpage on the top of the screen. With fragments you can specify that a certain section will be displayed at the top of the screen instead.
This is used to draw attention to a specific section of the element. You can see it often when people post links to wikipedia articles where they point to a specific part of the article only. The client (browser) will make sure that the section identified by the fragment will be displayed at the top of the screen.

Source:
HTTP Succinctly by Scott Allen Syncfusion
RFC3986 http://www.ietf.org/rfc/rfc3986.txt
Tweet