Thursday, 5 February 2015

Post 25: URL components and it's meaning


This post was also published here on my blog:
https://pandaquests.medium.com/url-components-and-its-meaning-ececbaf018ba

When browsing through the web you inevitably come across the address bar in your browser and the address in it. An example is this:


The address that you saw above is called URL (Unified Resource Locator). With the URL you can access certain resources on the internet. It's generalized structure looks like this:

<url-scheme>://<host-name>:<port-number>/<url-path>?<query-string>#<fragment>

In this post I'll explain each components and it's task.

First you have the URL scheme (in our example “http”). It describes how to access the resource at hand. Here it tells the browser to use the hyper text transfer protocol. Everything after :// will be specific to the protocol, e.g. ftp (file transfer protocol), mailto (for email address), etc. In this case we will concentrate on http.

After that comes the host name (in our example “foo.com”). The host is a server where the resource is saved. The browser will use the DNS (Domain Name Service) to “translate” the host name into an address that the browser can understand: The network address. Knowing the network address the browser is able to send the request for the resource.

In rare cases you can see the port-number after the host-name. The default port number is 80. Usually the default port number is omitted from the URL. It is only needed if the server is listening to a port number other than the default one. You usually only need to specify a different from the default port number when testing, debugging, or in development environment.

Next comes the URL path (in our example: “/some-link/”). It is the address within the host that directs to a specific resource on the host.

These resources that the URL path points to can be static, e. g. the resource can be a file (e. g. “/document.pdf” ), a picture (e. g. “/photo.jpg”), music file (e. g. “/music.mp3”), etc. However resources can also be dynamic. An URL path that points to /another-resource does not refer to a real file on the host server foo.com. You can see that because it has no file ending as in the static examples earlier. In this case an application is running on the host server that takes the request and build dynamically the resource using a content from a database. (That application is written in a web technology that is able to respond to incoming requests by creating HTML for the browser to display. That web technology can be ASP.NET, PHP, Perl, Ruby on Rails, etc.)

The section that comes after the “?” is called query (or “query string”). It contains information for the requested website to use or interpret. How this query is specified is totally up to the application. But usually the string that is passed is a name-value pair, e. g.:

http://foo.com?name1=value1&name2=value2

or

http://searchengine.com?q=what+i+am+searching

In this case the search engine searches for the name q and uses it's value for a search query.

Everything that comes after the “#” is called a fragment. Unlike the URL scheme, host name, URL path, port, or query string the fragment is not sent to the server. Instead it is only used by the client to identify a certain section within the resource. Typically web browsers will render the website so that the user will see the top element of a webpage on the top of the screen. With fragments you can specify that a certain section will be displayed at the top of the screen instead.
This is used to draw attention to a specific section of the element. You can see it often when people post links to wikipedia articles where they point to a specific part of the article only. The client (browser) will make sure that the section identified by the fragment will be displayed at the top of the screen.

Source:
HTTP Succinctly by Scott Allen Syncfusion
RFC3986 http://www.ietf.org/rfc/rfc3986.txt

No comments:

Post a Comment

Tweet