This post was also published here on my blog:
https://pandaquests.medium.com/url-components-and-its-meaning-ececbaf018ba
When browsing
through the web you inevitably come across the address bar in your
browser and the address in it. An example is this:
The address that you
saw above is called URL (Unified Resource Locator). With the URL you
can access certain resources on the internet. It's generalized
structure looks like this:
<url-scheme>://<host-name>:<port-number>/<url-path>?<query-string>#<fragment>
In this post I'll
explain each components and it's task.
First you have the
URL scheme (in our example “http”). It describes how
to access the resource at hand. Here it tells the browser to use the
hyper text transfer protocol. Everything after :// will be specific
to the protocol, e.g. ftp (file transfer protocol), mailto (for email
address), etc. In this case we will concentrate on http.
After that comes the
host name (in our example “foo.com”). The host is a
server where the resource is saved. The browser will use the DNS
(Domain Name Service) to “translate” the host name into an
address that the browser can understand: The network address. Knowing
the network address the browser is able to send the request for the
resource.
In rare cases you
can see the port-number after the host-name. The
default port number is 80. Usually the default port number is omitted
from the URL. It is only needed if the server is listening to a port
number other than the default one. You usually only need to specify a
different from the default port number when testing, debugging, or in
development environment.
Next comes the URL
path (in our example: “/some-link/”). It is the address
within the host that directs to a specific resource on the host.
These resources that
the URL path points to can be static, e. g. the resource can be a
file (e. g. “/document.pdf” ), a picture (e. g. “/photo.jpg”),
music file (e. g. “/music.mp3”), etc. However resources can also
be dynamic. An URL path that points to /another-resource does not
refer to a real file on the host server foo.com. You can see that
because it has no file ending as in the static examples earlier. In
this case an application is running on the host server that takes the
request and build dynamically the resource using a content from a
database. (That application is written in a web technology that is
able to respond to incoming requests by creating HTML for the browser
to display. That web technology can be ASP.NET, PHP, Perl, Ruby on
Rails, etc.)
The section that
comes after the “?” is called query (or “query
string”). It contains information for the requested
website to use or interpret. How this query is specified is totally
up to the application. But usually the string that is passed is a
name-value pair, e. g.:
http://foo.com?name1=value1&name2=value2
or
http://searchengine.com?q=what+i+am+searching
In this case the
search engine searches for the name q and uses it's value for a
search query.
Everything that
comes after the “#” is called a fragment. Unlike the URL
scheme, host name, URL path, port, or query string the fragment is
not sent to the server. Instead it is only used by the client
to identify a certain section within the resource. Typically web
browsers will render the website so that the user will see the top
element of a webpage on the top of the screen. With fragments you can
specify that a certain section will be displayed at the top of the
screen instead.
This is used to draw
attention to a specific section of the element. You can see it often
when people post links to wikipedia articles where they point to a
specific part of the article only. The client (browser)
will make sure that the section identified by the fragment will be
displayed at the top of the screen.
Source:
HTTP Succinctly by Scott Allen Syncfusion
RFC3986 http://www.ietf.org/rfc/rfc3986.txt
HTTP Succinctly by Scott Allen Syncfusion
RFC3986 http://www.ietf.org/rfc/rfc3986.txt
No comments:
Post a Comment