How URLs Work on the Web
This tutorial explains how URLs work from a technical SEO perspective.
What Is a URL?
A Uniform Resource Locator, or URL, contains the address of a resource on the Web.
An example URL for the home page of Google’s search engine might look like this:
https://www.google.com/index.html
It says to use the HTTPS protocol to go to the www subdomain of the domain google.com and fetch the index.html file at the root directory (/) of the server.
Most servers are configured to automatically load index.html files when the directory that contains the file is requested, so requesting this:
https://www.google.com/
should produce the same result as requesting this:
https://www.google.com/index.html
A directory is represented by a forward slash, so the index.html file at the root of the server is located at the path /index.html.
Let’s take a closer look at the structure of URLs.
Anatomy of a URL
In its most basic form, a URL on the Web has three parts:
- a protocol — this is almost always
http://orhttps://. HTTPS is the encrypted form of HTTP. - a host (or domain) — examples:
example.com,www.google.com,webmail.example.com - a path — this is the part after the domain. Examples:
/(home page),/about,/widgets/green/
Here are some example URLs that use those three components:
http://example.com/https://www.google.com/abouthttps://mail.google.com/
Port Numbers
All URLs also have a port number after the domain name, but it’s hidden in normal requests because an HTTP site runs on port 80 by default and an HTTPS site runs on port 443 by default.
Those three URLs from above would look like this if you included the (unnecessary) port numbers:
http://example.com:80/(80because it’s HTTP)https://www.google.com:443/about(443because it’s HTTPS)https://mail.google.com:443/(443because it’s HTTPS)
When we get into Web development, we’ll use various port numbers to run Web servers on our local computers.
Parameters
URL parameters are key-value pairs of data that can appear in URLs.
The parameters are separated from the beginning of the URL by a question mark. Each parameter is separated from other parameters by an ampersand.
Here’s a typical URL that contains several URL parameters:
https://example.com/?utm_source=newsletter&utm_medium=email&utm_campaign=spring_sale&utm_id=123
If we remove the separator characters (? and &), it’s easier to see the key-value pairs:
utm_source = newsletter
utm_medium = email
utm_campaign = spring_sale
utm_id = 123
utm is related to the old name of Google Analytics, so we can make it easier to read by removing that:
source = newsletter
medium = email
campaign = spring_sale
id = 123
There are multiple reasons for using URL parameters that we’ll get into later. For now, you should know how to recognize them and how to separate them into their key-value pairs.
URL Fragments
Another thing you’ll see on URLs is URL fragments.
A URL fragment is the part that comes after a hash sign (#).
They typically have two uses:
- You can link to a specific part of the page by using URL fragments. If you’ve ever clicked on a link and had the page scroll to a different part of the same page, it’s likely that a URL fragment was involved. If the page content doesn’t change when the hash changes, you’re probably dealing with this case. If you want to try it on this page, click here and it will use a URL fragment to scroll this section of the page to the top of your browser window. If you look at the URL, it will have
#url-fragmentson the end. - In some JavaScript frameworks, the hash sign is used to navigate between pages. It might be a single hash sign like
/#/or it might have an exclamation point like/#!/. If the page content changes when the part after the hash changes, then you’re dealing with this case. Page navigation with URL fragments isn’t in fashion any more, because it isn’t good for SEO, but it’s still possible to find cases of it in the wild.
URL fragments are not sent across the Web, so the server can’t see them.
How to Parse a URL with JavaScript
If you know a little JavaScript and want to experiment with URLs, you can inspect them right in the browser.
First open your browser console. On most computers you can press F12. Alternatively, right-click on a Web page and choose “Inspect”. Then, in the tool that pops up, go to the tab that says “Console”.
You can type JavaScript code in the console and the browser will run it.
Paste this code into the console:
var u = new URL(
"https://www.example.com/?utm_source=newsletter&utm_medium=email&utm_campaign=spring_sale&utm_id=123#abc"
);
You can then access various parts of the URL on the variable named u.
Try this one:
u.origin
It should return the base of the URL:
https://www.example.com
Here are some other fields to try:
hash—#abchost—www.example.comhostname—www.example.comhref—https://www.example.com/?utm_source=newsletter&utm_medium=email&utm_campaign=spring_sale&utm_id=123#abcorigin—https://www.example.compathname—/port— blank, because we’re connecting to one of the default ports (80 for HTTP or 443 for HTTPs)search—?utm_source=newsletter&utm_medium=email&utm_campaign=spring_sale&utm_id=123searchParams— this will contain a special kind of object calledURLSearchParamsthat contains the query string data. You can access it by doing something likeu.searchParams.get("utm_medium").
URLs can contain usernames and passwords, but it isn’t common on the open Web, because you generally don’t want anyone to see your password.
Takeaways
Here are some things you should remember from this section:
- A URL is an identifier for a thing or resource on the Web.
- URLs have different parts: protocol, domain, port number, path, parameters, and fragments.
- All URLs connect to a port number, but the default ports of
80(HTTP) and443(HTTPS) aren’t written in URLs in most cases, because they are the default for Web content. - URL parameters can store extra data in the URLs.
- URL fragments have two purposes: scrolling to a location in a page or changing the page itself.