The Semantic Web can be confusing at first; it's a term that's been used here and there for years, especially since the formalization of the HTML5.1 specification; most webmasters are only concerned with the semantic web in-so-far-as it having consequences on their search-engine turnover (i.e. their conversion / click-through rate and their website's visibility) but semantics tags have farther-reaching implications than simply boosting your site's SEO.
Consider HTML: it is a markup language used on webpages to describe an object on the Web. These webpages are retrievable at a specific location on the web and the information retrieved by submitting such a request should reflect the object unambiguously identified by the uniform resource indicator (URI). This expectation was widely appreciated in the early days of the web but recently the trend seems to have fallen out of fashion. However, I still support the idea that the content at a URI should always yield relevant and semantically meaningful data about the object. All of that semantic meaning for webpages is conveyed in the HTML located at that node.
Without looking at the source-code (i.e. when viewing the web with a User-Agent, or browser) the difference between
<span style="font-size:28.8px; font-weight:400">Heading!</span>
and
<h1>Heading!</h1>
... can seem insignificant because both of these HTML snippets produce the
same result: large text, like something that would introduce a few
paragraphs of text or an article. But imagine that, instead, someone has
printed the HTML out for you on a sheet of paper... which snippet means more
to you, the reader? The second snippet, clearly, conveys more semantic
meaning; the <h1>
tag suggests that the text "Heading!" is actually a
header and is an important piece of information. It would be trivial to
read through the HTML and pick out all important headings: simply read
through the HTML (i.e. walk the HTML DOM tree) and make note of all <h1>
and <h2>
tags.
This is how most simple robots, web-crawlers, and other
Internet-androids function. They sit next to their printer, fetching
content from sites using a few GET
requests and printing the results to
their printers: computers don't have eyes after all, so they cannot look at
how the markup would be rendered (although some advanced crawlers like
Googlebot use rendering engines while
crawling).
The same thing applies not just to web-crawlers but also to screen-readers,
assistive technology, and web-scraping tools which rely heavily on the
importance of semantic HTML.
HTML5 has introduced a large number of semantic tags; my favorite are:
<article> <aside> <footer> <header> <main> <nav> <section>
Using these tags (appropriately!) can give your articles and web-pages more
semantic meaning in the eyes of web-crawlers (increasing the visibility of
your website to search-engines!) while generally giving your page richer
meaning; a site full of <div>
elements doesn't mean much to someone
reading the source-code for your site, but an <article>
element followed
by a <h2>
tag conveys much more meaning.
When writing HTML from now on consider using these elements where
appropriate to give your page richer meaning; don't just toss them
around without concern though: consider wrapping your main page content
inside a <main>
tag and using <article>
and <section>
tags for
identifying cohesive and coherent veins of information on your page, and
consider using <nav>
to indentify navigational elements like clickable
arrows or page numbers.
Rules
There are some rules for writing valid HTML using the semantic elements; although you're perfectly fine ignoring these suggestions, they indicate that you're employing the semantic tags to their fullest! Tools like the W3C Validator can be used to check your site for markup validity and may offer suggestions which can improve your SEO. Some common rules / suggestions I know off the top of my head are:
<article>
and<section>
elements should be identified by a child heading (i.e.<h1>
or similar) element- Headers should be in cascading order of importance; you should never have
a
<h1>
element followed directly by a<h4>
element - Avoid using more than one
<h1>
element per page
Of course there are a lot more rules and suggestions; you can learn more by flipping through the HTML 5.1 spec published by the W3C. Some benefits of using valid HTML are:
- Your site is ranked higher in search engines like Google, Yahoo, Yandex, etc.
- Improved accessibility for users with screen-readers
- Simpler markup
- Fewer CSS rules
Because a lot of browsers style certain elements out-of-the-box (e.g. <h1>
is always fairly large) you can avoid writing complex and cryptic CSS rules,
instead allowing the browser to style according to its defaults (and
according to user preferences); using semantic elements on your site makes
everyone happier!