Search engine optimization (SEO) is a very technical field, but essentially, it’s about experts trying to make it as easy as possible for search engines to realize what websites are about, thus allowing said search engines to correctly show the right users the right websites.

Now, take a look at this list of reputable SEO firms from Clutch. All of them can tell you exactly what they’re trying to achieve with all their SEO wizardry. But have you ever thought about it from the perspective of the search engines themselves?

What’s in a Website?

From a search engine’s point of view, a website is an online document with many machinations and different parts – from the domain URL to the servers actual hosting the content to the actual user-facing webpages that make up the site.

A search engine’s concern is to crawl the website, index its most useful webpages for quicker access to users with a relevant search query, and analyze this index to come up with a match whenever a user performs a search query. Of course, the search engine may also create a cached version of what it deems are important webpages within the site. From the perspective of the search engine:

  1. Along with other factors in the backend, the website’s server uptime and reliability would be minor factors determining how often the search engine would risk showing this particular domain as a result in a search query, if it appears relevant.
  2. With the website’s reliability and relevance in mind, the search engine indexes the webpages for quicker retrieval in the event of a relevant query.
  3. A search engine wouldn’t truly understand what the website is about, but based on technical data it collects about the content of the individual webpages of the website (more on this later), it can algorithmically score the website – each individual webpage, more like – on how relevant it is to a specific search query.

Now, keep in mind that search engine spiders like Googlebot “crawl” and “score” individual webpages. Just a few years ago, search engine spiders could only crawl content and HTML code, but now they are showing that they can process Javascript and HTML5 and SCHEMA codes – so much so that it is recommended to use the contextual HTML5 tags and SCHEMA to benefit your SEO.

Where media such as images and video are concerned, spiders can only glean information about them from their HTML metadata alt properties, captions, and descriptions. It’s best practice to add as much of these backend, spider’s-eyes-only metadata to un-crawlable elements as possible to provide more context to the bots doing the indexing.

Each webpage’s content and links are crawled, and we’ll break them down further since they are the most important ranking factors Google uses in its algorithm.

Matching Content to Search Keywords

The core of how Google (and search engines in general) processes content has not radically changed since Sergey Brin and Larry Page outlined the process in their paper The Anatomy of a Large-Scale Hypertextual Web Search Engine during their days at Stanford University. It has, however, become more complex, evolving to handle exceptions, include more complicated elements, and take into account that black hat SEO practices exist and can potentially rig the outcome.

Search engines analyze the content of a webpage to see if it’s relevant to search queries. If we boil down the process to its simplest parts, the search query evaluation process goes as follows:

  1. Parse search keywords.
  2. Convert keywords into IDs.
  3. Start searching for IDs in Google index.
  4. Scan through index until a match is found.
  5. Compute how relevant matches are.
  6. Sort and rank relevant matches.
  7. Return the resulting ranked list.

In terms of ranking matches, the process more or less goes like this:

  • Keywords that matched words in content are called hits
  • Each hit is analyzed for factors such as capitalization, formatting, font, and position in the content, as well as if it is anchor text (link)
  • No particular factor can heft too much influence in ranking
  • Each factor has its own significance or type-weight
  • The number of hits has a separate significance or count-weight
  • Count-weight can only increase so far until it reaches a point where additional counts will not help in ranking
  • A combination of type-weight and count-weight is further combined with PageRank, the rank of a website dictated by the number and quality of its inbound links, to conclude its relevance to the search
  • For multiple keywords, another factor is included: proximity or the distance of keywords from one another in the content (represented by type-prox-weight, or a type and proximity pair)

To counter the effects of black hat SEO tactics such as keyword stuffing, imagine some of the steps in this (simplified) process to be under the watchful eye of Google’s Penguin and Panda updates.

Now What about the Links?

A link in a webpage communicates a specific message to search engines: this current webpage is linking to another one in the internet, which means the webpage being linked to is somewhat relevant to this current webpage, or to the specific part of the content of the webpage where the anchor text appears.

Also, the link equity of the webpage – if no “nofollow” attributes are added to the link – “passes on” to the webpage being linked to. That is to say, part of the authority and “relevance score” (e.g. PageRank) of the webpage receiving the link is gleaned from the authority and “relevance score” of the website linking to it. This also allows search engines to pass context value from one webpage to another, for example:

  • The search engine crawls a webpage talking about puppies as pets
  • The search engine “understands” that the webpage is about puppies, pets, and related subject matter based on the parsing of the content
  • The webpage links to a page about dog food with the anchor text being “dog food companies”
  • The search engine contextually relates the two webpages together, and though it does not truly understand the relation, it can correctly analyze that they are relevant to each other’s subject matter, especially after crawling the “dog food companies” webpage and getting the same or semantically similar keyword hits
  • In the process, the authority of the “dog food companies” webpage is incremented by the link equity passed to it by the webpage about puppies
  • It goes full circle and the search engine’s contextual understanding of the webpage about puppies is reinforced

Take note that this is an extremely simplified version, but you get the picture.

Understand the Process, Better Appreciate the SEO

Now with that rudimentary understanding of a search engine’s perspective, it should be easy to see why SEO experts would implement the campaigns that they do. It would be plain to ascertain why they would perform keyword research, plan out content the way they do, and recommend specific strategies based on your business objectives.

In fact, if you’re looking for an SEO partner, it’s best if you at least understand the process that search engines go through, so when your SEO agency or consultant suggests a course of action, you can figure out why and how it would affect your website’s search-friendliness. This is essentially the process that SEO professionals are trying to leverage for online marketing – this is the very playing field where the optimization game is played.

 

1 Comment

  1. Ben August 19th, 2016

    The post is well written i read twice and got some new tips as i am also the beginner in seo and hunt for new knowledge and got some tips from here , Thanks a lot for sharing with us .

Leave a Comment

Tim Clarke
Author Bio: Tim Clarke is the Research Manager with Clutch, Clutch identifies leading software and professional services firms that deliver results for their clients. Tim heads the SEO and PPC research at Clutch. You can follow Clutch at @clutch_co