This is the subhead for the blog post

The purpose of doing SEO site audits is to analyze a site for a number of compliance and technical issues that can positively and negatively influence search performance. There are a bunch of factors that go into a good, thorough audit, and we’ll cover them all in this series.

Today’s kick-off post will focus on site infrastructure.

In SEO terms, this refers to a website’s “back end,” which includes site platform, database design, server-side redirects, robots.txt, and other structural elements. Infrastructure is generally a function of the Information Technology (IT). Issues with infrastructure will impact a site’s ability to be crawled and indexed by the engines.

Let’s break down the elements and what to look for.

Page Inclusion

Page inclusion refers to the number of web pages being crawled and included in each of the search engines’ indexes. The number of pages included in the index is a strong indicator of search-friendliness issues and, in the case of a low number, may be a sign of a problem preventing web pages from being crawled and scored.

Level of Impact:  Very high

What you might find (example): The following numbers of pages are indexed in the major engines:

Domain Google Yahoo

Bing 0000 0000

0000* 0000 0000




The robots.txt file is a means to keep search engines out of specific sections of a site or the entire site. This file is useful for sections of a site that should not be indexed (though password protection is much more reliable). Often, as a site is being redesigned, developers will implement a robots.txt to keep engines out during development. Problems can arise when programmers forget to remove the file when the site goes live.

Level of Impact:  Very high

What you might find (example): No evidence of a robots.txt file is present. The file should reside at



Proper server syntax is critical for search engine spiders to correctly index web sites and their content. Two common codes are used by webmasters when they are redesigning a site: 301 and 302 redirects.

A 301 is a “permanent redirect,” while a 302 is a “temporary redirect.” A 302 redirect tells the engine to come back later because this move is only temporary. We recommend using a 301 redirect because the engines would rather crawl permanent content and because a 301 redirect moves the associated link authority to the new site (while a 302 redirect does not). A 302 redirect can even result in a penalty at Google because of past unethical use (a 302 can be used to hijack a site’s ranking).

Also, because the engines see the 302 redirect as temporary, they may leave the last indexed content from that URL in the index, resulting in duplicate content if the new page is an identical copy that has been moved to a new location.

In addition to the 301 and 302 server-side redirects, web developers will often use browser-side redirects such as JavaScript or meta refresh redirects. Neither of these techniques are SEO-friendly.

In addition to server-level redirects, many webmasters also use “zero second meta refresh” redirects or JavaScript redirects on the site. Search engines often interpret these redirects as a spam technique, and they are not recommended.

A 301 redirect is the recommended option to optimize search engine status and crawling opportunities. It is the most efficient and search engine–friendly method for web page redirection.

Level of Impact:  High

What you might find (example): The following redirects were found:



Redirect HTTP/1.1 301 Moved PermanentlySet-Cookie: ARPT=QPVXMZS10.140.230.7CKMQI;path=/Date: Wed, 25 Jan 2012 21:49:57 GMT Server: ApacheLocation: Content-Length: 304Keep-Alive: timeout=15, max=100 Connection: Keep-AliveContent-Type: text/html; charset=iso-8859-1



Most search engines prefer to let their spiders find your pages rather than having them submitted. Spiders are trained, and often the first page requested on your site is one named “sitemap.html” because this page offers a simple way for the spiders to locate the main segments of your site. By placing a link to your sitemap on your home page and every internal page on the site, you ensure that the search engines are only two clicks away from every important page on your site.

The HTML sitemap also provides another tool for web visitors to use to find content, especially when the internal search does not return relevant results.

A well-planned sitemap allows search engines to reach all of your content, no matter how deep. The sitemap can reinforce contextual relevance of your content by acting as a thematic hub for all related pages. The sitemap also provides increased accuracy to the internal search functionality by providing a guide for crawling internally referenced pages.

Level of Impact:  High


Home Page Text Links

The home page is often weighted as the most important page on the site and therefore is visited most regularly via search engine entry. Keyword-rich text links quickly deliver the content engines and searchers seek.

Level of Impact:  High


Required JavaScript or Cookies

Required JavaScript or cookies may stop search engines from indexing a web site. People can complete forms and accept cookies, but search engines cannot. If a search engine visits a web page that insists on cookies before displaying the page, the search engine may index the wrong text (the “cookies required” error page, for example) and abandon further indexing.

Level of Impact:  High

What you might find (example): With JavaScript disabled, the homepage and many of the pages were viewed. However, content on the FAQ pages would not display without the JavaScript capability turned on. In addition, with cookies turned off, none of the pages loaded properly. The search engines will not be able to crawl the site and effectively index the pages.


Multiple and Sub-Domains

Sites may have multiple domains for load-balancing purposes, but multiple domains can lead to duplicate content and, ultimately, a scoring penalty.

Sub-domains are often used to segment content (rather than using a directory structure), which creates a new and unique site in the eyes of the engines. Sub-domains should only be deployed if content contained within them is sufficiently different from the primary domain.

Level of Impact: Moderate


URL Structure

Search engines have difficulty following certain types of URLs, specifically those that are overly long or contain session IDs. Google will not crawl any URL that is longer than 255 characters and may truncate even shorter URLs if they are deemed excessive. URLs that contain special characters, such as ampersands and question marks, may cause difficulty for search engines.

Session IDs, in particular, cause problems. Many spiders will not crawl pages that contain anything that looks like session IDs in the URL because spiders believe session IDs create an infinite number of pages. Under these circumstances, each requested page may contain links to other pages, and the link URLs would contain the current session ID. Theoretically, this makes them different URLs—distinct from the previous time the page was requested. Search engine spiders are likely to skip these pages. If such pages are indexed, there may be multiple copies of pages and each page will take a share of the algorithm score. The result is diluted rankings.

Level of Impact:  High


Directory Structure

How you build a site can have an impact on the performance of the site and its ability to be crawled. The “flatter” the site structure (fewer directories), the easier and more quickly the site can be crawled. Large sites with deep content and multiple sub-sections can present a challenge for the search engine crawlers as it can take significant time to locate all of the pages. This issue can be mitigated with the effective use of sitemaps by directing the crawlers to specific areas of the site.

Level of Impact: Moderate


Naming Conventions

When coding a site, webmasters should use naming conventions that include important keywords. Avoid overloading the page using multiple words or long phrases in the URL.

Level of Impact:  Low

What you might find (example): The website includes a number of undesirable items such as numerical codes and product IDs. The URL structure would be enhanced for SEO by incorporating targeted keywords.

For example, the page, (, would perform better for terms like ‘LG 133’ if the term appeared in the URL rather than ‘A133CH’, which has almost no search demand in comparison to the term ‘LG 133’.

Searchers are more likely to click on listings where they see the exact searched phrases. The engines reinforce this practice by bolding those terms in the results, as shown in the example below.

ad keywords

Stay tuned for the rest of the SEO Audit series, which will address design and coding, web templates, link analysis, and spam/black-hat techniques.