How do you get sensitive content OFF of the internet?
Published: March 31, 2016
Author: Simon Heseltine
Hey, why aren’t we showing up in Google? I mean, some of our pages are, but not all of them. I searched on these seven obscure terms that are mentioned once on pages buried deep in the site, and we weren’t ranking for any of them. Obviously there’s an issue. We should be showing up. You need to fix that right away.
Whether you’re in-house, with an agency, or an independent consultant, the chances are strong that you’ve heard a variation of that statement at some point in your career. You’ve examined the site to find out why pages aren’t indexed, you’ve perhaps had to rejig the site architecture to surface buried content, you’ve fixed the sitemaps, or you’ve found one of the many spider blockers that could have been the issue.
But what if you have the opposite problem? What if content you want to stay hidden is already out there? What if sensitive, proprietary information is available for your competitors and customers to see, to act upon, to negatively impact your business?
The other day, I was chatting with SEO consultant Alan Bleiweiss, he’d been playing around on Google and had typed in “Do not publish” to see what came back. There he saw a wide selection of pages with that text that were not supposed to be published.
Similar searches that I tried out showed that there’s plenty of information out on the web that is not intended to be available for public consumption.
Simply stating that a page is for internal use only, or is confidential, or should not be distributed, is not a signal to a search engine that they should keep it out of the index, and if you think that, you’re absolutely, utterly wrong.
If you’d put this content out on your site, where the search engines can find it and index it, you’ve really only got yourself to blame.
What you need to do if you have content that you don’t want search engines to find is:
- DON’T PUT IT ON YOUR WEBSITE
If it’s not on the website, it can’t be found, it can’t be hacked, it can’t be copied… basically, it can’t be seen by those who shouldn’t see it.
- Put it on your intranet
If it absolutely, positively, has to be online, then putting it on your intranet, or behind a password-secured area of your site, prevents any search engines/outsiders from seeing it.
- Block it in Robots.txt
Simply place the content in a common folder (e.g. /keepaway/) where in your robots.txt you’d add the line:
Or if it’s just that one file, place a direct reference to that in the robots.txt:
But again, why are you allowing this content to be out there? Anyone who stumbles across it can share it on social/screenshot it/copy it/distribute it.
- Noindex the page
In the head of the page, simply place the following construct:
<META NAME=”ROBOTS” CONTENT=”NOINDEX”>
If the other links on the page are all to places that shouldn’t be indexed either, then place this there instead:
<META NAME=”ROBOTS” CONTENT=”NOINDEX, NOFOLLOW”>
- DID I MENTION NOT PUTTING IT ON THE INTERNET?
Ah right, I did, that was #1.
The long and the short of it is that if you do have content that you don’t want anyone to find, make sure it can’t be found. If you absolutely have to have it accessible by anyone, be explicit and tell the search engines to keep their grubby spiders well away from it, but be aware that it can still be found by anyone/any crawler not obeying the directives in #3 & #4 above.