Continuing in my short series of five big SEO myths, this one is perhaps the most controversial of the concepts I’m going to tackle.
In the first post in the series, I laid into the discredited but still apparently widespread practice of stuffing keywords into the meta tags of a web page. My research into how keywords are used by search engines also led to me taking a long hard look at the notion of Keyword Density and the idea that there is some magic optimum number that will make all the difference between search engine success and failure.
For those of you who already know what Keyword Density is and why it’s deemed so important, I might as well get this out of the way right up front: frankly, I’m just not buying it.
i. As with all the posts in this series, I’m writing from the perspective of a Public Relations bloke. My observations relate to how news releases and editorial copy perform in search engine terms; the same thoughts are not necessarily going to hold water when looked at from a broader web content perspective.
ii. I still have a lot to learn about all this stuff. If I get things wrong (as I inevitably will) I will add updated and corrected info in future posts.
OK. Onwards. If you want the really short version:
Rather than worry about achieving an optimum density percentage, people would do a lot better to focus on writing good, interesting copy.
[Note: I'm drawing heavily on the fact that I spent many years working in the knowledge management software business before moving into PR. I would never have considered myself a true KM expert, and I'm certainly not an expert in SEO - I'm a mere flack, after all - but I think I learned enough about keyword-based indexing and search techniques to be mildly dangerous. I've also dredged up from memory some of the old examples and thought models we used to use back in my KM days. Grateful credit to a number of my old KM buddies for seeding the dark and dusty corners of my mind with some of these still useful examples.]
Keyword Density is, according to Wikipedia’s simple definition:
…the percentage of times a keyword or phrase appears on a web page compared to the total number of words on the page.
Let’s say you’re searching for the keyword “bogus” and you come across a 100-word document that happens to include that keyword six times — that document has a density of six out of 100 for the keyword “bogus”, or:
Think of the way a search engine functions. A potential customer sitting in front of the search engine is trying to find information that is important to them. As a search engine developer, you want to offer up useful and meaningful results when they search. Using only the simple keywords the user provides, somehow you have to try to figure out what information would matter most to that individual right now.
This is a massively hard thing for any computer system to do. Most of us aren’t really terribly good at searching — it’s hard for us to translate the concepts and ideas we’re looking for into simple keywords.
At the other end of the search pipe, it’s almost indescribably challenging to build a computer system that can understand what all the stuff out there on the Web is about. And “aboutness” is really, really important. To a computer, the words and phrases in a document are just bits: ones and zeroes. They have no meaning; the computer doesn’t know what the document is about.
People know that a certain arrangement of words on a page, with spaces and punctuation just so, will turn a set of otherwise random characters into something that has meaning; that has aboutness.
Think of it this way: say you’ve forgotten both the name and the author of an old poem you remember learning as a child. You recall the sense of the thing, but you can’t remember how it went.
So you wander into a favourite second-hand bookstore to see if you can find a copy. Without even the poet’s name, though, you’re going to be kind of hosed.
Luckily, the ancient shopkeeper (let’s call him Mr. Ptolemy) is both exceptionally well-read and has a prodigious memory.
Trying to describe the poem to our friendly bookstore owner, you mention that it’s about the choices we all have to make in life, and the consequences we will inevitably face from those choices as we grow older.
Somehow, splendid chap that he is, Mr. Ptolemy is able to discern that you’re talking about Robert Frost’s “The Road Not Taken“.
He understood precisely what you meant and, as he recites a couple of favourite lines (“…Two roads diverged in a wood, and I– I took the one less traveled by, And that has made all the difference“), it all snaps into place. Yes! That’s exactly the poem I’m looking for!
Now try to imagine sitting in front of the Web version of Google and achieving the same result. What keywords would you have used? “Life” and “Choices” perhaps? Neither of those words appears anywhere in the poem. So where are you going to start?*
You have the sum of all human knowledge at your fingertips, but all you can do is describe what the document you want is broadly about. And all the computer can do is a kind of textual number-crunching based on word frequency, link relationships and keyword concepts.
Do you see how hard this stuff is for the people who build search engines?
Without getting deep into the kind of incredibly clever semantic search stuff my friends at TextWise do (disclosure: they’re a client), it’s really quite amazingly hard for most software systems to understand in any real way what even a simple document is about. So search engines were built around certain compromises.
Typically, in documents, web pages and things like that, there is going to be some kind of discernible relationship between the words they contain and what the document is actually about (unless, it seems, we’re looking at poetic metaphors). A document that uses the word “astrophysics” several times is likely (but far from certain) to have something to do with the general topic of astrophysics.
From this, we can infer that a whole bunch of documents and web pages with many similar words (astrophysics, astrophysicists, cosmologists, cosmology, etc.) are more likely to be about the same thing than documents with no similar words. This is useful, because it means we can start grouping stuff together into clusters of inferred aboutness.
(Homonyms tend to bugger this all up, I’m afraid. Our astrophysicist would mean something quite specific if she searched for “stars”. To a teenage celebrity gossip junkie, the same keyword means something entirely different. And a poor chap who just had difficulty spelling the word “asterisk” would be even more confused. But let’s not get too far down that path – semantic disambiguation blows my mind.)
By now, you should have already figured out how some of the earliest search engines worked.
- Build a really, really big index of words and pointers to where they appear in lots and lots of documents.
- Use the frequency of word-use as a guide to which documents are most likely to be about the topics your searcher is interested in.
- Layer on some synonym cleverness and you’ve got the start of a workable way to navigate through an ever-expanding online corpus of knowledge.
It’s from this approach that the notion of keyword density rose to prominence in the SEO world.
Unscrupulous marketers in the early days of the web figured out that early, dumb search engines could be fooled. A document that included the word “astrophysics” in every second sentence might, the theory went, end up being ranked as the single most relevant and useful document about astrophysics in the entire universe. (It wasn’t really quite this unsubtle, but you get my drift).
Having worked out the importance of density, web marketing monkeys started stuffing their pages with hidden keywords. Remember that old practice of embedding white text on the white background of a page? That was a density game.
The search companies quickly caught on though, as the Wikipedia entry notes:
In the late 1990s, which was the early days of search engines, keyword density was an important factor in how a page was ranked. However, as webmasters discovered this and the implementation of optimum keyword density became widespread, it became a minor factor in the rankings. Search engines began giving priority to other factors that are beyond the direct control of webmasters. Today, the overuse of keywords, a practice called keyword stuffing, will cause a web page to be penalized.
If you do any research into this stuff at all, you’ll soon see that there’s something of a balancing act going on. On the one hand, you don’t want to get downranked as a spammer for having too many keywords stuffed into your web pages. On the other, you don’t want to run the risk of ranking too low by not including enough keywords.
There’s a two-step consulting process taking place out there:
- Help the client figure out the most important keywords that will attract the right audience to their web pages (e.g. people who want to buy a couch in Canada are probably searching for “chesterfield” not “setee”);
- Optimize all web content to hit the right proportion of keywords-to-text throughout.
The general consensus right now seems to be that maintaining a keyword density of between 2-3% in your web content is optimal.
Any higher than 3% and you might get marked as spam, any lower than 2% and you’re just not even on radar. These numbers vary widely, mind: I’ve seen optimal density recommendations as high as 8% – which seems insane to me.
Think about this in PR terms for a second: to achieve 2-3% recommended density in a short, 400-word news release, you’d need to repeat the chosen keyword 8-12 times. We’ve all read news releases like that – the ones that sound like they were written by robots.
Here’s the thing, though: other than a relatively small group of real experts (the people who actually build the search engine algorithms at Google and elsewhere) no one really seems to know whether keyword density has any impact on search engine results.
In fact, I’ve been unable to find a single shred of evidence that any major search engine in use today gives preference to a particular ratio of keywords in web pages.
There are a lot of conflicting opinions out there, and I could be 100% wrong about this, but stick with me…
In all of the reading I’ve been doing on this topic, it was one particular comment from Eric Brantner at the site Reve News (geddit?) that really sparked my skepticism. In a piece titled “Keyword Density: The SEO Myth that Never Dies”, Eric writes:
The simple truth is search engines are far too advanced to be tricked by something as basic as an optimal keyword density
…and that makes a great deal of sense to me.
As an aside, I think one part of the problem is that people often completely misinterpret the idea behind those optimal density numbers. It’s easy to assume “recommended density” should be taken as a guide to add more keywords into a web page until you hit the magic ratio, and there are scores of online keyword density calculators that promise to help you figure out your sweet spot.
In fact, if keyword density measures are important at all, they’re primarily useful in helping to manage keyword overload — to ensure your content doesn’t get discounted as spam.
Optimal density is something you’re encouraged to work down to, not up towards. There’s a good article on this topic at the delightfully snarky SEOElite blog and another useful analysis on the well-known SEO Tools site.
Getting back to the main point, though, I’ve come across a number of sources making the (entirely believable) assertion that keyword density on a single document doesn’t actually matter much at all. And here’s why: keyword density is an internal measure. It ignores the fact that no web page is an island.
In other words: assessing keyword density can only tell you something about the individual web page (and its numeric placement in a simple ranking table) – it’s a way of analyzing word frequency in a document in relation only to the document itself.
Think of a great long list of documents, arranged in order of percentage density for the keyword “street”.
- At the top of the list is a document that has a very high density, as it contains the keyword many thousands of times in a 2,000 page file (let’s say it has a density of around 8%).
- Way further down the list is a web site that mentions the word fifty times out of 35,000 words (0.14% density).
- Somewhere in the middle is a Wikipedia entry with 133 uses of the keyword out of 2,700 words (5% density).
So which of these is actually the most relevant document? The answer, of course, all depends on what you’re looking for.
That first document in the list includes the word “street” thousands of times because it’s the Yellow Pages. Probably not what you had in mind.
The web site with a keyword density of less than 1% is the hip young online magazine you’re looking for – the one that just happens to be about all things “Street”, but is way too fearsomely cool to use the word more than a handful of times in its masthead and elsewhere.
At this point, the logic of my analogy crumbles and leaks rather, but you get the point. Just because a document uses the same word lots of times (or even just enough times) does not mean it’s the most relevant and useful document for every search.
It’s like: if I stood in front of an audience for an hour and dropped the word “astrophysics” into every fifth sentence, a completely unsophisticated listener might assume that I know something about astrophysics just because I used the word a lot.
But linguistics research has shown that frequency has no bearing on relevance – and it doesn’t take any kind of research to prove that I know the square root of bugger all about astrophysics (nor about SEO, for that matter).
The best and most advanced search engine algorithms (such as those in place at Google, for example) are designed to index and “understand” words in a document in the context of the index in which that document appears. The ultimate search engine, perhaps, would be one that (amongst its weaponry) had the ability to understand the true relevance of any single document when compared with every single other document in the known dataverse.
Again: the fact that a particular document happens to use a certain keyword a dozen times does not necessarily mean it is an authoritative source of info related to that keyword. Good search engines know this and have largely devalued keyword density as a ranking parameter. It’s still used, but it is not nearly as important a measure as it was way back at the dawn of the Web.
In short: frequency is not the same as relevance.
SEO efforts that focus too slavishly on achieving the optimum keyword density run the risk of creating dry, robotic copy that’s a nightmare for human visitors to read, and may even be down-ranked by sophisticated search engines.
Perhaps I’m being naive here, but I can’t help thinking that the goal of the search engines is to work the way our Mr. Ptolemy does in the bookstore example above. The search engine tries to understand what it is you’re really interested in, and offer that stuff up to you through the browser.
Google uses more than 200 different signals to try to determine the best information to offer up for any search, and they change their algorithms (by some accounts) several times a week. In the midst of all this high-power computing, what they’re trying to do is mimic a really good human guide. They do this by looking for the cues to what other people deem to be the most valuable, relevant, useful and interesting content on any topic – using all kinds of different “signals”.
With all that sophistication going on, I can’t help but think that such a simplistic notion as “keyword density” is a real red herring. Good content, well written, is as important today as it has always been. Write something useful, meaningful, intelligent, newsworthy or just genuinely interesting (or all of these), and the search engines will find you.
Before I shut up about this, a final thought on keywords. I’ve laid into them pretty hard in the first couple of posts here, and I don’t want anyone getting the wrong idea. While I’m just not ready to go along with the magic “optimal keyword density” malarkey, I’m still a firm believer in the importance and value of using the right keywords for the audience you hope to attract.
Keywords are, after all, the simple inputs we use to search – so it’s important to research and understand the words, phrases, synonyms and circuitous routes that bring people to your site. Studying your site analytics can be great for this.
In the last 24 hours, I know that people have come to my blog through searching for me by name (with all kinds of creative misspellings) or by searching for such diverse things as:
social media experts
future of branding
the machine stops
i hate vista
(I’m still the #1 ranked site in Canada for this last example, btw – and do you think Microsoft has ever reached out to me in any way?)
Studying the keywords people use to find you can teach you a lot. They’re still the key drivers of search and any professional communicator will want to be sure they’re using the same kind of vocabulary as the potential audience they’re seeking to engage. Again, there are a lot of online tools you can use to experiment with keywords. Go Google.
Just don’t get too hung up on any spurious notions of optimal keyword density, OK?
*[In case you're wondering, if you Google "poem about life choices", without the quotation marks, one of the top five results just happens to be a link to Robert Frost's poem. Darn it. This doesn't mean that any part of my argument is necessarily invalid, though. It simply proves that I'm not very good at coming up with illustrative examples for some of my points.]
Back to Myth #1: The Importance of Keyword Meta Tags
Next up – Myth #3: On-page optimization is the thing