|
In the last article in this series, we took a closer look at search engines. We saw how they find sites to spider, saw how our web site looks to a search engine spider, and talked about how search engines classify pages using keywords and keyphrases. This installment will talk about how search engines decide which pages should be listed first in their results, why others get listed later, and why some get banned or blocked altogether.
How do search engines rank their results? If you haven't done it in a while, go to one of the search engines, like Google, and search for something very specific. For example, when I searched for “2006 Ford Shelby Cobra specifications” I got results that looked something like this:
Notice that the left side of that window shows me the results of my search. The right side contains “Sponsored Links” which are paid advertisements that Google has decided somehow relate to my search results and therefore might interest me. But how did Google decide which of those pages is first, second, third, etc., in the list? It must have a method, right? Sure enough, it does. While the exact methodology is patented and a trade secret known only to Google's inner circle of programmers and management, I've read their patent application and determined from it the methods they use to rank the pages in their results list. The following items are all factors in that determination: Length of domain name registration: If your domain name is only registered a year at a time, and it's relatively new, Google probably assumes that you aren't too serious (at least not yet) about publishing information on the web. Many search engine spammers, for example, register a domain for a year, spam search results like crazy, then get out the following year because the domain is blacklisted with search engines. Registering your domain name for more than a year should help push you up a little in the search results. Inbound links to your site from other sites: If your site has any valuable information on it, it's logical to assume that others will link to your site from theirs. Thus, a part of Google's ranking system asks the question "how many other sites link to this one?" and adjusts your ranking if more sites link to yours. Thus, participating in link exchanges can help, though I suspect Google's algorithms are smart enough to recognize a "pure" link exchange from a link exchange among related web sites. Clicks to your site from the search results: If users see your site in the search results and they tend to click on it, Google can infer from this that your content may indeed be relevant to the search criteria. As a result, they'll move you up in the ranks. But they also track how long people visit your page before returning to the search results. If people tend to go to your site first, but then immediately return to the search results page, Google can deduce from this that your page doesn't contain information relative to the search and will push you down the list in the future. Keyword density of your page: As we discussed in the last article, search engines break your page content down into keywords and keyphrases. The more times they see the keywords and keyphrases on your page relative to the others in the list of results, the higher they will generally rank you. However, they do have algorithms in place to detect “gibberish” paragraphs that are included in your page only to bring up its ranking in the search results. This will get you pushed down in the results or get you banned completely. Frequency and extent of updates: If your site appears to have been stuck on the web and left unchanged for a few years, Google will assume that it's probably getting obsolete and it will fall in the rankings. Similarly, if you just make relatively minor changes just to make the page appear to be current, they can detect that, too. Your ranking would fall. The lesson here is that your pages must be kept current, occasionally updated significantly if appropriate, but not constantly changed just for the sake of changing them.
You may hear people refer to something called “search engine spam”. While this term is applied to any number of things, one of the best definitions I've seen is this one from searchenginewatch.com: Search engine spam is “pages created deliberately to trick the search engine into offering inappropriate, redundant, or poor-quality search results.” One example of this would be a page advertising lug nuts that contained dozens of spurious references to pop icon Britney Spears just for the sake of having the page pop up on searches for articles about Britney. That would differ from a non-spammer doing the same thing IF Britney was endorsing that product, sharing a testimonial about it, etc. While you should feel free to do things that improve your site's ranking in search engine results where it's RELEVANT, you should be very careful to avoid doing anything that meets the definition of search engine spamming. The experts working at the search engine companies know the difference, and their software knows it, too. If they think you're spamming, you'll find your site blacklisted quicker than Britney can hold a press conference. How Do I Improve My Pages’ Ranking in Search Engines?
The short answer to this question is that, in the final analysis, you really can’t choose where your page comes up in the search results. But there are lots of things you can do that will make a big difference in whether you come up third or 300th in the list. These techniques will have different levels of success with different search engines, because each one uses a proprietary (even “patented”) method for deciding what pages go in the index and which ones get listed in which order in search results. Most of the techniques below are developed from my analysis of Google’s patent documentation for its search engine, but some come from experience, observation, and other reading. I’ll cover the “less-obvious” techniques later in this article to help you get through them the first time.
Your placement in search engine results can improve if you can find a way to accomplish as many of the following as possible:
Get people to click on your site in the search results. Google in particular pays attention to how many people click your site (versus the others) in search results and how long they stay on your page. The more people who click your link in the search results and the longer they stay on your page, the better you’ll look to Google and the higher you’ll go in the search results. Get other sites relevant to your content to link to your page. Google’s spider pays attention to how often other sites link to yours, and whether those links are on pages that seem to relate to your content. Thus, its important to get links from sites that are “relevant” to yours. A million links to your web candy store from pages about cars, boats, politics, and the like aren’t going to count nearly as much as a few dozen links from candy fanatics’ web sites. Make sure your pages are appropriately “dense” with keywords and keyphrases. If you have a web page about the “rise and fall of civilizations” your page probably ought to have quite a few references to the phrase “rise and fall” and use the word “civilization” often. Most search engines determine the relevance of a page by noting how often the keyword or keyphrase appears on that page. If yours only says “rise and fall” and “civilizations” once, while someone else’s mentions those phrases 20 or 30 times, they’re page will likely outrank yours (all other things being equal). Register your domain name for at least two years. There are sites which try to “spam” search engines by creating pages filled with garbage text that contains words that search engine users are regularly looking for. When a user clicks on one of these page links, they’re treated to an ad for a typical spam email type of product or service instead of the useful information they expected. When the search engines identify a spammer’s Internet domain name, they may block it from their search results or at least drastically reduce the “relevance” of the page in search results. One way that search engines can identify spammers is by the fact that they generally only register domain names for a single year at a time, since they know the search engines will catch on to them and block that domain, making it useless after a few months. If your site is registered for more than 1 year at a time, this shows an “intent” to provide meaningful content for a long time. It will help improve your ranking in results, all other things being equal. Provide useful content. This sounds like a “no brainer” but realize that people who come to your site are usually there for a reason. They want the answer to a question, like “What colors does your product come in?” or “Do you offer overnight shipping?” or even more detailed information like “Does your warranty cover accidental chipping of the paint?” If your product or service’s web page merely says “We make toasters” without giving the details of your product line, visitors are going to find your site useless. If they were looking for a blue two-slice toaster capable of handling bagels and bread, your “we make toasters” page is not going to provide the answers they need. When they see your page in search results, they may visit it, but they won’t stick around. The search engine will view this as “this page wasn’t very relevant”. You’d be amazed how many sites out there think that just showing a picture of a product and a price is enough to generate lots of orders. Sometimes it is. Often it isn’t. Remember, the more good information you provide visitors, the more likely your keyword density will increase, the longer people will visit your page, and the more likely they’ll be to link to it from their own sites. All of these things will improve your placement in search results. Make sure there’s something for the search engine to “see” on your site. If you want to know what your site looks like to a typical search engine spider, here’s a simple test. Bring up your home page. Right-click it. Choose “View Source” (or the equivalent function – see your browser’s help system). If you see a bunch of HTML code, you are looking at what the spiders are going to be seeing. That means if your home page is a very clever Macromedia Flash splash screen that leads users into the “real” content, all the spider is going to see is a very small HTML page, perhaps with no links on it at all. It will think that this one HTML page is your entire web site. This is one reason that “splash screens” going into a web site are a bad idea, especially if there aren’t some links there a spider can follow into your “real” content. If you use a content management system, find out what it’s putting into the “META” tags in the finished HTML. To do this, bring up a content page on your site, right-click, and choose “View Source” (or the equivalent function). Look for the “<HEAD>” tag. Inside there should be one or more “<META>” tags that look something like this:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> <meta name="title" content="How Mac OS X on x86 is Great for Linux" /> <meta name="author" content="Michael Salsbury" /> <meta name="description" content="Michael Salsbury's Web Site, The author discusses why Mac OS X is a good thing for Linux, and how it may become even better when OS X runs on Intel." /> <meta name="keywords" content="Michael, Mike, Salsbury, Mike Salsbury, Blog, Web Site, mikesalsbury, Linux, OS X, Intel, switch, Linux vs. OS X" /> <meta name="Generator" content="Mambo - Site content (c) 2005 by Michael Salsbury. All rights reserved. Mambo (c) 2000-2005 by Miro International Pty. Ltd. All Rights Reserved." /> <meta name="robots" content="index, follow" />
These tags tell the spider the author’s title for the page, the author’s name, a description of the page and its content, the keywords the author thinks are relevant for this page, the name of the content management system that generated the site, and instructions that the “robots” (spiders) should index this page and follow the links it contains.
Your content management system may not be providing any META tags, or it may not be providing all of them. The important ones are the “title”, “description”, and “keywords” tags, as many search engines use precisely this text to display your page in the search results. The information in the keywords tag should be “minimally redundant” (meaning that the amount of repetition of a given word or phrase should be kept to an absolute minimum). Keep your content current. A search engine’s spider can tell (thanks to information your web server provides automatically) when you last updated your pages. The more often you update, and the more updates you make, the more “current” and “relevant” your site will appear to the search engine and the higher it will rank in the search results. The “sitemap” file can also help you to tell the search engines how often you expect particular content to be updated, and how important (to you) that content is to be indexed (note that this neither of these alone will improve your ranking at all). If possible, make sure your domain name reflects the site content. For example, a site about custom furniture will often appear higher in search results for “furniture” if it is named “smithfurniture.com” than if it is named simply “smith.com”. Avoid common “spamming” techniques. Spammers often put lots of “hidden text” on their pages to increase their keyword density, so that spiders will flag the pages as being especially dense with keywords. They will also cram lots of irrelevant words into the “META” tags (discussed above) thinking that this will improve their ranking in search results (e.g., putting “Britney Spears” on a page about chocolate, even though Ms. Spears isn’t involved in its promotion or advertising, just because lots of people are searching for information about her). Another common spamming technique is to generate lots of “fake” links to your site by creating a number of other sites that serve no purpose other than to link to yours, or by spamming message boards on unrelated sites with links to yours. If your site employs tactics that look like search engine spam, search engines will drop you from the database or push your results to the very bottom. We’ll cover this in more detail later. http://searchenginewatch.com/searchday/article.php/3483601 Avoid the use of frames. Search engine spiders often have problems traversing frames to find the content. If you can avoid the use of frames, more search engines will pick up your site easily. Build a sitemap for spiders to follow. Google provides information about this: https://www.google.com/webmasters/sitemaps/docs/en/protocol.html (We'll discuss this a little more in the next part of this series.) Build an index for spiders to follow. Since not all search engines pay attention to sitemaps, building an index file can help. To do this, visit (at least) the relevant pages in your site and build a “plain vanilla” page that does nothing more than link to all the important content on your site. Save this file as plain HTML and make sure it’s accessible to spiders. They’ll grab the page, and from it find links to everything on your site that you care about. Keep this page current and you should find your content picked up by the search engines pretty quickly and regularly. (This will also be discussed in the next part of this series.)
And that's really about it. If you follow these techniques, your pages will move up as far in the results as the users of the search engine think they should move. If you are delivering some really unique content, you will find yourself moving up fairly quickly. If you aren't, don't expect to be in the top 10.
That's all for now... In the next installment of this series, we'll look at how content management systems can affect your position in search results, or even whether you appear in the results at all.
Related Blogs:
Related Links:
|