|
In the first article in this series, we saw how search engines
work. They have “spiders” which crawl the web looking for pages
to store in a database, a “search page” that people use to find
web pages of interest, and a “search engine” that helps translate
what people ask for into a list of pages that seem to be most
relevant. The process looks like this:

But there is a lot more to the “search engine” block in this
diagram than there is to any other part of the overall search engine
system. As a webmaster, you should be asking yourself several
questions about the search engine at this point:
Once the search engine has found a site, how does it know
what the pages on that site are about? (“How can I help the search
engine figure out what my site is all about?”)
How does the search engine determine which pages to place at
the top of the list and why? (“How can I move my site farther up
the list?”)
What is “Search Engine Spam”? (“How do I avoid looking
like Search Engine Spam to a spider?”)
We'll be discussing the answers to some of these questions in this
article. The rest will be covered in future articles in this series.
How do I get my site listed with the search
engines?
If your site has been around for a
while, the major search engines may already have found it. But if
it's a new site, or one that isn't widely publicized, the search
engines may not know you exist. The first thing you need to do is
make sure they can find your site. If they can't find your site,
neither can your intended audience.
According to SearchEngineWatch.com,
the following search sites accounted for the vast majority of all
search traffic on the Internet during July 2005:
Add those together and you'll find
that they represent 99.4% of the search engine traffic you're likely
to receive at your site. If you want to get listed on each of those
sites, click their names to go (as directly as I can get you) to
their submission page. One thing you may find as you read
information on the other sites is that they draw from Google in one
way or another. AOL's search, for example, is provided by Google.
InfoSpace draws information from Google, Yahoo, and others, but
charges to submit your site to their engine. Since they're providing
less than 1% of the traffic we're likely to get, I don't think that
paying for their service is a smart idea (unless maybe you've
surveyed your potential audience and found that a lot of them use
it).
Once you've submitted your site to one
of these search engines, it will be added to the spider's database of
sites to crawl through. It will look for a file called “ROBOTS.TXT”
(in most cases) that tells it which parts of your site you DON'T want
included in the database. It will then look up your homepage and
scan that page for links to other pages on your site. It will then
visit those pages and scan them for links, and so on. It may index
all of your site in one visit, or it may take several visits over a
period of months to pick everything up. Rest assured that eventually
they will find you and include you in their system.
What does my site look like to a spider?
If you thought a spider
sees your web page the same way you do in Firefox, Internet Explorer,
or Netscape, think again. If you want to see what your site looks
like to a search engine spider, here's a simple test. Launch your
web browser and go to your site's home page. Right-click the page
and select the “View Source” or “View Page Source” option
from the pop-up menu. What you see in the resulting window is just
what the spider will see. Got a fancy Macromedia Flash menu?
Surprise! The spider can't see that. Got lots of complicated frames
and graphics content? The spider can't see that either. All it sees
is the raw HTML output of your server. No graphics, no Macromedia
content, no sounds, nothing but HTML.
Put yourself in the shoes
of the spider. Are there links here that a spider can follow to your
content? If not, you've got a real problem. There should be at
least one HTML link here that the spider can take to delve further
into your site. From there, it should be able to find more... and
more... until it finds everything you'd consider “relevant” on
your site. The more levels deep that the spider has to go, the
longer and the less-likely it is that the content at that level will
ever be fully indexed by the spider. This is because most of the
spiders are coded with “limits” to keep them from spending too
much crawl time at any one site. If the spider is allotted 1 minute
per site, for example, and it can only get about 3 levels deep into
your 7 level site by that time, the content you've got at levels 4
and above will probably never show up in a search engine.
One way to help the search
engine spiders help you is to provide them with an index to your
site. If you can do it, I recommend creating a single page that
lives very close to the top level of your site's hierarchy that
contains a link to every important piece of content you have. This
way, the spider will find that content early on in its search and
there is a better chance it will get everything you have to offer.
With respect to Google,
there is a feature called “Sitemaps” that they're working on.
You can help them work on it by submitting your site to them and
including a “sitemap.xml” file for their spider to pick up. This
file (like the index we discussed above), gives them a list of every
relevant page on your site, your own ranking as to how important that
site is to you to be indexed, and an indicator of how often you
expect that page to change. With this file, Google's spider is
better able to judge what content is available at your site to crawl
through and what order it should follow to crawl through it. There
is a good chance that this will improve your overall coverage in that
search engine. Looking long-term, I expect the other search engines
to pick up on this file eventually and incorporate it into their
processes, too. Thus, it may help you with sites other than Google.
How does the search engine classify the pages the
spiders find?
When a spider crawls
through a page on your site, it will place a copy of that page in a
“cache” on the search engine database. The search engine will
scan through that cached page to determine what words and phrases it
finds there. It will then “link” your page to those words and
phrases, known as “keywords” and “keyphrases”. The more
times it finds a particular keyword or keyphrase mentioned on your
page, the more it thinks your page is relevant to searches for that
particular keyword or keyphrase. Thus, it's important to make sure
that the keywords people are likely to use to find your site appear
on it as often as possible without making it sound ridiculous.
For instance, if I create a
page about the Model T Ford automobile, it might be the best and most
useful page about that car on the entire Internet. But if I only
mention the phrase “Model T Ford” 2-3 times in that page, while
someone else mentions it 20 or 30 times in the same size page, the
search engines (generally speaking) are going to think that other
page is “more relevant” than mine and place it higher up in the
list of results. Chances are that people will visit the pages higher
in the list before getting to mine, and they might find their answers
before they get to my site. In this example, that may be
disappointing to me (if I'm a Model T collector, for instance, and I
want to share what I know) or it could be catastrophic (if I'm
selling Model T accessories and no one ever comes to my site to buy
them).
This can be carried too
far. If you simply filled the page with the phrase “Model T Ford”
over and over again, you could theoretically go way up in the search
engine rankings. However, your page would read like gibberish to a
human being who visited it, and they'd make a mental note never to go
to your ridiculous site ever again. Similarly, if you mentioned that
phrase in every sentence, your reader is going to get very annoyed
with you. So the key is to use the key words and phrases as often as
you can without making the page appear silly or garbled. Besides,
people called “search engine spammers” have used this tactic
(filling a page up with commonly-searched words and phrases) in the
past and most of the search engines contain safeguards for preventing
such a page from ever getting very high in the results. In fact, if
they suspect you of doing it, they may very well delete all the pages
on your site from their database and never visit your site again.
That's all for now...
In the next installment of this
series, we'll take a closer look at how search engines rank the pages
they display in their search results to users. In doing so, we'll
learn ways we can improve our site's ranking in the results and thus
get more traffic to our site.
Related Blogs:
Related Links:
|