|
This article is intended to be the first in a series of articles on this web site about search engines, optimizing your site for search engines, improving your site ranking, etc. It explains at a high level what a search engine is, what it does, and how it does it. This provides the basis for future discussions.
The World Wide Web contains many
millions of web sites. Those web sites contain tens or hundreds of
individual pages of content (information). If there was no such
thing as a search engine, finding the page or pages you want out of
the millions available would be at best a difficult and problematic
exercise, and at worst, frustrating and impossible. This is the void
that search engines fill in the Internet. You can tell a search
engine that you’re interested in “Ancient burial masks” and it
will try to find pages from the millions available that talk about
that subject. It will then give you a list of what it found and let
you decide which ones might be helpful to you.
How does it do that?
At a high level, how search engines work is quite simple, but it quickly becomes complicated if you
start asking questions like “how does the search engine know to put
Ford Motor Company’s web site first when I ask about the 2006
Mustang?”. Let’s start with the simple answer, then dig more
deeply into it as we go. There are three basic parts to a search
engine: the “spider”, the search engine with its “database”, and the “search
page”.

If you’ve ever done a search on Google, MSN Search, Yahoo, etc.,
you’ve seen the “search page” or "search engine user interface". That’s the “public face”
the search engine displays to the world. The information that the
search page gives you comes from something you can’t really see,
which is the “database” used by the "search engine". The database is something like the
index at the back of a printed book. It contains lists of words and
phrases (known as “keywords” and “keyphrases”) and pointers
to web pages that contain those words and phrases. It also keeps
track of how “relevant” a particular page is to a particular
keyword/keyphrase and much more (some of which we’ll discuss later
on). When you type in a request like “Ford Mustang 2006
specifications”, the search engine (more or less) looks in its
index/database and (hypothetically) finds the keyphrase “Ford
Mustang” associated with 10 million web pages, sees that “2006”
appears on only 200 of those pages, then finds that “specifications”
appears on only 20 of those 200. It then displays the final 20 pages
for you, in the order that it thinks makes the most sense based on
what its database tells it about those 20 pages.
If all you do is use a search engine to
find web pages that interest you, the above is probably all you will
ever need to know (i.e., I type something in, it finds pages that are
probably relevant, and shows me a list of them). But if you actually
manage a web site and create web content, you should want
to know a lot more. The first question you should be asking is
“Where does the search engine’s database get the information that
it has about web pages?” This is where the “spider” comes in.
Introducing the Search Engine Spider
The search engine’s “spider”
isn’t an 8-legged arachnid, but a sophisticated software program
written by the search engine’s programming staff. Like it’s
creepy namesake, a “spider” spends its time “crawling” the
World Wide Web looking for web pages that aren’t in its database,
and updating the information it has about pages that already exist
there (such as “this page isn’t there anymore” or “this page
has been updated”). In reality, the search engine probably has
hundreds or thousands of spiders constantly crawling the web and
updating the information in the search engine database at any given moment. Those
spiders and the database exist on a large number of computers in the
search engine’s data center, and are hooked up to the Internet.
But wait, there's more...
This, at the highest level, is what a
search engine is and what it does. It sends “spiders” out to
crawl over the World Wide Web, collecting information about the web
pages it encounters there. It stores that information in a giant
“database” which helps the search engine figure out how to
respond to requests people enter on the “search page”. But there's more to it than that, which we'll discuss in the next article in this series.
Related Blogs:
Related Links:
|