Search
Enter Keywords:
Home
Introduction to Search Engines PDF Print E-mail
User Rating: / 0
PoorBest 
Written by Michael Salsbury   
Tuesday, 06 September 2005

This article is intended to be the first in a series of articles on this web site about search engines, optimizing your site for search engines, improving your site ranking, etc.  It explains at a high level what a search engine is, what it does, and how it does it.  This provides the basis for future discussions.

The World Wide Web contains many millions of web sites. Those web sites contain tens or hundreds of individual pages of content (information). If there was no such thing as a search engine, finding the page or pages you want out of the millions available would be at best a difficult and problematic exercise, and at worst, frustrating and impossible. This is the void that search engines fill in the Internet. You can tell a search engine that you’re interested in “Ancient burial masks” and it will try to find pages from the millions available that talk about that subject. It will then give you a list of what it found and let you decide which ones might be helpful to you.

How does it do that?


At a high level, how search engines work  is quite simple, but it quickly becomes complicated if you start asking questions like “how does the search engine know to put Ford Motor Company’s web site first when I ask about the 2006 Mustang?”. Let’s start with the simple answer, then dig more deeply into it as we go. There are three basic parts to a search engine: the “spider”, the search engine with its “database”, and the “search page”.




If you’ve ever done a search on Google, MSN Search, Yahoo, etc., you’ve seen the “search page” or "search engine user interface". That’s the “public face” the search engine displays to the world. The information that the search page gives you comes from something you can’t really see, which is the “database” used by the "search engine". The database is something like the index at the back of a printed book. It contains lists of words and phrases (known as “keywords” and “keyphrases”) and pointers to web pages that contain those words and phrases. It also keeps track of how “relevant” a particular page is to a particular keyword/keyphrase and much more (some of which we’ll discuss later on). When you type in a request like “Ford Mustang 2006 specifications”, the search engine (more or less) looks in its index/database and (hypothetically) finds the keyphrase “Ford Mustang” associated with 10 million web pages, sees that “2006” appears on only 200 of those pages, then finds that “specifications” appears on only 20 of those 200. It then displays the final 20 pages for you, in the order that it thinks makes the most sense based on what its database tells it about those 20 pages.

If all you do is use a search engine to find web pages that interest you, the above is probably all you will ever need to know (i.e., I type something in, it finds pages that are probably relevant, and shows me a list of them). But if you actually manage a web site and create web content, you should want to know a lot more. The first question you should be asking is “Where does the search engine’s database get the information that it has about web pages?” This is where the “spider” comes in.

Introducing the Search Engine Spider

The search engine’s “spider” isn’t an 8-legged arachnid, but a sophisticated software program written by the search engine’s programming staff. Like it’s creepy namesake, a “spider” spends its time “crawling” the World Wide Web looking for web pages that aren’t in its database, and updating the information it has about pages that already exist there (such as “this page isn’t there anymore” or “this page has been updated”). In reality, the search engine probably has hundreds or thousands of spiders constantly crawling the web and updating the information in the search engine database at any given moment. Those spiders and the database exist on a large number of computers in the search engine’s data center, and are hooked up to the Internet.

But wait, there's more...

This, at the highest level, is what a search engine is and what it does. It sends “spiders” out to crawl over the World Wide Web, collecting information about the web pages it encounters there. It stores that information in a giant “database” which helps the search engine figure out how to respond to requests people enter on the “search page”.  But there's more to it than that, which we'll discuss in the next article in this series.


SEO ELITE - Search Engine Optimization Software

Related Blogs:

Related Links:

Last Updated ( Thursday, 30 March 2006 )
< Previous   Next >

Main Menu
Home
Blog
Photos
Links
Search
Site Index
Feedback
Administrator
Featured Links
BlogInspiration
SpamToons
Shawn Prince's Blog
Jack Ludwig's Blog
Mike Cramer's Site
Fark
Slashdot
Woot!
Cigar Envy
John Kricfalusi's Blog
CigarBlog 101
Cigars 101 Forum
Sponsored Links


View Site Stats