What is a Search Engine

Posted by Shane Quigley on August 8th, 2006

Digital Marketing, SEO

90% of all websites visited on the Internet are found through search engines. They are essentially a mechanism for consumers to find the product, service and information they require, from a staggeringly vast amount of information.

Things are by no means as simple as they may appear where search engines are involved. This guide is a framework for designing and optimising websites to perform well in any search engine but predominantly Google, since it accounts for, in my experience, up to 85% of all traffic generated through search engines.

How do they work?
Search engines do not simply search all websites on the Internet and then present you with the most relevant list of sites, it much more complicated. If search engines were to check every site in “real time”, it would take months to return any results. You know how long it takes to search something as simple as your emails for a word or phrase; the Internet is infinitely larger and increasingly more complex. So the search engines had to find a more manageable solution to the problem. This is where software programs called robots and spiders come into the frame.

They move around the Internet, extracting content and other items from the websites it finds and stores these details in its own database, this is called crawling or indexing. Once they have collected this information they then analyze it on a wide number of different factors and attach certain scores to the site for various attributes such as, keyword occurrence, number of links and amount of content.

So essentially, when you use a search engine like Google, you are searching their database, not the Internet directly. That is why it is vitally important to make sure your site has been submitted to the main feeder engines, which means you will at least have a chance of being found.

Understanding the calculations the different search engines use to appraise sites that is so important in achieving high rankings. If we know how to optimise sites for search engines, we can begin to see results and this means more traffic, which could equal more business. 90% of all websites visited on the Internet are found through search engines. They are essentially a mechanism for consumers to find the product, service and information they require, from a staggeringly vast amount of information.

Things are by no means as simple as they may appear where search engines are involved. This guide is a framework for designing and optimising websites to perform well in any search engine but predominantly Google, since it accounts for, in my experience, up to 85% of all traffic generated through search engines.

How do they work?
Search engines do not simply search all websites on the Internet and then present you with the most relevant list of sites, it much more complicated. If search engines were to check every site in “real time”, it would take months to return any results. You know how long it takes to search something as simple as your emails for a word or phrase; the Internet is infinitely larger and increasingly more complex. So the search engines had to find a more manageable solution to the problem. This is where software programs called robots and spiders come into the frame.

They move around the Internet, extracting content and other items from the websites it finds and stores these details in its own database, this is called crawling or indexing. Once they have collected this information they then analyze it on a wide number of different factors and attach certain scores to the site for various attributes such as, keyword occurrence, number of links and amount of content.

So essentially, when you use a search engine like Google, you are searching their database, not the Internet directly. That is why it is vitally important to make sure your site has been submitted to the main feeder engines, which means you will at least have a chance of being found.

Understanding the calculations the different search engines use to appraise sites that is so important in achieving high rankings. If we know how to optimise sites for search engines, we can begin to see results and this means more traffic, which could equal more business.

More about search engine robots:
Search engine robots, as mentioned earlier, are the tools search engines use to gather content and information regarding the different sites available on the Internet. They are sometimes called “spiders”, “bots” or “crawlers”, due to the way they move from link to link, through a site.

The robots seek out web sites, checking for new sites, new pages and any changes to existing content. Once they have gathered this information, they pass it into the database for “indexing”, where it is evaluated.

There are 3 possible reasons a robot will visit your site:

  1. You submitted the URL to the search engine through its submission pages.
  2. The robot has found your site from another website linking to you, know as an
    external link.
  3. The robot knows you exist and is checking to see if your content has changed or been updated.

Robots are the first key to search engine to search engine optimisation, if you do not understand how they move around the site and what kind of navigation system it can follow, then your content is irrelevant because the robot won’t be able to find it.

Robots, at the moment, are relatively simple pieces of programming that have evolved from simple text browsers used in the early days of web browsing, when the Internet was used as a military information resource. They are able to read most text and code but struggle with the following:

‒ Frames and frame sets, this is why they have been largely abandoned by designers.
‒ Flash animations and navigation systems.
‒ Invalid code or coding practices.
‒ Text contained in images.
‒ Dynamically created URLs.
‒ Some Javascript and Javascript navigation systems.

Assuming the robots can find your site, the first thing they will check is your “robots.txt”, Not all sites have one, if you do it will be in what’s called your root directory. The base folder your site is stored in, and it informs the robot if it is allowed to index your site, it is then directed by this file, where it can and cannot go.

Your statistics package for your website should tell you which robots have been visiting you, how often they are coming and how many pages they are indexing per visit. The main three robots are Yahoo’s, Google’s and Microsoft’s, respectively identified as Inktomi Slurp, and Googlebot MSN Bot.

In order to do well in search engines you want them to visit you every day and do comprehensive, deep level indexing. So, what happens once all your information has been gathered? The search engines index the content!

Search engine indexing:
Once your content has been extracted, it is indexed. Each search engine has its own unique system for evaluating this information called an algorithm, essentially a complex mathematical equation that weighs up all the factors discussed in this document, compares them to every other site and allocates a point score to your site.

Because of the unique nature of these algorithms, it does not necessarily follow that a high ranking in Google will give the same result in Yahoo. The search companies are constantly tweaking and changing their algorithm in order that their engine be the one to offer the most relevant results from your keyword search and you will therefore use their service again. They also constantly refine their weighting techniques to make sure they are avoiding dirty tricks used by unethical optimisers to give artificially high rankings to irrelevant pages. It is for this reason that employing the services of a search engine optimisation firm who live and breath these rules, on a daily basis, is imperative. The layman cannot hope to keep up with these changes.

As I write this information now, the search companies are working on their next generation robots with much higher levels of intelligence that will change the face of optimisation, at least for a while anyway. In my opinion, Google is by far the most fair, impartial and intelligent search engine around. It does not allow paid listings to effect its main results and it pushes over 85% of traffic through the sites I monitor.

Leave a Reply