How Search Engines Work
The first basic truth you need to learn about SEO is that search
engines are not humans. While this might be obvious for everybody, the
differences between how humans and search engines view web pages
aren’t. Unlike humans, search engines are text-driven. Although
technology advances rapidly, search engines are far from intelligent
creatures that can feel the beauty of a cool design or enjoy the sounds
and movement in movies. Instead, search engines crawl the Web, looking
at particular site items (mainly text) to get an idea what a site is
about. This brief explanation is not the most precise because as we
will see next, search engines perform several activities in order to
deliver search results – <span
class=”moz-txt-tag”>/crawling/,
/indexing<span
class=”moz-txt-tag”>/, <span
class=”moz-txt-tag”>/processing/,
/calculating
relevancy/, and <i
class=”moz-txt-slash”>/retrieving<span
class=”moz-txt-tag”>/.
First, search engines *crawl<span
class=”moz-txt-tag”>* the Web to see what is there. This
task is performed by e piece of software, called a <i
class=”moz-txt-slash”>/crawler<span
class=”moz-txt-tag”>/ or a <span
class=”moz-txt-tag”>/spider/
(or Googlebot, as is the case with Google). Spiders follow links from
one page to another and index everything they find on their way. Having
in mind the number of pages on the Web (over 20 billion), it is
impossible for a spider to visit a site daily just to see if a new page
has appeared or if an existing page has been modified. Sometimes
crawlers will not visit your site for a month or two, so during this
time your SEO efforts will not be rewarded. But there is nothing you
can do about it, so just keep quiet.
What you can do is to check what a crawler sees from your site. As
already mentioned, crawlers are not humans and they do not see images,
Flash movies, JavaScript, frames, password-protected pages and
directories, so if you have tons of these on your site, you’d better
run the *Spider
Simulator* below to see if these
goodies are viewable by the spider. If they are not viewable, they will
not be spidered, not indexed, not processed, etc. – in a word they will
be non-existent for search engines.
After a page is crawled, the next step is to <span
class=”moz-txt-tag”>*index*
its content. The indexed page is stored in a giant database, from where
it can later be retrieved. Essentially, the process of indexing is
identifying the words and expressions that best describe the page and
assigning the page to particular keywords. For a human it will not be
possible to process such amounts of information but generally search
engines deal just fine with this task. Sometimes they might not get the
meaning of a page right but if you help them by optimizing it, it will
be easier for them to classify your pages correctly and for you – to
get higher rankings.
When a search request comes, the search engine <span
class=”moz-txt-tag”>*processes*
it – i.e. it compares the search string in the search request with the
indexed pages in the database. Since it is likely that more than one
pages (practically it is millions of pages) contains the search string,
the search engine starts <span
class=”moz-txt-tag”>*calculating the relevancy<span
class=”moz-txt-tag”>* of each of the pages in its index to
the search string.
There are various algorithms to calculate relevancy. Each of these
algorithms has different relative weights for common factors like
keyword density, links, or metatags. That is why different search
engines give different search results pages for the same search string.
What is more, it is a known fact that all major search engines, like
Yahoo!, Google, MSN, etc. periodically change their algorithms and if
you want to keep at the top, you also need to adapt your pages to the
latest changes. This is one reason (the other is your competitors) to
devote permanent efforts to SEO, if you’d like to be at the top.
The last step in search engines’ activity is <span
class=”moz-txt-tag”>*retrieving*
the results. Basically, it is nothing more than simply displaying them
in the browser – i.e. the endless pages of search results that are
sorted from the most relevant to the least relevant sites.
Leave a Comment
- « Older Entries
- Newer Entries »