SEO:History of SEO
SEO began in the mid-1990s, as the first search engines were cataloging the early Web. Initially, all a webmaster needed to do was submit a site to the various engines which would run spiders, programs to "crawl" the site, and store the collected data. The search engines then sorted the information by topic, and serve results based on pages they had spidered. As the number of documents online kept growing, and more webmasters realised the value of organic search listings, it became imperative for search engines to sort the vast collection of pages they had spidered and display the most relevant pages first. This was the start of a search engine vs. SEO struggle that continues to this day.
Initially, search engines were guided by the webmasters themselves. Early versions of search algorithms relied on webmaster-provided information like meta tags. Meta tags provided a guide to each page's content and relevant keywords. Soon some webmasters began to abuse meta tags, causing their pages to rank for irrelevant searches. In response, search engines developed more complex algorithms, taking into account a wider range of factors, but they still relied largely on what are today known as "on-site" factors. Examples of on-site factors include:
Keywords in the domain name
Keywords in the site's directory and file names
Page titles and tags: for example, a phrase marked up as an H1 (heading) element was considered to contain keywords relevant to the page
Ratio of the keyword(s) to other words on the page, the keyword density
Proximity of individual keywords that appear in the search phrase to each other in the text, also known as "keyword proximity"
Content of alternate text provided in the form of Alt attributes for images, noframes text for browsers not able to display framed pages, etc.
The inherent flaw in relying so extensively on those factors was that webmasters and SEOs had full control over them and could "optimize" their pages for better rankings. Search engines had to adapt again to ensure their SERPs showed the most relevant pages rather than the best optimized ones.
A new search engine emerged with a new kind of thinking. Google was started by two PhD students at Stanford University, Sergey Brin and Larry Page, and brought a new concept to ranking web pages. This concept, called PageRank, was, for many years, the mainstay of the Google algorithm [1]. PageRank relied heavily on incoming links and used the logic that each link to a page is a vote for that page's value. The more incoming links a page had the more "worthy" it was. The value of each incoming link itself varied directly based on the PageRank of the page it was coming from and inversely on the number of outgoing links on that page. PageRank proved to be very good at serving relevant results. Google became the most popular and successful search engine. Because PageRank measured an off-site factor, it was more difficult to manipulate - at first.
But manipulated it was. Given time, and the realization that PageRank was the new game in town, webmasters focused on exchanging, buying, and selling links on a massive scale. PageRank's reliance on the link as a vote of confidence in a page's value was undermined as many webmasters sought to garner links purely to influence Google into sending them more traffic, irrespective of whether the link was useful to human site visitors.
It was time for Google—and other search engines—to look at a wider range of off-site factors. There were other reasons to develop more intelligent algorithms. The Internet was reaching a vast population of non-technical users who were often unable to use advanced querying techniques to reach the information they were seeking and the sheer volume and complexity of the indexed data was vastly different from that of the early days. Search engines had to develop predictive, semantic, linguistic and heuristic algorithms.
The PageRank metric itself is still displayed in the Google Toolbar, but it is only one of several factors that Google considers in ranking pages.
Today, most search engines keep their methods and ranking algorithms secret. A search engine may use hundreds of factors in ranking the listings on its SERPs; the factors themselves and the weight each carries may change continually.
Much current SEO thinking on what works and what doesn't is largely speculation and informed guesses. Some SEOs have carried out controlled experiments to gauge the effects of different approaches to search optimization.
The following, though, are some of the considerations search engines could be building into their algorithms, and the list of Google patents [2] may give some indication as to what is in the pipeline:
Age of site
Length of time domain has been registered
Age of content
Regularity with which new content is added
Age of link and reputation of linking site
Standard on-site factors
Negative scoring for on-site factors (for example, a dampening for sites with extensive keyword meta tags indicative of having being SEO-ed)
Uniqueness of content
Related terms used in content (the terms the search engine associates as being related to the main content of the page)
External links, the anchor text in those external links and in the sites/pages containing those links
Citations and research sources (indicating the content is of research quality)
Stem-related terms in the search engine's database (finance/financing)
Incoming backlinks and anchor text of incoming backlinks
Negative scoring for some incoming backlinks (perhaps those coming from low value pages, reciprocated backlinks, etc.)
Rate of acquisition of backlinks: too many too fast could indicate "unnatural" link buying activity
Text surrounding outward links and incoming backlinks. A link following the words "Sponsored Links" could be ignored
Use of "rel=nofollow" to suggest that the search engine should ignore the link
Depth of document in site
Metrics collected from other sources, such as monitoring how frequently users hit the back button when SERPs send them to a particular page
Metrics collected from sources like the Google Toolbar, Google AdWords/Adsense programs, etc.
Metrics collected in data-sharing arrangements with third parties (like providers of statistical programs used to monitor site traffic)
Rate of removal of incoming links to the site
Use of sub-domains, use of keywords in sub-domains and volume of content on sub-domains… and negative scoring for such activity
Semantic connections of hosted documents
Rate of document addition or change
IP of hosting service and the number/quality of other sites hosted on that IP
Other affiliations of linking site with the linked site (do they share an IP? have a common postal address on the "contact us" page?)
Technical matters like use of 301 to redirect moved pages, showing a 404 server header rather than a 200 server header for pages that don't exist, proper use of robots.txt
Hosting uptime
Whether the site serves different content to different categories of users (cloaking)
Broken outgoing links not rectified promptly
Unsafe or illegal content
Quality of HTML coding, presence of coding errors
Actual click through rates observed by the search engines for listings displayed on their SERPs
Hand ranking by humans of the most frequently accessed SERPs
|