Link Spam
Link spam
(also called blog spam
or comment spam
) is a form of spamming or spamdexing that recently became publicized most often when targeting weblogs (or blogs), guestbooks, and online discussion boards. Any web application that displays hyperlinks submitted by visitors or the referring URLs of web visitors may be a target.
Adding links that point to the spammer's web site increases the page rankings for the site in the search engine Google. An increased page rank means the spammer's commercial site would be listed ahead of other sites for certain Google searches, increasing the number of potential visitors and paying customers.
History
Link spamming originally appeared in internet guestbooks, where spammers repeatedly fill a guestbook with links to their own site and no relevant comment to increase search engine rankings. If an actual comment is given it is often just "cool page", "nice website", or keywords of the spammed link.
In 2003, spammers began to take advantage of the open nature of comments in the blogging software like Movable Type by repeatedly placing comments to various blog posts that provided nothing more than a link to the spammer's commercial web site. Jay Allen created a free plugin, called MT-BlackList, for the Movable Type weblog tool that attempts to alleviate this problem. Many current blog software now have methods of preventing or reducing the effect of blog spam.
Link Spam Solutions
Instead of displaying a direct hyperlink submitted by a visitor, a web application could display a link to a script on its own website that redirects to the correct URL. This will not prevent all spam since spammers do not always check for link redirection but has proven very effective. Redirecting links prevent Google from factoring the link in its PageRank algorithm for that site making the spam ineffective. An added benefit is that the redirection script can count how many people visit external URLs, although it will increase the load on the site.
Another option is for the script to be client-side JavaScript. For example,
would work as a link but not be picked up by Google. Moreover, the javascript could be more complicated to ensure that the link would never be picked up since it was encoded. For example,
where 'hfksksgjlsll' is an encoded URL that is decoded by the javascript function redirectFunction which presumably is stored in the HEAD tag of the page. A downside of this is that visitors who have disabled Javascript in their browser would be unable to follow the links.
This kind of redirection can also be done via the .htaccess file in Apache, thus saving the load of a script.
Another way of preventing PageRank leakage without using client-side JavaScript or .htaccess file is the public redirection service like a TinyURL or My-Own.Net. For example,
where 'alias_of_target' is the alias of target address.
No follow
In early 2005 Google introduced an HTML attribute that disables the assignment of ranking credits for a particular link. This is a much easier solution that makes the improvised techniques above irrelevant. Most weblog software now comes with this enabled by default (and no option to disable it without code modification) adding the nofollow
attribute to reader-submitted links:
However, some weblog authors object to using the attributes, due to concerns over the motives for its introduction (the large amount of inter-linking between blogs makes search engine algorithms less accurate) and its effectiveness, since a spambot does not know whether its target is using 'nofollow' or not.
Turing tests
Various methods requiring humans to do spamming by hand have been attempted. A variety of captcha gateways have been implemented, in an effort to prevent bots from submitting entries. Drawbacks to this are the annoyance it poses for regular users, the lack of any alternative for visually impaired users, and the ability of some advanced bots to fool simple captchas most of the time.
Specific anti-spam methods
Particularly popular software products such as Movable Type and MediaWiki have developed their own custom anti-spam measures, as spammers focus more attention on targeting those platforms. Whitelists and blacklists that prevent certain IPs from posting, or that prevent people from posting content that matches certain filters, are common defenses. More advanced access control lists require various forms of validation before users can contribute anything like linkspam.
The goal in every case is to allow good users to continue to add links to their comments, as that is considered by some to be a valuable aspect of any comments section.
|