Saturday, 13 June 2009

Defeating the spammers

This month we've received an influx of backlink spammers setting up accounts and posting only links to sites they've set up. Apparently some site out there that does this for a living found our site, due to it's high page rank, and recommended us for it's backlink spamming newsletter. As a result we've been spending most waking moments checking who has signed up and deleting accounts that only set up 'spam gardens'.

We've also changed the site so that when a 'spam garden' is set up, and we don't notice it then we've put nofollow attributes to all links posted on the site. This pretty much renders the links as useless in the eyes of search engines and hopefully will deter these type of spammers over time.

Because we use RedCloth for all our text formatting on the site we can manipulate the text before rendering it to the site. This code sits in our initializers folder as redcloth_extenstions.rb of our Rails application:

module RedCloth::Formatters::HTML
include RedCloth::Formatters::Base

def link(opts)
"<a href=\"#{escape_attribute opts[:href]}\"#{pba(opts)} rel=\"nofollow\">#{opts[:name]}</a>"
end

def inline_html(opts)
no_follow(opts[:text])
end

private

def no_follow(text)
tokenizer = HTML::Tokenizer.new(text)
out = ''
while token = tokenizer.next
node = HTML::Node.parse(nil, 0, 0, token, false)
if node.tag? and node.name.downcase == 'a'
node.attributes['rel'] = 'nofollow' unless node.attributes.nil?
end
out << node.to_s
puts out
end
out
end
end


The link override is the inbuilt link generator for textile, so, anyone who creates a link using the textile format will have the nofollow attribute added to their link.

The inline_html is called to check for html generated by the user (as we allow basic html support). This scans the html and looks for link tags then adds the nofollow automatically.

Without using RedCloth as our markup generator I'm not sure how we would have gone about in adding these attributes.