| CARVIEW |
Spam filtering with Defender
2009-11-11 21:19:00 UTC
For a long time I've loved Akismet for doing spam filtering in comments on blogs, but lately it has failed to catch most spam. So now I've moved over to Defensio, which is a good, adaptive, spam filter. By adaptive I mean that it's constantly learning what is spam and what is not. If it erroneously marks a ham comment as spam or the other way 'round, you can correct it and it learns. The other day I wrote an API wrapper for Defensio called Defender. There already is one, RDefensio, but that's more of a "raw" wrapper. With Defender you would hopefully get a more ruby-ish feel.
In this post I will tell you how to install Defender and make a very simple blog to demonstrate it. To try this, you need to get an API key from Defensio. Where it says to enter a URL, you should enter the one where you're going to host your real blog. It doesn't care if the request comes from your server, localhost or even Google, as long as you set the same URL in Defender. So let's start setting up a very basic blog.
$ rails blog
$ cd blog/
First you'll want to install Defender. If you haven't already, you should add gemcutter to your sources list by doing gem install gemcutter and probably gem tumble, but the latter isn't required. After you've done that, simply install Defender.
blog$ sudo gem install defender
Please note that this will install the latest version of Defender. As I write this post that is 0.1.1, but if you are installing a later one, please note that the function definition may change. To load defender, just add it to your config/environment.rb file.
# config/environment.rb
Rails::Initializer.run do |config|
# ...
config.gem "defender"
# ...
end
I'm going to use Ryan Bates' nifty-generators for the file generations here. If you don't have it install, simply install the gem nifty-generators. These use some helpers that we want to add, and the nifty-generators include a generator to do just that.
./script/generate nifty_layout
Incidentally, this also adds a nice little design to your site which you should recognize if you've watched railscasts.
Nifty-generators also include a generator to make an application config file. This is a nice way to centralize all of the config (including the Defender config) in one place. Just generate it by running
./script/generate nifty_config
And then add your Defensio credentials to the config.
development:
defender:
api_key: "key1234"
# Note that the URL below needs to include "https://"
# or "https://"
owner_url: "https://binaryhex.com"
Let's do some scaffolding and generate the models.
./script/generate nifty_scaffold post title:string content:text
./script/generate nifty_scaffold comment post_id:integer author:string website:string email:string ip:string defensio_sig:string spam:boolean spaminess:float content:text create
While the attributes for the post model should be fairly self-explanatory, a couple of the attributes may need some explanation:
- defensio_sig: This is used to false positives or negatives so defensio will learn. While it's not needed in this tutorial, it's nice to store it so you have it when you want to implement the functionality.
- spaminess: Defensio assigns each comment a "spaminess" value of between 0 and 1, where 0 is least spammy, and 1 is most spammy. You could use this to sort the spam comments in an admin panel, so the least spammy would come on top, and the obvious ones further down. Or maybe you want color coding? Or both?
We only need the create action for this tutorial, so that's why there's a create at the end of the comment scaffolding line. If you want to, you can add all of the actions, but you probably want to make them accessible only by an admin. That way you can easily see comments marked as spam, and un-mark them, and vice versa.
We need to edit the model files a bit, especially the comment model to make it check for spaminess.
# app/models/post.rb
class Post < ActiveRecord::Base
attr_accessible :title, :content
has_many :comments
end
# app/models/comment.rb
class Comment < ActiveRecord::Base
attr_accessible :author, :website, :email, :content
belongs_to :post
# With these named_scopes we can easily
# filter out the (non)spammy comments
named_scope :not_spam, :conditions => { :spam => false }
named_scope :spam, :conditions => { :spam => true }
before_save :check_spam
def check_spam
defender = Defender.new(APP_CONFIG[:defender].symbolize_keys)
response = defender.audit_comment(
:user_ip => self.ip,
:article_date => post.created_at,
:comment_author => self.author,
:comment_type => "comment",
:comment_content => self.content,
:comment_author_email => self.email,
:comment_author_url => self.website)
self.spam = response.spam?
self.spaminess = response.spaminess
self.defensio_sig = response.signature
true
end
end
Note that you should remove some of the fields in attraccessible in the comment model. attraccessible sets which fields you can edit from a form, and you don't want someone to just tweak the form so nothing would be marked as spam, would you?
The before_save function will make sure that the given function will be called before each time you save a comment.
Let's update the post views so we can view and create comments.
# app/views/posts/show.html.erb
<% title h(@post.title) %>
<p><%= @post.content %></p>
<div id="comments">
<% for comment in @post.comments.not_spam %>
<p>
<strong>
<%= link_to h(comment.author), comment.website %>
</strong> at
<strong><%= comment.created_at %></strong>:<br>
<%=h comment.content %>
</p>
<% end %>
</div>
<% form_for [@post, @comment] do |f| %>
<%= f.error_messages %>
<p>
<%= f.label :author, "Name" %><br>
<%= f.text_field :author %>
</p>
<p>
<%= f.label :email %><br>
<%= f.text_field :email %>
</p>
<p>
<%= f.label :website %><br>
<%= f.text_field :website %>
</p>
<p>
<%= f.label :content %><br>
<%= f.text_area :content %>
</p>
<p><%= f.submit "Submit" %></p>
<% end %>
<p>
<%= link_to "Edit", edit_post_path(@post) %> |
<%= link_to "Destroy", @post,
:confirm => 'Are you sure?',
:method => :delete %> |
<%= link_to "View All", posts_path %>
</p>
The form_for with an array is for the form to point to the right URL, this way it will point to /posts/<id>/comments. To make it work, you need to open up the controller and add this line to the show method:
@comment = Comment.new
To make this work, we should edit our routes a bit. Remove the line with map.resources :comments, and edit the one with :posts to this:
map.resources :posts, :has_many => :comments
This way we can pass the post_id in the URL in a nice way when creating comments.
Let's update the comments controller so it passes the values not sent from the form. We should also add a way to notify the user about the comment being marked as spam, so they know what is happening.
# app/controllers/comments_controller.rb
class CommentsController
def create
@comment = Comment.new(params[:comment])
@comment.ip = request.remote_ip
@comment.post_id = params[:post_id]
if @comment.save
if @comment.spam?
flash[:error] = "We're sorry, but that comment " \
"looked like spam to our spam filter. " \
"Please contact the site admin at " \
"you@example.com if you feel this was " \
"an error."
else
flash[:notice] = "Successfully created comment."
end
end
redirect_to @comment.post
end
end
Migrate your database (rake db:migrate) and startup your server (script/server), time for some testing. Just create some posts and comments (both spam and non-spam to test if everything works). If it works, great! If not, submit a comment and I or someone else might look at it (I do not give support via email for this).
Congratulations! You now have spam filtering. If you want some more challenges, you could try to implement the "learning" functions (Defender#report_false_positives and Defender#report_false_negatives), and maybe some statistics (Defender#statistics). You would also want to "announce your blogposts", as defensio uses these to learn what isn't spam. Take a look at Defender#announce_article in the docs.
I hope you liked this tutorial and the library, please give me feedback, I'd be delighted. If you find a bug with the library, please report it in the bug tracker.
Comments
Josh Johnson at 2009-11-11 22:44:49 UTC:
First post!
Henrik Hodne at 2009-11-13 21:44:03 UTC:
Wohoo, I got featured on Ruby5.
https://ruby5.envylabs.com/episodes/28-episode-27-november-13-2009 at 4:00.
Thanks a lot to the Ruby5 guys for featuring this.
Henrik Hodne
henrik.hodne@binaryhex.com
Design by Tom Preston-Werner.
Tweaked by Henrik Hodne.