MWX/SpamFerret: Difference between revisions

Revision as of 14:50, 2 May 2007

Navigation

computing: software: MediaWiki: fighting spam

Overview

SpamFerret is my attempt at an improvement over the SpamBlacklist extension. It is currently in development.

Purpose

The SpamBlacklist extension has a number of shortcomings:

Can only handle a limited number of entries before exceeding the maximum string-length it can process, at which point all spam is allowed through
Does not keep track of which entries are still being "tried" (to allow for periodic "cleaning" of the list)
Does not keep track of offending IP addresses
Handles only domains; cannot blacklist by URL path (for partially compromised servers) or "catch phrases" found in spam and nowhere else
Does not keep a log of failed spam attempts, so there is no way to measure effectiveness

SpamFerret will:

be database-driven
keep logs and counts of spam attempts by blacklisting and by IP
match domains ("http://*.domain"), URLs ("http://*.domain/path") and catch-phrases ("helo please to forgive my posting but my children are hungary")
- should also be able to match patterns, like long lists of links in a certain format

It may also be unsuitable for use on busier wikis, as the checking process (which only happens when an edit is submitted) may take a fair amount of CPU time (checks the entire page once per blacklisted pattern). This shouldn't be a problem for smaller wikis, which are often monitored less frequently than busier wikis and hence are more vulnerable to spam.

Design

<sql> CREATE TABLE `patterns` (

 `ID` INT NOT NULL AUTO_INCREMENT,
 `Pattern` varchar(255) COMMENT 'pattern to match (regex)',
 `WhenAdded` DATETIME DEFAULT NULL COMMENT 'when this entry was added',
 `WhenTried` DATETIME DEFAULT NULL COMMENT 'when a spammer last attempted to include this pattern',
 `isActive` BOOL COMMENT 'if FALSE, do not include in checking',
 `isURL` BOOL COMMENT 'TRUE indicates that additional URL-related stats may be collected',
 `isRegex` BOOL COMMENT 'TRUE indicates that the string should not be escaped before feeding to preg_match()',
 PRIMARY KEY(`ID`)

) ENGINE = MYISAM;

CREATE TABLE `clients` (

 `ID` INT NOT NULL AUTO_INCREMENT,
 `Address` varchar(15) COMMENT 'IP address',
 `WhenFirst` DATETIME COMMENT 'when this IP address first submitted a spam',
 `WhenLast` DATETIME COMMENT 'when this IP address last submitted a spam',
 `Count` INT COMMENT 'number of attempts',
 PRIMARY KEY(`ID`)

) ENGINE = MYISAM;

CREATE TABLE `attempts` (

 `ID` INT NOT NULL AUTO_INCREMENT,
 `When` DATETIME COMMENT 'timestamp of attempt',
 `ID_Pattern` INT NOT NULL COMMENT '(patterns.ID) matching pattern found',
 `ID_Client` INT NOT NULL COMMENT '(clients.ID) spamming client',
 PRIMARY KEY(`ID`)

) ENGINE = MYISAM; </sql>

MWX/SpamFerret: Difference between revisions

Revision as of 14:50, 2 May 2007

Contents

Navigation

Overview

Purpose

Design

Navigation menu

MWX/SpamFerret: Difference between revisions

Revision as of 14:50, 2 May 2007

Navigation

Overview

Purpose

Design

Navigation menu

Search