During the Fall semester of 2011,I took a Programming Languages class. The class was divided up into groups and each group selected a programming language on which to report. I had always been eager to learn Forth and apply it somehow to my studies,so I convinced my group that this obscure (but interesting) language would serve us well.

One of the many things I did for my team,besides configure a Linux Forth distribution,was to set-up a subversion server and a wiki to facilitate working together as a group. Eager to use the power of this webserver,I set up the wiki located here,so we could contribute as we learned. I have always felt strongly towards wikis as a learning tool. Both of these worked great for collaborations,as members of the group often had conflicting schedules. The wiki,like all wikis,was set to a default-open stance,so everybody (and anybody who discovered it) could contribute to ideas about Forth and add resources.

However,after the class was finished,the Forth wiki lay dormant in its default-open stance,in the hope that others would discover it and contribute. I vowed one day that when I started learning embedded systems,I would reevaluate Forth for that purpose and breathe new life into the wiki about my discoveries. With that,I temporarily forgot it and focused on my finals. I had thus violated one of the unwritten rules of security: the unused will be abused.

For December,the wiki lay dormant: neglected but innocuous. Around January of 2012,however,the bots had found it.

Like something out of the Discovery Channel,the spammers descended upon the wiki like a pack of wild dogs on a wounded gazelle. What I found some weeks later was the rotten carcass of a once-useful wiki: advertisements for Viagra and Gucchi handbags littered its once familiar structure. My wiki had been turned into a violated and twisted landscape full of linkfarms to sites about how you could get your boyfriend back,or 'useful' facts on UK payday loans and buying facebook fans. Some of the advertisements were so grotesque,I couldn't even imagine posting them here. In short,I was horrified.

My reaction to this was to do the following:

  1. Containment - ensure that the hole was patched and the ravenous spamming would stop.
  2. Cleanup - revert the wiki back to its previous,unsullied condition.
  3. Analysis - analyze the spamming patterns and take to idenfity such cyberscourages.
  4. Prevention - discover and create tools,methodologies and policies to prevent or combat this in the future.

Containment

Containment was rather easy. All I had to do was find the proper lockdown procedures from the Mediawiki configuration file (LocalSettings.php) The mediawiki wiki suggested adding this to LocalSettings.php (found in the wiki installation's root folder) to lock down the server:

# Prevent new user registrations except by sysops
$wgGroupPermissions['*']['createaccount'] = false;

#restrict editing to sysops only
$wgGroupPermissions['*']['edit'] = false;
$wgGroupPermissions['user']['edit'] = false;
$wgGroupPermissions['sysop']['edit'] = true;
 
# Only users with accounts four days old or older can create pages
# Requires MW 1.6 or higher.
$wgGroupPermissions['*'            ]['createpage'] = false;
$wgGroupPermissions['user'         ]['createpage'] = false;
$wgGroupPermissions['autoconfirmed']['createpage'] = true;
$wgAutoConfirmAge = 86400 * 4; # Four days times 86400 seconds/day

In a pinch,this will do (disabling the edit functionality seems to take care of most vandalism/spam). However,this configuration is still problematic,as it is only a temporary fix. As soon as you turn any of these to true,functionality to spam the living daylights out of your wiki instantly returns. Users can furthermore cause havoc by moving articles - some bots I saw moved articles around. My own suggestion is something far more effective: copying the default permissions from DefaultSettings.php and setting them all to false except for those of the sysop (you).

#disable anonymous talk
$wgDisableAnonTalk=true;

# Four days times 86400 seconds/day
$wgAutoConfirmAge = 86400 * 4; 

#disable tools menu for anonymous users
$wgShowIPinheader = false;

// Implicit group for all visitors
$wgGroupPermissions['*']['createaccount']    = false;
$wgGroupPermissions['*']['read']             = true; //everybody should still be able to read
$wgGroupPermissions['*']['edit']             = false;
$wgGroupPermissions['*']['createpage']       = false;
$wgGroupPermissions['*']['createtalk']       = false;
$wgGroupPermissions['*']['writeapi']         = false;


// Implicit group for all logged-in accounts
$wgGroupPermissions['user']['move']             = false;
$wgGroupPermissions['user']['move-subpages']    = false;
$wgGroupPermissions['user']['move-rootuserpages'] = false; 
$wgGroupPermissions['user']['read']             = true; //users should still be able to read
$wgGroupPermissions['user']['edit']             = false;
$wgGroupPermissions['user']['createpage']       = false;
$wgGroupPermissions['user']['createtalk']       = false;
$wgGroupPermissions['user']['writeapi']         = false;
$wgGroupPermissions['user']['upload']           = false;
$wgGroupPermissions['user']['reupload']         = false;
$wgGroupPermissions['user']['reupload-shared']  = false;
$wgGroupPermissions['user']['minoredit']        = false;
$wgGroupPermissions['user']['purge']            = false; 
$wgGroupPermissions['user']['sendemail']        = false;

// Implicit group for accounts that pass $wgAutoConfirmAge
$wgGroupPermissions['autoconfirmed']['autoconfirmed'] = false;

// Users with bot privilege can have their edits hidden
// from various log pages by default
$wgGroupPermissions['bot']['bot']              = false;
$wgGroupPermissions['bot']['autoconfirmed']    = false;
$wgGroupPermissions['bot']['nominornewtalk']   = false;
$wgGroupPermissions['bot']['autopatrol']       = false;
$wgGroupPermissions['bot']['suppressredirect'] = false;
$wgGroupPermissions['bot']['apihighlimits']    = false;
$wgGroupPermissions['bot']['writeapi']         = false;

// Most extra permission abilities go to this group
$wgGroupPermissions['sysop']['block']            = true;
$wgGroupPermissions['sysop']['createaccount']    = true;
$wgGroupPermissions['sysop']['delete']           = true;
$wgGroupPermissions['sysop']['bigdelete']        = true; 
$wgGroupPermissions['sysop']['deletedhistory']   = true; 
$wgGroupPermissions['sysop']['deletedtext']      = true; 
$wgGroupPermissions['sysop']['undelete']         = true;
$wgGroupPermissions['sysop']['editinterface']    = true;
$wgGroupPermissions['sysop']['editusercss']      = true;
$wgGroupPermissions['sysop']['edituserjs']       = true;
$wgGroupPermissions['sysop']['import']           = true;
$wgGroupPermissions['sysop']['importupload']     = true;
$wgGroupPermissions['sysop']['move']             = true;
$wgGroupPermissions['sysop']['move-subpages']    = true;
$wgGroupPermissions['sysop']['move-rootuserpages'] = true;
$wgGroupPermissions['sysop']['patrol']           = true;
$wgGroupPermissions['sysop']['autopatrol']       = true;
$wgGroupPermissions['sysop']['protect']          = true;
$wgGroupPermissions['sysop']['proxyunbannable']  = true;
$wgGroupPermissions['sysop']['rollback']         = true;
$wgGroupPermissions['sysop']['trackback']        = true;
$wgGroupPermissions['sysop']['upload']           = true;
$wgGroupPermissions['sysop']['reupload']         = true;
$wgGroupPermissions['sysop']['reupload-shared']  = true;
$wgGroupPermissions['sysop']['unwatchedpages']   = true;
$wgGroupPermissions['sysop']['autoconfirmed']    = true;
$wgGroupPermissions['sysop']['upload_by_url']    = true;
$wgGroupPermissions['sysop']['ipblock-exempt']   = true;
$wgGroupPermissions['sysop']['blockemail']       = true;
$wgGroupPermissions['sysop']['markbotedits']     = true;
$wgGroupPermissions['sysop']['apihighlimits']    = true;
$wgGroupPermissions['sysop']['browsearchive']    = true;
$wgGroupPermissions['sysop']['noratelimit']      = true;
$wgGroupPermissions['sysop']['versiondetail']    = true;
$wgGroupPermissions['sysop']['movefile']         = true;

I made sure I hit every permission instead of just the general ones. Since I had no idea how much the system was compromised,I locked everything else as well. e.g. If I had reason to believe a sysop was compromised,I could set those to false as well. When I mean lockdown,I mean lockdown.

This method works better because of the way permissions work on Mediawiki. Permissions are stored in a 2-dimensional array. First,each group has a series of true-false abilities that are set by default to true. To add an ability to a group,you must place its groupname in the first index of the array and the ability in the second array.

There are groups that every group belongs to,which I call metagroups (called ImplicitGroups in the documentation). These include the metagroup '*' (which stands for ALL users, including anonymous ones) and 'user',which incorporates all registered users. You can block every user with '*',but the other groups supersede this.

For example,even though you might set $wgGroupPermissions['*']['edit'] = false,users would still be able to edit,so you have to create a separate rule in case of users.

This gets somewhat complicated when we introduce the 'autoconfirmed' group. The autoconfirmed group automatically adds any users that meets $wgAutoConfirmCount or $wgAutoConfirmAge (both of which are set to a default of 0!). Thus,by default,ANY user that registers with Mediawiki is autoconfirmed. What does this mean?

Thankfully,autoconfirmed does not mean much beyond another implicit group - autoconfirmed has no privileges beyond user. However,if you used 'confirmed' users as a guideline of who not to delete while performing a spam recovery,you should beware how easy it is to become confirmed. Many wikis give additional privileges to confirmed users,so this implicit group should not be ignored.

Placing the DefaultSettings.php permissions in LocalSettings.php gives us finer grained controls over what is happening with our wiki server. Thus,in the future,when you feel prepared enough to give anonymous users the ability to edit again,you can make sure that they cannot use the writeapi,which most spambots use to automate their work. $wgGroupPermissions['*']['writeapi'] is set to a default true,meaning it's trivial for bots to scan your wiki for vulnerabilities as long as read permissions are set to true.


Cleanup

Cleanup,however,was far more difficult than containment. I had violated yet another law of security: Always have a backup.

I did not have a backup.

What I ended up doing was writing SQL queries for the better part of an afternoon and carefully deleting spam from pages. It was an interesting and dare I say... fun exercise in finding clever ways to delete information from a database.

Make sure to use mysqldump to backup your data before playing hack and slash with MySQL! One little mistake can leave your database in a state of utter corruption.

One of the most helpful tools I used during cleanup was PHPMyAdmin. PHPMyAdmin is easy to install (especially on Debian/RedHat systems) and is generally easy to install on other systems as well. PHPMyAdmin is also fantastic for other chores,and I really haven't found anything else at which it isn't competent. My rule is to nearly always install PHPMyAdmin besides MySQL.

Eventually,I did realize that it would be far better if I could just write an extension to Mediawiki to revert changes chronologically,so that's what I did.

The extension (which I call 'TARDIS') is in a very early alpha stage,so I am not releasing it yet. There is a log of bugs to squash and a lot of sanity checking to perform before I let the world see it :P


Analysis

Because of security issues (Checksums for passwords, etc) I cannot give you the entire database to analyze for yourself. However,I can give you my own analysis of what I found.

An OVERWHELMING majority of the spam came from right here in the United States. Yes,in the United States! My initial analysis indicated that most of it was foreign (Russia, UK, and China),but after using this GeoIP database,I found that over 41% of the spam originated from US IP addresses! No other country even came close to that number.

Take a look at this graph: America yeah!!!

Which says to me that the US has some of the worst spammers in the world! That,or at least we have so many machines that are compromised that spammers all over the world take advantage of this massive computing resource.

Even more fascinating was the breakdown of how spammers from each country operated. The ones from the US were by far the most clever. Most spammers from the US used a combination of two techniques: first registering a user on the wiki site with a believable first and last name followed by a 3-5 digit number,then these accounts would be used to post a misleadingly informative article (somewhere between 2-5 paragraphs) that contained at most 5 links to external sites. Each of these fake accounts would be associated with a fake email,usually purporting to be an aol.com,gmail.com,live.com,verizon.com,or yahoo.com address (they were fakes - the few I tested were not real accounts). The text on these articles was far above the normal Markov-chain nonsense spam: Most of these would probably pass a cursory inspection if they had related subject matter to the wiki!

For your fake email browsing pleasure,here are a few samples from the US spammers:

LindyMainard10941@yahoo.com
ThomBaber16749@gmail.com
FanetteEnnis1646@yahoo.com
CaylinKerr19624@yahoo.com
ErleenGilkinson15975@aol.com
5175687e@gmail.com
FarquharLitchfield17599@live.com
ArdwyadBonner11772@live.com
DaivyaFollet16858@gmail.com
RebaHacker7338@aol.com

Spammers from other countries were far less sneaky and used a "post as many links in a short a time as possible" approach,which made them very easy to detect. The usage of non-Latin alphabet characters were also a dead give-away.

For those of you who are interested in detailed information about the spam (including IP's, so you can ban them) here you are:

Description

Name

Size

MD5 Sum

Long list of fake emails I harvested

emails.txt

165Kb

63a05815c7ef5868847eb35cc69d383a

External spam links

externallinks.txt

1717Kb

622fbc83cbd7f0f61f1a327f6d1d3e88

Spammer IP's - PERFECT for banning

ipaddrs.txt

60Kb

e2dae16c54d73a557d11b3baf9d793a1

Country by country ip statistics

ip_stats.txt

3Kb

cbbf48f2357c0ea0af96752fd3ee0c87


Prevention

Prevention of abuse seems to be a major issue in the Mediawiki community. Just take a look at their website

Essentially,though,all prevention recommendations break down into one of the following categories:

As you have noticed,all of these strategies are reactive,not proactive. Fortunately,there are strategies you can use and extensions you can install that will perform some decent spam protection.

The ConfirmEdit extension . I initially chose ConfirmEdit with Asirra,which pulls kitty and doggy photos from a pet adoption site and asks potential users and editors to identify the animal. This is especially useful for dyslexics because traditional captchas are VERY hard to identify with a learning disability.

However,the Asirra extension is not quite stable yet (I will work on this, I think),so I opted for the FancyCaptcha version instead.

Make sure that when you are installing,debugging,or testing extensions,you use:

$debug=false;
if ($debug) {
    $wgShowExceptionDetails = true; 
}

In order to make troubleshooting much easier.


Conclusion

My experiences with the wikihijacking were painful but important lessons in security. Namely: