Tuesday, March 30, 2010

An Idea for a Rails QA App forms...

I've collected all the data up to the end of February 2010 and had some fun running a few reports. Now what?

There's the QA process. Do I really have all the data or do I just think I do? How do I exclude what I don't want, i.e. spam?

How do I separate the HA.org heroes from the trolls from the spammers? That's the most important question because this project is about troll behavior and having fun dissecting it.

The work of collection was a month by month process. The HA.org frontpage contains a comprehensive list of the archives by month and the number of articles for each month. So I'm thinking right now that a good QA process' work will have to be divided the same way.

So first I see a selection toolbar at the top of the browser viewing area with a year/month and  backwards/forwards arrows. At the bottom center right of the screen a radio button group with  hero, troll, spammer selections and a submit button. The rest of the screen will contain a sampling of comments from the handle owner.

So at a glance, I can come to a decision, is this a hero, a troll or a spammer? Once the decision is made the comments and the handle can be marked and then excluded from further year/month examinations.

And of course there's all the handles whose status I already know up front. No need to look at those. Roger Rabbit's stuff alone accounts for 11 percent!

So that's good enough for pass 1.

Pass 2 is then first a matter of submitting some spammer's spam for further examination to Akismet. Why? A hero or a troll's handle may have been hijacked by a spammer for a time. Submitting the comments to Akismet can easily mark what is spam and what is not.

Next is the hardest part.. A handle like Jane Doe may at one time be a progressive from Tacoma and at another time a troll from Bumfuck, WA. That's going to take a good amount of work but of course if we're smart, the work could be minimized.

15,078 handles.. Don't remind me of the comments. Someone's got to do it.

Monday, March 29, 2010

The HNMT revealed...

Here we see a screenshot of a pivot table I cooked up to summarize comments by handle in 2009. I looked for handles that had either "Roger" or "Rabbit" in them, i.e. Roger|Rabbit, not being confident that the bash command line interface would tolerate the space between the first and last names in one shot..

Well lookee there in the leftmost column.. (Click on the picture to super-size it.) A gaggle of putrid handles by that most resentment-driven and odious of trolls - the hateful name morphing troll, the HNMT.

In contrast, take a look at Roger Rabbit's line. He has misnamed himself. He's not a rabbit, he's a tortoise and the numbers peppered around the rabbit's line? Not a rabbit either, but to be sure a most bothersome pest.

I noticed something else.. In 2008..

Again we see Roger's solid row, this time in the bottom third of the report, and again we see the obnoxious pest buzzing about.. Activity seems to build to a frantic pitch indeed in May of that year and then... There's a very curious shutdown in pest vigor. What happened? Illness? Personal crisis? Boredom? Resignation at soon to be President-Elect Obama's sweeping victory in November? Or perhaps some other unhealthy fixation?

Another mystery to solve..

It is accomplished...

As of the end of February 2010.

7,680 articles over the 70 months HA.org had been to that point in existence.

414,965 comments. (Haven't filtered out the spam yet.)

Whew! A mother lode of right wing inanity and foolishness. With some intelligent and fun liberal-leaning comment thrown in..

and last but not least:

Out of 15,078 unique handle names!

Top 10 # of comments by handle:

10 GBS @ 4,952
9 rhp6033 @ 5,016
8 Steve @ 5,582
7 ArtFart @ 5,839
6 Marvin Stamm @ 6,774
5 Daddy Love @ 8,246
4 YLB @ 8,678
3 Mr. Cynical @ 9,314
2 Puddybud @ 11,368

and number one?

Our beloved Roger Rabbit at 56,751.

Commenter's various aliases are not yet factored in. Steve might be several different "Steves". (And the HNMT is a particularly nasty case.) It's going to require quite a bit of study. And I first have to do some qa on the collection process as a whole.  So this post will be updated as the picture gets clearer..

But so far, so good! Fun, fun, fun on the runup to November!

Special note to that most moronic of trolls (#2? It fits!):

$ du -h .ha

201M    .ha/data