Monday, July 26, 2010

Yet More Tagging Progress!

The 319 handles in the "between 100 and 999 comments" bracket have been tagged. Learned many interesting things.

About a third of the handles were pretty obvious names that were familiar to me. They took about as much time to tag as the over 1000 group.

However the remainder were kind of forgotten to me and I had to flip between my tag file and a little query script to scan their comments to answer the question - troll or non-troll?

It was a longer slog than I cared for but... It was a pretty interesting bunch of comments.

Some comments came from single-issue type people who impressed me with the depth of their knowledge on their pet issues - everything from the ferries to Sound Transit to whatever was the issue of the day at HA.

Some were from lefties that had little regard for the Dems and predicted (fairly accurately) that people on the left were getting their hopes up way too high for the Dems to deliver any appreciable change.

Some comments came from righties (a few, mind you) that believed more or less the same thing and were pretty disgusted with how the Republicans had screwed up things. I could not in good conscience call these righties trolls.

There was even one right winger who impressed me as being genuinely interested in putting forth an intellectually sound and nuanced argument. One. Just one! This winger was sounding out a lefty for his personal opposition to abortion. I made a note to myself to read the entirety of this right winger's comments.

See trolls? You can get some positive attention from us if you quit the name calling, turn off the right wing hate radio and other degenerate propaganda and think deeply through your positions on the issues.

All in all it was somewhat tedious but rewarding work and at this point I'm 84.5 percent of the way through tagging the entire comment database.

The next bracket, the "between 10 and 99 comments" bracket has a touch over 1400 handles in it. Already started with the ones that are most familiar. We'll see how it goes. Again, after that, I'll have the comments 94 percent tagged.

Beyond that? Well "the swamp" (almost 14,000 handles!) has some interesting critters in it indeed. I won't ignore it. Trolls, be advised: you can run (or swim) through "the swamp", but you can't hide.

Lastly there's the job of sifting through handles that have switched between troll and non-troll identities - separating the troll from the non-troll to even the spam comments wrapped under one handle. I started sketching out a user interface to a web app that will help with that and the whole tagging, typing chore to boot.

I've pretty much settled on the Sinatra framework for the web app. Gonna be a whole lotta fun!

2 comments:

demo kid said...

Are you planning to use some type of latent semantic analysis to show the differences in the way trolls use language? That'd be pretty cool... a la wordle.net?

YLB said...

Oh yeah absolutely. The words and phrases the right wing trolls employ over time are classic.

I want to draw pictures of them just like you see at wordle.net.