Showing posts with label News. Show all posts
Showing posts with label News. Show all posts

Sunday, January 9, 2011

Bullshit Code!

During the back and forth in the comment threads over the tragic shooting of Congresswoman Gabrielle Giffords (among others), Puddybud went off on yours truly:
You wasted all those unemployed months telling the unemployment office you were looking for a yob while writing BULLSHIT code!
Bullshit code??? Hey those are fighting words!


Can Puddybud or anyone else tell "the world" what this does?

This will be the first of another occasional series. Not once a week like the "Bird's Eye View" but...  you get the idea.

Update: no takers yet.. (Puddybud dumped on it, of course.) In case you didn't notice, that if statement above is dangling. So I'll complete it and we'll see what happens:


Hint: I actually demoed what this code does some time back - in the comment threads.

Yet another update: Wow! I'm glad I blogged this code. Kind of forced me to look at it more closely and I've already re-factored it some to yield just a bit more functionality. Details to come in the next BS Code post.

Tuesday, August 3, 2010

DB Snapshot Demo

Enjoy.. Blow it up to full screen to see it better.



Update: small bug found in the snapshot code. As of my last run, the db has 238 comment records over 8 articles in August. Before it was just importing the same 168 records.

Monday, July 26, 2010

Yet More Tagging Progress!

The 319 handles in the "between 100 and 999 comments" bracket have been tagged. Learned many interesting things.

About a third of the handles were pretty obvious names that were familiar to me. They took about as much time to tag as the over 1000 group.

However the remainder were kind of forgotten to me and I had to flip between my tag file and a little query script to scan their comments to answer the question - troll or non-troll?

It was a longer slog than I cared for but... It was a pretty interesting bunch of comments.

Some comments came from single-issue type people who impressed me with the depth of their knowledge on their pet issues - everything from the ferries to Sound Transit to whatever was the issue of the day at HA.

Some were from lefties that had little regard for the Dems and predicted (fairly accurately) that people on the left were getting their hopes up way too high for the Dems to deliver any appreciable change.

Some comments came from righties (a few, mind you) that believed more or less the same thing and were pretty disgusted with how the Republicans had screwed up things. I could not in good conscience call these righties trolls.

There was even one right winger who impressed me as being genuinely interested in putting forth an intellectually sound and nuanced argument. One. Just one! This winger was sounding out a lefty for his personal opposition to abortion. I made a note to myself to read the entirety of this right winger's comments.

See trolls? You can get some positive attention from us if you quit the name calling, turn off the right wing hate radio and other degenerate propaganda and think deeply through your positions on the issues.

All in all it was somewhat tedious but rewarding work and at this point I'm 84.5 percent of the way through tagging the entire comment database.

The next bracket, the "between 10 and 99 comments" bracket has a touch over 1400 handles in it. Already started with the ones that are most familiar. We'll see how it goes. Again, after that, I'll have the comments 94 percent tagged.

Beyond that? Well "the swamp" (almost 14,000 handles!) has some interesting critters in it indeed. I won't ignore it. Trolls, be advised: you can run (or swim) through "the swamp", but you can't hide.

Lastly there's the job of sifting through handles that have switched between troll and non-troll identities - separating the troll from the non-troll to even the spam comments wrapped under one handle. I started sketching out a user interface to a web app that will help with that and the whole tagging, typing chore to boot.

I've pretty much settled on the Sinatra framework for the web app. Gonna be a whole lotta fun!

Monday, July 12, 2010

Tagging Progress

Over the weekend I took the most comment-prolific 81 handles, grouped the contained aliases under a single tag (e.g. all of Puddybud's various handles under the tag "junkshot") and categorized the tag as troll or non-troll.

Remember that those top 81 handles have contributed 1,000 or more comments each and in total account for over 62 percent of all comments.

Conclusion: 30 percent of those comments are troll comments. That number will probably hold pretty steady as I tag the next 1700 or so most comment-prolific handles. And after I'm done with that?

Just a hair over 94 percent of all comments will be judged troll or non-troll.

Of course the work is never done. Some handles like "John" or "Bill" or "Steve" have exhibited either troll or non-troll character throughout their lifetimes. The troll comments contained under those handles will have to be laboriously sifted out and assigned their own special tags.

Thursday, June 17, 2010

Update

Been a while since the last post. Still tagging handles, still removing warts from my code. The latest code is almost where I want it. I could do a bit more factoring, removing some repetition but it would be more for aesthetic purposes. The code does what I want with satisfactory performance. Test coverage as is usually the case could be much better.

On a more relevant note, April and May 2010 are now in the database so I thought I'd update the big picture that I first sketched here.

As of the end of May 2010:

8,082 articles over the 73 months of HA.org's existence.

430,912 comments. (Haven't filtered out the spam yet.)

and last but not least:

Out of 15,728 unique handles!

Top 10 # of comments by handle:

10 GBS @ 5,317
9 rhp6033 @ 5,478
8 ArtFart @ 6,004
7 Steve @ 6,167
6 Marvin Stamn @ 6,774
5 Daddy Love @ 8,580
4 YLB @ 9,323
3 Mr. Cynical @ 9,827
2 Puddybud @ 11,369

and number one?

Our beloved Roger Rabbit at 58,310.


So Steve moves ahead of Art, gaining on Stamn. Puddybud barely budges due to begging everybody else to come to me to justify his drivel. (Useless, he'll ALWAYS be #2.) All the usual disclaimers on that top 10 list still apply.

An updated brackets report follows:


One handle has joined the 1000+ comments club. Right now I don't know who you are but congrats!

Update to "Update": The newest member of the 1000+ comments club is "John" which is a handle that's been used by many, many people, hero and troll alike, over the years. Again, congratulations "John"!

Friday, April 30, 2010

Entering the Twitterverse

Tagging handles in earnest indeed. In the meantime, I may tweet a few eureka moments about code or trolls.

While I was at it, I added an rss feed.

Ciao..

Monday, April 26, 2010

Still Distracted but Turning a Big Corner

In the last post I was all jazzed about an opportunity to make the data collection from my "select sources" run much faster.

Well, I'm pretty much at the other end of it and I'm quite happy about it. I did a major rework of the code. It's much better than before and my only regret is that I wish I'd thought of this speedup sooner.

The old code base accomplished the collecting (almost 8,000 articles and 415,000 comments) but it did the job kind of slowly and that was even with some good speedups I had spotted early. But if I had had what I've got now, it would have gone much faster. Also I would have totally minimized data collection from HA itself. Oh well. Better late than never.

In the last post I mentioned that I used code that would automate a web browser to navigate the web and collect HA articles from select sources. I also mentioned that by bringing up one browser instance and feeding requests to it, you will maximize the data collection performance.

The obvious speedup of course is to distribute requests across more browser instances - make it scale. But now we're talking about forking off objects from classes that encapsulate the browser-driving code. And once you create sub-processes you need a means to communicate requests to and retrieve results from those forked instances.

The tried and true method to accomplish this communication is to use a message queuing daemon. In a previous Ruby project I'd used a pretty slick one with a handy Ruby support library. No problem mon -  it's just a matter of finding the right places to insert this code. It was a lot of work but it made me take a hard look at the entire code base and throw away a lot of stuff that just wouldn't be needed anymore. And as I separated out code into Ruby modules by function, I was forced to come to a better understanding of instance variables. In my Repo class I totally overused class methods and was passing way too many parameters from method to method. Why? Way back in the beginning I thought it would be easier to test if I did it that way. Totally wrong. Thanks to this rework, the code in this project and my other projects will be so much better.

One of my select sources in the past really made me tear my hair out. I had only collected 4 or so articles from it because of all the trouble it gave me but now that's all changed. I've gotten much more than a handful from that source for March 2010 alone and I expect it to only get better.  And there's yet another select source out there that I didn't even bother to try because it was so weird and different from the others. I'm feeling pretty confident now. I will give it a shot but I have to do a couple things first.

I'm going to start tagging handles in earnest. This is really critical to separating the trolls from the heroes and then being able to make big picture analyses of troll behavior over the months and the election cycles.

Then I've got to scratch an itch about current month activity. So far I'm caught up to the end of March 2010. Normally I wouldn't add a month to my collection until 10 days into the next month. Why? Goldy doesn't shutdown comments to an article until 8 days or so after the article is published. So if I'm curious about some interesting activity in the current month, I have to develop a way to create a snapshot database with an eye on merging various snapshots into the real thing to save time when a month finally closes.

Wednesday, April 7, 2010

Opportunities, Distractions

One goal of this project was to  minimize data collection as much as possible directly from HA.org itself. I would have been a lot farther along much sooner if I didn't care about this. Why is this important?

I not only want to slice and dice troll behavior but I also want to learn all these awesome cool ruby-based software libraries. They've been a total joy to learn about and work with.

The data collection problem and the composition of its solution into classes has been pretty simple so far. There's a Session class whose job is to collect one month of HA.org's existence. The most important instance method of this class is add which at its heart is a for loop which starts at the first article of the month and ends at the last. Each article of course contains the comments which is the gold I'm after. But it's within the loop that the fun really begins.

I said I wanted to minimize data collection from the well, HA.org. To do this I constructed a least cost route of sources from which to draw articles with HA.org being the highest cost, the last resort. So I have a Repo class (short for repository) that's blessed from a Session instance. I feed it an article number and a source and it tries to get the article. If it comes up short then the Session instance tries the next source in the route.

So far pretty simple. Not rocket science. Fun? Only for a geek, right? Well it gets even better. Some of my sources require calling a simple http get which is available from lots of ruby http libraries. This is how I get stuff, again as a last resort, from HA.org. Butt simple and too boring. But a few sources, I call them select sources, require something a little more sophisticated. They require driving a web browser to go and get the article. Without getting into too much detail:


Making this go was just too much freaking fun for words.

So, after fist pumping and watching my code drive a web browser to navigate the web and get the stuff I wanted from it I quickly realized I'd done something a little dumb:

My code was opening and closing the web browser for each request to the web.. Ugh.. It's a lot faster Sherlock to just leave one browser instance up and feed all the many requests you want to it and then close it when the session is concluded. Another hour or two of searching for the right spot in the code to insert the needed changes and I had a big increase in performance.

Which is all a long winded introduction to the point of this post - there's yet ANOTHER opportunity (and yep, distraction from my initial goal) to make this puppy run even faster. Next post. I'm gonna code.

Tuesday, April 6, 2010

QA winds down. HA's most commented posts.

To kind of, sort of check on the quality of the data collection process, I whipped up a report of the most commented articles of each month of HA's existence. Then here and there I went to the well to see if what I have and what HA has matches up..

I've found tiny variances in, strangely enough, both directions.. I'll definitely have to look more thoroughly at those months where I seem to have MORE than what Goldy has. But so far the variances seem to be made mostly of spam.

Once I had that report working, I tweaked it to select one article at random from each month. Again, things seem to match up just fine. Even better than the most commented articles sample.

So confidence is high.

Anyhow the most commented posts merit further mention. Here's a pretty picture:


The first column is the year/month, the second is the article number and the third is the total number of comments for the thread. The most commented HA blog post, in March of 2005, was a knock-down, drag-out thread about Terry Schiavo - a major body-blow for the extreme right wing. Between trying to weaken Social Security, using Terry Schiavo's vegetative body as a prop, letting New Orleans drown, corruption/sex scandals, torture, dollar black hole wars of choice and the final, ugly meltdown of the economy - let us never forget what it means to have the right wing in control of this country.

Saturday, April 3, 2010

Tagging Handles

Trolls and Heroes adopt many aliases over the course of their on-line existence. To separate them from one another and make searching and analysis easier I'm going to stamp each comment with a tag. No not all umpteen thousands of them - just associate the 15,078 handles (and growing) with a more coherent tag e.g. puddybud for all the versions of puddybud's handle and that will in turn stamp his legacy of over 26 thousand comments.

The tags will live in a separate file, a hash whose key is the tag and whose value is a category:

h - hero
t - troll
ul - a lefty who hurts the cause by getting too much like a righty.
rr - a right winger whose tone is reasonable most of the time. (Very rare at HA.org.)
s - spammer

In other news I'm torn a bit about using Sinatra over Rails for the QA portion of this project. Rails 3 is worth a look because it's new and much more flexible. I'll probably use both for portions of the remainder of this project.

After all most of the motivation and benefit of this project has been from learning and applying all these wonderful ruby-based software development frameworks.

Late update: Do you believe in synchronicity? I just found this out minutes before I published this post. The author of the command/tasking framework that I use to develop my reports on this project is heavily motivated by the concept of machine tagging which is used by Flickr and other projects. For now I'm sticking with the simple tagging implementation I've sketched here but I may have to revisit this more thoroughly later.

Monday, March 29, 2010

It is accomplished...


As of the end of February 2010.

7,680 articles over the 70 months HA.org had been to that point in existence.

414,965 comments. (Haven't filtered out the spam yet.)

Whew! A mother lode of right wing inanity and foolishness. With some intelligent and fun liberal-leaning comment thrown in..

and last but not least:

Out of 15,078 unique handle names!

Top 10 # of comments by handle:

10 GBS @ 4,952
9 rhp6033 @ 5,016
8 Steve @ 5,582
7 ArtFart @ 5,839
6 Marvin Stamm @ 6,774
5 Daddy Love @ 8,246
4 YLB @ 8,678
3 Mr. Cynical @ 9,314
2 Puddybud @ 11,368

and number one?

Our beloved Roger Rabbit at 56,751.

Commenter's various aliases are not yet factored in. Steve might be several different "Steves". (And the HNMT is a particularly nasty case.) It's going to require quite a bit of study. And I first have to do some qa on the collection process as a whole.  So this post will be updated as the picture gets clearer..

But so far, so good! Fun, fun, fun on the runup to November!

Special note to that most moronic of trolls (#2? It fits!):

$ du -h .ha

201M    .ha/data
...

Saturday, July 28, 2007

Google API, FoxAttacks

Today I'm going to study the Google API and try out a few experiments.

In the meantime, Doofus shat out a little nugget last night in response to the gathering momentum against the Faux News Channel:
Just more proof that liberals can’t handle the truth.
Sorry doofus, the truth is that Faux News broadcasts lies, smears and RNC propaganda. Faux News is anti-democracy and cheer leads for a one-party ruled, authoritarian state. If like doofus, you want to live under an oligarchic, autocratic regime like Putin's Russia then by all means cheer for Faux News. However, if anyone out there like me loves Democracy and Freedom and wants to do what they can to preserve it, sign up for FoxAttacks.

Any business I patronize that advertises on Faux News I'm not going to patronize again until they yank their ads.

Thursday, July 26, 2007

Troll Dossiers

I'm going to compile dossiers on each of the trolls. There'll be links on the sidebar to each dossier to make it easy for you to contribute items on each troll that I may have overlooked. You do this in the comments sections of the post.

The dossiers will have the following sections.

Quick facts - the quick and dirty back story on the troll

Aliases - This is important. Trolls often post with many aliases. The more of these I know, the better I can data mine. You can really help out here with aliases that I'm not familiar with.

Misc - fun facts the come to the surface about our wingnut friends.

Links - to a greatest hits lists of troll comments.

These files will be a quick, easy way to collect info on each troll for both entertainment and to aid data collection and analysis.

My next post will be an example of a dossier.

Update: include your ha.org handle with your information and I'll hat tip you in the dossier.

Update II: I've added a status section to the dossiers. Active, Retired, Semi-Retired, Occasional, etc.

Wednesday, July 25, 2007

Tuesday July 24, 2007 - A great day

Not only was it my birthday, it was the day Goldy deleted the content of two off-topic comments from two of HA's worst trolls: DOOFUS and MTR the braindead bet welsher. The right-wing BS of these trolls will be analyzed in future posts.

Yes indeed, an excellent birthday present.

Thank you Goldy! I love this new policy!

I'm going to expose your asses!

Greetings fellow HA readers, progressives and yes, you trolls. Who amongst you loves to have fun with trolls? Raise your hand! All of you, good!

Well, why am I doing this? Maybe out of some respect for Goldy's new commenting policy. To turn off my caps lock key. But also to brush up on my perl and ruby skills and learn the google API. Why you say?

The HA.org comment threads are a priceless treasure of right-wing lunacy just ripe for some data mining and analysis. As I promised Puddybud one fine day, "I swear, I'm going to spider this site and expose your ass!".

So that's what I'm going to do. I'm going to try not to spider HA.org directly. That'd run up Goldy's bandwidth bill. My goal is to use the google cache only.

I'm also going to be looking at the other website. You know the one run by some kept fella on Greenlake? But only to add some spice to the fun.

What is spidering? Books have been written on it. It's not that hard to do.

So folks, look forward to facts and figures and greatest hits lists on your favorite trolls. Of course I'll be welcoming your input and insights on the subject.

We're going to have fun with this!