You are reading a MIX Online Opinion. In which we speak our minds. Karsten Januszewski Meet Karsten Arrow


5Comment Retweet

Creating A Tree Map Data Visualization Out of 200,000 Tweets About Iran

Jun 24, 2009 By Karsten Januszewski

I let The Archivist (a Twitter data mining application from the Mix Online experiements) loose on the term Iran and it has run for the last five days. At this point, I now have an archive of around ~200,000 tweets about Iran. So, then, the question becomes,  what to do with all that data? After all, data is only interesting if you can do something with it.

Tim came up with an idea: what if we created a tree map that showed who the top 100 people were who tweeted the most about this topic. That way, you could quickly get a sense of who the people were most active on this topic. So with some guidance from Hans (who built a tree map control for the Descry project) and an assist on my LINQ query* from Joshua, I quickly put together the following infographic (click on it to see a larger version):


So, of the 200,000 tweets about Iran, the person in the upper left has tweeted most about Iran and then the data distributes itself from there.  From here,you can imagine a lot of possibilities – different pivots on the data,drilling down, etc.

We just put this together this morning – not sure where it is going but am curious to hear if people find this interesting. Let us know and be sure to follow us on Twitter for the latest updates from MIX Online.

*Here’s the LINQ query by the way – the crux being the sub select that pulls the avatar out of the grouping:

var items = (from tweet in tweets
            group tweet by tweet.Username into grp
            orderby grp.COUNT() as Computed descending
            select new TreeMapItem
                Size = grp.COUNT() as Computed,
                Label = grp.Key,
                PanelItemType = typeof(CustomTreeMapItem),
                Tag = (from subtweet in grp
			select subtweet).First().Image


Follow the Conversation

5 comments so far. You should leave one, too.

Lisa said on Jun 24, 2009

That is definitely more interesting than Twitter''s search box! You are probably aware of the tweets that post all of the trending topic items (annoying) - is there any way to rule out those tweets from the query since those accounts don''t really discuss the topic?

Travis said on Jun 24, 2009

While I like the visualization, it ironically seems like the perfect tool for the Iranian Security Forces who are attempting to identify the most active people.

Karsten Januszewski said on Jun 24, 2009

I see what you mean -- basically, you want the ability to pass a list of usernames to the query and have any tweet from that user excluded from the results. Definitely doable -- this would be a feature that we could add to The Archivist. Thanks for the feedback!

Karsten Januszewski said on Jun 24, 2009

@Travis -- Interesting. We choose to use this dataset because it was topical. The same visualization could be applied to anything, of course. I suppose we could have picked "coffee," but this seemed more useful. What someone does with the information they learn from a visualization is out of anyone''s hands.

Alex Nagler said on Oct 15, 2009

I''m actually engaged in a research project on Iran and Twitter and I find this to be fascinating. Is there any way I could get your data for the database I''m compiling?


Alex Nagler
Stony Brook University ''10