You are reading a MIX Online Opinion. In which we speak our minds. Karsten Januszewski Meet Karsten Arrow

Lab Notes

10Comment Retweet

Excel, PowerPivot and Data Mining Twitter

Nov 17, 2010 By Karsten Januszewski

Out of the box, The Archivist (our Twitter archival and analysis service) provides six visualizations of sliced Twitter data. However, there are some visualizations that The Archivist doesn’t provide.

For example, consider an archive on the term soundcloud with 388,000+ tweets. The Archivist will show you the top 25 users who tweeted about soundcloud. I can learn from The Archivist that the user who tweeted the most about soundcloud was top100djsgirls. But what if I wanted to look at just the tweets from top100djsgirls? The Archivist can’t do that.

Similarly, The Archivist provides a tweet by volume chart. I can see the peak day over the last 4 months for the term soundcloud was in mid-August of 2010. What if I’d like to look at just the tweets from the high volume day?

Or, what if I’d like to see a pie chart that shows distribution based on the language of a tweet?  How’s about looking at the distribution of tweets over time filtered on a given user?

Pulling an archive into Excel and using PowerPivot to slice and dice the data makes it possible to answer these kinds of questions (and more). PowerPivot is designed for working with large datasets (100,000+) inside Excel, so it’s perfect for mining large tweet archives.

In the screencast below, I’ll walk you through exactly how to do this:

width=”512″ height=”350″>

Oomph artwork
You need Silverlight to view this video. It’s fast,it’s free and it’s awesome. Click here to get it.

Follow the Conversation

10 comments so far. You should leave one, too.

curious said on Nov 23, 2010

I can''t view your screencast.

Karsten Januszewski Karsten Januszewski said on Nov 23, 2010

hmm, there''s something goofy with the screencast. It seems if you hit refresh it will appear. Not sure what''s going on; will investigate. In the meantime, hit refresh should get you to the screencast.

Randy Picker Randy Picker said on Dec 10, 2010

This looks extremely useful, but it looks to me as if you have turned off the ability to export tweets to Excel. Is that right?

Randy Picker Randy Picker said on Dec 12, 2010

I am not quite sure how I missed the desktop version of the software, but I now see that, which does contain an export to Excel button.

But I can''t verify that that is working (save as well). It certainly doesn''t present a window to allow me to name the file and that is true of save as well.

Where should these files be located?

The window also references "remaining Twitter hits." How does that limit operate?

Thanks.

Bob Boynton said on Sep 14, 2011

Karsten:

Is this still available? The video seems to be gone, and I do not see any way to get it. I hope it is available as I would very much like to use it.

admin said on Sep 15, 2011

Hmm, somethings screwed up with the embedded video. Here's a link to the wmv: http://ecn.channel9.msdn.com/o9/ch9/e6aa/129e4c5c-fc9c-4b37-8cc7-9e2f0184e6aa/archivistdatamining_2MB_ch9.wmv

Jennifer said on Sep 22, 2011

What does "remaining Twitter hits" mean?

Karsten Januszewski said on Jan 3, 2012

@Jennifer - that refers to Twitters rate limiting and how you can be throttled by Twitter if the subnet you are on is hammering them too hard.

Coach Factory said on May 16, 2012

Cheers for this content, guys, continue to keep up the good work.

Marni Shoes said on May 28, 2012

I wants to thank you with the endeavors you have produced in publishing this article. I am trusting the same greatest work from you within the future as well. In fact your fanciful writing abilities has inspired me to start my own weblog now. Truly the blogging is spreading its wings rapidly. Your generate up is a fine example of it.Thanks yet again for discussing this cost-free on the web!