The Beginning of The Archivist
Jul 2, 2010 In Design By Tim AidlinA couple years ago, my friend irhetoric and I were chatting about Twitter, as we were at the end of building Flotzam, a social networking aggregator tool that collects and visualizes the ephemeral information floating around in the social sphere. In some ways, this ephemerality is beautiful: all this information swirls around us like leaves that appear and decay over time. On the other hand, if this information is valuable, capturing and keeping it could be useful.
The most striking—and easily accessible—of these potentially useful data streams is Twitter. Tweets, while often inane or funny or what-have-you, can impart important information, whether by themselves or in aggregate. Take, for instance, Twitter’s role in the Iranian Elections of 2009,in which Iranian protesters used Twitter to broadcast information that would have otherwise been suppressed,using the hashtag #iranelection. The election also provoked people from all nations to “show solidarity with the protesters” by changing their Twitter avatar to green. What good this did is unclear, but it’s an interesting way of disseminating information and sentiment.
The problem, though, is that you can only view tweets as far as two weeks in the past (or less for very active terms such as #iranelection). Fortunately, by 2009 Karsten and I had designed and developed The Archivist Desktop for MIX Online, and were able to save and capture this data. We ended up with 22,000+ individual tweets over a 2-week span.

The Archivist Desktop
The Archivist Desktop was originally released as a “mini-lab” and has been installed by users from such varied fields as marketing, software development, social sciences and research. It was a ‘small project’, but Karsten and I thought it was a worthwhile for a few reasons. With The Archivist, users are now able to:
-
Save tweets
Users can save the tweets they’re interested in.
-
Export the data to analyze
The Archivist provides an easy way to export to Excel for visualizations and data-mining.
-
Visualizations to interpret
There are two useful visualizations shipped with The Archivist, which can provide quick and easy understanding of the data the user has collected.
-
Distributed network
The Archivist is an installable program and works on a distributed network, which reduces the likelihood of one user being affected by another. This is especially important when dealing with Twitter rate-limiting.
-
Private, secure data
Sometimes users are sensitive about the data they’re collecting. By having an installable program, the data is saved to a user’s hard-disc and accessible only to those users who have direct access to that computer and file.
Challenges with The Archivist Desktop
While The Archivist Desktop is great in many ways, we needed to address a few problems we saw as the project matured. Over time, we were able to connect with users around the globe and encountered a few recurring requests:
-
“Always-on and connected?”
Because The Archivist Desktop is an application, it needs two things to work: it must be running and have an internet connection. If your computer goes to sleep, for instance, the application stops running. Or, if you’re running it on a laptop and you travel, you get no connectivity, no archives.
-
“Not Just Windows, please.”
Many of our users, including some of the MIX Online team at times, are Mac users. By making this project Windows-only and relying on various other technology dependencies, we were shutting out a large and potentially enthusiastic part of the community.
-
“I’d rather not install a program.”
We found that a large population of users were hesitant to install a program they were unsure of. The Archivist is a small file and, for Internet Explorer users, a one-click process. To many users, however, installing a program is a hurdle that’s very difficult to cross.
-
“I wish I could go further back in time.”
Suppose User A started an archive on the term “Microsoft.” If four months later User B started an archive on the same “Microsoft” keyword, all of the data User A collected would not be available to User B. That seems a waste. Data should be ultimately shareable. Free the data!
-
“More eye-candy, please.”
The users of The Archivist expressed they found great value in the data-visualizations The Archivist provided, but wanted more. The data visualizations people asked for were often the same, and included top keywords, top users, top links, and volume over time. There were also many other great suggestions on how we could parse the data from Twitter. Users had very specific requests that coincided with their goals, whether they were focused around brands, politics or pop-culture.
-
“I’d rather share a link or image than a bunch of data.”
While it’s easy to attach an Excel spreadsheet or .xml file to an email, that isn’t the best experience for sharing data or understanding around that data. How could we help people take full advantage of this new data and information?
The Archivist Web Alpha
We decided to move The Archivist to “the cloud” and create a web-based, Windows Azure-powered web tool. There are numerous benefits to creating the new version:
-
Always-on
By creating a web service, we were able to take advantage of the always-on nature of the internet and could poll Twitter for data whether the end user was connected or not. This enables users to set up archives without having to monitor their status more than once every 30 days to keep the archive active. We no longer needed to require users to actually run the program—because essentially, we’re running it for them.
-
Cross-platform
The Archivist Web runs on multiple browsers and platforms. This opens up The Archivist to essentially all internet-connected users. Some sort of authentication is required, however, to save archives. To make this as seamless as possible, we tapped into the Twitter authentication system and now have people simply use their Twitter credentials to authenticate themselves.
-
Data-sharing
By making The Archivist a cloud-based application and storing collected information in shared database, we can make archives that have already been created available to new users. For instance, if User A creates a “Microsoft” archive on January 2 and User B starts a “Microsoft” archive on March 2, User B’s archive should reach back to January 2. This data is unavailable through a normal Twitter search. This feature, in our opinion, is super valuable.
An additional type of sharing is the ability to tweet both The Archivist Web itself and your archive set—which will only display those archives you have set to “public” – individual archives, as well as individual visualizations within that archive. It’s as easy as sharing the url that appears in the address bar at the top of any browser. -
No installation required
By no longer requiring installation, users can use the service without the commitment of trusting the source, waiting for the download, ensuring they have the right system requirements and the other barriers to entry.
-
More visualizations
Our users had asked for a lot of new visualizations. Of course, we had ideas on what *we* would like to see and found useful, but talking to our current and potential future users helped us determine which visualizations were the most useful to our audience.
In my next Opinion, I’ll be writing about the process of scoping the new project to our resources, time-constraints and technical limitations. From there I’ll try to explain the process of releasing a project and making it be successful.




Follow the Conversation
2 comments so far. You should leave one, too.
On twitter, I follow lists such as @dorman/microsoft-mvp and @JesseLiberty/silverlight-must-read.
"save tweets" | "export data to analyze" are very useful features of The Archivist Desktop for reviewing web and mobile technology trends, etc.
Best regards, Rob
What a Good Information.
At the Jae Choi Law Offices an- experienced Queens bankruptcy attorney can help you.