You are reading an article. Lots more of these in the archives. Joshua Allen Meet Joshua Arrow

Articles

23Comment Retweet

5 Tips For Building Effective Infographics

Jan 30, 2009 In Design By Joshua Allen

Information Visualization is a hot topic. It seems like a new data visualization library or artistic visualization is released every week. As Jon Udell's article shows, we are just beginning to see how software can help us find meaning in data. So, when we set out to create this issue of MIX Online, we decided to focus specifically on interactive infographics, a fascinating application of information visualization that hasn't received much coverage

Infographics (short for Information Graphics) are part design, part data visualization. Like generic data visualization tools such as charts and graphs, infographics typically represent some data in a way that a viewer can quickly understand. But infographics are customized specifically to the data, topic, and audience – each infographic is essentially a creative work that gets a point across.

By pairing your infographic with software, you can allow people to interact with the visualization and explore the concepts, giving a more powerful and immediate ability to understand the point you are making.

Inspired by the visualizations that we regularly see in Good Magazine, Wired, and New York Times, we created some sample interactive infographics. Take a look at obesity trends in America, or explore the job roles that go into creating a web site.

Creating a slick infographic needs a bit more effort than is needed to fire up a generic charting or graphing tool. But the extra work can really pay off, making your visualization stand out from the crowd.

And to help you out,we have compiled this list of handy tips for creating effective infographics:

5. Data is King

Once you’ve decided what point you want to communicate with your visualization,you need to collect the necessary data. More often than not, the data isn’t readily available via a convenient JSON API, and almost never available in a convenient database format. Even if you don’t need to go to the library and manually copy data from some data bureau’s print publications, you may end up downloading delimited text files, screen-scraping HTML tables, or worse. Luckily, this job can be largely automated with tools like Excel (for transforming delimited CSV and TSV), Dapr, and Many Eyes.

A good example is the data from Centers for Disease Control (CDC) health risk factors survey. CDC collects a huge amount of data on a yearly basis, including trends on smoking, drinking, hypertension, as well as obesity. All of the data that CDC collects is available for free download to the public; more than 100,000 individual survey results that can be mined for trends.

When we began creating our obesity visualization, we quickly noticed some challenges with the CDC data. The first thing we noticed is that the CDC’s map sometimes did not match the data in the PDF report that the CDC publishes. And we noticed that the data listed in the PDF report sometimes didn’t match the CSV file of data that can be exported from the CDC database. The discrepancies were rare enough and small enough that they could easily be considered transcription errors, but to prove this out we wanted to get as close to the source data as possible.

The raw data for the CDC is in the individual surveys turned in for thousands of people each year. This information is in SAS XPORT format, a commercial software used by statisticians. Since we wanted our results to be reproducible by anyone, and since you might be interested in doing your own analysis of the 100 or so other data points collected by the CDC, we needed to process the data using a freely-available tool. The ‘R’ statistical software was up to the task. Here is the ‘R’ script that we used to aggregate all of the data on the ‘BMI’ (body mass index) field for the years 1987-2007:

# turns BMI into either normal/overweight/obese
obesity <- function (val) {
  if (val <= 249) {
   return ("normal")
  } else if (val > 249 & val <= 299) {
   return ("overweight")
  } else if (val > 299 & val < 999) {
   return ("obese")
  } else {
   return ("unknown")
  }
 }

obvect <- Vectorize(obesity)

# given state code used by CDC, returns state abbreviation
lookstate <- function(scode) {
   s <- subset(state, Code == scode, select=Abbrev)
   return (s$Abbrev)
 }

lsvect <- Vectorize(lookstate)

state <- read.table("d:\data\cdc\obesity\states.csv", ?
                                    sep=",", header=TRUE)

# aggregates all BMI survey results for a single year, from
# a file named YYYY.xport

# exports the aggregated data to a file named YYY.CSV
importyear <- function(year) {
  w <- read.xport(paste('d:\data\cdc\obesity\',year,'.XPT', sep=""))
  # only take survey results with legitimate BMI
  wfilt <- subset(w, X.BMI < 999, select=c(X.STATE, X.BMI))
  # convert BMI to obese/overweight/normal ranges
  wfilt$X.OBES <- obvect(wfilt$X.BMI)
  # get state abbreviations
  wfilt$State <- lsvect(wfilt$X.STATE)
  # aggregate
  result <- table (wfilt$State, wfilt$X.OBES)
  write.table (result, paste('d:\data\cdc\obesity\',?
  year,'.csv', sep=""), sep=",")
  return (TRUE)
}

iyvect <- Vectorize(importyear)

years <- 1987:2007

iyvect (years)

4. Resist Scope Creep

Once you have some good data and get the first builds of your visualization working, it can be very tempting to add features. People see the visualization, and say "wouldn’t it be cool if…?" Often, the new functionality can be added quickly, so it’s hard to resist the temptation. But you need to remember that your goal is to answer one question and communicate crisply — not to be all things to all people. Adding too many bells and whistles will dilute your message, and distracts you during development. We struggled with this on a couple of our visualizations, and in a couple of cases ended up having to back out functionality to keep the designs clean and simple.

3. Don’t Over-Architect

Normally when you write code, you want to make your code as generic, maintainable, and reusable as possible. But infographics are often one-off projects that you create for a specific purpose. You don’t typically plan on a v2 and v3 of an infographic. There is no need to abandon good coding practices, but don’t overly componentize or modularize your code. As an example, there is probably not a large market for data-bound controls which render T-shirts on a grid, so we didn’t bother to design that piece of the visualization as a standalone control. On the other hand, it is conceivable that someone would want to use a treemap control for some scenario outside of inauguration speeches, so we *did* componentize that piece of the visualization.

2. Stay Agile

Saying that your infographic code is one-off doesn’t mean that anything goes. You can be relatively more careless with modularization and componentization, but you should be relatively *more* careful with your separation between data and visualization. You want to be as nimble as possible when it comes time to make tweaks to the design. The best approach is to bring in a little bit more data than you need, and write your code to be schema-agnostic. We found this to be a joy when using LINQ. We designed our visualizations to be loaded from a sequence of objects (e.g. Friendfeed statuses, State obesity records, TreeMapItems, etc.). Then we encapsulated separately the LINQ code to read information from JSON or XML and project into the desired type. Even if you don’t have LINQ, it’s smart to keep as clean a separation as possible between your data parsing and your rendering.

1. Answer One Question

The most important step in creating your infographic is deciding precisely what question you want to answer. The more time you spend here, the more effective your visualization will be. Clearly settling on one question to answer can be deceptively difficult. Of course, it’s easy to say what you want to communicate in general terms: "We want to show the impact of AIDS in Africa", or "We want to show what a serious problem obesity in America is". But to maximize your message’s impact, you have to get specific. Specific questions are "Show me how the proportion of obese people in the country has grown year-over-year". Or "show me a comparison of how much the various presidents talked about a topic relative to one another". You are telling a story, but you don’t use words to tell the story. The question you choose will determine what data you present, and how you will visualize it.

Follow the Conversation

23 comments so far. You should leave one, too.

Rick Barraza said on Jan 30, 2009

I like the visualizations with the exception of the obesity epidemic visualization. What you''re trying to map is a data value across geographic regions, yet a) You are using area in a vague way (comparing circles v. pie segments) and b) you''re displaying geographic data alphabetically v. spatially. How exactly are the values, say 21.2 to 28.1 being converted to the size of the shirt in any helpful way? Is it the area of the shape, or simply the maximum width of the diameter? These are drastically different. The graphic resolution of detail, therefore, is far less accurate and possibly misleading. Also, how many people think of states as an asc. / desc. alphabetical list? Wouldn''t a color coded map with shaded states been better? Perhaps using a depth / z-index approach for filtering (raising / dropping subsets along z axis)? This one could have used a little more Edward Tufte and a little less "chart junk". Really dug the other ones, though.

Joshua Allen Joshua Allen said on Jan 30, 2009

Hi Rick,

Good comments. I especially like the ideas about incorporating some z-axis for filtering and subsetting. And interesting points about tee-shirt sizes.

We explicitly stayed away from the geographic approach, since we felt that it gives the impression that geography is a major determining factor in obesity -- or at least, it would predispose people to look for geographic patterns. Having people walk away saying, "Folks in the south are GINORMOUS!" is not the message we wanted to convey.

We struggled a bit with -- do we use alphabetical ordering or order by obesity by default? The advantage of an alphabetic ordering is that it remains stable over time, but I sort of like the graphic sorted by obesity rate, since it makes the point of aggregate girth more directly.

Rick Barraza said on Jan 30, 2009

Great points, Joshua. Your right, as soon as I responded, I realized that that straight geography without population density per state region also being factored somehow could also be misleading as well. As a general personal guideline, though, if I find I''m using two values (size and color) to both indicate one dimension of data, I try and think of an alternative. I think its fantastic, though, that we''re having these great types of converations now with RIA, Silverlight and the MSFT stack. Can''t wait for the SL3 and hearing more from you guys! Keep up the great work.

Jeremy Handcock said on Jan 30, 2009

Nice work! The ideas in your Social Timeline share a lot in common with some recent work of mine (http://aperte.org/2009/01/30/mix-social-timeline), so it was really cool to see this.

Keith J. Farmer said on Feb 3, 2009

The T-shirt size I think was a good visualization, but as you might point out -- what was your question?

Data, even visualized, is of limited value on its own. You are generally, even subconsciously, trying to find relationships. In the obesity graphic, you''ve shown that some states are more obese than others, but you''ve halted the inference at that point.

I realize you want to avoid inferencing weight vs region, but I think you need other, more interesting, axis than state name. You''ve got trivia, but no knowledge, and in the case of obesity there is something to be said about culture-driven dietary (or sedentary) habits.

Perhaps a map with typical t-shirts displaying typical levels of physical activity? Make it animated. :)

Nishant Kothary said on Feb 3, 2009

@Keith - Thanks for your comment. I''m totally on board with your idea about another axis (something other than state name) and we actually struggled with that for a while. Our primary question was, "What is the ongoing national obesity trend?" We locked on this pretty early, and decided that we would answer it with "The Obesity Epidemic":http://www.visitmix.com/labs/descry/theobesityepidemic/. The thing we found is that most people don''t really know the extent of the obesity epidemic in this country; they''ve heard that we''re getting fatter as a people, but that''s about it. We wanted to drive that point home and provide the data to help someone drill into the national trend if they wanted to. I think the infographic does that fairly well.

A side note is that we''ve received feedback on both sides of the coin - some folks have said, "Man, I didn''t know it was that bad." And others like you have said, "OK, this is cool, but I want to know more." That''s always the challenge with answering one question well. Some folks already know the answer to that question (or want a more elaborate answer). Others don''t. Either way, the purpose of the infographic as a very basic level is to arouse curiosity and inspire conversation and hopefully we were able to do that.

Personally, I''d have loved to have drawn a correlation between t-shirt sizes and physical activities (that''s a great idea). I''m pretty sure the team felt the same way; we had several wouldn''t-it-be-awesome-if-we-could-add-this conversations through the life of the project. But, to Joshua''s 4th point, we all resisted scope creep.

If you''re interested, we will be sharing the code really soon. Maybe you can download and extend the viz with some of your ideas. :)

visualthinkmap said on Feb 8, 2009

i really loved the alternate arrangemtn function of by obesity rate, as it gives a much greater overview for me of the overall obesity epidemic throught the lovely graduated scale of light to dark of red for the density of obesity in each particular state.

size nice idea, but the small would be small and probably reduce legibilitity. the colour graduation works.

i agree with nishant that some poeple want more, answers elaboarted, but an inofgraphics purpose, minimal level of success, is to arouse curiosity about the subject matter at hand.

i assume this extra axis is a way of adding isometric/3d like i saw recently at infoaesthtics where there was a 3d treemap. this might work.

as someone said, the geographic depiction for states is tempting and far easier but would elicit focus on the extremes of the data such ''the south are...''.

i would say as a person with a pathetic knowledge of states in america, the abbreviations to MO for MISSOURI is a good sapce saver, but the key needs to easier to detect as i only just noticed the faint white contrasted to grey for the identification difficult to detect.
i like the soft grey, white and pastels but maybe a stronger white/black might be easier.

maybe the small geographic map used as a key for states, not to show the density of obesity for each state, but a basic mouse over (since its interactive) that highlights the state in the key with a single like grey just to make it easier to recognise the state,
rather than just the letter links. But this might take it to the ''south are...'' still, which you wanted to negate.

overall i liked it and it definetly works to initiate curiousity and is soft on the eye with alternate ways of presenting the data (ascend/descend) & (alpahbet/rate), which i think is extremely useful where possible.

very good... havent looked at the others yet.

love the project though. will have to rememb to keep checking on your latest and liked the aricle about focus on the question and could take from it even though it ''skim read btw'' seemed coding orientated.

The Cydonian said on Feb 16, 2009

This may or may not be a good place to point this out, but the Their First Words download at Codeplex doesn''t work. It appears that the code looks for an InfoPath file called inagural.xml which doesn''t seem to be included with the download.

The Cydonian said on Feb 16, 2009

Actually, my mistake entirely; the InfoPath file exists and the app works fine. :-) Was looking for it in the wrong places.

Neat demo on LINQ as well, compliments to everyone who worked on it and for sharing it under an MS-PL license. :-)

Kyralessa said on Mar 1, 2009

The "Their First Words" is very interesting...but did you notice that the search is case-sensitive? Can that be fixed?

Joshua Allen Joshua Allen said on Mar 1, 2009

@Kryalessa: Yep, we decided to make it case-sensitive so that we could suss out religious usages of words like "One" and "He", which have a different meaning in inaugural speeches than their non-capitalized counterparts. However, since these are simply regular expressions, you can easily perform case-insensitive searches. For example, if you wanted to search for all uses of "him", no matter how capitalized, just start your query with (?i). For example:

(?i)him

Would get you all capitalizations of the word "him". Also, if you are customizing this code for your own use, and you don''t want case-sensitivity, you can change the call the th eRegExOptions to set the "IgnoreCase" option.

Phil Cockfield said on Mar 16, 2009

Beautiful work all round!

I was wondering. The sliding (gravity laden) panel in the Social Timeline is certainly smooth. Did you guys do a lot of custom work to make that happen? Would you care to make a few comments on your general approach.

Thanks!

Hans Hugli Hans Hugli said on Mar 17, 2009

@Phil: Certainly! We did a couple of things to ensure good performance of this application.

We kept the sheer number of Silverlight elements down by employing control virtualization. I’ve put together a blog post that talks about basic control virtualization here: http://tinyurl.com/aag2t9. Essentially it shows how we keep elements that are not visible on screen out of memory (or out of the VisualTree.)

Then we took advantage of the CompositionTarget.Rendering event. This event gets fired once per frame, right before rendering, and it fires about 60 frames a second. The animation speed of the panels is based on the last known mouse velocity which is calculated in the MouseMove/MouseUp events. When the CompositionTarget.Rendering event fires, it calculates the new position of a panel by adding the velocity to the x-coordinate of the panel. Friction is also applied to the velocity so that as the Render event is called successively, the panel motion eventually slows down and comes to rest. Take a look at the CompositionTarget_Rendering method in: http://tinyurl.com/dxrnkn

Here''s a little more about CompositionTarget Rendering performance considerations: http://work.j832.com/2007/06/re-compositiontargetrendering.html

Lastly we used Scobleizer as our FriendFeed user for testing. That helped us to find our app performance bottlnecks and make it perform well with large numbers. ;)

I hope this helps!

Angharad LLewelyn said on Mar 20, 2009

Looking for the first time at your concepts it is really great and my initial thoughts are that it could maybe be used for a childs computer to instruct and educate and/or for dyslexic computer users to accelerate learning of many subjects otherwise unavailable would love to explore the possibilities of educational uses within the classroom. Will enjoy keeping up to date with your site. Very Interesting mmmm

Liam McGee said on Nov 26, 2009

Joshua, I love the ''A website named desire'' infographic.

I am an accessibility test engineer. Pleased to meet you.

Joshua Allen Joshua Allen said on Dec 17, 2009

Thanks, @Liam! Stay tuned over the next couple of months, and we''ll be providing an easy way to get your own wall-sized poster. We will announce on our RSS feed and on our Twitter account (@mixonline)

????? ????? said on Dec 29, 2009

Great points, Joshua. Your right, as soon as I responded, I realized that that straight geography without population density per state region also being factored somehow could also be misleading as well. As a general personal guideline, though, if I find I’m using two values (size and color) to both indicate one dimension of data, I try and think of an alternative. I think its fantastic, though, that we’re having these great types of converations now with RIA, Silverlight and the MSFT stack. Can’t wait for the SL3 and hearing more from you guys! Keep up the great work.

tec-goblin tec-goblin said on Mar 15, 2010

The obesity visualisation doesn''t work on my browser. I think some kind of floating point number is passed without respect to the locale, so commas and points get mixed...

Détails de l’erreur de la page Web

Agent utilisateur : Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; OfficeLiveConnector.1.3; OfficeLivePatch.0.0; Zune 4.0; MS-RTC LM 8; .NET4.0C; .NET4.0E)
Horodateur : Mon, 15 Mar 2010 14:20:16 UTC

Message : Unhandled Error in Silverlight 2 Application Le format de la chaîne d''entrée est incorrect. à System.Number.ParseDouble(String value, NumberStyles options, NumberFormatInfo numfmt)...
System.Net.WebClient.DownloadStringOperationCompleted(Object arg)
URI : http://www.visitmix.com/labs/descry/theobesityepidemic/getsilverlight.js

Hans Hugli Hans Hugli said on Mar 15, 2010

Hi @tec-goblin, I may need more info. It looks as though you are running Silverlight 2.0 is this correct? Do the other visualization work for you? Contact me at mixon@microsoft.com directly so that I can help you get it working.

ahmet maranki ahmet maranki said on May 1, 2010

Actually, my mistake entirely; the InfoPath file exists and the app works fine. :-) Was looking for it in the wrong places...

diyaliz diyaliz said on May 1, 2010

Nice work! The ideas in your Social Timeline share a lot in common with some recent work of mine so it was really cool to see this.

Port Vila Port Vila said on Aug 17, 2010

Can anyone advise on good software programs for infographics, something other than Photoshop?

Casque Beats By Dr Dre said on Apr 17, 2012

You guys have a wonderful website intending the following, KIU!