A year ago, I wrote a post about reshaping data from a wide format to a long format. I thought that considering how much time had passed, it would be good to revisit R’s in-built reshape functions. For these examples, I’ve copied the Stata examples from the UCLA Academic Technology Services’s “Reshape data wide to long” page. Since the data is provided in Stata dta files, you need to first load the “foreign” package to be able to read the data in R.
A little more than a week ago, I wrote about creating pivot tables in Microsoft Excel and OpenOffice.org. I also mentioned that I would explain how to do similar calculations by using R. This post will explain how to achieve similar results in R by using the reshape package.
I had initially started experimenting with the reshape package several months ago when I was trying to figure out how to reshape data from wide to long formats. However, once I started experimenting with it, I realized I had misunderstood what the reshape package was designed to do. Now that I finally have a grasp of what can be done using the package, I thought I would share what I’ve found using a few examples.
A lot of the times, students at the Academy enter data in a “wide” format (since it is a very natural way to enter data in a spreadsheet). Let’s say, for example, that they were collecting data for a household, and for each person, they were collecting information on three variables. Assume also that they were only collecting information about five household members. They might end up with a first row of column names something like “HouseholdID” | “member.01″ | “member.02″ | “member.03″ | “member.04″ | “member.05″ | “variable1.01″ | “variable1.02″ | “variable1.03″ | “variable1.04″ | “variable1.05″ | “variable2.01″ | “variable2.02″ … and so on. Sometimes, however, we may find it more useful to have our data in a “long” format. This post tells you how to quickly do that using R.