The 'koboloader' package for R

The “koboloadeR” R package is designed to make it easy to retrieve data collected using the KoBo Toolbox or other services using the same API (for example, KoBo Humanitarian Response, Ona). Of these, KoBo Toolbox is quite generous with 1,500 submissions per user per month as their limit (and I believe something like 5GB per month for data submissions–like photos, videos, or audio recordings that can be linked to a survey).

The package is available at GitHub. Get it using:


(Note: install_github via @jtilly)

I won’t go into details here about KoBo. It’s an awesome tool for quick mobile data collection. Developing the survey tools is very easy and provides you with a good range of question types. You can collect data while offline and sync it with the server when you have a data connection available. And, you can export data into different forms for analysis later on.

Read on for more details!

splitstackshape V1.4.0 for R

After more than a year since splitstackshape V1.2.0, I’ve finally gotten around to making some major updates and submitting the package to CRAN.

So, if you have messed up datasets filled with concatenated cells of data, and you need to split that data up and reorganize it for later analysis, install and load the latest version (V1.4.0) of splitstackshape with:

## [1] '1.4.0'

Read on for details!

An R function like "order" from Stata

The "splitstackshape" package for R

A while ago, a friend of ours presented me with a data problem. Her questionnaire had some questions where the respondent could provide multiple responses. You know, the "Check as many as apply" type of questions. One way that this data is commonly stored is to put a comma separated value into a single cell in a spreadsheet. In fact, if you use something like Google Forms to collect your data and have questions that use check-boxes, that's how your data will finally be stored in a Google Spreadsheet.

What exactly is "elegant" code in R?

In celebration of my achieving 10,000 “reputation” on Stack Overflow, I’m re-posting one of my questions from there that was (as I had expected) deleted after being live for about 5 hours. In that time, I never really got a satisfactory answer, so if anyone wants to offer one in the comments, that would be great!

Stratified random sampling in R from a data frame

Important update The original function that was present at this post has been deleted. Instead, I’ve posted a much improved version for the sake of others visiting this page. The function is presently defined as: Arguments df: The input data.frame group: The grouping column(s). Can be a character vector or the numeric positions of the columns. size: The desired sample size. Can be a decimal (proportionate by group) or an integer (same number of samples per group).

Reshaping data in R revisited

A year ago, I wrote a post about reshaping data from a wide format to a long format. I thought that considering how much time had passed, it would be good to revisit R’s in-built reshape functions. For these examples, I’ve copied the Stata examples from the UCLA Academic Technology Services’s “Reshape data wide to long” page. Since the data is provided in Stata dta files, you need to first load the “foreign” package to be able to read the data in R.

Regular expressions in R

In my last post, I showed a few things I had figured out recently related to regular expressions. By now, you have also figured out that I like figuring things out in R, and application of regular expressions is one of these things.

Sounds interesting. Is that a regular expression?

I’ve been meaning to learn how to use regular expressions for quite some time now, but just never seemed to get around to doing so. The other night, I decided to take a stab at them though, and over the past few days, I’ve sort of managed to learn a few tricks. Some of these might seem unnecessary, particularly since the examples comprise relatively small chunks of text. But, hopefully you can also see the application of the same techniques for larger text files. In some of the examples, I’ve also included how it might help with preparing your data for use with a program like R. For all of these examples, I’ve used Geany as my text editor. I suggest you use a good text editor like Geany or Notepad-plus-plus too.

The new sample size calculator for R (already)

aka “Maybe I shouldn’t post so quickly

Just hours ago, I posted my first set of functions for R to determine the sample size for a known population. Then, I had to update that post to reflect my newfound knowledge, and now, I thought I would update again, so that the best functions I came up with would all be in one place. There are two functions, sample.size.table() and sample.size(). Here’s some more information about each.