Stratified random sampling in R from a data frame

Important update The original function that was present at this post has been deleted. Instead, I’ve posted a much improved version for the sake of others visiting this page. The function is presently defined as: Arguments df: The input data.frame group: The grouping column(s). Can be a character vector or the numeric positions of the columns. size: The desired sample size. Can be a decimal (proportionate by group) or an integer (same number of samples per group).

The new sample size calculator for R (already)

aka “Maybe I shouldn’t post so quickly

Just hours ago, I posted my first set of functions for R to determine the sample size for a known population. Then, I had to update that post to reflect my newfound knowledge, and now, I thought I would update again, so that the best functions I came up with would all be in one place. There are two functions, sample.size.table() and sample.size(). Here’s some more information about each.

A sample size calculator function for R

IMPORTANT: This is here mostly to remind me of how I solved my problem. You should read “The new sample size calculator for R (already)” if you really want to use this function.

In the research class at the Tata-Dhan Academy, students are currently getting into sampling, so I thought I would introduce them to R. However, try as I might, I couldn’t find how to do a simple sample size calculation in R if I knew, for instance, the size of the population I wanted to sample from, the confidence level desired, and the confidence interval desired.

Now, I know that there are literally hundreds of such calculators online, but I thought it would be a good excuse for me to learn how to write a function. Here are my first three four functions which demonstrate some of the features available for writing functions in R. These are relatively basic, and there might be better ways to do this (if there are, please share!) but it was still a fun experiment for me.

Sampling with replacement in R

In my last post about sampling, Simple sampling with R, we were doing simple sampling without replacement–that is, each item could only be selected once. However, there are times when you want to simulate sampling with replacement. For example, if you wanted to simulate sampling the results of rolling a dice 50 times, your outcomes each time could be a 1, 2, 3, 4, 5 or 6, but 50 is more than 6, so you need to let the software “replace” the sample before it takes another sample.

This post explains how to do this with R.

Simple sampling with R

I mentioned in an earlier post (“Am I inconsistent?”) that I got interested in R because Amy had asked me to help her with some sampling at one point. Since that was my starting point, I thought I would share some of my experiments with you. In this post:

  1. Simple random sampling
  2. Simple random sampling with a seed
  3. Sorting your sample