Sampling with replacement in R

In my last post about sampling (Simple sampling with R) we were doing simple sampling without replacement–that is, each item could only be selected once. However, there are times when you want to simulate sampling with replacement. For example, if you wanted to simulate sampling the results of rolling a dice 50 times, your outcomes each time could be a 1, 2, 3, 4, 5 or 6, but 50 is more than 6, so you need to let the software “replace” the sample before it takes another sample.

This post explains how to do this with R.

Let’s imagine that we want to take a sample from things that are not numbers. For example, pretend we are taking M&Ms out of a jar that has blue, green, and red M&Ms, and we want to pretend we’re randomly taking M&Ms out of the jar. Here’s what we do:

> candy = c("blue","green","red")
> sample(candy, 20, replace=T)
 [1] "red"   "red"   "red"   "red"   "blue"  "green" "red"
 [8] "blue"  "blue"  "blue"  "blue"  "green" "green" "green"
[15] "blue"  "blue"  "blue"  "green" "blue"  "red"

In the above example, “replace=T” is required since there are only three items in our list. It means that we can sample the same item more than one time, sort of like taking a piece of candy out, recording what color it is, putting it back in the jar, mixing the candy up again, and then taking another sample.

Now, imagine that the M&M jar has more of a certain color of candy. In this case, we’d also be concerned with probability.

> sample(candy, 20, replace=T, prob=c(0.7, 0.2, 0.1))
 [1] "blue"  "blue"  "green" "blue"  "blue"  "blue"  "blue"
 [8] "green" "green" "blue"  "blue"  "blue"  "blue"  "blue"
[15] "green" "blue"  "green" "green" "green" "blue"
> # ========================================================
> # A second time
> # ========================================================
> sample(candy, 20, replace=T, prob=c(0.7, 0.2, 0.1))
 [1] "green" "blue"  "green" "blue"  "blue"  "blue"  "blue"
 [8] "blue"  "blue"  "blue"  "blue"  "blue"  "blue"  "green"
[15] "blue"  "blue"  "red"   "green" "green" "blue"

Notice how, in general, we have more blues and greens and almost no reds. You can sort these results too, but then you don’t see the sequence in which the items were selected, which might also be interesting. Here are three more samples, sorted so you can see more easily the effect that setting the probability has on the outcome.

> sort(sample(candy, 20, replace=T, prob=c(0.7, 0.2, 0.1)))
 [1] "blue"  "blue"  "blue"  "blue"  "blue"  "blue"  "blue"
 [8] "blue"  "blue"  "blue"  "blue"  "blue"  "blue"  "blue"
[15] "blue"  "blue"  "green" "green" "red"   "red"
> sort(sample(candy, 20, replace=T, prob=c(0.7, 0.2, 0.1)))
 [1] "blue"  "blue"  "blue"  "blue"  "blue"  "blue"  "blue"
 [8] "blue"  "blue"  "blue"  "blue"  "blue"  "blue"  "blue"
[15] "blue"  "green" "red"   "red"   "red"   "red"
> sort(sample(candy, 20, replace=T, prob=c(0.7, 0.2, 0.1)))
 [1] "blue"  "blue"  "blue"  "blue"  "blue"  "blue"  "blue"
 [8] "blue"  "blue"  "blue"  "blue"  "blue"  "blue"  "green"
[15] "green" "green" "green" "green" "green" "red"

You can also assign a name to the probability set before you start your sample instead of typing out the probabilities each time you take a sample.

> ProbCandy = c(0.7, 0.2, 0.1)
> sort(sample(candy, 20, replace=T, prob=ProbCandy))
 [1] "blue"  "blue"  "blue"  "blue"  "blue"  "blue"  "blue"
 [8] "blue"  "blue"  "blue"  "blue"  "blue"  "blue"  "blue"
[15] "blue"  "blue"  "blue"  "blue"  "blue"  "green"

Related posts (possibly):

  1. Simple sampling with R I mentioned in an earlier post (Am I inconsistent?) that...
  2. Using the reshape package in R for pivot-table-like functionality A little more than a week ago, I wrote about...
  3. R is like a giant calculator for grownups One of the things that is interesting about R is...
  4. Getting data into R When you first open R, you’re greeted with a screen...
  5. Quickly reshaping data from “wide” to “long” formats in R A lot of the times, students at the Academy enter...
This entry was posted in (all categories), Geekiness, Useless Knowledge and tagged , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

One Trackback

  1. By R is like a giant calculator for grownups on June 30, 2010 at 12:36 pm

    [...] is. One of the fun things about it is how interactive it can be. While my examples so far have been a little bit more involved, it can be useful to spend some time just getting acquainted with how R performs [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

blog comments powered by Disqus