Sampling with replacement in R

In my last post about sampling (Simple sampling with R) we were doing simple sampling without replacement–that is, each item could only be selected once. However, there are times when you want to simulate sampling with replacement. For example, if you wanted to simulate sampling the results of rolling a dice 50 times, your outcomes each time could be a 1, 2, 3, 4, 5 or 6, but 50 is more than 6, so you need to let the software “replace” the sample before it takes another sample.

This post explains how to do this with R.

Let’s imagine that we want to take a sample from things that are not numbers. For example, pretend we are taking M&Ms out of a jar that has blue, green, and red M&Ms, and we want to pretend we’re randomly taking M&Ms out of the jar. Here’s what we do:

?View Code RSPLUS
1
2
3
4
5
> candy = c("blue","green","red")
> sample(candy, 20, replace=T)
 [1] "red"   "red"   "red"   "red"   "blue"  "green" "red"
 [8] "blue"  "blue"  "blue"  "blue"  "green" "green" "green"
[15] "blue"  "blue"  "blue"  "green" "blue"  "red"

In the above example, “replace=T” is required since there are only three items in our list. It means that we can sample the same item more than one time, sort of like taking a piece of candy out, recording what color it is, putting it back in the jar, mixing the candy up again, and then taking another sample.

Now, imagine that the M&M jar has more of a certain color of candy. In this case, we’d also be concerned with probability.

?View Code RSPLUS
1
2
3
4
5
6
7
8
9
> sample(candy, 20, replace=T, prob=c(0.7, 0.2, 0.1))
 [1] "blue"  "blue"  "green" "blue"  "blue"  "blue"  "blue"
 [8] "green" "green" "blue"  "blue"  "blue"  "blue"  "blue"
[15] "green" "blue"  "green" "green" "green" "blue"
> # A second time
> sample(candy, 20, replace=T, prob=c(0.7, 0.2, 0.1))
 [1] "green" "blue"  "green" "blue"  "blue"  "blue"  "blue"
 [8] "blue"  "blue"  "blue"  "blue"  "blue"  "blue"  "green"
[15] "blue"  "blue"  "red"   "green" "green" "blue"

Notice how, in general, we have more blues and greens and almost no reds. You can sort these results too, but then you don’t see the sequence in which the items were selected, which might also be interesting. Here are three more samples, sorted so you can see more easily the effect that setting the probability has on the outcome.

?View Code RSPLUS
1
2
3
4
5
6
7
8
9
10
11
12
> sort(sample(candy, 20, replace=T, prob=c(0.7, 0.2, 0.1)))
 [1] "blue"  "blue"  "blue"  "blue"  "blue"  "blue"  "blue"
 [8] "blue"  "blue"  "blue"  "blue"  "blue"  "blue"  "blue"
[15] "blue"  "blue"  "green" "green" "red"   "red"
> sort(sample(candy, 20, replace=T, prob=c(0.7, 0.2, 0.1)))
 [1] "blue"  "blue"  "blue"  "blue"  "blue"  "blue"  "blue"
 [8] "blue"  "blue"  "blue"  "blue"  "blue"  "blue"  "blue"
[15] "blue"  "green" "red"   "red"   "red"   "red"
> sort(sample(candy, 20, replace=T, prob=c(0.7, 0.2, 0.1)))
 [1] "blue"  "blue"  "blue"  "blue"  "blue"  "blue"  "blue"
 [8] "blue"  "blue"  "blue"  "blue"  "blue"  "blue"  "green"
[15] "green" "green" "green" "green" "green" "red"

You can also assign a name to the probability set before you start your sample instead of typing out the probabilities each time you take a sample.

?View Code RSPLUS
1
2
3
4
5
> ProbCandy = c(0.7, 0.2, 0.1)
> sort(sample(candy, 20, replace=T, prob=ProbCandy))
 [1] "blue"  "blue"  "blue"  "blue"  "blue"  "blue"  "blue"
 [8] "blue"  "blue"  "blue"  "blue"  "blue"  "blue"  "blue"
[15] "blue"  "blue"  "blue"  "blue"  "blue"  "green"

Of course, you can also use set.seed() to make your sampling replicable.


Related posts (possibly):

  1. Stratified Random Sampling in R–A Function in Progress IMPORTANT: This is here mostly to remind me of how...
  2. Simple sampling with R I mentioned in an earlier post (Am I inconsistent?) that...
  3. Stratified random sampling in R from a data frame After a little bit more work, there’s a new stratified...
  4. Using the reshape package in R for pivot-table-like functionality A little more than a week ago, I wrote about...
  5. The new sample size calculator for R (already) aka “Maybe I shouldn’t post so quickly” Just hours ago,...
This entry was posted in (all categories), Geekiness, Useless Knowledge and tagged , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.