Sampling with replacement in R

In my last post about sampling (Simple sampling with R) we were doing simple sampling without replacement–that is, each item could only be selected once. However, there are times when you want to simulate sampling with replacement. For example, if you wanted to simulate sampling the results of rolling a dice 50 times, your outcomes each time could be a 1, 2, 3, 4, 5 or 6, but 50 is more than 6, so you need to let the software “replace” the sample before it takes another sample.

This post explains how to do this with R.

> # ========================================================
> # Can we sample from things that are not numbers, for
> # example, pretend we are taking M&Ms out of a jar that
> # has blue, green, and red M&Ms?
> # ========================================================
> candy = c("blue","green","red")
> sample(candy, 20, replace=T)
 [1] "red"   "red"   "red"   "red"   "blue"  "green" "red"
 [8] "blue"  "blue"  "blue"  "blue"  "green" "green" "green"
[15] "blue"  "blue"  "blue"  "green" "blue"  "red"
> # ========================================================
> # "replace=T" is required since there are only three items
> # in our list. It means that we can sample the same item
> # more than one time.
> # ========================================================
> # ========================================================
> # What about probability?
> # ========================================================
> sample(candy, 20, replace=T, prob=c(0.7, 0.2, 0.1))
 [1] "blue"  "blue"  "green" "blue"  "blue"  "blue"  "blue"
 [8] "green" "green" "blue"  "blue"  "blue"  "blue"  "blue"
[15] "green" "blue"  "green" "green" "green" "blue"
> # ========================================================
> # A second time
> # ========================================================
> sample(candy, 20, replace=T, prob=c(0.7, 0.2, 0.1))
 [1] "green" "blue"  "green" "blue"  "blue"  "blue"  "blue"
 [8] "blue"  "blue"  "blue"  "blue"  "blue"  "blue"  "green"
[15] "blue"  "blue"  "red"   "green" "green" "blue"
> # ========================================================
> # Notice how, in general, we have more blues and greens
> # and almost no reds. You can sort these results too, but
> # then  you don't see the sequence in which the items
> # were selected, which might also be interesting. Here
> # are three more samples, sorted so you can see the effect
> # that setting the probability has on the outcome.
> # ========================================================
> sort(sample(candy, 20, replace=T, prob=c(0.7, 0.2, 0.1)))
 [1] "blue"  "blue"  "blue"  "blue"  "blue"  "blue"  "blue"
 [8] "blue"  "blue"  "blue"  "blue"  "blue"  "blue"  "blue"
[15] "blue"  "blue"  "green" "green" "red"   "red"
> sort(sample(candy, 20, replace=T, prob=c(0.7, 0.2, 0.1)))
 [1] "blue"  "blue"  "blue"  "blue"  "blue"  "blue"  "blue"
 [8] "blue"  "blue"  "blue"  "blue"  "blue"  "blue"  "blue"
[15] "blue"  "green" "red"   "red"   "red"   "red"
> sort(sample(candy, 20, replace=T, prob=c(0.7, 0.2, 0.1)))
 [1] "blue"  "blue"  "blue"  "blue"  "blue"  "blue"  "blue"
 [8] "blue"  "blue"  "blue"  "blue"  "blue"  "blue"  "green"
[15] "green" "green" "green" "green" "green" "red"
> # ========================================================
> # You can also assign a name to the probability set before
> # you start your sample instead of typing out the
> # probabilities each time you take a sample.
> # ========================================================
> ProbCandy = c(0.7, 0.2, 0.1)
> sort(sample(candy, 20, replace=T, prob=ProbCandy))
 [1] "blue"  "blue"  "blue"  "blue"  "blue"  "blue"  "blue"
 [8] "blue"  "blue"  "blue"  "blue"  "blue"  "blue"  "blue"
[15] "blue"  "blue"  "blue"  "blue"  "blue"  "green"
blog comments powered by Disqus