The original function that was present at this post has been deleted. Instead, I’ve posted a much improved version for the sake of others visiting this page. The function is presently defined as:
df: The input
group: The grouping column(s). Can be a character vector or the numeric positions of the columns.
size: The desired sample size. Can be a decimal (proportionate by group) or an integer (same number of samples per group).
select: A named
listwith optional subsetting statements.
replace: Logical. Should sampling be done with or without replacement.
bothSets: Logical. Should a
listbe returned. Useful when setting up a "testing" and "training" sampling setup.
And here are some examples of the function in action:
There is also a
data.table version that is much more efficient but has the same functionality.