R

# Sample() Function in R

In R, we get randomly sample values from a vector or a list using the sample() function. It enables us to randomly select a subset of data which is useful in many statistical applications. If the input is a list in the sample() function, the output will also be a list with the same number of elements, but with the selected elements. This article demonstrates the sample() function of R with the implementation which sets the various arguments.

## Example 1: Using the Sample() Function with the Data Argument

The sample() function of R must be provided with the sample data to randomly generate a number. The sample data is the required argument of the sample() function whose code is given in the following:

dataX <- c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100)

sample(dataX, 3)

sample(dataX, 3)

Here, we first generate the vectors of the integer elements within the “dataX” variable. Next, we call the sample() function twice in the code and pass the “dataX” vector that we previously generated as an argument to it. The first use of the sample(dataX, 3) takes a random sample of three elements from the “dataX” vector. The results are a random permutation of three elements from “dataX”. After that, we use the sample(a, 5) again which takes another independent random sample of three elements from the “dataX” vector. This time, the outcome is entirely distinct from the last one.

The output shows the different elements on calling the sample() function twice. Note that every time we create the sample randomly, different elements from the vectors are obtained: ## Example 2: Using the Sample() Function with the Replace Argument

Furthermore, we have the “replace” argument of the sample() function which takes the logical values. A similar element can be selected more than once if the element is sampled with the replacement option, TRUE. However, if the value is set to FALSE, there can only be one selection of each element which causes the elements to be sampled without replacement.

random_numbers=c(11,25,12,89,45,16,67,38,96,55,73)

sample(random_numbers,4,replace=TRUE)

sample(random_numbers,5,replace=TRUE)

Here, we first define the vector with some numeric values in the “random_numbers” variable. After that, we invoke the sample() function where the “random_numbers” is passed as an argument. The value of “4” is specified to the sample() function which indicates that it only selects four random values from the vectors in “random_numbers”.

Next, the replace=TRUE in the sample() function specifies that each value can be selected more than once. Then, we deploy the sample() function again which selects “5” random values from the vectors this time. Similarly, we set the replace argument with “TRUE” as before for the multiple selection options for each value.

As we can see, the first output displays the vector of 4 randomly selected elements from the “random_numbers” vector. The next output, however, displays a vector of “5” randomly selected elements: ## Example 3: Using the Sample() Function with the Size Argument

The next argument that the sample() function passes is the “size”. The “size” is an optional parameter that indicates the value of samples to be drawn. The code of the sample() function with the “size” parameter is given in the following:

vectors <- 1:10

sample(vectors, size = 5)

Here, a numeric vector is defined as a sequence of integers from 1 to 10 in the “vectors” variable. The sample() function is then employed to random elements selection from the vector. As we can see, the sample() function takes two arguments. The first argument are the vectors that we get the sample from. The next argument is the size which is specified with the value of “5” which indicates that there are only five elements to select from the vector.

Hence, the selected elements are returned in a random order as a new vector in the following output: ## Example 4: Using the Sample() Function for the R List

Moreover, the sample() function can be used for the list in R. This section of the example gets random values from the list.

R_list <- list(1:4,

913,

c("X", "YYY", "GOOD"),

"ZZZ",

5)

result <- R_list[sample(1:length(R_list), size = 4)]

result

Here, the list of “R_list” is defined with elements of different types including a vector of numerics, a single number, a character vector, a string, and another number. After that, we create a “result” variable where the sample() function is invoked.

Inside the sample() function, we set the “1:length(R_list)” expression which indicates the vectors of indices to sample through. Next, we have a “size” argument to specify the number of elements to be sampled which is “4”. Therefore, the “R_list” generates three randomly selected elements from the list of “R_list”. Since the elements in the list of “R_list” are of different types, the resulting elements in “result” can also be of different types.

The output represents the new list which contains a random subset of the original list: ## Example 5: Using the Sample() Function with the Prob Argument

Additionally, we have the “prob” parameter of the sample() function. The “prob” argument gives the probability of the selected element in the vector. Note that all elements are assumed to have equal probability when the “prob” argument is not used.

my_data=c(31,99,5,24,72)

sample(my_data, size = 10, replace = TRUE,

prob = c(0.5, rep(0.1, 4)))

Here, the elements of numeric vectors are referred to the “my_data”. In the next step, we call the sample() function where the “my_data” is passed to randomly selected 10 elements from it. Then, the “size” argument is defined which specifies that the value to select randomly should be of “10” size. After that, we assign “TRUE” to the “replace” argument which means that each selected element is replaced into the vector before selecting the next one. The third argument that is defined in the sample() function is “prob” which defines the probability of each element in the “my_data” vector to be selected. The probability of the first element is set to “0.5”. For the remaining four vector elements, the probability is “0.1”.

The following output is retrieved with the highest probability of the first element in the vectors as expected: ## Example 6: Using the Sample() Function to Render the Barplot

Lastly, the sample() function is used to construct the barplot in R to visualize the distribution of a categorical variable with a given probability distribution.

sample_data= c(1, 2, 3)

barplot(table(sample(sample_data, size=500, replace=TRUE, prob=c(.30,.60,.10))))

Here, after defining the “sample_data” with the vector of an integer value, we generate the barplot by deploying the sample() function. First, we call the barplot which invokes the table() function to create a frequency table of the resulting sample. Then, we specify the sample() function within the table() function where a random sample of size 1000 is drawn from a vector of integers 1 to 3. Then, the “prob” argument is used to specify the probability of selecting each integer.

As we can see now, the barplot is rendered in the following with the three bars, one for each integer, and the height of the bars are relevant to the integer that occurs in the sample: ## Conclusion

We have seen how the sample() function works with various examples. The sample() function is used with different arguments where the sample data is required and all the other arguments are optional and are called upon specific cases. However, the sample() function is useful in statistical analysis or when working with large datasets. 