Rstudio provides an integrated development environment to handle free programming language “R” stuff, which is available under the license of GNU. Rstudio is an ideal computing environment for generating detailed statistical visualizations, and as such, is used by statisticians all over the world.
RStudio also happens to be available as a software program and as a server application, used by a variety of different Linux distributions and for Windows and macOS.
Download R programming language(Prerequisites)
Rstudio desktop application needs the R programming language to work on Linux distros. It is necessary to download an R version that is compatible with your Linux operating system. You can download it using a software repository.
1- Downloading R with the web browser
If you’re unable to get R from the software center, it means that the repository has to be updated first. You can just skip all that and download it off the web by typing in this link:
Into the search box on your web browser. Their homepage should resemble the screenshot below:
2- Downloading R from Linux terminal
Fire up the CLI terminal, type in the command below, and hit enter:
Then run an update with the commands below:
This command will fetch R’s updates and acquire all the relevant files from the main Ubuntu repository.
Then issue the following command to install R:
The command above goes through the package listing, revealing how much disk space it will fill up, then asks for confirmation. Hit the ‘Y’ key on your keyboard to continue with the installation.
The output will most likely confirm the installation.
You can look it up in the search box as illustrated below:
Installing Rstudio on Ubuntu 20.04 with command terminal
With the host programming language installed, we can now proceed to install Rstudio. To demonstrate the installation, we will be using the command line terminal.
Fire up the terminal and issue the following
You’ll be prompted to enter the root password. Once you enter the password, the package installation will commence
The Rstudio online package has now been connected and is being transferred to your hard drive.
You’ll be asked to enter the root password again. Enter the password to have the package list read and loaded.
The installed will ask for permission to continue, press the y key on your keyboard.
The output will verify the install, as shown below.
Getting started with RStudio:
To launch RStudio, head over to the search box and look-up Rstudio. You’ll see it within one of the lists as shown below:
Click on the Rstudio icon to launch it.
Investigating datasets with RStudio
With Rstudio, you can visualize any data in the form of graphs, tables, and charts.
To understand how data is represented visually in Rstudio, let’s take the sample 2010 census population for every zip code as an example.
The process of data analysis can be vaguely reduced to the following four steps:
1-Import raw data
You can import the raw data directly from the web into Rstudio by doing it systematically in the console window with the command below:
With the command executed, Rstudio will obtain the data as a csv file from the web, and the contents will be assigned to the cpd variable.
Another way to go about Importing data to Rstudio is by manually downloading the dataset to your harddrive and then open the contents with Rstudio’s import data feature.
Head over to the import dataset option in the Environment tab, and select the dataset file to upload. Click Ok, and you’ll be displayed the dialog about the dataset. This is where you’ll be specifying the parameters, as well as the names and decimals. When you’re done, just click import, and the dataset will be added to the Rstudio, and a variable will be assigned to its name.
To see what datasets are in use, issue the command below with the variable attributed to a dataset:
2 –Manipulating the Data
Now that you’ve imported the dataset, there’s a whole lot you can do to transform this data. The data is manipulated through transformation features. Assume that you want to tour to a certain array within the data set. If we were to go to the total population column in our dataset, we’d enter the command below:
The data is also retrievable in the form of a vector:
The subset function in Rstudio allows us to query the dataset. Let’s say we need to highlight the rows where the male to female ratio is positive. To pick out those rows, you’d issue the following command:
In the command above, the first parameter we assigned had to be the variable attributed to the dataset to which we applied the function. Boolean condition is considered as the second parameter. Also, the boolean condition has to be assessed for every row. It serves as the deciding factor as to whether or not a row is to be a part of the output.
3 -Using the average functions on the dataset
Rstudio has specific functions to work out averages on the dataset:
$ median(cpd$Total Females) – gives the median for a column
$ quantile(cpd$Total Population) –gives the quantile for a column
$ var(cpd$Total males) –works out the variance for a column
$ sd(cpd$Total Females) –gives standard deviation
To get the summarized report on the dataset, you can run any one of these functions on the whole dataset as well.
4 -Creating a graph for the dataset
If you’re going to work with Rstudio often, you’ll find its visualization tool very resourceful. You can create a graph out of any imported dataset with the plot and other visualization functions in Rstudio.
To generate a scatterplot for the dataset, you’d issue the following command:
Now, let’s discuss the parameters involved here. In each parameter, s refers to the subset of the original dataset, and by adding “p”, you’re indicating that you want the output plotted.
You can also represent your dataset in the form of a Histogram:
Similarly, to obtain a Bar chart of the imported dataset:
$ barplot(counts, main="Total Population Distribution",
$ xlab="Number of TotalPopulation")
Managing data in unevenly spaced time series
To manage data with unevenly spaced time series, you should integrate the zoo package with Rstudio. To get the zoo package, go to the lower-right corner of the screen in Rstudio and to the package’s component. The zoo package converts the irregular time series data into zoo objects. The arguments inserted to create zoo objects are the data, which comes first, followed by the value to order by.
Zoo objects provide ease of use support. All you have to do is type “plot”, and you’ll be displayed all the plot methods you can use with that zoo package.
If you find yourself confused about what a certain Rstudio function has to offer, enter that function’s name, and follow it with “?” to see the prompt on the help menu. Also, pressing ctrl+space after a function name produces the auto-completion window.
Wrapping up
This tutorial has illustrated how you can set up Rstudio on Ubuntu 20.04 and covered the basics of statistical representation and manipulation with Rstudio. If you wish to utilize Rstudio better, familiarizing yourself with R Programming basics should be a good first step. Rstudio is a powerful tool and has applications in many industries across the globe: artificial intelligence and data mining, to name a few.
Getting to know R programming’s nitty-gritty is a bit of a learning curve, but it’s worth the effort.