NOTE: This post requires you to have read my previous post about data visualisation in python as it contains important information about data visualisation and the Iris dataset, which we will be using as an example in this tutorial. You can read the post by clicking the link given below. Also, it is recommended that you have an R IDE such as RStudio as it keeps the script, R console, terminal and plots etc. at your fingertips. Links from where one can download R and RStudio are given at the end.
What is R? Why learn R?
R is a programming language designed to aid with data analysis and statistical computing. Unlike python, it has many inbuilt datasets and methods to plot graphs and manipulate data. Because of this, it has a shorter and lesser demanding syntax, without the need to install and import expansive libraries. These are a few reasons as to why R is gaining popularity in the world of data science. In my opinion, data visualisation in R is far easier to learn than data visualisation in python.
Plotting Points in R
In R, we plot points using the plot(x, y…..) function, which takes its main parameters as lists of the x and y coordinates of the points. In addition to these, we can pass arguments to change the character/symbol of the points (pch argument), change the colours of points (assuming the character supports colours)(bg argument), set x- and y- axis labels (xlab and ylab), set the heading of the graph (main), among others. Try typing the following code in the R console or as a .R file and run it:
x <- c(1, 2, 3) y <- c(1, 3, 5) plot(x = x, y = y, main = "Simple Plot", pch = 21, bg = "yellow")
Viewing the Iris Dataset in R
Iris is one of the many built-in datasets (a.k.a. dataframes) in R. Therefore, all we need to do to print the dataset is type “iris” in the console or in a .R file and run it. To print only the first few observations of the iris dataset, we can type head(iris) instead.
Plotting Scatterplots of the Iris Dataset in R
To plot a scatterplots showing the relations between sepal length and width and petal length and width, we use the plot() function and jazz it up just a little bit. Instead of the x and y arguments, we can use the iris[index:index] method to obtain values of the columns we need. Also, we can add our own legend to match the colours to their species.
#plotting scatterplot b/w sepal length and width plot(iris[1:2], pch = 21, main = "Sepal Length and Width", bg = c("red", "green", "blue")[unclass(iris$Species)]) legend("topleft", legend = c("setosa", "versicolor", "virginica"), cex = .8, col = c("red", "green", "blue"), fill = c("red", "green", "blue"))
#plotting scatterplot b/w petal length and width plot(iris[3:4], pch = 21, main = "Petal Length and Width", bg = c("red", "green", "blue")[unclass(iris$Species)]) legend("topleft", legend = c("setosa", "versicolor", "virginica"), cex = .8, col = c("red", "green", "blue"), fill = c("red", "green", "blue"))
From the above examples, we can clearly notice the difference in code length and simplicity between python and R.
Scatterplot Matrices in R
The iris dataset has 4 features, and plotting scatterplots to compare all features to each other, two at a time, is a tedious task. R provides a built in solution – plotting scatterplot matrices. Scatterplot matrices have all the attributes arranged diagonally and plots a grid of scatterplots that lets one compare all features to each other at a glance. We use the pairs() function to plot a scatterplot matrix. The arguments are more or less similar to those of the plot() function. The code below, which can be directly typed in the console, demonstrates the plotting of a scatterplot matrix:
pairs(iris[1:4],main = "Iris Dataset", pch = 21, bg = c("red", "green", "blue")[unclass(iris$Species)])
To download R, click on the following link and follow the instructions given:
To download RStudio, click on the following link and follow the instructions given: