Today we’ll start by familiarizing ourselves with the RStudio Graphical User Interface (GUI) and basic R commands and operators. We’ll also examine Git version control and R Markdown for making “pretty” documents.
I am a strong believer in open code, open access, open resources, and not reinventing the wheel. That being said, while I attribute sources within this document, I’d like to acknowledge upfront those whose hard work I have shamelessly drawn from or used as reference.
These tutorials are intended to provide you with some background information on topics and outline some tasks to be undertaken in R. These are a supplement to, not a replacement for, the reading.
While I will not collect R scripts or other documents from you demonstrating that you have completed the tutorials, I do expect you to work through the tutorials in R.
Why do I expect this? Because no one learned how to write code or use a piece of software by just reading about it. This is one of those things you must do to learn.
I also expect you to work through the tutorials before the next class period. Even if you are just copying worked through examples you are bound to run into problems at some point. It’s better that we identify those problems and fix them before we start adding on to the complexity.
Each tutorial will have several tabs. You should work through the tabs going from left to right.
Let’s start off by opening up RStudio and setting up the Graphical User Interface (GUI). (A written guide to much of this information can be found in The R Book: Chapter 1. Note that some of the information in this source is outdated.)
If you would like to change the display settings go to Tools –> Global Options –> Appearance and choose a new “Editor Theme”. (My personal preference is “cobalt”.)
Now lets check out the version and citation information for R.
Type the following into the command console window (lower left) to the right of the >
Hit enter after each command.
R.Version()
citation()
Note: In R ( )
immediately following a word indicate that we are working with a function. The parentheses hold the arguments that must be answered in order for a function to run. For example: if we wanted to calculate the mean value of a vector of values we would need to provide, at the very least, that vector of values to the function that calculates the mean. In some cases, such as those run above, no arguments need to be supplied. In these cases, R will refer to some pre-established default. In these tutorials I will use this font
to denote functions, oeprators, and code within the text.
If you have played around with R a little you may be wondering what packages are and why there are so many of them.
Simply put packages are user-developed collections of tools and functions that work in the R environment. These packages are what make R so powerful, but at times so frustrating. As tools developed by R users for R users many of the packages focus on simplifying the use of R for specific applied science and data analysis tasks. (REALLY HELPFUL!) This means that your average R user, like you and me, can focus on the science instead of worrying about the nitty gritty details of the mathematics and programming behind the methods.
However, this also means that each package has its own unique syntax, function names and argument calls, user support documentation, update timing (and sometimes no updates), and occasionally even data classes. (REALLY FRUSTRATING!) And because there are so many packages it is unrealistic to expect to become an expert in them all. Most R users frequently reference the R help and other online forums when they can’t remember the name of a specific function, or how to structure an expression, etc….
Don’t worry. Once you learn how to: find packages that can help you solve your problems, read the documentation, navigate the forums and search for help you’ll find that there are very few data problems that can’t be tackled in R.
To install a package simply type install.packages("name of package")
in the command line.
To load the package type library(package name)
in the command line.
Simply type update.packages()
in the command line to update all packages.
Try the following:
install.packages("sf") #sf is a fantastic package for handling spatial data that we will use often later in the semester
library(sf)
update.packages()
# P.S. Text following the "#" sign is designated as a comment and is not treated or run like code.
# Note that in the cobalt user interface comments are also a different color and appear in italics.
# Comments are this researchers best friend as they allow me to understand what I did when I come back to a script after a few days or weeks or ...
# They are also helpful for explaining your logic/process for others that you share your code with.
# Main point: comment often, comment well
NOTE: If you want to run another install packages command for a different package (e.g. ggplot2) you can simply hit the up arrow to pull up a previous command on the current command line. You can then edit the name of the package and hit enter to run the command as normal. Becuase of this rather convenient feature you should NOT attempt to use the arrow keys to navigate through the command console output (say you had R return a long list and you want to go back and take a look at it). Instead use the scroll bar.
Now lets check out a few ways to access R help. While it is normal to conduct web searches to assist with writing code to solve complex problems in many cases (e.g. wrong syntax used, can’t remember specific function arguments that need to be referenced) a simple review of the official R Documentation can help you pinpoint and correct the issue.
To access the R Documentation on how to use a function we can use the following:
?
followed by the function name??
followed by the topic or help.search("topic")
Try the following one at a time.
?read.csv
?hist
??csv
??histogram
Notice the small window in the lower right-hand corner of your screen? Here we can see the Help documentation related to these functions/topics.
Remember the function name but can’t remember what the arguments to that function are or how they are structured/ordered? In the command console simply start typing in the function name and hover your mouse over the little pop-up window for the function you want, or put in the parentheses after the function name and check out the little yellow pop-up window before you continue typing any arguments.
I personally find the auto-suggestion, auto-fill, and function argument pop-up window to be the most useful parts of the R Studio interface and use them every time I work in R.
Of course there are numerous online sources of R help as well. Let’s start with the more official sources of online help.
Note: I’m cheating a bit here by saying this is online help. While you will typically access this info online, for packages you have installed most of this information should actually exist in your R library on your computer.
You should now see a page that provides: the name of the package along with a brief description, information on the latest version and when it was published, the authors of the package, the website, a link to the reference manual, and links to a set of vignettes amoung other things.
If you are looking for a package that does a specific thing (e.g. multilevel modeling) and easy way to find one is to search CRAN (using the search menu accessible from the CRAN homepage) by that topic.
There are also numerous google groups dedicated to specific R packages and google searches are generally fruitful provided you specify that you are searching for an answer to an R problem, provide plenty of detail, and try out different search terms.
Thus far we have used only the R command console to do calculations and assign data. However, if we want to be able to go back and reproduce a set of calculations or want to share our code with others we will want a way to save and store and modify the R commands and operations we run. The simplest way to do this is using an R script. An R script is simply a text document (with a .R file ending) in which we write out all the R commands we want to run and descriptive comments about these commands. In R Studio we can then select and run commands from the script.
Easy peasy, right?
OK, so what is an R Project?
An R Project is essentially a special R folder that holds your R materials related to a specific topic. Technically, it a working directory, a file path on your computer that maps where information should be coming from and going to that is desgnated with a .RProj file. In R you work out of a single active working directory at a time. So if we want to pull data files from other folders on our computer we have to specify the full path (e.g.”C:/Users/jane/Documents/GEOG 728/mydata.csv”*) instead of just writing out the file name “mydata.csv”. I’ll show you how to check and set your working directory next week, but for simply know that using a project simplifies things a bit by streamlining and automating the working directory references in R.
Your R Projects can hold R Scripts, R Markdown files, data, text files, images, etc… and you can easily organize these items in an R Project by creating sub folders.
Now the final, and perhaps most terrifying, R file type. R Markdown! * Much of this section comes from Jonathan Gilligan’s great R Markdown Intro which can be found here.
R Markdown is a fantastic tool that allows us to combine text and chunks of R code and their output into a single document. These documents are stored as .Rmd files and use a plain text formatting system called Markdown to create pretty documents that can be converted to HTML documents or web pages, PDFs, or even MS Word documents. The big advantage here is that we can produce not just cool looking documents, but we can also integrate instructions or background information with our actual analysis and results. Making it easier for others to understand what we did, how we did it, and what we found.
Why did I say it can be terrifying? Am I just being overly dramatic?
Well, yes, but also because R Markdown files can be somewhat intimidating when you first start working with them as you have to handle both Markdown text formatting and R code as well as the various organizational and formatting rules for integrating the two. But don’t let that worry you too much, we’ll start simple and there are plenty of resources out there for those of you who want to get real fancy.
As proof that it’s not so terrifying after all, this tutorial was made using R Markdown. Later on I’ll show you where you can download the .Rmd version to use as a template for your own work if you so desire.
At the top of your new .Rmd you should see something like the following.
---
title: "My First Markdown"
author: "Kate Nelson"
date: "1/9/2020"
output: html_document
---
This is the document header information. All the information here, except for the output setting will be printed at the top of the final document you create from this file. To turn an RMarkdown document into an HTML, PDF, or Microsoft Word document, you just click on the “Knit” button in RStudio. If you click on the word “Knit” on the button, RStudio will turn the RMarkdown document into the default format specified in the header information under output:. To knit the document into a different output format, click on the arrow just to the right of the word “Knit,” and select the output format you want.
Just below the header information you will see an R code chunk called R setup. Code chunks can be distinguished by the ``` at the beginning and end of the chunk. Code chunks also have a slightly different background color as text. This code chunk holds information about how we want to run and process R code in the .Rmd. So we could load package libraries here, etc…
Any line of text that begins with one or more hash symbols (“#”) and is preceded by a blank line is treated as a section header. (In code chunks the “#” character still denotes a comment.) Top-level section headers have a single hash, and subsections, subsubsections, etc. use two, three, etc. hashes.
# This is a top-level section header
## This is a subsection
### This is a subsubsection
# This is another section
Any block of one or more lines of text, with a blank line before and a blank line after is treated as a single paragraph. To separate paragraphs, put a blank line between them:
This is one paragraph.
It stretches over several consecutive lines,
but it will be formatted
as a single paragraph.
This is another paragraph.
The blank line between the two
blocks of text tells Markdown
that they are separate paragraphs
To make italic text and boldface text you surround the text with underscores or asterisks. A single underscore or asterisk means italic, two means boldface, and three means both italic and boldface:
This is _italic text_. This is *also italic text*. __This is boldface__ and
**so is this**. ***This is bold italic***. This is ~~strikethrough~~, perhaps
to indicate an error.
You can make bulleted or numbered lists easily in Markdown. Simply begin a line with an asterisk, hyphen, or plus sign. Numbered lists can be made by using a number followed by a period.To make a sub-list, just indent the lines of the sublist by four spaces.
* This is a list
* This is the second item of the list.
* This is a sub-list
A list item can have several paragraphs. Just ident the continuation
by four additional spaces and do not begin it with an asterisk.
If you have multiple lines with no blank line separating them,
Markdown treats them as a single paragraph.
* This is the main list again.
Just as with other things, you can break a single list item into several lines,
and as long as there is no blank line between them, Markdown knows to treat
them as a single paragraph.
As I said before, one of the best reasons to use R Markdown documents is because we can integrate text with R computation. To enter R expressions in an RMarkdown document, we use “code blocks” or “inline code”. Code blocks are useful if we are doing a calculation, and inline code is useful if you just want to insert a number (maybe one you have calculated in a code block) into the middle of a line of text.
Code blocks begin and end with three consecutive “back-tick” characters:
```{r code_block_name, options}
# code goes here, for example
1+1
```
Each code chunk needs to have a unique name and we can specify options that will: show or hide the code, show or hide the results, evaluate the code or not, etc…
We can assign numbers to variables and do computations in a code chunk much as we would do in an R Script. If we ask the R code to return an object it will be printed immediately below the code chunk as seen below.
sigma <- 5.67E-8
I_solar <- 1350 # watts per square meter
albedo <- 0.3
I_absorbed <- I_solar * (1 - albedo)
T <- (I_absorbed / (4 * sigma))^0.25
T
## [1] 254.0664
Code blocks can also be used to include tables and graphs in our document.
Inline code appears between single back-ticks `r 1+1`
Just as a code chunk start by specifying that R
language should be applied by writing the character “r” in the chunk header information, inline code starts by specifying “r” then what should be run using that R
.
We can use inline code to print the value of an R expression in the middle of a line of text. For example using inline code T = `r T`
will give T = 254.0663741. Or we could write T = `r (I_absorbed / (4 * sigma))^0.25`
to return 254.0663741.
Inline code is very useful because it lets us ask R to automatically insert a number into the text. This means that every time we knit the document, that number is generated by R. If you write a report and then realize that there was a problem with the data or the analysis, you can just fix the problem and re-knit the report using the corrected data and analysis code. You don’t need to manually go through a separate document and edit the numbers to update them with the latest results from your analysis. RMarkdown will do that for you, if you used inline code to insert the numbers into your text.
Thus, a bit of extra work at the beginning to set up the document and analysis using RMarkdown saves lots of time later on by making it trivial to update the document.
There is MUCH MUCH more that can be done with Markdown formatting including adding images and hyperlinks or even tabs (like in this tutorial) that you can explore on your own using the following references:
You can also pull up the Markdown Quick Reference guide in the help window of the R Studio GUI. Go to Help –> Markdown Quick Reference
For now let’s wrap up by trying to actually convert our markdown file into an html file.
A html version of your .Rmd should have popped up in a second window. How does it look?
Full Disclosure *Much of this section comes from Jonathan Gilligan’s introduction to Reproducible Research which can be found here.
As you edit both your text and the R scripts you use for your analysis, it is valuable to be able to keep track of changes. For instance, if your analysis is working well, and then you edit something and it stops working it is useful to be able to go back and look at what changed between the time when it was working and when it stopped working.
Version (or revision) control systems allow you to easily keep track of changes you make to your files, and it is increasingly becoming a standard practice to use version control in data analytics and computationally intensive research.
We will be using a revision control tool called git
. Not only can we use GIT version control in R via our R Projects, but there is a web site called github.com, which allows people to store their version controlled projects in the cloud and serves as a platform for sharing of and collaboration on projects. I personally use github to release scripts and models developed in my research, and there is an educational site connected to github, which we will use for some class assignments later in the semester.
For now though, we will start by using local version control through R.
You should now have in the upper-right window a tab that says Git. This window will show you a list of all the files in your R PRoject that have been modified. If we click on the Commit button we should be able to see what has been changed in each document as well as a history of these changes. Right now you won’t see anything becuase you haven’t changed any files in your R Project since we enabled version control.
If you examine your markdown file in the commit window you should now see some text in the preview that is highlighted in green. The git version control is telling us that the only change made was to add this text. (Deletions are shown in red.) If we think that this is a good and significant change to keep we should type a commit message like “added some random text” and click the “Commit” button. This will ask git to remember what the document looked like at this exact moment in our revisions. Say I changed a code chunk and somehow break everything. Instead of trying to remember exactly what it looked like before I messed it up I could just go back to the History section in the git commit window and restore that earlier version of my document. Of course, this means that I actually need to commit my changes frequently (not just save the document).
Whew, that was a lot of information. Finally, we’re ready to get cooking with some data. But wait! First we need to get some data into R.
x <- 6
y <- 8
n <- 1:10
let <- LETTERS[1:10]
z <- c(1, 1, 2, 3, 5, NA, 13)
let_df <- data.frame(n, let)
# I'm pulling heavily from http://www.cookbook-r.com/Basics/Information_about_variables/ for this
So what did we just do? We just created new numerical data objects called “x” and “y” with a value of 6 and 8, a new vector data object called “n” with integer values 1 through 10, a new vector called “let” that includes the first 10 letters of the alphabet, a list called “z” that includes some numbers and a NULL or missing data value, and a data frame that combines the two vectors “n” and “let”.
Some Important Notes
<-
is an operator that assigns a value, vector, dataframe, etc… to the thing that is pointed to
:
is an operator that identifies a sequence. It can be used to create a new sequence or to reference a sequence of values or columns, etc… within an existing dataset.
LETTERS
is a set of constants built into base R, namely the 26 letters of the alphabet in upper-case. Check ?Constants for other constants available.
Let’s check to make sure these variables really are what we think they are by asking R to return their values.
x
## [1] 6
y
## [1] 8
n
## [1] 1 2 3 4 5 6 7 8 9 10
let
## [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J"
z
## [1] 1 1 2 3 5 NA 13
let_df
## n let
## 1 1 A
## 2 2 B
## 3 3 C
## 4 4 D
## 5 5 E
## 6 6 F
## 7 7 G
## 8 8 H
## 9 9 I
## 10 10 J
So far so good. Now let’s perform some basic operations on these variables.
x + y
## [1] 14
y > 2
## [1] TRUE
6 + 8
## [1] 14
log(6)
## [1] 1.791759
n*2
## [1] 2 4 6 8 10 12 14 16 18 20
max(let)
## [1] "J"
sum(z)
## [1] NA
sum(z, na.rm=T)
## [1] 25
Great! Notice that null values will throw off most calculations (and most real datasets have nulls) so we need to specify what to do with them in the function if we want a value returned. Often we choose to remove or ignore nulls using na.rm=TRUE
, but this is not always the appropriate choice.
Now on to bigger and better things… lets try one of the most basic and important tasks in R: importing data. Yay!
R is wonderfully flexible with the types of data you can import and work with. Got a csv or text file? Base R can handle that. What about an excel file or shapefile?
read.csv
function. Download the file iris.csv from Canvas into your R project folder then try loading it into R. Remember that you can use ?read.csv
to get help on how to use this function.You should now see a table of information from the iris dataset printed out on your console. In this case R has read the data and shown us what it found, but becuase we have not assigned it to a data object it was not retained in the R working environment. Let’s try again, this time assign the dataset to a data object using the <-
operator.
d<-read.csv("iris.csv", stringsAsFactors = FALSE)
OK. Now let’s see what we can find out about this dataset. Let’s take a quick look at the data by simply asking R to return the dataset.
d
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5.0 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
## 11 5.4 3.7 1.5 0.2 setosa
## 12 4.8 3.4 1.6 0.2 setosa
## 13 4.8 3.0 1.4 0.1 setosa
## 14 4.3 3.0 1.1 0.1 setosa
## 15 5.8 4.0 1.2 0.2 setosa
## 16 5.7 4.4 1.5 0.4 setosa
## 17 5.4 3.9 1.3 0.4 setosa
## 18 5.1 3.5 1.4 0.3 setosa
## 19 5.7 3.8 1.7 0.3 setosa
## 20 5.1 3.8 1.5 0.3 setosa
## 21 5.4 3.4 1.7 0.2 setosa
## 22 5.1 3.7 1.5 0.4 setosa
## 23 4.6 3.6 1.0 0.2 setosa
## 24 5.1 3.3 1.7 0.5 setosa
## 25 4.8 3.4 1.9 0.2 setosa
## 26 5.0 3.0 1.6 0.2 setosa
## 27 5.0 3.4 1.6 0.4 setosa
## 28 5.2 3.5 1.5 0.2 setosa
## 29 5.2 3.4 1.4 0.2 setosa
## 30 4.7 3.2 1.6 0.2 setosa
## 31 4.8 3.1 1.6 0.2 setosa
## 32 5.4 3.4 1.5 0.4 setosa
## 33 5.2 4.1 1.5 0.1 setosa
## 34 5.5 4.2 1.4 0.2 setosa
## 35 4.9 3.1 1.5 0.2 setosa
## 36 5.0 3.2 1.2 0.2 setosa
## 37 5.5 3.5 1.3 0.2 setosa
## 38 4.9 3.6 1.4 0.1 setosa
## 39 4.4 3.0 1.3 0.2 setosa
## 40 5.1 3.4 1.5 0.2 setosa
## 41 5.0 3.5 1.3 0.3 setosa
## 42 4.5 2.3 1.3 0.3 setosa
## 43 4.4 3.2 1.3 0.2 setosa
## 44 5.0 3.5 1.6 0.6 setosa
## 45 5.1 3.8 1.9 0.4 setosa
## 46 4.8 3.0 1.4 0.3 setosa
## 47 5.1 3.8 1.6 0.2 setosa
## 48 4.6 3.2 1.4 0.2 setosa
## 49 5.3 3.7 1.5 0.2 setosa
## 50 5.0 3.3 1.4 0.2 setosa
## 51 7.0 3.2 4.7 1.4 versicolor
## 52 6.4 3.2 4.5 1.5 versicolor
## 53 6.9 3.1 4.9 1.5 versicolor
## 54 5.5 2.3 4.0 1.3 versicolor
## 55 6.5 2.8 4.6 1.5 versicolor
## 56 5.7 2.8 4.5 1.3 versicolor
## 57 6.3 3.3 4.7 1.6 versicolor
## 58 4.9 2.4 3.3 1.0 versicolor
## 59 6.6 2.9 4.6 1.3 versicolor
## 60 5.2 2.7 3.9 1.4 versicolor
## 61 5.0 2.0 3.5 1.0 versicolor
## 62 5.9 3.0 4.2 1.5 versicolor
## 63 6.0 2.2 4.0 1.0 versicolor
## 64 6.1 2.9 4.7 1.4 versicolor
## 65 5.6 2.9 3.6 1.3 versicolor
## 66 6.7 3.1 4.4 1.4 versicolor
## 67 5.6 3.0 4.5 1.5 versicolor
## 68 5.8 2.7 4.1 1.0 versicolor
## 69 6.2 2.2 4.5 1.5 versicolor
## 70 5.6 2.5 3.9 1.1 versicolor
## 71 5.9 3.2 4.8 1.8 versicolor
## 72 6.1 2.8 4.0 1.3 versicolor
## 73 6.3 2.5 4.9 1.5 versicolor
## 74 6.1 2.8 4.7 1.2 versicolor
## 75 6.4 2.9 4.3 1.3 versicolor
## 76 6.6 3.0 4.4 1.4 versicolor
## 77 6.8 2.8 4.8 1.4 versicolor
## 78 6.7 3.0 5.0 1.7 versicolor
## 79 6.0 2.9 4.5 1.5 versicolor
## 80 5.7 2.6 3.5 1.0 versicolor
## 81 5.5 2.4 3.8 1.1 versicolor
## 82 5.5 2.4 3.7 1.0 versicolor
## 83 5.8 2.7 3.9 1.2 versicolor
## 84 6.0 2.7 5.1 1.6 versicolor
## 85 5.4 3.0 4.5 1.5 versicolor
## 86 6.0 3.4 4.5 1.6 versicolor
## 87 6.7 3.1 4.7 1.5 versicolor
## 88 6.3 2.3 4.4 1.3 versicolor
## 89 5.6 3.0 4.1 1.3 versicolor
## 90 5.5 2.5 4.0 1.3 versicolor
## 91 5.5 2.6 4.4 1.2 versicolor
## 92 6.1 3.0 4.6 1.4 versicolor
## 93 5.8 2.6 4.0 1.2 versicolor
## 94 5.0 2.3 3.3 1.0 versicolor
## 95 5.6 2.7 4.2 1.3 versicolor
## 96 5.7 3.0 4.2 1.2 versicolor
## 97 5.7 2.9 4.2 1.3 versicolor
## 98 6.2 2.9 4.3 1.3 versicolor
## 99 5.1 2.5 3.0 1.1 versicolor
## 100 5.7 2.8 4.1 1.3 versicolor
## 101 6.3 3.3 6.0 2.5 virginica
## 102 5.8 2.7 5.1 1.9 virginica
## 103 7.1 3.0 5.9 2.1 virginica
## 104 6.3 2.9 5.6 1.8 virginica
## 105 6.5 3.0 5.8 2.2 virginica
## 106 7.6 3.0 6.6 2.1 virginica
## 107 4.9 2.5 4.5 1.7 virginica
## 108 7.3 2.9 6.3 1.8 virginica
## 109 6.7 2.5 5.8 1.8 virginica
## 110 7.2 3.6 6.1 2.5 virginica
## 111 6.5 3.2 5.1 2.0 virginica
## 112 6.4 2.7 5.3 1.9 virginica
## 113 6.8 3.0 5.5 2.1 virginica
## 114 5.7 2.5 5.0 2.0 virginica
## 115 5.8 2.8 5.1 2.4 virginica
## 116 6.4 3.2 5.3 2.3 virginica
## 117 6.5 3.0 5.5 1.8 virginica
## 118 7.7 3.8 6.7 2.2 virginica
## 119 7.7 2.6 6.9 2.3 virginica
## 120 6.0 2.2 5.0 1.5 virginica
## 121 6.9 3.2 5.7 2.3 virginica
## 122 5.6 2.8 4.9 2.0 virginica
## 123 7.7 2.8 6.7 2.0 virginica
## 124 6.3 2.7 4.9 1.8 virginica
## 125 6.7 3.3 5.7 2.1 virginica
## 126 7.2 3.2 6.0 1.8 virginica
## 127 6.2 2.8 4.8 1.8 virginica
## 128 6.1 3.0 4.9 1.8 virginica
## 129 6.4 2.8 5.6 2.1 virginica
## 130 7.2 3.0 5.8 1.6 virginica
## 131 7.4 2.8 6.1 1.9 virginica
## 132 7.9 3.8 6.4 2.0 virginica
## 133 6.4 2.8 5.6 2.2 virginica
## 134 6.3 2.8 5.1 1.5 virginica
## 135 6.1 2.6 5.6 1.4 virginica
## 136 7.7 3.0 6.1 2.3 virginica
## 137 6.3 3.4 5.6 2.4 virginica
## 138 6.4 3.1 5.5 1.8 virginica
## 139 6.0 3.0 4.8 1.8 virginica
## 140 6.9 3.1 5.4 2.1 virginica
## 141 6.7 3.1 5.6 2.4 virginica
## 142 6.9 3.1 5.1 2.3 virginica
## 143 5.8 2.7 5.1 1.9 virginica
## 144 6.8 3.2 5.9 2.3 virginica
## 145 6.7 3.3 5.7 2.5 virginica
## 146 6.7 3.0 5.2 2.3 virginica
## 147 6.3 2.5 5.0 1.9 virginica
## 148 6.5 3.0 5.2 2.0 virginica
## 149 6.2 3.4 5.4 2.3 virginica
## 150 5.9 3.0 5.1 1.8 virginica
Wow! That pretty much filled up your console with a bunch of numbers. Let’s try using the head()
function to get just a summary of the first few records for the dataset.
head(d)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
head(d, 10)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5.0 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
10
in the function head()
mean?What if I want to know how many rows or columns (or both) are in a dataset?
nrow()
, ncol()
, and dim()
functions.nrow(d)
## [1] 150
ncol(d)
## [1] 5
dim(d)
## [1] 150 5
By the way, the iris
dataset is a pretty famous dataset in R land. It’s actually one of the datasets provided with base R via the package “datasets()” to allow for simple illustrations of R functions and is widely used for demonstrations and worked examples for many of the more widely used R packages such as dplyr and ggplot2. Want to know how to access these mysterious, freely provided datasets?
-Just type data()
You should now have a second tab open in your main R-Studio window called “R data sets”. You can pull any of these datasets into your working environment in R using the command data(dataset name)
.
-Try assigning one of these datasets to a new data object called ds using the name of the dataset provided in the “R data sets” tab. - Then take a look at the data, get a summary of the first 5 records, and gather information on its dimensions, number of rows and number of columns.
For example for the mtcars
dataset I get:
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## [1] 32 11
## [1] 32
## [1] 11