Abstract
Part of theR for Artists and Designers
course at the
School of Foundation Studies, Srishti Manipal Institute of Art, Design,
and Technology, Bangalore.
At the end of this Lab, we will:
This guide will lead you through the steps to install and use R, a free and open-source software environment for statistical computing and graphics.
What is R?
>
)What is RStudio?
Our end goal is to get you looking at a screen like this:
Install R from CRAN, the Comprehensive R Archive Network. Please choose a precompiled binary distribution for your operating system.
Launch R. You should see one console with a command line interpreter
(>
). Close R.
Install the free, open-source edition of RStudio: http://www.rstudio.com/products/rstudio/download/
RStudio provides a powerful user interface for R, called an integrated development environment. RStudio includes:
>
),Launch RStudio. You should get a window similar to the screenshot you see here, but yours will be empty. Look at the bottom left pane: this is the same console window you saw when you opened R in step 1.15.
>
and type
x <- 2 + 2
, hit enter or return, then type
x
, and hit enter/return again.[1] 4
prints to the screen, you have successfully
installed R and RStudio, and you can move onto installing packages.The version of R that you just downloaded is considered base R, which provides you with good but basic statistical computing and graphics powers. For analytical and graphical super-powers, you’ll need to install add-on packages, which are user-written, to extend/expand your R capabilities. Packages can live in one of two places:
install.packages("name_of_package", dependencies = TRUE)
in
your CONSOLE.Place your cursor in the CONSOLE again (where you last typed
x
and [4]
printed on the screen). You can use
the first method to install the following packages directly from CRAN,
all of which we will use:
To install a package, you put the name of the package
in quotes as in
install.packages("name_of_package")
. Mind your use of
quotes carefully with packages.
To use an already installed package, you must load it
first, as in library(name_of_package)
, leaving the name of
the package bare. You only need to do this once per
RStudio session.
You can download all of these at once, too:
install.packages(c("knitr", "dplyr", "ggplot2", "babynames"), dependencies = TRUE)
A brief aside: c()
is a command in R that allows us to
combine things into a vector
( one of the ways data is
represented in R)
c("hello", "my", "name", "is", "arvind")
## [1] "hello" "my" "name" "is" "arvind"
c(1:3, 20, 50)
## [1] 1 2 3 20 50
help(name_of_package)
or
?name_of_package
.citation("name_of_package")
.install.packages("dplyr", dependencies = TRUE)
library(dplyr)
help("dplyr")
citation("ggplot2")
##
## To cite ggplot2 in publications, please use:
##
## H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.
##
## A BibTeX entry for LaTeX users is
##
## @Book{,
## author = {Hadley Wickham},
## title = {ggplot2: Elegant Graphics for Data Analysis},
## publisher = {Springer-Verlag New York},
## year = {2016},
## isbn = {978-3-319-24277-4},
## url = {https://ggplot2.tidyverse.org},
## }
The webpage you are looking at is derived from a
R Markdown
doc that you can download, edit and compute
with. We will meet R Markdown
in the next class.
Download this .Rmd file using the Code->Download Rmd
button at the top right corner.
Change the author name to your own!
Hit the green “play” button to run this “load_packages” chunk to include in your R session all the installed packages you need:
Let us greet our data first !!
glimpse(babynames) # dplyr
Rows: 1,924,665
Columns: 5
$ year <dbl> 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1…
$ sex <chr> "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", …
$ name <chr> "Mary", "Anna", "Emma", "Elizabeth", "Minnie", "Margaret", "Ida", "Alice", "Bertha", "Sarah", "Annie", "Clara", "El…
$ n <int> 7065, 2604, 2003, 1939, 1746, 1578, 1472, 1414, 1320, 1288, 1258, 1226, 1156, 1063, 1045, 1040, 1012, 995, 982, 949…
$ prop <dbl> 0.07238359, 0.02667896, 0.02052149, 0.01986579, 0.01788843, 0.01616720, 0.01508119, 0.01448696, 0.01352390, 0.01319…
head(babynames) # base R
# A tibble: 6 × 5
year sex name n prop
<dbl> <chr> <chr> <int> <dbl>
1 1880 F Mary 7065 0.0724
2 1880 F Anna 2604 0.0267
3 1880 F Emma 2003 0.0205
4 1880 F Elizabeth 1939 0.0199
5 1880 F Minnie 1746 0.0179
6 1880 F Margaret 1578 0.0162
tail(babynames) # same
# A tibble: 6 × 5
year sex name n prop
<dbl> <chr> <chr> <int> <dbl>
1 2017 M Zyhier 5 0.00000255
2 2017 M Zykai 5 0.00000255
3 2017 M Zykeem 5 0.00000255
4 2017 M Zylin 5 0.00000255
5 2017 M Zylis 5 0.00000255
6 2017 M Zyrie 5 0.00000255
names(babynames) # same
[1] "year" "sex" "name" "n" "prop"
If you have done the above and produced sane-looking output, you are
ready for the next bit. Use the code below to create a new data frame
called arvind
.
my_name_data <- babynames %>%
filter(name == "Arvind" | name == "Aravind") %>%
filter(sex == "M")
The first bit makes a new dataset called
my_name_data
that is a copy of the babynames
dataset- the %>%
tells you we are doing some other stuff
to it later.
The second bit filters
our babynames
to
only keep rows where the name
is either Arvind or Aravind
(read |
as “or”.)
The third bit applies another filter
to keep only
those where sex is male.
Let’s check out the data.
my_name_data
# A tibble: 61 × 5
year sex name n prop
<dbl> <chr> <chr> <int> <dbl>
1 1970 M Arvind 5 0.00000262
2 1972 M Arvind 8 0.00000478
3 1975 M Arvind 7 0.00000431
4 1976 M Arvind 5 0.00000306
5 1977 M Arvind 9 0.00000526
6 1978 M Arvind 6 0.00000351
7 1979 M Arvind 7 0.00000391
8 1980 M Arvind 6 0.00000323
9 1981 M Arvind 8 0.0000043
10 1982 M Arvind 6 0.00000318
# … with 51 more rows
glimpse(my_name_data)
Rows: 61
Columns: 5
$ year <dbl> 1970, 1972, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1…
$ sex <chr> "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", …
$ name <chr> "Arvind", "Arvind", "Arvind", "Arvind", "Arvind", "Arvind", "Arvind", "Arvind", "Arvind", "Arvind", "Arvind", "Arvi…
$ n <int> 5, 8, 7, 5, 9, 6, 7, 6, 8, 6, 7, 7, 7, 13, 8, 11, 6, 8, 12, 10, 17, 6, 14, 21, 21, 6, 20, 5, 24, 10, 25, 8, 26, 15,…
$ prop <dbl> 2.620e-06, 4.780e-06, 4.310e-06, 3.060e-06, 5.260e-06, 3.510e-06, 3.910e-06, 3.230e-06, 4.300e-06, 3.180e-06, 3.760…
Again, if you have sane-looking output here, move along to plotting the data!
plot <- ggplot(my_name_data, aes(x = year,
y = prop,
group = name,
color = name)) +
geom_line()
Now if you did this right, you will not see your plot!
Because we saved the ggplot
with a name
(plot
), R just saved the object for you. But check out the
top right pane in RStudio again: under the Environment
pane
you should see plot
, so it is there, you just have to ask
for it. Here’s how:
plot
Edit my code above to create a new dataset. Pick 2 names to compare
how popular they each are (these could be different spellings of your
own name, like I did, but you can choose any 2 names that are present in
the dataset). Make the new plot, changing the name of the first argument
arvind
in ggplot()
to the name of your new
dataset.