+ - 0:00:00
Notes for current slide
Notes for next slide

Background? R and Rstudio installed? What do you know about R?

Advanced R for Econometricians

Introduction

Martin C. Arnold, Jens Klenke

1 / 31

Preliminaries

Contact Information:

Slides, exercises and announcements will be provided on:


The password for the course is "Hadley".


2 / 31

Background? R and Rstudio installed? What do you know about R?

Grading

  • There will be two graded assignments.

  • You will have to work on a final group project which focuses on one or more of the topics discussed in the course.

  • The final grade is a weighted average of the grades for the group project (60%), its presentation (20%) and the assignments (20%).

3 / 31

Outline

  1. Introduction
  2. Prerequisites
    • Rmarkdown
    • Git and Github
  3. Data Visualisation with ggplot
  4. Data Wrangling and Transformation
    • dplyr
    • Databases
  1. Advanced R Programming
    • Advanced Concepts
    • Functional Programming
    • Object Oriented Programming
  2. Performance
    • Profiling and Benchmarking
    • Improving Performance
    • Rcpp
    • RcppArmadillo
  3. Web Scraping

Additional Topics for self-study

  1. data.table
  2. Shiny
  3. Writing Packages
4 / 31

About R

  • Advantages:
    • R is a language and environment for statistical computing and graphics.
    • R is highly extensible. There is a massive set of packages for statistical modelling, machine learning, visualisation, as well as importing and manipulating data. Researchers in statistics and machine learning often publish an R package to accompany their articles.
    • R is free and open source.
    • R provides tools for communicating your results. R packages make it easy to produce html or pdf reports, or create interactive websites.
    • R is designed to connect to high-performance programming languages like C, Fortran, and C++.
5 / 31

About R

  • Advantages:

    • R is a language and environment for statistical computing and graphics.
    • R is highly extensible. There is a massive set of packages for statistical modelling, machine learning, visualisation, as well as importing and manipulating data. Researchers in statistics and machine learning often publish an R package to accompany their articles.
    • R is free and open source.
    • R provides tools for communicating your results. R packages make it easy to produce html or pdf reports, or create interactive websites.
    • R is designed to connect to high-performance programming languages like C, Fortran, and C++.
  • Disadavantages:

    • It is not very fast.
    • It has an inconsistent syntax.
5 / 31

Sources

6 / 31

Packages

  • In R the fundamental unit of shareable code is the package.
  • Chances are that someone has already solved a problem that you’re working on, and you can benefit from their work by downloading their package.

The main sources for packages are:

  • CRAN (Comprehensive R Archive Network) with as of now ca. 15.000 available packages
  • GitHub (that is the place where packages are developed before getting published on CRAN)
  • Bioconductor
7 / 31

Installing Packages

How to install packages from CRAN?

  • Use install.packages("MASS") to install the package MASS from CRAN.
  • Or using RStudio: Tools → Install Packages → MASS
8 / 31

Installing Packages

How to install packages from CRAN?

  • Use install.packages("MASS") to install the package MASS from CRAN.
  • Or using RStudio: Tools → Install Packages → MASS

How to install packages from Github?

  • Install the package devtools from CRAN.
  • E.g., use devtools::install_github("tidymodels/broom") to install the broom package from Github.

  • If you want to find out how a package works, search for

    • a vignette by the package authors
    • a blog post about the package, e.g. on R Bloggers
8 / 31

Installing Packages

How to install packages from CRAN?

  • Use install.packages("MASS") to install the package MASS from CRAN.
  • Or using RStudio: Tools → Install Packages → MASS

How to install packages from Github?

  • Install the package devtools from CRAN.
  • E.g., use devtools::install_github("tidymodels/broom") to install the broom package from Github.

  • If you want to find out how a package works, search for

    • a vignette by the package authors
    • a blog post about the package, e.g. on R Bloggers

Example

install.packages("broom")
library(broom)
vignette("broom")
8 / 31

Which functionality does the broom package provide?

Getting Help

If you have specific questions about your code or an error message look on

The chances are high that somebody else has already asked the same question and it has already been solved.

9 / 31

Getting Help

If you have specific questions about your code or an error message look on

The chances are high that somebody else has already asked the same question and it has already been solved.

If you don't find an answer you can ask a question yourself. What sounds easy is harder then one would think. Most importantly you have to provide a reproducible example.

9 / 31

RStudio

  • RStudio is an integrated development environment (IDE) for R.
  • It provides some convenient functionality compared to the IDE that comes with R.
10 / 31

RStudio

  • RStudio is an integrated development environment (IDE) for R.
  • It provides some convenient functionality compared to the IDE that comes with R.

RStudio Projects

  • To work in a specific directory the base R way would be to use setwd(). This is problematic if you want to share your code or work from different machines on the same project.
  • Use projects instead.
10 / 31

RStudio

  • RStudio is an integrated development environment (IDE) for R.
  • It provides some convenient functionality compared to the IDE that comes with R.

RStudio Projects

  • To work in a specific directory the base R way would be to use setwd(). This is problematic if you want to share your code or work from different machines on the same project.
  • Use projects instead.

Workflow

  • Instead of opening a new R session and use setwd() you open the project. The working path is automatically set to the folder where the .Rproj file lives in.
10 / 31

Some Shortcuts

Shortcut Action
Alt + Shift + k list all shortcuts
Cmd/Ctrl + Shift + F10 restart R
Cmd/Ctrl + Shift + S rerun current script
F1 help page
Ctrl + l clears the console
11 / 31

Some Shortcuts

Shortcut Action
Alt + Shift + k list all shortcuts
Cmd/Ctrl + Shift + F10 restart R
Cmd/Ctrl + Shift + S rerun current script
F1 help page
Ctrl + l clears the console

RStudio supports:

  • package development
  • version control via git and svm
  • creation and compilation of dynamic documents
  • connections to external data bases
  • ...
11 / 31

Some Shortcuts

Shortcut Action
Alt + Shift + k list all shortcuts
Cmd/Ctrl + Shift + F10 restart R
Cmd/Ctrl + Shift + S rerun current script
F1 help page
Ctrl + l clears the console

RStudio supports:

  • package development
  • version control via git and svm
  • creation and compilation of dynamic documents
  • connections to external data bases
  • ...

We will use some of this later on.

11 / 31

R Basics

  • We start by recapitulating some basics you should be familiar with when working with R.
  • An in-depth treatment of this material can be found in the first chapter of Advanced R by Hadley Wickham.
12 / 31

R Basics: Data Types and Data Structures

We assume that you know the basic data structures in R.

Dimensions Homogeneous Heterogeneous
1 Atomic Vector List
2 Matrix Data Frame
n Array

13 / 31

R Basics: Data Types and Data Structures

We assume that you know the basic data structures in R.

Dimensions Homogeneous Heterogeneous
1 Atomic Vector List
2 Matrix Data Frame
n Array

You should also be familiar with the basic data types (character, double, integer, logical).

13 / 31

R Basics: Data Types and Data Structures

We assume that you know the basic data structures in R.

Dimensions Homogeneous Heterogeneous
1 Atomic Vector List
2 Matrix Data Frame
n Array

You should also be familiar with the basic data types (character, double, integer, logical).


Task

  1. How is a list different from an atomic vector?
  2. How is a matrix different from a data frame?
  3. How do you find out the type of an object?
  4. How can you check if an object is of type character?
13 / 31

Answers Task

  1. list can store different data types
  2. data frames can store different values with different types
  3. class()
  4. is.character()
object.size(1:1000)
object.size(as.numeric(1:1000))
char_vector <- c("Hi", "there")
typeof(char_vector)
class(char_vector)
num_matrix <- matrix(rnorm(4), ncol = 2)
typeof(num_matrix)
class(num_matrix)
is.numeric(num_matrix)
plus <- function(a, b){
if(!is.numeric(a)) stop("a is not numeric")
if(!is.numeric(b)) stop("b is not numeric")
a + b
}
plus(1, "a")

R Basics: Subsetting

You should know how to use the common subsetting operators [, [[ and $.

Have you ever encountered @?

14 / 31

R Basics: Subsetting

You should know how to use the common subsetting operators [, [[ and $.

Have you ever encountered @?


Task
  1. What is the result of subsetting a vector with positive integers, negative integers, a logical vector, or a character vector?
  2. What is the difference between [ , [[ , and $ when applied to a list?
  3. When should you use drop = FALSE ?
14 / 31
  1. [] = always returns object of same class [[]] = can extract one element from list or data frame list_a <- list(a = 2, b = "hallo") list_a["a"] list_a[["a"]] list_a$a

  2. [ selects sub-lists. It always returns a list; if you use it with a single positive integer, it returns a list of length one. [[ selects an element within a list. $ is a convenient shorthand: x$y is equivalent to x[["y"]].

  3. Use drop = FALSE if you are subsetting a matrix, array, or data frame and you want to preserve the original dimensions. You should almost always use it when subsetting inside a function.

## Drop
m1 <- matrix(1:4, nrow = 2)
m2 <- matrix(1:4, nrow = 2)
matrix_prod <- function(A, B){
B %*% t(A)
}
matrix_prod(m1[1, ], m2)
matrix_prod(m1[1, , drop = FALSE], m2)
## Example for S4 class
N <- 200
x.vec <- as.vector(fGarch::garchSim(fGarch::garchSpec(rseed = 1985), n = N)[,1])
garch11 <- fGarch::garchFit(~ garch(1,1), data = x.vec, trace = FALSE)
isS4(garch11)

R Basics: Control Flows

The basic control flows are conditionals (choices) and loops.

Task

  1. What is the difference between if and ifelse()?
  2. What will be the value of y in the following code if x is TRUE? What if x is FALSE? What if x is NA?

    y <- if(x) 3
  3. Name three kind of loops that can be implemented in R! How do they differ?
15 / 31
if("2") 3
if(2) 3
for(i in 1:3){
print(i)
}
i <- 1
while(i < 4){
print(i)
i <- i + 1
}
i <- 1
repeat{
print(i)
i <- i + 1
if(i > 3) break
}

R Basics: Functions

Task

  • Write a function that takes a number x and returns the square of x.
16 / 31

R Basics: Functions

Task

  • Write a function that takes a number x and returns the square of x.

Scoping

  • An important concept and also a source of trouble is scoping, the act of finding the value associated with a name.
  • R uses lexical scoping: it looks up the values of names based on how a function is defined, not how it is called.
16 / 31

R Basics: Functions

Task

  • Write a function that takes a number x and returns the square of x.

Scoping

  • An important concept and also a source of trouble is scoping, the act of finding the value associated with a name.
  • R uses lexical scoping: it looks up the values of names based on how a function is defined, not how it is called.

Example: lexical scoping

f1 <- function() x
x <- "global"
f2 <- function(){
x <- "local_f2"
f1()
}
f2()
16 / 31

Scoping Rules

R’s lexical scoping follows four primary rules:

  • Name masking
  • Functions versus variables
  • A fresh start
  • Dynamic lookup
17 / 31

Explaniation on the following slides

Scoping Rules: Name Masking

Example: What happens here?

x <- 3
square_1 <- function() x^2
square_1()
18 / 31

Scoping Rules: Name Masking

Example: What happens here?

x <- 3
square_1 <- function() x^2
square_1()


Example: And here?

x <- 3
square_2 <- function(){
x <- 2
foo <- function() x^2
foo()
}
square_2()
18 / 31

Scoping Rules: Name Masking

Example: What happens here?

x <- 3
square_1 <- function() x^2
square_1()


Example: And here?

x <- 3
square_2 <- function(){
x <- 2
foo <- function() x^2
foo()
}
square_2()
  • Names defined inside a function mask names defined outside a function.
18 / 31

Example 1 works fine x will be not overwritten Exmaple two -> x will be overwritten in the function

Scoping Rules: Functions versus Variables

Example: Let's make things more confusing. Can you predict the result?

x <- 3
square_2 <- function(){
x <- 2
foo <- function() x^2
foo()
}
square_3 <- function(x){
square_2 <- 5
square_2()
}
square_3()
19 / 31

Scoping Rules: Functions versus Variables

Example: Let's make things more confusing. Can you predict the result?

x <- 3
square_2 <- function(){
x <- 2
foo <- function() x^2
foo()
}
square_3 <- function(x){
square_2 <- 5
square_2()
}
square_3()
  • When a function and a non-function share the same name, R ignores non-function objects in a function call.
19 / 31

Scoping Rules: A Fresh Start

Example: What is the result of the first call to square_4()? What will happen the second time?

x <- 2
square_4 <- function(){
(x <- x^2)
}
square_4()
square_4()
20 / 31

Scoping Rules: A Fresh Start

Example: What is the result of the first call to square_4()? What will happen the second time?

x <- 2
square_4 <- function(){
(x <- x^2)
}
square_4()
square_4()
  • Every time a function is called a new environment is created to host its execution.
  • A function has no way to tell what happened the last time it was run; each invocation is completely independent.
20 / 31

Scoping Rules: Dynamic Lookup

Example: Predict the result of the two calls to f1(). Compare it to the main rule of lexical scoping.

x <- 1
f1 <- function() x
f1()
x <- 2
f1()
21 / 31

Scoping Rules: Dynamic Lookup

Example: Predict the result of the two calls to f1(). Compare it to the main rule of lexical scoping.

x <- 1
f1 <- function() x
f1()
x <- 2
f1()
  • R looks for values when the function is run, not when the function is created.
21 / 31

Lazy Evaluation

What could cause an error here?

lazy_function <- function(x){
10
}
lazy_function()
22 / 31

Lazy Evaluation

What could cause an error here?

lazy_function <- function(x){
10
}
lazy_function()

Why does this not induce an error?

22 / 31

Lazy Evaluation

What could cause an error here?

lazy_function <- function(x){
10
}
lazy_function()

Why does this not induce an error?

Function arguments are lazily evaluated; they’re only evaluated if accessed.

lazy_function <- function(x){
x
}
lazy_function()
22 / 31

Style Guide

Object names

  • Variable and function names should be lowercase.
  • Use an underscore (_) to separate words within a name (as an alternative you can use camel case, but be consistent).
  • Variable names should be nouns.
  • Function names should be verbs.
  • Strive for names that are concise and meaningful (this is not easy!).
  • Where possible, avoid using names of existing functions and variables.
23 / 31

Style Guide

Object names

  • Variable and function names should be lowercase.
  • Use an underscore (_) to separate words within a name (as an alternative you can use camel case, but be consistent).
  • Variable names should be nouns.
  • Function names should be verbs.
  • Strive for names that are concise and meaningful (this is not easy!).
  • Where possible, avoid using names of existing functions and variables.

Good

day_one
day_1
23 / 31

Style Guide

Object names

  • Variable and function names should be lowercase.
  • Use an underscore (_) to separate words within a name (as an alternative you can use camel case, but be consistent).
  • Variable names should be nouns.
  • Function names should be verbs.
  • Strive for names that are concise and meaningful (this is not easy!).
  • Where possible, avoid using names of existing functions and variables.

Good

day_one
day_1

Bad

# Bad
first_day_of_the_month
dayone
djm1
T <- FALSE
c <- 10
mean <- function(x) sum(x)
23 / 31

Spacing

  • Place spaces around all infix operators (=, +, -, <-, etc.) and = in a function call.
  • Always put a space after a comma, and never before.

Good

average <- mean(feet / 12 + inches, na.rm = TRUE)

Bad

average<-mean(feet/12+inches,na.rm=TRUE)


24 / 31

Spacing

  • Place spaces around all infix operators (=, +, -, <-, etc.) and = in a function call.
  • Always put a space after a comma, and never before.

Good

average <- mean(feet / 12 + inches, na.rm = TRUE)

Bad

average<-mean(feet/12+inches,na.rm=TRUE)


  • An exception to this rule are :, :: and :::.

Good

x <- 1:10
base::get

Bad

x <- 1 : 10
base :: get
24 / 31

Spacing

  • Extra spacing (i.e. more than one space in a row) is ok if it improves alignment of equal signs or assignments (<-).

Example

list(
total = a + b + c,
mean = (a + b + c) / n
)
25 / 31

Spacing

  • Extra spacing (i.e. more than one space in a row) is ok if it improves alignment of equal signs or assignments (<-).

Example

list(
total = a + b + c,
mean = (a + b + c) / n
)
  • Do not place spaces around code in parentheses or square brackets (unless there’s a comma, in which case see above).

Good

if (debug) do(x)
diamonds[5, ]

Bad

if ( debug ) do(x) # No spaces around debug
x[1,] # Needs a space after the comma
x[1 ,] # Space goes after comma not before
25 / 31

Spacing

  • An opening curly brace should never go on its own line and should always be followed by a new line. A closing curly brace should always go on its own line, unless it’s followed by else. Always indent the code inside curly braces.

Good

if (y < 0 && debug) {
message("Y is negative")
}
if (y == 0) {
log(x)
} else {
y ^ x
}
26 / 31

Spacing

  • An opening curly brace should never go on its own line and should always be followed by a new line. A closing curly brace should always go on its own line, unless it’s followed by else. Always indent the code inside curly braces.

Good

if (y < 0 && debug) {
message("Y is negative")
}
if (y == 0) {
log(x)
} else {
y ^ x
}

Bad

if (y < 0 && debug) {
message("Y is negative") }
if (y == 0) {
log(x)
} else {
y ^ x
}
26 / 31

Spacing

  • It’s okay to leave very short statements on the same line:

Example

if (y < 0 && debug) message("Y is negative")
  • Strive to limit your code to 80 characters per line.
  • If you find yourself running out of room, this is a good indication that you should encapsulate some of the work in a separate function.
27 / 31

Spacing

  • It’s okay to leave very short statements on the same line:

Example

if (y < 0 && debug) message("Y is negative")
  • Strive to limit your code to 80 characters per line.
  • If you find yourself running out of room, this is a good indication that you should encapsulate some of the work in a separate function.

Assignment

  • Use <-, not =, for assignment.

Good

x <- 5

Bad

x = 5
27 / 31

Commenting guidelines

  • Comment your code.
  • Each line of a comment should begin with the comment symbol and a single space: #.
  • Comments should explain the why, not the what.
  • Use commented lines of - and = to break up your file into easily readable chunks.
28 / 31

Commenting guidelines

  • Comment your code.
  • Each line of a comment should begin with the comment symbol and a single space: #.
  • Comments should explain the why, not the what.
  • Use commented lines of - and = to break up your file into easily readable chunks.

Example

# Load data ---------------------------
# Plot data ---------------------------
28 / 31

Exercises

  1. Combine all elements of df_list to one data frame. The result should be only a single line of code.
df_list <- lapply(0:4, function(j) data.frame(c1 = (1:5) + 5 * j,
c2 = letters[(1:5) + 5 * j]
)
)
  1. Fix each of the following common data frame subsetting errors:
mtcars[mtcars$cyl = 4, ]
mtcars[-1:4, ]
mtcars[mtcars$cyl <= 5]
mtcars[mtcars$cyl == 4 | 6, ]
  1. What does df[is.na(df)] <- 0 do?
  1. How would you randomly permute the columns of a data frame?
29 / 31

Exercises

  1. Write a while and a repeat loop doing the same as
for(i in LETTERS[1:10]){
print(i)
}
  1. Write a for loop doing the same as
count <- 0
repeat{
x <- sample(1:6, 1)
count <- count + 1
if(x == 6) break
}
print(count)
30 / 31

Exercises

  1. What is the problem here?
# Number of apples
i <- 100
for (i in 1:3) {}
paste("The number of apples is", i)
  1. What is the problem here? Can you debug it?
df <- mtcars
lin_mod <- function(){
lm(y ~ . ,data = df)
}
simple_lin_mod <- function(data, y, x){
df <- data[ ,c(y, x)]
names(df) <- c("y", "x")
fast_lm()
}
simple_lin_mod(df, "mpg", "cyl")
31 / 31

Preliminaries

Contact Information:

Slides, exercises and announcements will be provided on:


The password for the course is "Hadley".


2 / 31

Background? R and Rstudio installed? What do you know about R?

Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow