+ - 0:00:00
Notes for current slide
Notes for next slide

Advanced R for Econometricians

Data Visualisation with ggplot2

Martin C. Arnold, Jens Klenke

1 / 42

Data Visualisation

R is a free software environment for statistical computing and graphics. R-Projekt
2 / 42

Data Visualisation

R is a free software environment for statistical computing and graphics. R-Projekt

There are three major graphical systems:

  • R base graphics
  • lattice
  • ggplot2
2 / 42

Data Visualisation

R is a free software environment for statistical computing and graphics. R-Projekt

There are three major graphical systems:

  • R base graphics
  • lattice
  • ggplot2

In this course, we will focus on ggplot2. If you want to learn about the others, R Graphics is a great source.

2 / 42

ggplot2

What is ggplot2?

3 / 42

ggplot2

What is ggplot2?

3 / 42

The Layered Grammar of Graphics

The grammar consists of

  • data
  • aesthetic mappings (e.g. mapping of data to x and y coordinates, size, color, shape, ...)
  • geometric objects (e.g. points, lines, bars, ...)
  • scales (controls mapping from data to aesthetics , e.g., which colors should be used)
  • facets (splitting data to create plots for subgroups)
  • statistical transformations (summarize data before plotting)
  • coordinate systems (e.g. Cartesian, polar, ...)
4 / 42

The Layered Grammar of Graphics

The grammar consists of

  • data
  • aesthetic mappings (e.g. mapping of data to x and y coordinates, size, color, shape, ...)
  • geometric objects (e.g. points, lines, bars, ...)
  • scales (controls mapping from data to aesthetics , e.g., which colors should be used)
  • facets (splitting data to create plots for subgroups)
  • statistical transformations (summarize data before plotting)
  • coordinate systems (e.g. Cartesian, polar, ...)

The data, mappings, statistical transformations and geometric objects form a layer. A plot can have multiple layers.

4 / 42

A Basic Example

Example

library(ggplot2)
data("diamonds")
ggplot(data = diamonds) +
geom_point(mapping = aes(x = carat, y = price))

5 / 42

A Basic Example

Example

library(ggplot2)
data("diamonds")
ggplot(data = diamonds) +
geom_point(mapping = aes(x = carat, y = price))

ggplot()

  • Creates a coordinate system that you can add layers to.
  • Everything you provide as an argument will be the default for all added layers.
5 / 42

A Basic Example

Example

library(ggplot2)
data("diamonds")
ggplot(data = diamonds) +
geom_point(mapping = aes(x = carat, y = price))

ggplot()

  • Creates a coordinate system that you can add layers to.
  • Everything you provide as an argument will be the default for all added layers.

geom_point()

  • Adds a layer of points.
  • Each geom_* function takes a mapping argument, which paired with aes() defines how variables in your data are mapped to visual properties.
5 / 42

A Basic Example

6 / 42

not the whole dataset included

Adding Layers

To make the (possibly nonlinear) relationship in the data easier visible we add a smoothing function to the plot.

7 / 42

Adding Layers

To make the (possibly nonlinear) relationship in the data easier visible we add a smoothing function to the plot.

Example

ggplot(data = diamonds) +
geom_point(mapping = aes(x = carat, y = price)) +
geom_smooth(mapping = aes(x = carat, y = price), method = 'loess')
7 / 42

Adding Layers

To make the (possibly nonlinear) relationship in the data easier visible we add a smoothing function to the plot.

Example

ggplot(data = diamonds) +
geom_point(mapping = aes(x = carat, y = price)) +
geom_smooth(mapping = aes(x = carat, y = price), method = 'loess')

To write more compact code we can

  • omit the parameter names
  • switch mapping to ggplot().
7 / 42

Adding Layers

To make the (possibly nonlinear) relationship in the data easier visible we add a smoothing function to the plot.

Example

ggplot(data = diamonds) +
geom_point(mapping = aes(x = carat, y = price)) +
geom_smooth(mapping = aes(x = carat, y = price), method = 'loess')

To write more compact code we can

  • omit the parameter names
  • switch mapping to ggplot().

The same mapping is then used for all layers (but can be overwritten if necessary).

7 / 42

Adding Layers

To make the (possibly nonlinear) relationship in the data easier visible we add a smoothing function to the plot.

Example

ggplot(data = diamonds) +
geom_point(mapping = aes(x = carat, y = price)) +
geom_smooth(mapping = aes(x = carat, y = price), method = 'loess')

To write more compact code we can

  • omit the parameter names
  • switch mapping to ggplot().

The same mapping is then used for all layers (but can be overwritten if necessary).

Example

ggplot(diamonds, aes(carat, price)) +
geom_point() +
geom_smooth(method = 'loess')
7 / 42

Adding a Layer

8 / 42

Aesthetics

  • Until now we only used the x and y coordinates as aesthetics.
  • ?geom_point() tells us about further aesthetics that we can map data to.
  • Each geom has its own set of aesthetics.
9 / 42

Aesthetics

  • Until now we only used the x and y coordinates as aesthetics.
  • ?geom_point() tells us about further aesthetics that we can map data to.
  • Each geom has its own set of aesthetics.

Example

ggplot(diamonds, aes(x = carat, y = price,
color = color,
shape = cut)) +
geom_point()
9 / 42

Aesthetics

10 / 42

Exercise

  1. Experiment with geom_point() using the mtcars data set. Try out different aesthetics with different variables. What do you note?
11 / 42

Specifically explain the different behaviour of factor and numeric variables for e.g. color.

Statistical Transformations and Aesthetics

  • If discrete variables are mapped to aesthetics, ggplot will automatically group the data.
  • In this case every statistic transformation is performed by group.

Example

ggplot(diamonds, aes(x = carat, y = price,
color = color, shape = cut)) +
geom_point() +
geom_smooth()
12 / 42

Statistical Transformations and Aesthetics

13 / 42

Statistical Transformations and Aesthetics

  • If this is not desired simply move color and shape as additional aesthetics to geom_point().

Example

ggplot(diamonds, aes(x = carat, y = price)) +
geom_point(aes(color = color, shape = cut)) +
geom_smooth()
  • Sometimes it is more convenient to leave all aesthetics in ggplot() (e.g. if there are many layers) and overwrite the created groups by an arbitrary constant value.

Example

ggplot(diamonds, aes(x = carat, y = price,
color = color, shape = cut)) +
geom_point() +
geom_smooth(aes(group = 1))
14 / 42

Statistical Transformations and Aesthetics

15 / 42

Geometric Objects and Statistic Transformations

  • Layers may be defined in terms of a geometric object (geom_*) or a statistical transformation (stat_*).
  • Each geometric object is associated with a default statistical transformation and vice versa.

    Examples:

    • geom_point() has the identity function as statistical transformation.

    • geom_smooth() fits a regression model before plotting a line with a prediction interval.

  • stat_smooth() is an alias which does essentially the same. However, in the first case we could change the statistic transformation and in the second case we could change the geometric object.

  • Often it is not a good idea to change the default behaviour (e.g. try geom_point(stat = "smooth", method = "lm")) but we will see an example where it can be useful.
16 / 42

Exercises

  1. Use the mtcars data set and plot mpg vs. hp. Add a smoothing line to the plot.
  2. Add a smoothing function to the plot for each number of cylinders.
  3. Find out how to remove the confidence interval.
  4. Use a simple linear regression model and a quadratic regression model for smoothing.
17 / 42

Bar Plot

geom_bar() counts the number of observations within each group and produces a bar plot.



Example

ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut))
# x should be discrete
18 / 42

Bar Plot

geom_bar() counts the number of observations within each group and produces a bar plot.



Example

ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut))
# x should be discrete

18 / 42

Histogram



Example

ggplot(diamonds, aes(x = depth, fill = cut)) +
geom_histogram(binwidth = 0.1)
19 / 42

Histogram



Example

ggplot(diamonds, aes(x = depth, fill = cut)) +
geom_histogram(binwidth = 0.1)

19 / 42

1D Density



Example

ggplot(diamonds, aes(x = depth, fill = cut)) +
# try color instead of fill
geom_density()
20 / 42

1D Density



Example

ggplot(diamonds, aes(x = depth, fill = cut)) +
# try color instead of fill
geom_density()

20 / 42

2D Density



Example

ggplot(diamonds, aes(carat, price)) +
geom_point() +
geom_density2d()
21 / 42

2D Density



Example

ggplot(diamonds, aes(carat, price)) +
geom_point() +
geom_density2d()

21 / 42

2D Density with geom = "polygon"



Example

ggplot(diamonds, aes(carat, price)) +
geom_point() +
stat_density2d(aes(fill = ..level..),
geom = "polygon")
22 / 42

2D Density with geom = "polygon"



Example

ggplot(diamonds, aes(carat, price)) +
geom_point() +
stat_density2d(aes(fill = ..level..),
geom = "polygon")

22 / 42

Faceting using one variable

  • Faceting generates multiple plots each showing a different subset of the data.



Example

ggplot(diamonds, aes(carat, price)) +
geom_point() +
facet_grid( ~ cut)
23 / 42

Faceting using one variable

  • Faceting generates multiple plots each showing a different subset of the data.



Example

ggplot(diamonds, aes(carat, price)) +
geom_point() +
facet_grid( ~ cut)

23 / 42

Faceting using two variables

Example

ggplot(data = diamonds, mapping = aes(carat, price)) +
geom_point() +
facet_grid(color ~ cut)

24 / 42

Position Adjustments

  • Each geometric object has a parameter for position adjustment.
    ggplot(data = diamonds) +
    geom_bar(mapping = aes(x = cut, fill = clarity), position = "stack") # the default
25 / 42

Position Adjustments

  • Each geometric object has a parameter for position adjustment.

    ggplot(data = diamonds) +
    geom_bar(mapping = aes(x = cut, fill = clarity), position = "stack") # the default
  • Instead of stacking the bars we can position them side-by-side using dodge.

    geom_bar(., position = "dodge")
25 / 42

Position Adjustments

  • Each geometric object has a parameter for position adjustment.

    ggplot(data = diamonds) +
    geom_bar(mapping = aes(x = cut, fill = clarity), position = "stack") # the default
  • Instead of stacking the bars we can position them side-by-side using dodge.

    geom_bar(., position = "dodge")
  • With fill relative proportions can be compared.

    geom_bar(., position = "fill")
25 / 42

Stacked Bar Plot

Example

ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity), position = "stack") # the default


26 / 42

Stacked Bar Plot

Example

ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity), position = "stack") # the default


26 / 42

Dodged Bar Plot

Example

ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity), position = "dodge")

27 / 42

Dodged Bar Plot

Example

ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity), position = "dodge")

27 / 42

Filled Bar Plot

Example

ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity), position = "fill")

28 / 42

Filled Bar Plot

Example

ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity), position = "fill")

28 / 42

Scales

  • Scales determine how the data are mapped to the aesthetics (e.g. which value takes which color).
  • Scales are defined by functions of the form scale_aestheticname_scalename().
29 / 42

Scales

  • Scales determine how the data are mapped to the aesthetics (e.g. which value takes which color).
  • Scales are defined by functions of the form scale_aestheticname_scalename().

Example

ggplot(diamonds, aes(carat, price, color = cut)) + geom_point() + scale_color_grey()

29 / 42

Scales

  • Scales determine how the data are mapped to the aesthetics (e.g. which value takes which color).
  • Scales are defined by functions of the form scale_aestheticname_scalename().

Example

ggplot(diamonds, aes(carat, price, color = cut)) + geom_point() + scale_color_grey()

29 / 42

Scales

  • You can also define your own scales.
30 / 42

Scales

  • You can also define your own scales.

Example

ggplot(diamonds, aes(carat, price, color = cut)) +
geom_point() +
scale_color_manual(values = c("#c9792e", "blue", "green", "gray", "thistle2"))

30 / 42

Scales

  • You can also define your own scales.

Example

ggplot(diamonds, aes(carat, price, color = cut)) +
geom_point() +
scale_color_manual(values = c("#c9792e", "blue", "green", "gray", "thistle2"))

30 / 42

Titles and labels

labs() allows you to

  • add a title and a subtitle
  • add a caption
  • add a tag
  • change the axis labels
  • change the legend title.
31 / 42

Titles and labels

labs() allows you to

  • add a title and a subtitle
  • add a caption
  • add a tag
  • change the axis labels
  • change the legend title.

Example

ggplot(diamonds, aes(x = carat, y = price, fill = cut)) +
geom_point() +
labs(title = "Diamanten", x = "Karat", y = "Preis", fill = "Schnitt")
31 / 42

Titles and labels

Example

ggplot(diamonds, aes(x = carat, y = price, fill = cut)) +
geom_point() +
labs(title = "Diamanten", x = "Karat", y = "Preis", fill = "Schnitt")
32 / 42

Titles and labels

Example

ggplot(diamonds, aes(x = carat, y = price, fill = cut)) +
geom_point() +
labs(title = "Diamanten", x = "Karat", y = "Preis", fill = "Schnitt")

32 / 42

Themes

Themes control the appearance of the plot

  • font type and font size
  • background
  • ticks marks and labels
  • grid lines
  • ...
33 / 42

Themes

Themes control the appearance of the plot

  • font type and font size
  • background
  • ticks marks and labels
  • grid lines
  • ...

There are many predefined themes. However, if you like you may also define everything yourself.

33 / 42

Themes

Themes control the appearance of the plot

  • font type and font size
  • background
  • ticks marks and labels
  • grid lines
  • ...

There are many predefined themes. However, if you like you may also define everything yourself.

Example

ggplot(diamonds, aes(carat, price, color = cut)) +
geom_point() + theme_bw()
33 / 42

Themes

Example

ggplot(diamonds, aes(carat, price, color = cut)) +
geom_point() + theme_bw()
34 / 42

Themes

Example

ggplot(diamonds, aes(carat, price, color = cut)) +
geom_point() + theme_bw()

34 / 42

Overplotting

  • If plotting many data points with similar values individual points overlap and disappear.
  • Overplotting can make it hard to see the pattern in the data and may render a plot useless.
  • There are a couple of ways to address overplotting such as:
    • only plot subsets of the data (e.g. facet_grid)
    • making the points transparent
35 / 42

Overplotting

  • If plotting many data points with similar values individual points overlap and disappear.
  • Overplotting can make it hard to see the pattern in the data and may render a plot useless.
  • There are a couple of ways to address overplotting such as:
    • only plot subsets of the data (e.g. facet_grid)
    • making the points transparent

Example

library(cowplot)
over1 <- ggplot(diamonds, aes(carat, price)) +
geom_point()
trans <- ggplot(diamonds, aes(carat, price)) +
geom_point(alpha = 0.05)
plot_grid(over1, trans)
35 / 42

Transparency

36 / 42

Jittering

  • Even in cases with only a few data points overplotting can become an issue if there is only a small number of unique values.
  • Adding small random numbers can help to reduce overplotting.
37 / 42

Jittering

  • Even in cases with only a few data points overplotting can become an issue if there is only a small number of unique values.
  • Adding small random numbers can help to reduce overplotting.

Example

over2 <- ggplot(mtcars, aes(am, cyl)) +
geom_point()
jitter <- ggplot(mtcars, aes(am, cyl)) +
geom_jitter(width = 0.03, height = 0.1)
plot_grid(over2, jitter) # from cowplot
37 / 42

Jittering

38 / 42

Composite Plots

  • There are many packages extending ggplot2 such as cowplot and ggExtra.
  • With basic ggplot2 it is e.g. quite tricky to create a composite plot as created by ggExtra::ggMarginal()
39 / 42

Composite Plots

  • There are many packages extending ggplot2 such as cowplot and ggExtra.
  • With basic ggplot2 it is e.g. quite tricky to create a composite plot as created by ggExtra::ggMarginal()

Example

library(ggExtra)
scatter_plot <- ggplot(diamonds, aes(carat, price)) +
geom_point()
ggMarginal(scatter_plot, # ggExtra
type = 'density',
margins = 'both',
size = 5,
colour = '#FF0000',
fill = '#FFA500'
)
39 / 42

Composite Plots

40 / 42

Exercises

  1. Download the Titanic data set from Moodle and
  • use a bar plot to show how many people survived the Titanic compared to those who didn't.
  • add a color coding to the previous plot to visualize the differences between the passengers' gender.
  • split the previous plot into three plots based on the passengers' class.
  • compare the age distribution between survivors and non-survivors.
  1. Try to answer the following questions about the mpg dataset (comes with ggplot2) using ggplot.
  • How are engine size and fuel economy related?
  • Do certain manufacturers care more about economy than others?
  • Has fuel economy improved in the last ten years?
  1. Compare the two data sets economics and economics_long (both come with ggplot) with respect to the ease of use when working with ggplot.
41 / 42
  1. Reproduce the plot created by the following code with ggplot.
  2. plot(mtcars$mpg ~ mtcars$wt, xlab = "wt", ylab = "mpg", pch = 19, ylim = c(5, 35))
    mod <- lm(mpg ~ wt, data = mtcars)
    abline(mod, col = "red")
    wt_new <- seq(min(mtcars$wt), max(mtcars$wt), by = 0.05)
    conf_interval <- predict(mod, newdata = data.frame(wt = wt_new),
    interval = "confidence", level = 0.95)
    # setup vertrices of polygon (for shading the CI):
    p <- cbind(c(wt_new, rev(wt_new)), c(conf_interval[, 3], rev(conf_interval[, 2])))
    polygon(p, col = adjustcolor("steelblue", alpha.f = 0.5), )
    lines(wt_new, conf_interval[, 2], col = "steelblue", lty = 2)
    lines(wt_new, conf_interval[, 3], col = "steelblue", lty = 2)
42 / 42

Data Visualisation

R is a free software environment for statistical computing and graphics. R-Projekt
2 / 42
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow