Plotting longitudinal data in R using base graphics and ggplot2

This example highlights some of the differences between the plotting approaches of base graphics and ggplot2 in R.

Imagine we gather a group of subjects, randomly sampled between genders, and uniformly sampled over the age range 0-30 years. We want to study these individuals longitudinally, so for each subject we record one FA measure now, and another data point at some variable time in the future.

In our plot, we'd like to:

  • Show the relationship between age and FA
  • Identify gender
  • Identify time point
  • Associate multiple time points for the same individual
  • Add trendlines
    • Quadratic
    • Exponential

Plots

First we display the entire dataset using base and ggplot2:

Then we show how easy it is with ggplot2 to use different variables to slice the data up into subplots:

Finally, we show how this idea extends to allow for any arbitrary element to be customized for each subplot:

And the same thing showing formulas instead:

Notes

  • Plot symbols: Using base, these and other plotting aesthetics must be specified directly for each point. Using ggplot2, you simply specify a variable and it will figure out rational defaults and which to assign to each point.
  • Grouping: Using base, it is cumbersome to plot many line segments, joining the different points for each subject. Normally this would be done with many calls to lines(), slicing the data up for each subject with a loop or one of the apply() functions. Or if there are only 2 points for each subject, a single call to segments() would work. Using ggplot2, this is as simple as specifying a grouping variable.
  • Trendlines: ggplot2 can overlay simple trendlines (linear models, etc.) without having to fit the model and generate model predictions separately.
  • Legends: Using base, legends must be constructed manually. Legends are created automatically by ggplot2.
  • Editing: If you need to change some aspect of a base graphics plot, the entire code used to generate the plot needs to be entered again. With ggplot2, this is easy to do by just updating the plot object and redrawing.
  • Portability: Similarly, it is easy with ggplot2 to swap out variables assigned to different aspects of the plot, or to run the same visual program on an entirely new set of data.
  • Facets: This is the nicest feature of ggplot2 in my opinion. If you want to make subplots for different groupings in the results (ex: by gender), all you have to do is specify one or two facet variables. In base, all the steps of subsetting the data and plotting would have to be performed manually for each subplot. Now imagine if you had 2 factor variables with 6 levels each. ggplot2 would take care of generating 36 subplots (plus custom annotations) for the different combinations of these levels - all without modifying the script!

Code

statistics/longitudinal-data.txt · Last modified: 2011/06/07 2:36 pm PDT by John Colby
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki