Data Analysis, Graphics and Visualisation Using R

Graphs bring data and data analyses to life.  They are an important, even essential, tool for extracting meaning from data, and from the results of data analyses.  Prior to analysis, as part of the analysis, and following the analysis, they provide essential checks on key assumptions that underly the analysis.  In the recent past, there have been dramatic advances in what is available, with the R system bringing many of the new abilities together within a common framework.

 

R is the leading tool for statistics, data analysis, machine learning and statistical graphics. It is supported by an active community of thousands of developers and contributors, and more than 2 million users. It has become the environment of choice for the implementation of new techniques, with over 6000 modules -- with more added every day - covering the methods of every discipline from anthropology to zoology. The powerful and innovative graphics abilities available in R include the provision of well-designed publication-quality plots that can include mathematical symbols and formulae.

 
Level 3 - runs over 5 days
Instructor: 

Following a first in Mathematics at Auckland University and a variety of teaching and lecturing positions, John Maindonald settled down to working with other researchers as a quantitative problem solver. Until his move from New Zealand to Australia in 1996, much of his work was in plant, fruit and insect and other pest research, with industrial consulting as a sideline. He took up a position at The Australian National University (ANU) in 1998.  At ANU he has relished the stimulus of working with biologists (including molecular biologists), ecologists, epidemiologists, public health researchers, demographers, computer scientists, numerical analysts, machine learners, an economic historian, forensic linguists, and a lively group of statisticians. He is the author of a book on Statistical Computation.  He the senior author of "Data Analysis and Graphics Using R". This example-based exposition of practical approaches to data analysis, now into its third edition, has sold more than 10,000 copies.  Now in semi-retirement, he does occasional consulting, and fronts workshops on the use of the open source R system for scientific and statistical applications and for graphics.

Course dates: Monday 8 February 2016 - Friday 12 February 2016
Course status: Course completed (no new applicants)
Week: 
Week 3
About this course: 

This course will emphasise insights that graphs can provide on data analysis and on data analysis results, rather than data analysis as such.

 

It will cover:
1) The use of graphs for exploratory data analysis, as a prelude to more formal analysis, for checking on analysis results, and for presenting results.  Regression calculations will be a particular focus.  
2) Dynamic displays of 3-dimensional data;
3) The use of Shiny, with RStudio, to create HTML displays where graphs can be manipulated dynamically.
4) The creation of Hans Rosling style Motion Charts. See https://www.gapminder.org/upload-data/motion-chart/
5) The creation and use of network graphs.
6) the overlaying of plots on to Google maps or Google Earth Displays; with the ability to manipulate the resulting display dynamically

Topics (3), (5) and (6) will be introductory in scope only, designed to indicate what is possible.  References and links will be given that may serve as starting points for further study.

 

Concepts and understanding that are important for the use of R will be introduced in the context of data exploration, regression calculations and graphics.  The introduction that it provides to the R system will be helpful in the use of R more generally. Intending participants are encouraged to work through the introductory notes on the R system that are noted below (Recommended reading). There will be some limited use of the graphical user interface provided by the R Commander package for R. Most use of R will however be from the command line, using the highly attractive RStudio "interactive display environment".

This course is intended for all, whether in industry or business or academia, who wish to update their graphics and associated quantitative methods skills.  The course has especial relevance to early career researchers aiming to establish a solid foundation for lifelong skill development.

 

Course syllabus: 

Day 1
Check on R installations; review basics of R and RStudio, including data input and output, data manipulation, and simple graphics.  Show how to create code notebooks. Introduce the use of R Markdown for project documentation and for report generation.

 

Day 2
Continue review of R basics.  Exploratory data analysis, using base R and using Lattice graphics.

 

Day 3
Dynamic displays of 3-dimensional data.  The ggplot2 “grammar of graphics”, and ggmap.  Graphics for regression.

 

Day 4
Hans Rosling style motion charts.  Graphics for regression, continued.  Network graphs.

 

Day 5
Maps, and overlaying onto Google maps.  Review of earlier days.  Some topics that were introduced on earlier days can be taken up in more detail, as time allows and participants request.

Course format: 

BYO laptops are required for this course, however assistance installing the software will be provided in advance of the course. Data will also be provided. Anyone with data that seems suitable for use in the course is invited to contact the lecturer with details.

For information on relevant components of the R system, and on preparation for this course, go to: https://maths-people.anu.edu.au/~johnm/r-courseprep.html

Recommended Background: 

Knowledge of the principles of multiple regression at a level comparable to that provided by the Fundamentals of Multiple Regression course. Previous experience of data analysis using SPSS or SAS or Stata or R, or another system with comparable abilities. Participants must be comfortable with typing commands at the command line. You should have prior exposure to R, or be willing to gain some prior familiarity with R in the weeks leading up to the course.

Recommended Texts: 

For information on relevant components of the R system, and on preparation for this course, go to: http://maths-people.anu.edu.au/~johnm/r-courseprep.html

The Maindonald and Braun text noted below covers a substantial part of the course content, and will be useful for supplementary reading. Arrangements will be made for course participants to purchase this text at a discounted cost.

  • Maindonald, J.H. and Braun, W.J. Data Analysis and Graphics Using R – An Example-Based Approach. Cambridge University Press 2010. A review written by an enthusiastic reader can be found at http://r4stats.com/articles/book-reviews/

 

On the growing popularity of R, relative to other software, see http://r4stats.com/articles/popularity/

Course fees
Member: 
$1,870
Non Member: 
$3,485
Full time student Member: 
$1,870
FAQ: 

Q: Do I have to have any prerequisites to do this course?

A: Yes, you should know the principles of Multiple Regression, have experience with a data analysis pacakge (SPSS, SAS, Stata or R) and be comfortable typing commands at the command line. You should have prior exposure to R, or be willing to gain some prior familiarity with R in the weeks leading up to the course.

Participant feedback: 

Good mix of practice and discussion and explanation. (Summer 2013)

 

Great to be equipped with better ways to do analysis (Summer 2013)

 

Overview of R package identified gaps in my knowledge of statistical concepts and their implication (specifically reg diagnostics) (Summer 2013)

 

Got what I wanted/expected (Summer 2014)

Program: 
Summer Program 2016
Notes: 

The instructor's bound, book length course notes will serve as the course texts.