Learning R: Open Source (Free) Stats Package

(this course was previously entitled 'Introduction to R, to R Graphics, and to Reproducible Reporting with R')

 

R is the leading tool for statistics, data analysis, machine learning and statistical graphics. It is supported by an active community of thousands of developers and contributors, and more than 2 million users. It has become the environment of choice for the implementation of new techniques, with over 6000 modules -- with more added every day -- covering the methods of every discipline from anthropology to zoology.  Its powerful and innovative graphics abilities are an especial attraction.

 

 

This course will begin by discussing R system setup, the use of the RStudio interface, the R language, and data input and output in R. The discussion will then move on to the use of R for data manipulation, for simple forms of data summary and tabulation, for graphics, and for elementary data analysis. Attention will be given to the use of R for one-sample t-tests, two-sample t-tests, analysis of variance for one-way comparisons, and simple uses of regression methods. The focus will be however, on the use of graphics for exploratory data analysis, rather than on formal use of statistical tests.

Use of R will, throughout the course, be from the innovative RStudio interface. RStudio has impressive abilities for project management, for maintaining a record of work, and for reproducible reporting. RStudio’s project management abilities make it possible to switch back to an earlier project at the click of a mouse button, with the working environment returned to its state at the time of leaving the project.

This course is intended for data analysts and early career researchers aiming to establish a solid foundation for lifelong skill development. The emphasis of the course is on gaining or extending familiarity with R, on the use of R for simple types of statistical analysis and graphical display, on project management, and on the use of RStudio’s abilities for reproducible recording and reporting of what has been achieved.

 

This course will be very suitable preparation for the course Data Analysis, Graphics  and Visualization Using R.

 

 

 

COURSE SYLLABUS

 

Day 1
Key R language ideas and terminology
Sources of help for use of R
Introduction to RStudio
The R Working Environment
Data input

 

Day 2
Comparison of two groups of data
Data exploration and first steps in regression
Data structures: column objects and data frames
Base vs Lattice graphics; addition of layers to lattice graphs

 

Day 3
ggplot2 graphics: layers are everywhere
Input of data from other packages and from the internet
(Mainly, be aware of these abilities, which are advancing rapidly)
Matrices, data frames and tables
Regression

 

Day 4
Regression, continued
Data summary & manipulation
Google’s Public Data Explorer & the googleVis package
Logistic and poisson regression (if there is interest)

 

Day 5
Review of topics from earlier days
Topics from earlier days covered in more detail, or special topics (as negotiated with the tutor)

 

 

 

 

 
Level 2 - runs over 5 days
Instructor: 

Following a first in Mathematics at Auckland University and a variety of teaching and lecturing positions, John Maindonald settled down to working with other researchers as a quantitative problem solver. Until his move from New Zealand to Australia in 1996, much of his work was in plant, fruit and insect and other pest research, with industrial consulting as a sideline. He took up a position at The Australian National University (ANU) in 1998.  At ANU he has relished the stimulus of working with biologists (including molecular biologists), ecologists, epidemiologists, public health researchers, demographers, computer scientists, numerical analysts, machine learners, an economic historian, forensic linguists, and a lively group of statisticians. He is the author of a book on Statistical Computation.  He the senior author of "Data Analysis and Graphics Using R". This example-based exposition of practical approaches to data analysis, now into its third edition, has sold more than 10,000 copies.  Now in semi-retirement, he does occasional consulting, and fronts workshops on the use of the open source R system for scientific and statistical applications and for graphics.

Course dates: Monday 28 September 2015 - Friday 2 October 2015
Course status: Course completed (no new applicants)
Week: 
Week 1
Recommended Background: 

You should have an understanding of elementary statistics equivalent to the syllabus of Fundamentals of Statistics.

You should be comfortable finding your way around the file system on your computer: Microsoft Windows, or Macintosh OS X, or Linux. You should install R on your system and work through the preparatory exercises that are available from the website noted above. 

 

 

Recommended Texts: 

The instructor's bound, book length course notes will serve as the course texts.

You are encouraged to work through the introductory notes on the R system that are noted below. There will be some limited use of the graphical user interface provided by the R Commander package for R.  Most use of R will however be from the command line, using Rstudio. https://maths-people.anu.edu.au/~johnm/rrr-courseprep.html

 

On the growing popularity of R, relative to other software, see http://r4stats.com/articles/popularity/

 

 

The Maindonald and Braun text noted below will be useful for supplementary reading. Arrangements will be made for course participants to purchase this text at a discounted cost.

Maindonald, J.H. and Braun, W.J. Data Analysis and Graphics Using R – An Example-Based Approach. Cambridge University Press 2010. A review written by an enthusiastic reader can be found at http://r4stats.com/articles/book-reviews/

Course fees
Member: 
$1,870
Non Member: 
$3,485
Full time student Member: 
$1,870
Program: 
Spring Program 2015
Notes: 

BYO laptops are required for this course, however assistance installing the software will be provided in advance of the course. Data will also be provided.

If you can provide your own data in advance of the course, (if it's suitable for the methods covered in the course) you will have the opportunity to analyze it and discuss the output.

For information on relevant components of the R system, on preparation for this course, and on computer setup, go to:

https://maths-people.anu.edu.au/~johnm/rrr-courseprep.html