Learning R: Open Source (Free) Stats Package

(this course was previously entitled 'Introduction to R, to R Graphics, and to Reproducible Reporting with R')

 

BYO laptops are required for this course, however assistance installing the software will be provided in advance of the course.

 

This course is intended for data analysts and early career researchers aiming to establish a solid foundation for lifelong skill development.  The emphasis of the course is on gaining or extending familiarity with R, on the use of R for simple types of statistical analysis and graphical display, on project management, and on the use of RStudio’s abilities for reproducible recording and reporting of what has been achieved.

 

R is the leading tool for statistics, data analysis, machine learning and statistical graphics. It is supported by an active community of thousands of developers and contributors, and more than 2 million users. It has become the environment of choice for the implementation of new techniques, with over 2000 modules -- with more added every day -- covering the methods of every discipline from anthropology to zoology.  Its powerful and innovative graphics abilities are an especial attraction.

 

Simple forms of data summary and tabulation will be covered.  Other topics will include the use of R for one-sample t-tests, two-sample t-tests, the use of analysis of variance for one-way comparisons, and simple uses of regression methods.  The focus will however be on the use of graphics for exploratory data analysis, rather than on formal use of statistical tests.

Use of R will, throughout the course, be from the innovative RStudio interface.  RStudio has impressive abilities for project management, for maintaining a record of work, and for reproducible reporting.  RStudio’s project management abilities make it possible to switch back to an earlier project at the click of a mouse button, with the working environment returned to its state at the time of leaving the project.

A first step in reproducible analysis and reporting is to maintain a reproducible record of all calculations that led to the reported result.  The “markup” document approach, as implemented in R and RStudio, goes further.  Code, surrounded by markup that controls how it will be used, is embedded into the text of the report.  This “markup” document is then processed through R to give a web browser version of the report, complete with tables and graphs and other computer output.   A further step yields a Microsoft Word or Open Office or other such document that can be polished to give the final report.

Access to the markup document enables a co-worker, or a referee, or reader of the final paper, or the author at a later point in time, to reproduce the steps that contributed to the final report or paper.  If corrections are made to the data, or analysis details are changed, this is immediately reflected when the markup document is processed into a revised report.   Funding agencies are increasingly likely to demand some form of reproducible reporting.  It is the wave of the future.  Participants in this course will learn to practice reproducible reporting while learning of advancing their knowledge of R.

Intending participants are encouraged to work through the introductory notes on the R system that are noted below. There will be some limited use of the graphical user interface provided by the R Commander package for R.  Most use of R will however be from the command line, using RStudio.

This course will be very suitable preparation for the course Data Analysis, Graphics  and Visualization Using R.

 

Data will be provided.  Participants who can provide the data in advance of the course will, if the data are suitable for the methods covered in the course, have the opportunity to analyze their own data and discuss the output.

 

Notes will be provided.   For information on relevant components of the R system, on preparation for this course, and on computer setup, go to:

https://maths-people.anu.edu.au/~johnm/rrr-courseprep.html

 

 
Level 2 - runs over 5 days
Instructor: 

Following a first in Mathematics at Auckland University and a variety of teaching and lecturing positions, John Maindonald settled down to working with other researchers as a quantitative problem solver. Until his move from New Zealand to Australia in 1996, much of his work was in plant, fruit and insect and other pest research, with industrial consulting as a sideline. He took up a position at The Australian National University (ANU) in 1998.  At ANU he has relished the stimulus of working with biologists (including molecular biologists), ecologists, epidemiologists, public health researchers, demographers, computer scientists, numerical analysts, machine learners, an economic historian, forensic linguists, and a lively group of statisticians. He is the author of a book on Statistical Computation.  He the senior author of "Data Analysis and Graphics Using R". This example-based exposition of practical approaches to data analysis, now into its third edition, has sold more than 10,000 copies.  Now in semi-retirement, he does occasional consulting, and fronts workshops on the use of the open source R system for scientific and statistical applications and for graphics.

Course dates: Monday 30 June 2014 - Friday 4 July 2014
Course status: Course completed (no new applicants)
Week: 
Week 1
Recommended Background: 

Participants should have an understanding of elementary statistics equivalent to the syllabus of ‘Fundamentals of Statistics’.

Participants should be comfortable finding their way around the file system on their computer ― Microsoft Windows, or Macintosh OS X, or Linux.

Course fees
Member: 
$1,800
Non Member: 
$3,230
Full time student Member: 
$1,800