R Software for Data Science: Online - (3 days)

This masterclass offers a step-by-step, interactive introduction to R and RStudio for participants with no experience with these software packages.

 

This masterclass is part of the ACSPRI suite of courses in social data science is specially designed for those who want to learn how to use R for data manipulation and statistical analysis.

 

This course will  be run over 3 days, (2 days in week 1 & 1 day in Week 2), using the following timetable:

 

Day 1

  • 9.30 am - 10.00 am – Introductions and setup check
  • 10.00 am - 11.30 am - Instructional Zoom Session
  • 12.30 pm - 2.00 pm - Instructional Zoom Session
  • 3.00 pm - 5.00pm - Instructional Zoom Session and Exercises

 
Days 2 and 3

  • 10.00 am - 11.30 am - Instructional Zoom Session
  • 12.30 pm - 2.00 pm - Instructional Zoom Session
  • 3.00 pm - 5.00pm - Instructional Zoom Session and Exercises

 

 

 
Master Class - runs over 3 days
Instructor: 

Dr Joanna Dipnall is an applied statistician with interests in the advanced statistical methods, including machine learning and deep learning techniques. She completed her Honours in Econometrics with Monash University and her PhD with IMPACT SRC, School of Medicine, Deakin University. Joanna works extensively with registry and linked medical data and collaborates extensively with the Faculty of IT at Monash to supervise Masters and PhD students to integrate artificial intelligence within health research. Joanna teaches within the Monash Biostatistics Unit and is the Unit Co-coordinator for the Monash Masters of Health Data Analytics course. Joanna has taught advanced statistical methods for many years at universities and for ACSPRI.

About this course: 

One of the key skills in data science is making effective use of software for manipulating data and generating results. R is an established software environment used in the world of data science. In this course, you will be introduced to basic data wrangling, descriptive statistics, visualisation and reporting of results. Key R data science libraries such as dplyr and ggplot will be introduced.

Upon completion of this master class, you will have the skills required to load different types of data files into R, manage and manipulate your data, build visualisations and produce a basic report. The workshop is relevant to researchers and data analysts in any area of research that want to use R for their research work. This workshop aims to introduce the foundations of R and build confidence in the use of R.

 

Course syllabus: 

 

Day 1

  • Introduction to R
  • Installing and loading libraries
  • Data structures in R (vectors, matrices, data frames)
  • Descriptive statistics
  • Tabulations
  • Exercises

 

Day 2

  • Introduction to data wrangling
  • Recoding variables
  • Generating new variables
  • Filtering data frames (rows and/or columns)
  • Merging and appending data
  • Exercises and homework

 

Day 3

  • Review of homework and Quiz
  • Basic graphs
  • Extending graphs with ggplot
  • Creating your first report of your analysis using Markdown files (tables, graphs)
  • Exercises

 

 

Course format: 

This course will be run online over 2 weeks with days 1 and 2 in the first week, and day 3 the following week. Homework will be provided to participants to complete over the following week, with a quiz to be completed prior to day 3.

 

Participants will require their own computers and to have loaded R and RStudio loaded onto their machines. They will also need to be able to access the internet to download R libraries. This course will be taught in the PC environment but MAC users are welcome.

 

Please note that due to the short 3-day structure, there will not be any time set aside for analysing participant’s own data.

 

 

 

Recommended Background: 

This course assumes that participants have:

 

  1. A basic understanding of statistical concepts including descriptive statistics (mean, median and interquartile range),
  2. Some familiarity with a PC/Mac environment including keyboard skills,
  3. An understanding of folder and file structures in the PC/Mac environment, and
  4. Some experience in using Microsoft Word and Excel or their equivalent.
Recommended Texts: 

 

Data Analysis and Graphics Using R by John Maindonald and W. John Braun.

 

Discovering Statistics Using R by Andy Field and Jeremy Miles.

Supported by: