Machine Learning for Data Science: Surpervised Learning Techniques: Online (3 days)

This masterclass is an introduction to supervised machine learning techniques for data science. It will provide an interactive step-by-step guide to running some of the standard statistical regression and classification machine learning models that every data scientist should know. This course will use the R software.

 

This masterclass is part of the ACSPRI suite of courses in social data science and is specially designed for those who want a gentle introduction to supervised machine learning models in data science.

 

 

This course will be run over 3 days, (2 days in week 1 & 1 day in Week 2), using the following timetable:

 

Day 1

  • 9.30 am - 10.00 am – Introductions and setup check
  • 10.00 am - 11.30 am - Instructional Zoom Session
  • 12.30 pm - 2.00 pm - Instructional Zoom Session
  • 3.00 pm - 5.00pm - Instructional Zoom Session and Exercises

 
Days 2 and 3

  • 10.00 am - 11.30 am - Instructional Zoom Session
  • 12.30 pm - 2.00 pm - Instructional Zoom Session
  • 3.00 pm - 5.00pm - Instructional Zoom Session and Exercises

 

 
Master Class - runs over 3 days
Instructor: 

Dr Joanna Dipnall is an applied statistician with interests in the advanced statistical methods, including machine learning and deep learning techniques. She completed her Honours in Econometrics with Monash University and her PhD with IMPACT SRC, School of Medicine, Deakin University. Joanna works extensively with registry and linked medical data and collaborates extensively with the Faculty of IT at Monash to supervise Masters and PhD students to integrate artificial intelligence within health research. Joanna teaches within the Monash Biostatistics Unit and is the Unit Co-coordinator for the Monash Masters of Health Data Analytics course. Joanna has taught advanced statistical methods for many years at universities and for ACSPRI.

About this course: 

Machine Learning techniques are becoming increasingly popular across a broad range of research areas and a necessary skill for the serious data analyst. This branch of artificial intelligence relates to algorithms that learn from data based on performance measures. Supervised machine learning algorithms are trained using "labelled" data to predict the output of interest. Once the training process is complete, the model is tested on a test data set to predict the output, which is used to measure the performance of the algorithm. Based on the nature of the output task, these models can be classed as either regression or classification algorithms. This course is an introductory course with a primarily focus on the application of specific machine learning techniques, rather than the complex mathematical and statistical theory behind the algorithms.

 

Upon completion of this masterclass, you will have the skills required to confidently run a set of standard supervised machine learning tasks using the R software platform. You will have gained an understanding of when each type of model is appropriate and be able to justify the use of your model using key machine learning performance measures. The workshop is relevant to researchers and data analysts in any area of research that want to use machine learning algorithms for their research work.

 

Course syllabus: 

Day 1:

  • Fundamentals of Machine Learning
  • Machine Learning workflow
  • Different Machine Learning algorithms
  • Feature engineering
  • Common Machine Learning Metrics
  • Model tuning and over-fitting

 

Day 2:

  • Linear regression methods
  • Classification methods
  • Tree-based methods
  • Resampling techniques
  • Ensemble methods
  • Exercises

 

Day 3:

  • Review of homework and Quiz
  • Market basket analysis
  • Neural networks
  • Support vector machines
  • Use and reporting of supervised machine learning models in publications
  • Exercises
Course format: 

This course will be run online over 2 weeks with days 1 and 2 in the first week, and day 3 the following week. Homework will be provided to participants to complete over the following week, with a quiz to be completed prior to day 3.

 

Participants will require their own computers and to have loaded R and RStudio loaded onto their machines. They will also need to be able to access the internet to download R libraries. This course will be taught in the PC environment but MAC users are welcome.

 

Please note that due to the short 3-day structure, there will not be any time set aside for analysing participant’s own data.

 

Recommended Background: 

This course assumes that participants have:

  1. A basic of statistical concepts including descriptive statistics (mean, median and interquartile range), and regression analysis.
  2. A sound knowledge of using the R and RStudio software.
  3. Some familiarity with a PC/Mac environment including keyboard skills.
  4. An understanding of folder and file structures in the PC/Mac environment, and
  5. Some experience in using Microsoft Word and Excel or their equivalent.

 

Recommended Texts: 

The Elements of Statistical Learning Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman.

An Introduction to Statistical Learning with Applications in R, by Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani

Mastering Machine Learning with R - Second Edition, Advanced prediction, algorithms, and learning methods with R 3.x by Cory Lesmeister