This course is designed as an applied introduction to the use of the Stata software for Machine Learning (ML) techniques.
This course will be offered online via Zoom And will run to the following timetable:
9.30am -11.00am: Instructional Zoom session
11.00am-11.30am: Break
11.30am-12.30pm: Instructional Zoom session
12:30pm-1.30pm: Lunch
1.30pm-3.00pm: Instructional Zoom session
3.00pm-3.30pm: Break
3.30pm-4.30pm: Exercises
Please note: Courses will run on Australian Eastern Daylight Time (GMT +11)
(ie Melbourne, Sydney, Canberra daylight savings time)
Dr Joanna Dipnall is a biostatistician with the School of Public Health and Preventative Medicine (SPHPM) at Monash University and Honorary Research Fellow with School of Medicine at Deakin University. She holds a B.Ec(Honours) from Monash University, and a PhD from the School of Medicine at Deakin University. She also lectures and tutors with the Department of Statistics, Data Science and Epidemiology at Swinburne University. Joanna has developed a novel Risk Index for Depression (RID) utilising SEM and machine learning techniques that brought together five key determinants of depression. She has been a teacher of Stata software for over 15 years, training across Australia and overseas and was a member of the Scientific Committee for the Oceania Stata Users Group Meeting in 2017.
Machie Learning techniques are becoming increasingly popular across areas of research from computer science to various disciplines of medicine. This branch of artificial intelligence relates to algorithms that learn from data based on specific tasks and performance measures. This course is an introductory applied course, using Stata software to run various ML algorithms. This course will use some Stata commands that are built into the base system and others that have been specially designed user-written commands that have evolved from the increasing use of ML. Classification, prediction and model selection issues will be discussed. Detailed notes with worked examples and references will be provided as a basis for both the lecture and hands-on computing aspect of the course.
Please note that this course will use Stata V16.
This course primarily focusses on the application of specific ML techniques rather than the complex mathematics behind the ML algorithms and is broken up into Five Parts:
Part I: Fundamentals of Machine Learning
Part II: Machine Learning Techniques and Work Flow
Part III: Decision Trees & Random Forests
Part IV: Boosted regression
Part V: Support Vector Machines
Part VI: Lasso regression
At the end of each day, participants will be given time to do some ML exercises on their own to practise what they have learned.
This workshop will take place in a classroom. You will need to bring your own laptop with Stata. If you don't have a copy of Stata, please let us know in advance and we will organise a trial version for the course.
This course assumes that participants have (1) Familiarity with the Stata command language (2) sufficient understanding of statistics to be able to comprehend the material covered in the course outline, such as a basic grounding in multiple regression (e.g. linear, logistic, Poisson) and clustering techniques (e.g. Principal components analysis, k-means clustering) (3) access to Stata V16 (4) some experience in using Microsoft Word and Excel or their equivalent (5) experience using a text editor such as Notepad.
Course notes will be supplied.
No specific references are suggested but a number will be supplied with the notes handed out for the course.
Stata is distributed in Australia and New Zealand by Survey Design and Analysis Services.