Data Analysis using Stata

The aim of the course is to provide the participants with understanding and experience to undertake a basic research project in the social or health sciences using Stata as the statistical tool. Stata is a comprehensive integrated package for data management, analysis and graphics. Stata version 13 has a comprehensive GUI interface. The course will be presented in a way that introduces beginners to survey research and at the same time extends the capabilities of more experienced researchers. Sample datasets will be provided, but participants are encouraged to bring some of their own data for analysis in Excel or ASCII format. Teaching and practice will be closed integrated, and individual assistance will be provided as needed.

Preparing Stata datasets. Introduction to the Stata system. Data analysis and session management. Looking at Stata datasets. Sources of help. Basic commands. Modifying data, editing, recoding, checking and tidying. Stata do-files (syntax files). Generating new variables. Inputting data into Stata. Introduction to Stata graphics. Outputting results to Word etc. Handling strings and dates. Handling missing data.
Starting the analysis. Initial univariate analysis: frequency distributions, exploratory data analysis. Initial bivariate analysis: cross-tabulations, correlations. t-tests and analysis of variance. Developing scales and indices: summated scales, factor analysis, alpha coefficient. More graphics including scatterplots, box plots.

Regression analysis. Introduction to regression analysis: ordinary least squares. Checking assumptions with regression diagnostics. More graphics including regression diagnostics. Basic introduction to logistic regression.
Analysis of survey data. Introduction to sampling for surveys. Weighting observations. Analysis of survey data.
Sundry topics. Advanced dataset management: merge, collapse, reshape. Additional (user-contributed) Stata procedures. Checking and archiving data.

Level 2 - runs over 5 days

Dr Joanna Dipnall is a biostatistician with the School of Public Health and Preventative Medicine (SPHPM) at Monash University and Honorary Research Fellow with School of Medicine at Deakin University. She holds a B.Ec(Honours) from Monash University, and a PhD from the School of Medicine at Deakin University. She also lectures and tutors with the Department of Statistics, Data Science and Epidemiology at Swinburne University. Joanna has developed a novel Risk Index for Depression (RID) utilising SEM and machine learning techniques that brought together five key determinants of depression. She has been a teacher of Stata software for over 15 years, training across Australia and overseas and was a member of the Scientific Committee for the Oceania Stata Users Group Meeting in 2017.

Course dates: Monday 30 June 2014 - Friday 4 July 2014
Week 1
Course status: Course completed (no new applicants)
Recommended Background: 

This course assumes that participants have (1) reasonable understanding of statistics to be able to comprehend the material covered in the course outline above (e.g. regression analysis) (2) some familiarity with a PC environment including keyboard skills and understanding of folder and file structures, (3) some experience in using Microsoft Word and Excel or their equivalent (4) some experience using a text editor such as Notepad, UltraEdit. It does not assume prior experience with Stata, SAS, SPSS or any other specific statistical packages although any such experience would be helpful.

Recommended Texts: 

Course notes will be supplied. No specific references are suggested although participants are encouraged to bring any Stata documentation they may have. For an overview of the Stata package, please visit or


Course notes are provided.

Supported by: 

Stata logo

Stata is a registered trademark of StataCorp LP, College Station, TX, USA, and the Stata logo is used with the permission of StataCorp