Applied Multivariate Analysis using Stata

Multivariate statistics provide researchers with the ability to analyse complex data sets. It allows them the ability to plot large sets of data, reduce the number of variables, predict and identify groups of inter-related variables, and detect natural groups of observations. The aim of the course is to provide the participants with understanding multivariate analysis sufficient to determine the appropriate technique for a given problem, format data as required for analysis, run the analysis using the Stata statistical program, and interpret the results.

This course will cover the following multivariate techniques:

1) Multiple Regression: Multiple regression analysis is often used to model the relationship between a single dependent interval variable with several varying types of independent variables. This technique is often used in economics for prediction and forecasting (e.g. national economy), and in social research for evaluating what determines an effective program (e.g. the best predictors of success in high-school), or determining which personality variable best predicts a social trait

2) Logistic Regression: Logistic regression is used when there is a binary dependent variable and several varying types of independent variables. Logit analysis is used to predict the probability of an event in the dependent variable. This analysis is used widely in health research where the dependent variable is the outcome of a disease or health condition (e.g. lung cancer) or in social research where the outcome is a certain event (e.g. employment status).

3) Canonical correlation: Canonical correlation is used to investigate the relationship between two sets of variables. One set contains two or more dependent variables and the other set contains two or more independent variables. For example, it has been used to investigate the relationship between a number of risk factors to a group of symptoms in social research.

4) Discriminant analysis: Discriminant analysis is used to study the differences between two or more groups with respect to several variables simultaneously. It can be used to understand differences in groups so as to predict the likelihood that an individual belongs to a certain group. For example, investigating which background variables discriminate between patients likely to recover fully, partially or not at all.

5) Principal components and factor analysis: Principal components analysis is an exploratory technique used to produce a smaller number of artificial variables (called principal components) that will account for most of the variance in the originally observed variables. It is also often used to uncover unknown trends in data. The principal components may then be used as predictor or criterion variables in subsequent analyses. For example, a large number of highly correlated measures for job satisfaction can be transformed into a smaller set of uncorrelated principal components that are then used for subsequent analysis (e.g. regression analysis).

6) Exploratory Factor analysis: Exploratory Factor analysis is used to obtain distinct new variables of factors. Factor analysis looks at the interrelationships among a large number of variables and explains them in terms of their underlying factors or dimensions. This technique is often used in social science to measure a trait that cannot be measured directly (e.g. self-esteem).

7) Cluster analysis: Cluster analysis is an exploratory technique that uses a number of different algorithms and methods to combine observations into previously unknown mutually exclusive natural groups or clusters based on specific similarities. For example, social researchers have used this technique to produce unique groups based on socio-economic profiles.

8) Multidimensional scaling: Multidimensional scaling for two way data is a data dimension-reduction and visualization technique that looks at dissimilarities between observations based on certain characteristics. Distance measures of similarity and dissimilarity are used to produce graphs of relative positioning. For example, researchers have reviewed how close American universities are to each other, reviewing the differences between private and public universities.

9) Correspondence analysis: Simple correspondence analysis provides graphical representations of two-way frequency tables to improve the researcher’s understanding of any similarities and associations between the variables. Thus, it is especially good for the analysis of large contingency tables. For example, it could be used to investigate various crimes across the different states.

10) Survival analysis: Survival analysis data deals with the outcome being the waiting time until the occurrence of a well-defined event. Observations are censored, in the sense that for some units the event of interest has not occurred at the time the data are analysed and explanatory variables are used to control for the effect on the waiting time. The point of survival analysis is to follow subjects over time and observe at which point in time they experience the event of interest (e.g. cancer). Survival analysis is often referred to as time to event analysis, mainly used in biomedical sciences where the interest is in observing time to death. However, over the past few years this analysis has been extended to other areas of research such as the social sciences (e.g. forensic analysis, employment analysis, marriage) and even engineering sciences (e.g. failure time analysis).

Sample datasets will be provided, but participants are encouraged to bring some of their own Stata data for analysis. Teaching and practice will be closed and integrated, and individual assistance will be provided as needed.

Level 3 - runs over 5 days

Instructor:

Dr Joanna Dipnall is an applied statistician with particular interests in the advanced statistical methods and machine and deep learning techniques. She completed her Honours in Econometrics with Monash University and PhD with IMPACT SRC, School of Medicine, Deakin University. Joanna works extensively with registry and linked medical data and collaborates extensively with the Faculty of IT at Monash to supervise Masters and PhD students to integrate AI within health research. Joanna teaches within the Monash Biostatistics Unit and is the Unit Coordinator for the Monash Masters of Health Data Analytics course. Joanna has taught advanced statistical methods for many years at universities and for ACSPRI.

Course dates: Monday 7 July 2014 - Friday 11 July 2014

Course status: Course completed (no new applicants)

Venue:

University of Queensland, St Lucia Campus

Week:

Week 2

Recommended Background:

Participants should have completed an intermediate statistics course covering at least some of the syllabus of “Data Analysis Using Stata”. Stata will be available, and experience with Stata will be assumed (e.g. use of Stata’s Do files).

Recommended Texts:

Course notes will be supplied

Course fees

Member:

$1,800

Non Member:

$3,230

Full time student Member:

$1,800

Supported by:

Stata is a registered trademark of StataCorp LP, College Station, TX, USA, and the Stata logo is used with the permission of StataCorp

Event	Dates
Master-class January 2026: Thriving in the age of AI: Online	28/01/2026
Summer Program 2026	19/01/2026 - 13/02/2026
ANU Online Summer School in Political Analysis	02/02/2026 - 20/02/2026
Workshop March 2026: NVivo Essentials: Online	24/03/2026 - 26/03/2026
Master-class March 2026: Qualitative Interviewing: Online	26/03/2026 - 27/03/2026
Master-class April 2026: Discourse Analysis: Online	09/04/2026 - 10/04/2026
Master-class April 2026: Questionnaire Design: Online	16/04/2026 - 17/04/2026
Workshop May 2026: LimeSurvey Web Surveys: Online	21/05/2026
Master-class June 2026: Writing Qualitatively: Online (2-days)	03/06/2026 - 04/06/2026
Master-class September 2026: Data Visualisation: Online	09/09/2026 - 11/09/2026

Applied Multivariate Analysis using Stata

Upcoming Events

Shopping cart

Applied Multivariate Analysis using Stata

Subscribe

Upcoming Events

Shopping cart