Advanced Data Analysis: Quantitative and Qualitative Dependent Variables, Clustered and Longitudinal Data

The purpose of the course is to provide students will the ability to use SAS to analyse data from complex surveys using a range of statistical techniques appropriate to a particular research question and the nature of the response variable (e.g. normal, continuous but not normal, dichotomous, nominal, ordinal and count).

The course will provide students with a range of advanced skills for the analysis of cross-sectional data and extensions to longitudinal and other types of clustered data. The statistical techniques taught include logistic, multinomial and ordinal regression, Poisson regression, fixed and random effects models, repeated effect models (with Generalized Estimating Equations) and event history analysis.

There will be an emphasis on proposing plausible research questions, choosing the appropriate statistical approach, interpreting the coefficients, specifying different models and drawing conclusions.

Students will analyse two major Australian longitudinal studies: the Longitudinal Surveys of Australian Youth (LSAY) and the Household Income and Labour Dynamics of Australian study (HILDA).

For this course, the software package used will be SAS.

Statistical approaches covered:

1.Bivariate Basic statistical concepts (revision)

2.Bivariate Ordinary Least Squares (OLS) regression

3.Multivariate OLS regression (for normally distributed interval outcomes)

4.Logistic, Multinomial, Ordinal and Poisson regression for dichotomous, nominal, ordinal and count data.

5.Fixed effects models

6.Random effects models

7.Repeated measures models (General Estimating Equations) for continuous, dichotomous, nominal, ordinal and count data.

8.Event history analysis (Accelerated Time Failure and Proportional Hazards Models)

Emphases on:

  • Choosing the most appropriate approach given the nature of the data
  • Interpretation of the estimated coefficients
  • Modeling – i.e. building and interpreting a plausible model
  • Drawing conclusions

Substantive areas analysed:

  • PISA test scores (normally distributed)
  • University entrance performance
  • Entrance to university
  • Post-School Study
  • Earnings
  • Life Satisfaction
  • Poverty
  • Financial Stress
  • Exiting Unemployment
  • Transition to adulthood (leaving home, marriage)

Students will be given some time to do their own analyses using the techniques learnt on these outcomes (e.g. earnings, poverty) or similar outcomes (e.g. income, financial stress) or other outcomes available in the data. 

 
Level 4 - runs over 5 days
Instructor: 
Course dates: Monday 20 January 2014 - Friday 24 January 2014
Course status: Course completed (no new applicants)
Week: 
Week 1
Recommended Background: 

It is expected that students have a strong grounding in introductory statistics (for the analysis of survey data) up to and including Ordinary Least Squares (OLS) regression. Familiarity with logistic regression would be helpful. It will be assumed that students are familiar with SAS or sufficiently adept at using other statistical packages, so that writing SAS syntax for data manipulation and statistical analysis will not cause too many problems. Note that all SAS command files will be supplied.

It is not necessary for students to be familiar with the Longitudinal Surveys of Australian Youth (LSAY) or Household Income and Labour Dynamics of Australia (HILDA) studies, but it would help to know something about them or other similar youth or household panel studies.

This course is designed for social science researchers who wish to address research questions using appropriate statistical procedures for clustered (including on longitudinal) data. 

Course fees
Member: 
$1,710
Non Member: 
$3,130
Full time student Member: 
$1,710