Data Analysis in R

This course is intended for applied data analysts, including academics (and postgraduate research students), policy specialists and others. It will examine questions dealt with in public policy, the social sciences (especially politics) and industry, using real data. This includes voter surveys; economics data; and imprisonment rates in different Australian states. The unit will help build participants’ ability to undertake rigorous statistical analysis, including means, confidence intervals and linear regression in R, and create publication-standard graphs of the results. The end result will be more professional and easy to understand research.

 
Level 2 - runs over 5 days
Instructor: 

Dr Shaun Ratcliff is a political scientist, survey researcher and applied data scientist.

He is the Principal at Accent Research, where he works with clients on complex social and political research, studying how the public thinks and behaves, what influences their beliefs and actions, and ways to engage with them.

He was previously Director of Data Science at YouGov, and before that, a Lecturer at the US Studies Centre at the University of Sydney where he remains an Honorary Associate and continues to teach data science.

He has a PhD in Political Science from Monash University.

Course dates: Monday 3 July 2017 - Friday 7 July 2017
Course status: Course completed (no new applicants)
Week: 
Week 2
About this course: 

R is open source and free. It is flexible, powerful and intuitive and it is excellent for data visualisation. As it is open source, R has thousands of developers in leading universities, corporate research labs and other institutions across the world. This means its capabilities tend to exceed competing software, with new packages added or updated daily. This is particularly the case for data visualisation, in which R tends to lead the pack. As there is no licence, you can take it with you wherever you go. No matter where you work, you don't have to change software packages when you change employers. Consequently, R has becoming increasingly popular for academic research, economics analysis and public policy development. This trend is only likely to continue. Becoming skilled in R will help build your personal capabilities and employment opportunities by making you a more flexible worker capable of undertaking analysis many other researchers and analysts cannot.

 

No prior experience with R, or any sophisticated quantitative methods are required for this course. Participants should be computer literate and use data in their occupations (or study, if they are a student) and understand some of the basics pf statistics, including what is the mean, the median and the standard deviation. Some basic knowledge with regression is helpful, as is the ability to do simple coding.

 

This is a course for subject matter experts who want to use more quantitative analysis in their work. By the end of the week you will be able to better conduct basic descriptive analysis and regression in R, and will be able to create impressive looking graphs.

 

If you are unsure whether this is for you, please contact Shaun for more information. He can talk you through the course and the kinds of things you will cover.

Course syllabus: 

Day 1

Getting started – loading and cleaning your data and making professional graphs.

R is excellent for conducting simple yet effective analyses of data. In particular, it is useful in graphing descriptive data such as trends in unemployment and public opinion. We will look at plotting data to provide you with methods you can use in your work or your research.
 

The course starts with instructions on how to access and re-code data in R, and then calculate descriptive statistics. We will then cover graphing means and variance so you can better understand the structure of your data. The first day of the unit will also include how to plot the trends of multiple indicators (for instance, unemployment), and public opinion data in a way that looks professional and sophisticated, with just a few a few lines of code.

 

 

Day 2

Understanding your data

We then build on the work of the first day by running more complex descriptive analyses on public opinion on immigration and other policy issues. This includes examining the probabilities of voters holding certain preferences on policy issues, but also trends in these attitudes. We also learn how to break public opinion down by different population groups, such as younger and older voters.

 

For each of these steps we look at graphing these results, including overlaying trend lines and confidence intervals over the original data.

 

Day 3

Getting started with linear regression

Sometimes you need to do more than look at the descriptive data. For instance, there may be confounding factors, such as the effects of economic, political and demographic influences that impact on policy outcomes. Or there may be certain demographic characteristics of voters related to their preferences for certain policies. We can control for these and learn far more from our data using simple linear regression.

 

We follow up our descriptive examinations from the first two days by learning how to fit linear regression models to these data, and plotting the regression line over the original data to examine model fit. This allows us to examine how different variables influence outcomes we might be interested in.

 

Day 4

More on regression

On day four we will look at fitting more complex linear regression models, including interactions, and using a variety of datasets. We will also look at graphing the regression line of our interactions, and the model coefficients, to make our results clearly understandable. We will also look at plotting the residuals from the regressions to check our model fits.
 

Many social science issues are not linear, however, but instead involve probabilities or non-linear outcomes. On the afternoon of the fourth day of this course, we will look at some alternative ways to examine your data.
 

First, we will predict the probabilities of vote choices in Australian federal elections using logistic regressions fit to survey data; and plotting the predicted probabilities from these models onto the original data. This will help us establish what kinds of voters support different parties, and why

 

Day 5

Bringing it all together
On the final day we will explore some slightly more complex regression models and look at graphing estimates and predictions from the model outputs.
 

There will also be the option for you to provide your own data and we can look at the best ways to analyse it using R, and plotting results.
One-on-one consultations will be undertaken in the afternoon to go over specific parts of the course participants want further advice on.

 

Course format: 

This will be held in a classroom. Course participants will require a laptop for this course with R installed. ACSPRI staff and the course instructor will be able to help with this in the weeks leading up to the course.

 

Data and course notes will be provided (although there are options to use your own data on day 5).

Recommended Background: 

No prior experience with R, or any sophisticated quantitative methods are required for this course. Participants should be computer literate and use data in their occupations (or study, if they are a student) and understand some of the basics of statistics, including what is the mean, the median and the standard deviation. Some basic knowledge with regression is helpful, as is the ability to do simple coding.

Course fees
Member: 
$2,100
Non Member: 
$3,800
Full time student Member: 
$1,980
Participant feedback: 

 

 

 

Program: 
Winter Program 2017
Notes: 

Instructor's bound course notes will be provided.