Data Analysis in R

This course is intended for applied data analysts, including academics (and postgraduate research students), policy specialists and others. It will examine questions dealt with in public policy, the social sciences (especially politics) and industry, using real data. This includes voter surveys; economics data; and imprisonment rates in different Australian states. The unit will help build participants’ ability to undertake rigorous statistical analysis, including means, confidence intervals and linear regression in R, and create publication-standard graphs of the results. The end result will be more professional and easy to understand research.

Level 2 - runs over 5 days

Dr Shaun Ratcliff is a quantitative political scientist working at the United States Studies Centre, the University of Sydney.

His academic research focuses on the issue preferences and behaviour of political actors in the United States and Australia, including voters, interest groups and elites, and the role of parties as interest aggregators. Recently his focus has been on citizens’ attitudes towards COVID-19 and government responses to the pandemic. At the University of Sydney he has taught data science, public opinion, research methods and the use of data in politics in the US Studies Centre, the Department of Government, and the Faculty of Engineering and IT.

He has a PhD in political science from Monash University, and has previously worked at the University of Melbourne and Monash. He has also held government and media relations roles, and provided statistical and political consulting services, for industry associations, trade unions and political campaigns.

For further details visit his website

Course dates: Monday 30 January 2017 - Friday 3 February 2017
Course status: Course completed (no new applicants)
Week 2
About this course: 

R is open source and free. It is flexible, powerful and intuitive and it is excellent for data visualisation. As it is open source, R has thousands of developers in leading universities, corporate research labs and other institutions across the world. This means its capabilities tend to exceed competing software, with new packages added or updated daily. This is particularly the case for data visualisation, in which R tends to lead the pack. As there is no licence, you can take it with you wherever you go. No matter where you work, you don't have to change software packages when you change employers. Consequently, R has becoming increasingly popular for academic research, economics analysis and public policy development. This trend is only likely to continue. Becoming skilled in R will help build your personal capabilities and employment opportunities by making you a more flexible worker capable of undertaking analysis many other researchers and analysts cannot.


No prior experience with R, or any sophisticated quantitative methods are required for this course. Participants should use data in their occupations (or study if they are a student) and understand some of the basics (what is the mean, the median the standard deviation, for instance; and some basic knowledge with regression is helpful).


This is a course for subject matter experts who want to use more quantitative analysis in their work. By the end of the week you will be able to better conduct basic descriptive analysis and regression in R, and will be able to create impressive looking graphs.

Course syllabus: 

Day 1

Getting started – loading and cleaning your data and making professional graphs.
R is excellent for conducting simple yet effective analyses of data. In particular, it is useful in plotting descriptive data such as trends in unemployment, oil prices and crime rates. We will look at plotting data to provide you with methods you can use in your work or your research.
The course starts with instructions on how to access and re-code data in R, calculate descriptive statistics. We will then cover graphing means and variance so you can better understand the structure of your data. The first day of the unit will also include how to plot the trends of multiple indicators (for instance, several economic indicators such as unemployment, inflation, consumer confidence, oil prices, building starts, job vacancies and business investment, as well as social issues such as immigration and imprisonment rates) in a way that looks professional and sophisticated, with just a few a few lines of code.


Day 2

Understanding your data, and getting started with regression
We then build on the work of the first day by running more complex descriptive analyses on public opinion on immigration and other policy issues. This includes examining the probabilities of voters holding certain preferences on policy issues, but also trends in these issues. We also look at graphing these results, including overlaying trend lines and confidence intervals over the original data.
However, sometimes you need to do more than look at the descriptive data. For instance, there may be confounding factors, such as the effects of economic, political and demographic factors on policy outcomes. Or there may be certain demographic characteristics of voters related to their preferences for certain policies. We can control from these and learn far more from our data using simple linear regression. We follow up our descriptive examination of voters’ policy attitudes by learning how to fit linear regression models to these data, and plotting the regression line over the original data to examine model fit. This allows us to examine how attitudes towards immigration (and other policy issues) have changed over time, and how these relate to voters socioeconomic backgrounds.


Day 3

Learning more about linear regression in R
On day three of we will look at fitting more complex linear regression models, including interactions, and using a variety of datasets. We will also look at graphing the regression line of our interactions, and the model coefficients, to make our results clearly understandable. We will also look at plotting the residuals from the regressions to check our model fits.


Day 4

Logistic regression in R
Many social science issues are not linear, however, but instead involve probabilities or non-linear outcomes. On the fourth day of this course, we will look at some alternative ways to examine your data.
First, we will predict the probabilities of vote choices in Australian federal elections using logistic regressions fit to survey data; and plotting the predicted probabilities from these models onto the original data. This will help us establish what kinds of voters support different parties, and why. We will then look at estimating count data (in this case imprisonment rates in different Australian states) using Poisson regression.


Day 5

Bringing it all together
On the final day we will explore some slightly more complex regression models and look at graphing estimates and predictions from the model outputs.
There will also be the option for you to provide your own data and we can look at the best ways to analyse it using R, and plotting results.
One-on-one consultations will be undertaken in the afternoon to go over specific parts of the course participants want further advice on.

Course format: 

This will be held in a classroom. Course participants will require a laptop for this course with R installed. ACSPRI staff and the course instructor will be able to help with this in the weeks leading up to the course.


Data and course notes will be provided (although there are options to use your own data on day 5).

Recommended Background: 

ACSPRI course Fundamentals of Statistics or a similar level basic understanding of statistical analysis.

Course fees
Non Member: 
Full time student Member: 
Participant feedback: 

I already had some knowledge of the topic, but there was plenty of content that was new and useful. (Winter 2016)


I can see its capability for more sophisticated data analyses. (Winter 2016)


Instructor's bound course notes will be provided.