# Course Title: Analysis of Large Data Sets

## Part A: Course Overview

Course Title: Analysis of Large Data Sets

Credit Points: 12.00

### Teaching Period(s)

MATH1319

City Campus

145H Mathematical & Geospatial Sciences

Face-to-Face

Sem 2 2009,
Sem 2 2010,
Sem 2 2013,
Sem 1 2015,
Sem 2 2016

Course Coordinator: Dr. Yan Wang

Course Coordinator Phone: +61 3 99252381

Course Coordinator Email: yan.wang@rmit.edu.au

Pre-requisite Courses and Assumed Knowledge and Capabilities

It is recommended students are familiar with elementary statistics knowledge on sampling distribution, estimation and hypothesis test; and regression modelling on simple linear regression, multiple linear regression and preferable logistic regression. Students should be acquainted with using Microsoft Windows and have some exposure to Windows-based statistics packages such as Minitab, SPSS etc. Previous exposure to a programming language, such as R, Matlab, SAS or Python, is useful but not required.

Course Description

This course will focus on analytical tools for predictive modelling. Predictive modelling involves a collection of mathematical techniques having in common the goal of finding a relationship between a target/response/dependent variable and various predictor/independent variables, in order to make a future prediction of the target variable, by feeding the observed predictors into the mathematical relationship.

This course covers the methodologies that are commonly used in predictive modelling, including decision tree, multiple regression, logistic regression and neural network. It also develops your skills in predictive modelling with R, and in assembling analysis flow diagrams, using the rich tool set of SAS Enterprise Miner for predictive modelling.

Objectives/Learning Outcomes/Capability Development

On completion of this course you should be able to:

1. Conduct exploratory data analysis using R and SAS Enterprise Miner exploration tools
2. Build up predictive models using R and SAS Enterprise Miner tools, such as Decision Tree, Regression and Neural Network;
3. Select and justify appropriate model assessment criteria and compare performance across different models;
4. Pursue further studies in large data set analysis and related areas.

This course contributes to the following Program Learning Outcomes for MC004 Master of Statistics and Operations Research and MC242 Master of Analytics:

Personal and professional awareness

• the ability to reflect on experience and improve your own future practice
• the ability to apply the principles of lifelong learning to any new challenge.

Knowledge and technical competence

• an understanding of appropriate and relevant, fundamental and applied mathematical and statistical knowledge, methodologies and modern computational tools.

Problem-solving

• the ability to bring together and flexibly apply knowledge to characterise, analyse and solve a wide range of problems
• an understanding of the balance between the complexity / accuracy of the mathematical / statistical models used and the timeliness of the delivery of the solution.

Teamwork and project management

• the ability to constructively engage with other team members and resolve conflict.

Communication

• the ability to effectively communicate both technical and non-technical material in a range of forms (written, electronic, graphic, oral) and to tailor the style and means of communication to different audiences. Of particular interest is the ability to explain technical material, without unnecessary jargon, to lay persons such as the general public or line managers.

Information literacy

• the ability to locate and use data and information and evaluate its quality with respect to its authority and relevance.

Overview of Learning Activities

The course will be delivered through a combination of face-to-face lectures and computer lab practice. While attendance at weekly lectures is beneficial, there is an expectation that you will spend more time out of class on this course, in particular developing your skills with the package R and SAS Enterprise Miner. Assessments will be carried out on a regular basis to check your understanding of concepts and provide feedback and additional information. The course is supported by the Blackboard learning management system.

Your will undertake 3 hours of face-to-face learning every week through lecture/lab sessions. Meanwhile an average of 3-6 hours/per week of independent study is recommended.

Overview of Learning Resources

A list of recommended textbooks for this course will be provided on Blackboard. All course materials, including lecture notes, lab exercises, practical exercises, assignments will be posted on Blackboard LMS.
The statistical package SAS Enterprise Miner can be accessed from the School computer labs, as well as through the RMIT MyDesktop system anywhere and anytime.
Library Subject Guide for Mathematics & Statistics http://rmit.libguides.com/mathstats

Overview of Assessment

This course has no hurdle requirements.