Course Title: Analysis of Large Data Sets
Part A: Course Overview
Course Title: Analysis of Large Data Sets
Credit Points: 12.00
145H Mathematical & Geospatial Sciences
|Sem 2 2016|
Course Coordinator: Dr. Yan Wang
Course Coordinator Phone: +61 3 9925 2381
Course Coordinator Email: firstname.lastname@example.org
Course Coordinator Location: 8.9.34
Pre-requisite Courses and Assumed Knowledge and Capabilities
It is recommended students are familiar with elementary statistics knowledge on sampling distribution, estimation and hypothesis test; and regression modelling on simple linear regression, multiple linear regression and preferable logistic regression. Students should be acquainted with using Microsoft Windows and have some exposure to Windows-based statistics packages such as Minitab, SPSS etc. Previous exposure to a programming language, such as R, Matlab, SAS or Python, is useful but not required.
With the explosion of “Big Data” problems, statistical learning/machine learning has become a very hot field in many scientific areas as well as marketing, finance, and other business disciplines. People with statistical analytics skills are in high demand.
This course will focus on analytical tools for predictive modelling, which refers to a collection of mathematical techniques having in common the goal of finding a relationship between a target/response/dependent variable and various predictor/independent variables, in order to make future prediction of the target variable by feeding the observed predictors into the mathematical relationship.
This course covers the methodologies that are commonly used in predictive modelling, including decision tree, logistic regression and neural network. It also delivers skills in predictive modelling with R, and in assembling analysis flow diagrams using the rich tool set of SAS Enterprise Miner for predictive modelling.
Please note that if you take this course for a bachelor honours program, your overall mark in this course will be one of the course marks that will be used to calculate the weighted average mark (WAM) that will determine your award level. (This applies to students who commence enrolment in a bachelor honours program from 1 January 2016 onwards. See the WAM information web page for more information.)
Objectives/Learning Outcomes/Capability Development
This course contributes to the Program Learning Outcomes for BP245 Bachelor of Science (Statistics); BP083 Bachelor of Science (Mathematics); and BH119 (Bachelor of Analytics (Honours):
Knowledge and technical competence
- an understanding of appropriate and relevant, fundamental and applied mathematical and statistical knowledge, methodologies and modern computational tools.
- the ability to bring together and flexibly apply knowledge to characterise, analyse and solve a wide range of problems
- an understanding of the balance between the complexity / accuracy of the mathematical / statistical models used and the timeliness of the delivery of the solution.
- the ability to effectively communicate both technical and non-technical material in a range of forms (written, electronic, graphic, oral) and to tailor the style and means of communication to different audiences. Of particular interest is the ability to explain technical material, without unnecessary jargon, to lay persons such as the general public or line managers..
- the ability to locate and use data and information and evaluate its quality with respect to its authority and relevance.
- develop the cognitive skills to review critically, analyse, consolidate and synthesise knowledge to identify and provide solutions to complex problems with intellectual independence.
On completion of this course you should be able to:
- Conduct exploratory data analysis using SAS Enterprise Miner exploration tools
- Build up predictive models using R and SAS Enterprise Miner tools, such as Decision Tree, Regression and Neural Network;
- Select and justify appropriate model assessment criteria and compare performance across different models;
- Pursue further studies in large data set analysis and related areas.
Overview of Learning Activities
The course will be delivered through a combination of face-to-face lectures and computer lab practice. While attendance at weekly lectures is beneficial, there is an expectation that you will spend more time out of class on this course, in particular on the practice of the package R and SAS Enterprise Miner. Assessment will be distributed on a regular basis to check your understanding of concepts and to provide additional information. The course is supported by the Blackboard learning management system.
Your will undertake 4 hours per week of face-to-face learning through lecture/lab sessions. Meanwhile it is recommended that an average of 4-6 hours/week of independent study is expected.
Overview of Learning Resources
A list of recommended textbooks for this course is provided on Blackboard.
All course materials, including lecture notes, lab exercises, practical exercises, assignments will be posted on Blackboard LMS.
The statistical package SAS Enterprise Miner can be accessed from the school computer labs, as well as through the RMIT MyDesktop system at anywhere and anytime.
A Library subject guide is available at: http://rmit.libguides.com/mathstats
Overview of Assessment
This course has no hurdle requirements.
Assessment Task 1: Assignments
This assessment task supports CLOs 1, 2, 3 & 4
Assessment 2: Tests
This assessment task supports CLOs 1, 2 & 3.
Assessment 3: Examination
This assessment supports CLOs 1, 2 & 3.