Course Title: Data Preprocessing
Part A: Course Overview
Course Title: Data Preprocessing
Credit Points: 12.00
171H School of Science
|Sem 1 2018|
Course Coordinator: Dr. Anil Dolgun
Course Coordinator Phone: +61 3 9925 2526
Course Coordinator Email: email@example.com
Course Coordinator Location: 8.9.23
Course Coordinator Availability: By appointment
Pre-requisite Courses and Assumed Knowledge and Capabilities
A working knowledge of basic mathematics and familiarity with computers.
Real-world data are commonly incomplete, noisy, and inconsistent. This course will cover a wide range of topics designed to equip you with the skills needed to prepare all forms of untidy data for statistical analysis. The course will cover the core concepts of data preprocessing, namely tidy data, data integration, data cleaning, data transformation, data standardisation, data discretisation, and data reduction. You will develop and apply your data preprocessing skills to complex, noisy, and inconsistent real world data using leading open source software.
Objectives/Learning Outcomes/Capability Development
This course contributes to the following Program Learning Outcomes for MC004 Master of Statistics and Operations Research and MC242 Master of Analytics:
Personal and professional awareness
- the ability to contextualise outputs where data are drawn from diverse and evolving social, political and cultural dimensions
- the ability to reflect on experience and improve your own future practice
- the ability to apply the principles of lifelong learning to any new challenge.
Knowledge and technical competence
- an understanding of appropriate and relevant, fundamental and applied mathematical and statistical knowledge, methodologies and modern computational tools.
- the ability to bring together and flexibly apply knowledge to characterise, analyse and solve a wide range of problems
- an understanding of the balance between the complexity / accuracy of the mathematical / statistical models used and the timeliness of the delivery of the solution.
- the ability to locate and use data and information and evaluate its quality with respect to its authority and relevance.
On completion of this course you should be able to:
- Critically reflect upon different data sources, types, formats and structures.
- Apply data integration techniques to import and combine different sources of data.
- Apply different data manipulation techniques to recode, filter, select, split, aggregate, and reshape the data into a format suitable for statistical analysis.
- Justify data by detecting and handling missing values, outliers, inconsistencies and errors.
- Demonstrate practical experience by having been exposed to real data problems.
- Effectively use leading open source software for reproducible, automated data preprocessing.
Overview of Learning Activities
Course learning activities take place both online and face-to-face. Online course notes and materials replace traditional lectures and labs. Face-to-face class time is mainly used for hands-on demonstrations of concepts and software use and working in groups on module exercises and problems. You will develop your data preprocessing skills through the completion of regular skill-building exercises and assignments that consolidate learning and prepare for the final exam.
Total study hours
You will undertake 3 hours per week of face-to-face learning in class. In addition to the weekly classes, you are expected to spend approximately another six hours per week on activities related to this course. These activities include reading and practicing online course material, completing skill-building exercises and assignments, and preparing for assessments.
Overview of Learning Resources
There are no prescribed texts for this course. All course content, notes, learning materials and data sets will be available through the course website and Canvas LMS. A list of recommended textbooks for this course will also be provided.
You are highly recommended to bring a portable computing device to class, preferably a laptop, with Wi-Fi access to the RMIT University network. You will also require open source software used in the course to be installed on your personal computing device.
Overview of Assessment
This course has no hurdle requirements.
Assessment Task 1: Assignment 1
Early semester assignment.
This assessment task supports CLOs 1 & 2
Assessment Task 2: Assignment 2
Mid semester assignment.
This assessment task supports CLOs 1, 2, 3 & 4.
Assessment Task 3: Assignment 3
This assessment task supports CLOs 1, 2, 3, 4, 5 and 6.
Assessment Task 4: Final Examination
A two-hour final examination during the exam period
This assessment task supports CLOs 1, 2, 3, 4, and 5.