Course Title: Data Preprocessing

Part A: Course Overview

Course Title: Data Preprocessing

Credit Points: 12.00

Course Coordinator: Dr. Anil Dolgun

Course Coordinator Phone: +61 3 9925 2526

Course Coordinator Email:

Course Coordinator Location: 8.9.23

Course Coordinator Availability: By appointment, by email

Pre-requisite Courses and Assumed Knowledge and Capabilities

A working knowledge of basic mathematics and familiarity with computers.

Course Description

Real-world data are commonly incomplete, noisy, and inconsistent. This course will cover a wide range of topics designed to equip you with the skills needed to prepare all forms of untidy data for statistical analysis. The course will cover the core concepts of data preprocessing, namely tidy data, data integration, data cleaning, data transformation, data standardisation, data discretisation, and data reduction. You will develop and apply your data preprocessing skills to complex, noisy, and inconsistent real world data using leading open source software.

This course includes a Work Integrated Learning experience in which your knowledge and skills will be applied and assessed in a real workplace context. Any or all of these aspects of a WIL experience may be simulated.

Objectives/Learning Outcomes/Capability Development

On completion of this course you should be able to:


  1. Critically reflect upon different data sources, types, formats and structures.
  2. Apply data integration techniques to import and combine different sources of data.
  3. Apply different data manipulation techniques to recode, filter, select, split, aggregate, and reshape the data into a format suitable for statistical analysis.
  4. Justify data by detecting and handling missing values, outliers, inconsistencies and errors.
  5. Demonstrate practical experience by having been exposed to real data problems.
  6. Effectively use leading open source software for reproducible, automated data preprocessing.

This course contributes to the following Program Learning Outcomes for BP330, Bachelor of Space Science.

Understanding science and engineering

  • You will demonstrate an understanding of the scientific method and engineering fundamentals and an ability to apply them  in practice.

Knowledge and technical competence

  • You will have broad knowledge in space science and technology with deep knowledge in its core concepts.
  • You will have knowledge in at least one discipline other than your primary discipline and some understanding of interdisciplinary linkages.

Inquiry and Problem Solving

  • You will be able to choose appropriate tools and methods to solve scientific problems within your area of specialisation.
  • You will demonstrate well-developed problem solving skills, applying your knowledge and using your ability to think analytically and creatively.

Information literacy

  • You will develop a capacity for independent and self-directed work.
  • You will work responsibly, safely, legally and ethically.
  • You will develop an ability to work collaboratively.

Overview of Learning Activities

Course learning activities take place both online and face-to-face. Online course notes and materials replace traditional lectures and labs. Face-to-face class time is mainly used for hands-on demonstrations of concepts and software use and working in groups on module exercises and problems. You will develop your data preprocessing skills through the completion of regular skill-building exercises and assignments that consolidate learning and prepare for the final exam.

Total study hours

You will undertake 3 hours per week of face-to-face learning in class. In addition to the weekly classes, you are expected to spend approximately another six hours per week on activities related to this course. These activities include reading and practicing online course material, completing skill-building exercises and assignments, and preparing for assessments.

Overview of Learning Resources

There are no prescribed texts for this course. All course content, notes, learning materials and data sets will be available through the course website and Canvas LMS. A list of recommended textbooks for this course will also be provided.

You are highly recommended to bring a portable computing device to class, preferably a laptop, with Wi-Fi access to the RMIT University network. You will also require open source software used in the course to be installed on your personal computing device.

Overview of Assessment

This course has no hurdle requirements.

Assessment tasks

Assessment Task 1:  Assignment 1

Early semester assignment

Weighting 10%

This assessment task supports CLOs 1 & 2

Assessment Task 2:  Assignment 2

Mid-semester assignment.

Weighting 20%

This assessment task supports CLOs 1, 2, 3 & 4.

Assessment Task 3:  Assignment 3

Final assignment.

Weighting 30%

This assessment task supports CLOs 1, 2, 3, 4, 5 and 6.

Assessment Task 4: Final Examination

A two-hour final examination during the exam period

Weighting 40%

This assessment task supports CLOs 1, 2, 3, 4, and 5.