Course Title: Data Preprocessing

Part A: Course Overview

Course Title: Data Preprocessing

Credit Points: 12.00


Course Code




Learning Mode

Teaching Period(s)


City Campus


171H School of Science


Sem 1 2021,
Sem 1 2022,
Sem 2 2022,
Sem 1 2023,
Sem 2 2023

Course Coordinator: Dr. Sona Taheri

Course Coordinator Phone: +61 3 9925 2526

Course Coordinator Email:

Course Coordinator Location: 15.04.02

Course Coordinator Availability: by email

Pre-requisite Courses and Assumed Knowledge and Capabilities

Assumed Knowledge

Applied business mathematics

Course Description

Real-world data is commonly incomplete, noisy, and inconsistent. You will be equipped with the skills needed to prepare all forms of untidy data for analysis. You will learn about the core concepts of data wrangling, namely tidy data, data integration, data cleaning, data transformation, data standardisation, data discretisation, and data reduction. You will develop and apply your data wrangling skills to complex, noisy, and inconsistent real-world data using leading open-source software R.


Objectives/Learning Outcomes/Capability Development

This course contributes to the following Program Learning Outcomes for BP330, Bachelor of Space Science.

Understanding science and engineering

  • You will demonstrate an understanding of the scientific method and engineering fundamentals and an ability to apply them in practice.

Knowledge and technical competence

  • You will have broad knowledge in space science and technology with deep knowledge in its core concepts.
  • You will have knowledge in at least one discipline other than your primary discipline and some understanding of interdisciplinary linkages.

Inquiry and Problem Solving

  • You will be able to choose appropriate tools and methods to solve scientific problems within your area of specialisation.
  • You will demonstrate well-developed problem-solving skills, applying your knowledge and using your ability to think analytically and creatively.

Information literacy

  • You will develop a capacity for independent and self-directed work.
  • You will work responsibly, safely, legally, and ethically.
  • You will develop an ability to work collaboratively.

On successful completion of this course, you should be able to:

  1. Utilise leading open-source software, R, to address and resolve data wrangling tasks.
  2. Select, perform, and justify data validation processes for raw datasets to satisfy quality requirements
  3. Apply and evaluate the best practice standards of Tidy Data Principles.
  4. Critically analyse data integration procedures for combining data with different types and structures into a suitable format.

Overview of Learning Activities

You will be actively engaged in a range of learning activities such as lectorials, tutorials, practicals, laboratories, seminars, project work, class discussion, individual and group activities. Delivery may be face to face, online or a mix of both.

You are encouraged to be proactive and self-directed in your learning, asking questions of your lecturer and/or peers and seeking out information as required, especially from the numerous sources available through the RMIT library, and through links and material specific to this course that is available through myRMIT Studies Course.

Overview of Learning Resources

RMIT will provide you with resources and tools for learning in this course through myRMIT Studies Course.

There are services available to support your learning through the University Library. The Library provides guides on academic referencing and subject specialist help as well as a range of study support services. For further information, please visit the Library page on the RMIT University website and the myRMIT student portal.

Overview of Assessment

You will be assessed on how well you meet the course learning outcomes and on your development against the program learning outcomes. 

Assessment Tasks

Assessment Task 1: Pre-processing data project
Weighting 20% 
This assessment task supports CLOs 1, 2 & 3

Assessment Task 2: Coding exercises
Weighting 35%
This assessment task supports CLOs 1, 2, 3 & 4

Assessment Task 3: Applied relational data project
Weighting 45%
This assessment task supports CLOs 1, 2, 3 & 4

If you have a long-term medical condition and/or disability it may be possible to negotiate to vary aspects of the learning or assessment methods. You can contact the program coordinator or Equitable Learning Services if you would like to find out more.