Course Title: Data Preprocessing

Part A: Course Overview

Course Title: Data Preprocessing

Credit Points: 12.00


Course Code




Learning Mode

Teaching Period(s)


City Campus


171H School of Science


Sem 1 2021,
Sem 1 2022,
Sem 2 2022,
Sem 1 2023,
Sem 2 2023

Course Coordinator: Dr. Sona Taheri

Course Coordinator Phone: +61 3 9925 2526

Course Coordinator Email:

Course Coordinator Location: 15.04.02

Course Coordinator Availability: by email

Pre-requisite Courses and Assumed Knowledge and Capabilities

Assumed Knowledge

Applied business mathematics

Course Description

Real-world data is commonly incomplete, noisy, and inconsistent. You will be equipped with the skills needed to prepare all forms of untidy data for analysis. You will learn about the core concepts of data wrangling, namely tidy data, data integration, data cleaning, data transformation, data standardisation, data discretisation, and data reduction. You will develop and apply your data wrangling skills to complex, noisy, and inconsistent real-world data using leading open-source software R.


Objectives/Learning Outcomes/Capability Development

This course contributes to the following Program Learning Outcomes for BP330, Bachelor of Space Science.

Understanding science and engineering

  • You will demonstrate an understanding of the scientific method and engineering fundamentals and an ability to apply them in practice.

Knowledge and technical competence

  • You will have broad knowledge in space science and technology with deep knowledge in its core concepts.
  • You will have knowledge in at least one discipline other than your primary discipline and some understanding of interdisciplinary linkages.

Inquiry and Problem Solving

  • You will be able to choose appropriate tools and methods to solve scientific problems within your area of specialisation.
  • You will demonstrate well-developed problem-solving skills, applying your knowledge and using your ability to think analytically and creatively.

Information literacy

  • You will develop a capacity for independent and self-directed work.
  • You will work responsibly, safely, legally, and ethically.
  • You will develop an ability to work collaboratively.

On successful completion of this course, you should be able to:

  1. Utilise leading open-source software, R, to address and resolve data wrangling tasks.
  2. Select, perform, and justify data validation processes for raw datasets to satisfy quality requirements
  3. Apply and evaluate the best practice standards of Tidy Data Principles.
  4. Critically analyse data integration procedures for combining data with different types and structures into a suitable format.

Overview of Learning Activities

This course uses highly structured learning activities to guide your learning and prepare you for your assessments. The activities are a combination of individual, peer-supported and facilitator-guided activities, with opportunities for feedback throughout. 

Authentic and industry-relevant learning is critical to this course; you will therefore be encouraged to critically compare current thinking and practice within this context and industry. You will apply your thinking by producing relevant real-world assessment tasks and engage with scenarios and case studies.  

Social learning is another important aspect of coursework; you are therefore expected to participate in group activities, share drafts of your work and other resources that might be helpful, as well as giving and receiving peer feedback. By working efficiently and effectively with others, you will achieve outcomes greater than those that you might have achieved on your own.

Above all, the learning activities are designed to maximise the likelihood that you will not only understand the course learning resources, but also be able to apply those learnings to your own professional practice.

Overview of Learning Resources

RMIT will provide you with resources and tools for learning in this course through myRMIT Studies Course.

There are services available to support your learning through the University Library. The Library provides guides on academic referencing and subject specialist help as well as a range of study support services. For further information, please visit the Library page on the RMIT University website and the myRMIT student portal.

Overview of Assessment

You will be assessed on how well you meet the course learning outcomes and on your development against the program learning outcomes. 

Assessment Tasks

Assessment Task 1: Pre-processing data project
Weighting 20% 
This assessment task supports CLOs 1, 2 & 3

Assessment Task 2: Coding exercises
Weighting 35%
This assessment task supports CLOs 1, 2, 3 & 4

Assessment Task 3: Applied relational data project
Weighting 45%
This assessment task supports CLOs 1, 2, 3 & 4

If you have a long-term medical condition and/or disability it may be possible to negotiate to vary aspects of the learning or assessment methods. You can contact the program coordinator or Equitable Learning Services if you would like to find out more.