Course Title: Data Wrangling

Part A: Course Overview

Course Title: Data Wrangling

Credit Points: 12.00

Terms

Course Code

Campus

Career

School

Learning Mode

Teaching Period(s)

MATH2349

City Campus

Postgraduate

171H School of Science

Face-to-Face

Sem 1 2018,
Sem 2 2018,
Sem 1 2019,
Sem 2 2019,
Sem 1 2020,
Sem 2 2020

Flexible Terms

Course Code

Campus

Career

School

Learning Mode

Teaching Period(s)

MATH2405

RMIT Online

Postgraduate

171H School of Science

Internet

JanJun2020 (TP3)

Course Coordinator: Dr. Sona Taheri

Course Coordinator Phone: -

Course Coordinator Email: sona.taheri@rmit.edu.au

Course Coordinator Location: -

Course Coordinator Availability: By email


Pre-requisite Courses and Assumed Knowledge and Capabilities

A working knowledge of basic mathematics and familiarity with computers.


Course Description

Real-world data are commonly incomplete, noisy, and inconsistent. This course will cover a wide range of topics designed to equip you with the skills needed to prepare all forms of untidy data for analysis. The course will cover the core concepts of data pre-processing, namely tidy data, data integration, data cleaning, data transformation, data standardisation, data discretisation, and data reduction. You will develop and apply your data wrangling skills to complex, noisy, and inconsistent real-world data using leading open source software.


Objectives/Learning Outcomes/Capability Development

This course contributes to the following Program Learning Outcomes for MC004 Master of Statistics and Operations Research and MC242 Master of Analytics:

 

Personal and professional awareness

  • the ability to contextualise outputs where data are drawn from diverse and evolving social, political and cultural dimensions
  • the ability to reflect on experience and improve your own future practice
  • the ability to apply the principles of lifelong learning to any new challenge.

Knowledge and technical competence

  • an understanding of appropriate and relevant, fundamental and applied mathematical and statistical knowledge, methodologies and modern computational tools.

Problem-solving

  • the ability to bring together and flexibly apply knowledge to characterise, analyse and solve a wide range of problems
  • an understanding of the balance between the complexity / accuracy of the mathematical / statistical models used and the timeliness of the delivery of the solution.

Information literacy

  • the ability to locate and use data and information and evaluate its quality with respect to its authority and relevance.

 

 

This course contributes to the following Program Learning Outcomes from GC173KP19 Graduate Certificate of Data Science.

  1. Understand and use appropriate and relevant, fundamental and applied mathematical and statistical knowledge and methodologies and modern processes and analytical tools. 
  2. Bring together and flexibly apply knowledge to characterise, analyse and solve a range of data science problems. 
  3. Select and justify relevant approaches to analyse data problems and/or identify opportunities. 
  4. Use visualisation approaches to interpret and communicate findings to inform decision makers.
  5. Critically reflect on their own practice to support their continual professional development and become lifelong learners.


Course Learning Outcomes

On completion of this course you should be able to:

  1. Accurately, logically and ethically combine data from multiple sources to make suitable for statistical analysis and draw valid interpretations.
  2. Articulate how data meets the best practice standards (e.g. tidy data principles).
  3. Select, perform and justify data validation processes for raw datasets.
  4. Use leading open source software (e.g. R) for reproducible, automated data processing.


Overview of Learning Activities

This course is offered in two different delivery modes: On Campus and Online.

On Campus

For students enrolled on campus course learning activities take place both online and face-to-face. Online course notes and materials replace traditional lectures and labs. Face-to-face class time is mainly used for hands-on demonstrations of concepts and software use and working in groups on module exercises and problems that consolidate learning and prepare for the final exam.

You are highly recommended to bring a portable computing device to class, preferably a laptop, with Wi-Fi access to the RMIT University network. You will also require open source software used in the course to be installed on your personal computing device.

Online 

This course uses highly structured learning activities to guide your learning process and prepare you for your assessments. The activities are a combination of individual, peer-supported and facilitator-guided activities, and where possible project-led, with opportunities for feedback throughout. 

Authentic and industry-relevant learning is critical to this course and you will be encouraged to critically compare and contrast what is happening in your context and in industry, and to use your insights. 

Social learning is another important component and you are expected to participate in class and group activities, share drafts of work and resources and give and receive peer feedback. You will be expected to work efficiently and effectively with others to achieve outcomes greater than those that you might have achieved alone. 

Above all, the learning activities are designed to maximize the likelihood that you will not only understand the course learning resources but also apply that learning to improving your own practice, for example by producing real-world artefacts and engaging in scenarios and case studies. 

 

Total study hours

On Campus

You will undertake 3 hours per week of face-to-face learning in class. In addition to the weekly classes, you are expected to spend approximately another six hours per week on activities related to this course. These activities include reading and practicing online course material, completing module exercises and assignments, and preparing for assessments.

Online

During the 6 week teaching period, this course will be comprised of 20 hours of directed and self-directed learning per week including webinars, practice exercises, assignments and revision.


Overview of Learning Resources

On Campus

There are no prescribed texts for this course. All course content, notes, learning materials and data sets will be available through the course website and Canvas LMS. A list of recommended textbooks for this course will also be provided on Canvas as reference sources.

Online

Each learning activity contains the core resources, such as videos, podcasts, readings, templates, articles, industry tools and/or communities that you need to complete that activity, or links to those resources. 

Additional learning resources designed into the course, will be clearly marked as supplemental. If your course teaching team finds additional resources during course delivery which they think can support or be of interest to the class cohort, these will be made available as required during the teaching period. 

In your class environment, besides your learning activities you will also find 

  • All assessment briefs 
  • A course information page with a study schedule
  • Various communication tools to facilitate collaboration with your peers and facilitators, and to share information 

Learning Resources are also available online through RMIT Library databases and other facilities. If you require assistance with the RMIT library facilities contact the Business Liaison Librarian for your school. Contact details for Business Liaison Librarians are located on the RMIT Library website. 


Overview of Assessment

On Campus:

This course has no hurdle requirements.

 

Assessment Task 1:  Assignments    

Assignments staggered throughout the semester.  

Weighting 30%  

This assessment task supports CLOs 1, 2, 3 and 4

 

Assessment Task 2: Mid-Semester test

Weighting 20%

This assessment task supports CLOs 1, 2 and 3

 

Assessment Task 3: Final Examination 

A two-hour final examination during the exam period

Weighting 50%

This assessment task supports CLOs 1, 2, 3 and 4.

 

Online:

This course has three assessments, all of which must be completed. A total mark of 50% is required for a pass in the course. This does not mean that each individual component of assessment must be passed.

 

Assessment Task 1:  Assignment 1

Weighting 30%  

This assessment task supports CLOs 1, 2, 3 and 4

 

Assessment Task 2: Online Test

Weighting 30%

This assessment task supports CLOs 1, 2, 3 and 4

 

Assessment Task 3: Assignment 2

Weighting 40%

This assessment task supports CLOs 1, 2, 3 and 4