Course Title: Big Data Processing

Part A: Course Overview

Course Title: Big Data Processing

Credit Points: 12.00

Important Information:

Please note that this course may have compulsory in-person attendance requirements for some teaching activities.

Please check your Canvas course shell closer to when the course starts to see if this course requires mandatory in-person attendance. The delivery method of the course might have to change quickly in response to changes in the local state/national directive regarding in-person course attendance. 


Terms

Course Code

Campus

Career

School

Learning Mode

Teaching Period(s)

COSC2633

City Campus

Undergraduate

171H School of Science

Face-to-Face

Sem 2 2019,
Sem 2 2020,
Sem 2 2021

COSC2633

City Campus

Undergraduate

175H Computing Technologies

Face-to-Face

Sem 2 2022

Course Coordinator: Dr Ke Deng

Course Coordinator Phone: +61 3 9925 3202

Course Coordinator Email: ke.deng@rmit.edu.au

Course Coordinator Location: 14.9.12

Course Coordinator Availability: By appointment


Pre-requisite Courses and Assumed Knowledge and Capabilities

Expected prior study:

Databases: this prerequisite knowledge can be attained by completing ISYS1057 Database Concepts
Extensive programming skills: this prerequisite knowledge can be attained by completing COSC1076 Advanced Programming Techniques.

Note: it is a condition of enrolment at RMIT that you accept responsibility for ensuring that you have completed the prerequisite/s and agree to concurrently enrol in co-requisite courses before enrolling in a course.

For your information go to RMIT Course Requisites webpage.


Course Description

This course builds on your database and programming skills. It aims to give you an in-depth understanding of a wide range of fundamental algorithms and processing platforms used in big data management.

The course covers Big Data Fundamentals, including the characteristics of Big Data, the sources Big Data (such as social media, sensor data, and geospatial data), as well as the challenges imposed around information management, data analytics, as well as platforms and architectures. Emphasis will be given to non-relational databases by examining techniques for storing and processing large volumes of structured and unstructured data, streaming data as well as complex analytics on them. Cloud computing and data centres will also be presented as a solution to handling big data and business intelligence applications.

The course aims to keep a balance between algorithmic and systematic issues. The algorithms discussed in this course involve methods of organising big data for efficient complex computation. In addition, we consider Big Data platforms (such as Hadoop) to present practical applications of the algorithms covered in the course.


Objectives/Learning Outcomes/Capability Development

This course is a specialisation course that contributes to the following Program Learning Outcomes (PLOs) for BP340/BP340P23 Bachelor of Data Science:

Enabling Knowledge (PLO1)

You will gain skills as you apply knowledge with creativity and initiative to new situations. In doing so, you will:

  • Demonstrate mastery of a body of knowledge that includes recent developments in computer science, information technology and statistics;
    • Understand and use appropriate and relevant, fundamental and applied mathematical and statistical knowledge, methodologies and modern computational tools;
    • Recognise and use research principles and methods applicable to data science.

Critical Analysis (PLO2)

You will learn to accurately and objectively examine, and critically investigate computer science, information technology (IT) and statistical concepts, evidence, theories or situations, in particular to:

  • Analyse and manage large amounts of data arising from various sources
    • Evaluate and compare solutions to data analysis problems on the basis of organisational and user requirements;
    • Bring together and flexibly apply knowledge to characterise, analyse and solve a wide range of statistical problems.

Problem Solving (PLO3)

Your capability to analyse complex problems and synthesise suitable solutions will be extended as you learn to:

  • Design and implement data analytic techniques that accommodate specified requirements and constraints, based on analysis or modelling or requirements specification;
    • Apply an understanding of the balance between the complexity / accuracy of the mathematical / statistical models used and the timeliness of the delivery of the solution.

Communication (PLO4)

You will learn to communicate effectively with a variety of audiences through a range of modes and media, in particular to:

  • Interpret abstract theoretical propositions, choose methodologies, justify conclusions and defend professional decisions to both technical and non-technical personnel via technical reports of professional standard and technical presentations.


Upon successful completion of this course, you should have gained an understanding of Big Data concepts, including cloud and big data architectures, an overview of Big Data analytics, implementation of Big Data platforms, and be able to apply these concepts using an industry standard non-relational database environment.

The key course learning outcomes are:

  • CLO 1: model and implement efficient big data solutions for various application areas using appropriately selected algorithms and data structures.
  • CLO 2: analyse methods and algorithms, to compare and evaluate them with respect to time and space requirements and make appropriate design choices when solving real-world problems.
  • CLO 3: motivate and explain trade-offs in big data processing technique design and analysis in written and oral form.
  • CLO 4: explain the Big Data Fundamentals, including the evolution of Big Data, the characteristics of Big Data and the challenges introduced.
  • CLO 5: apply non-relational databases, the techniques for storing and processing large volumes of structured and unstructured data, as well as streaming data.
  • CLO 6: apply the novel architectures and platforms introduced for Big data, i.e., Hadoop, MapReduce and Spark.


Overview of Learning Activities

The learning activities included in this course are:

  • Key concepts will be explained in pre-recorded lecture videos and lectorials, face-to-face or online, where syllabus material will be presented, and the subject matter will be illustrated with demonstrations and examples;
  • Lectorials and practical classes with group discussions (including online forums) focused on projects and problem solving will provide practice in the application of theory and procedures, allow exploration of concepts with teaching staff and other students, and give feedback on your progress and understanding;
  • Assignments, as described in Overview of Assessment (below), requiring an integrated understanding of the subject matter;
  • Private study, working through the course as presented in classes and learning materials, and gaining practice at solving conceptual and technical problems.


Overview of Learning Resources

The course is supported by the Canvas learning management system which provides specific learning resources. See also the RMIT Library Guide at http://rmit.libguides.com/compsci
You will make use of computer laboratories and relevant software provided by the School. You will be able to access course information and learning materials through myRMIT and may be provided with copies of additional materials in class or via email. Lists of relevant reference texts, resources in the library and freely accessible Internet sites will be provided.


Overview of Assessment

The assessment for this course comprises four assignments.

Note: This course has no hurdle requirements.


Assessment Tasks

Assessment Task 1: MapReduce Preliminary Program
This assignment helps students to build up the understanding on fundamental MapReduce program principles. 
Weighting 25% 
This assessment task supports CLOs 1-4 and 6.

Assessment Task 2: MapReduce Advanced Program
This assignment is featured by MapReduce program development which involves complex processing skills and performance analysis.
Weighting 25%
This assessment task supports CLOs 1-4 and 6.

Assessment Task 3: MapReduce Problem Solving 
This assignment develops the ability of students to apply the MapReduce program skills learned in this course to solve a practical big data processing problem. 
Weighting 25%
This assessment task supports CLOs 1-4 and 6.

Assessment Task 4: Spark Program
This assignment gives students the chance to understand the Spark program principles and to develop program skills to achieve efficient big data processing. 
Weighting 25%
This assessment task supports CLOs 1, 2, 4, 5, 6.