Course Title: Big Data Processing
Part A: Course Overview
Course Title: Big Data Processing
Credit Points: 12.00
171H School of Science
Sem 2 2019,
Sem 2 2020,
Sem 2 2021
Course Coordinator: Dr Ke Deng
Course Coordinator Phone: +61 3 9925 3202
Course Coordinator Email: email@example.com
Course Coordinator Location: 14.9.12
Course Coordinator Availability: By appointment, by email
Pre-requisite Courses and Assumed Knowledge and Capabilities
Expected prior study:
Databases: this prerequisite knowledge can be attained by completing ISYS1057 Database Concepts
Extensive programming skills: this prerequisite knowledge can be attained by completing COSC1076 Advanced Programming Techniques.
This course builds on your database and programming skills. It aims to give you an in-depth understanding of a wide range of fundamental algorithms and processing platforms used in big data management.
The course covers Big Data Fundamentals, including the characteristics of Big Data, the sources Big Data (such as social media, sensor data, and geospatial data), as well as the challenges imposed around information management, data analytics, privacy and security, as well as platforms and architectures. Emphasis will be given to non-relational databases by examining techniques for storing and processing large volumes of structured and unstructured data, streaming data as well as complex analytics on them. Data warehouses will also be presented as a solution to handling big data and business intelligence applications.
The course aims to keep a balance between algorithmic and systems issues. The algorithms discussed in this course involve methods of organising big data for efficient complex computation. In addition, we consider Big Data platforms (such as Hadoop) to present practical applications of the algorithms covered in the course.
Objectives/Learning Outcomes/Capability Development
Upon successful completion of this course you should have gained an understanding of Big Data concepts, including cloud and big data architectures, an overview of Big Data analytics, implementation of Big Data platforms, and be able to apply these concepts using an industry standard non-relational database environment.
The key course learning outcomes are:
- CLO 1: model and implement efficient big data solutions for various application areas using appropriately selected algorithms and data structures.
- CLO 2: analyse methods and algorithms, to compare and evaluate them with respect to time and space requirements and make appropriate design choices when solving real-world problems.
- CLO 3: motivate and explain trade-offs in big data processing technique design and analysis in written and oral form.
- CLO 4: explain the Big Data Fundamentals, including the evolution of Big Data, the characteristics of Big Data and the challenges introduced.
- CLO 5: apply non-relational databases, the techniques for storing and processing large volumes of structured and unstructured data, as well as streaming data.
- CLO 6: apply the novel architectures and platforms introduced for Big data, i.e. Hadoop, MapReduce and Spark.
Overview of Learning Activities
The learning activities included in this course are:
- Key concepts will be explained in pre-recorded lecture videos, lectorials, classes or online, where syllabus material will be presented and the subject matter will be illustrated with demonstrations and examples;
- Tutorials and/or labs and/or group discussions (including online forums) focused on projects and problem solving will provide practice in the application of theory and procedures, allow exploration of concepts with teaching staff and other You, and give feedback on your progress and understanding;
- Assignments, as described in Overview of Assessment (below), requiring an integrated understanding of the subject matter; and
- Private study, working through the course as presented in classes and learning materials, and gaining practice at solving conceptual and technical problems.
Total study hours
A total of 144 hours of study is expected during this course, comprising:
Teacher-directed hours (48 hours): lectorials and laboratory sessions. Each week there will be 2 hours of lectorial laboratory work. You are encouraged to participate during lectorials through asking questions, commenting on the pre-recorded lecture videos based on your own experiences, and through presenting solutions to written exercises. The laboratory sessions will introduce you to the tools necessary to undertake the assignment work.
Student-directed hours (96 hours): You are expected to be self-directed, studying outside class independently. You are expected to study by yourself the weekly lecture materials by reading the lecture notes and watching the pre-recorded lecture video clips before attending a lectorial of each week
Overview of Learning Resources
The course is supported by the Canvas learning management system which provides specific learning resources. See also the RMIT Library Guide at http://rmit.libguides.com/compsci
You will make use of computer laboratories and relevant software provided by the School. You will be able to access course information and learning materials through myRMIT and may be provided with copies of additional materials in class or via email. Lists of relevant reference texts, resources in the library and freely accessible Internet sites will be provided.
Overview of Assessment
The assessment for this course comprises weekly on-line tests or tasks, a major assignment and a formal written exam.
Note: This course has no hurdle requirements.
Assessment Task 1: Assignment 1
In this task, you will solve a data analysis problem on the big data processing platform.
This assessment task supports CLOs 1, 2, 4.
Assessment Task 2: Assignment 2
In this task, you will implement, critically analyse, and report a substantial big data solution of an existing data mining algorithm on a large data set.
This assessment task supports CLOs 1, 2, 3, 4, 5 and 6.
Assessment Task 3: Assignment 3
In this task, you will design, implement, and report on a data stream analysis project.
This assessment task supports CLOs 1, 2, 3, 4, 5, and 6.