Course Title: Big Data Management

Part A: Course Overview

Course Title: Big Data Management

Credit Points: 12.00

Terms

Course Code

Campus

Career

School

Learning Mode

Teaching Period(s)

COSC2636

City Campus

Postgraduate

140H Computer Science & Information Technology

Face-to-Face

Sem 1 2016

COSC2636

City Campus

Postgraduate

171H School of Science

Face-to-Face

Sem 1 2017,
Sem 1 2018

Course Coordinator: Dr. Zhifeng Bao

Course Coordinator Phone: +61 3 9925 1940

Course Coordinator Email: zhifeng.bao@rmit.edu.au


Pre-requisite Courses and Assumed Knowledge and Capabilities

Required Prior Knowledge:   Pre-requisites: Database systems: this prerequisite knowledge can be attained by completing ISYS1055 Database Concepts  Extensive programming skills: this prerequisite knowledge can be attained by completing COSC1295 Advanced Programming   Co-requisites: If you have not taken the COSC1285/2123 Algorithm and Analysis course, it is COMPULSORY that you must take it at the same time you take Big Data Management. Strong algorithm and complexity analysis skills are needed for students to perform well in the Big Data Management course learning.


Course Description

This course builds on skills gained in database management systems and gives students an in-depth understanding of a wide range of fundamental Big Data Management systems. In particular, this course focuses on the “variety” of the 3Vs in big data, where how to store, index and query various types of data (structured, unstructured, geo-spatial and time series data) in a real-world application. Moreover, this courses introduce an end-to-end infrastructure to solve big data management problems, which include data cleaning, data integration, data update, query processing (top-k query, k-nearest neighbour query, range query, point query), data visualization, data crowdsourcing, from front-end to back-end. The students are expected to establish the skills to extract core efficiency/scalability challenges from a real-life application scenario, in order to identify and address the bottleneck of a big data management system.   This course establishes a strong working knowledge of the concepts, techniques and products associated with Big Data. The main focus is on specialized storage models, indexing techniques, efficient and scalable algorithm designs for query processing, to work with a variety of Big Data.   Students will learn the core functionality of each major Big Data component and how they integrate to form a coherent solution with business benefit. Hands-on programming and algorithm design exercises aim to provide insight into what the tools do so that their role in Big Data systems can be understood.   The course keeps a good balance between algorithmic and systems issues. The algorithms discussed in this course involve methods of organising big data for efficient complex computation using MapReduce in particular platforms, such as Hadoop, to present practical applications for Big Data.

 


Objectives/Learning Outcomes/Capability Development

This course contributes to the development of the following Program Learning Outcomes:

Problem Solving:

Ability to model and implement efficient big data solutions for various application areas using appropriately selected tools and architectures.


Critical Analysis:

Ability to analyse big data infrastructures and their components, to compare and evaluate them, and make appropriate design choices when solving real-world problems.


Communication:

Ability to motivate and explain trade-offs in big data platform design and analysis in written and oral form.


On completion of this course you should have gained an understanding of Big Data concepts, including cloud and big data architectures, an overview of Big Data tools and platforms, and to apply these concepts using an industry standard tools and products. The key learning outcomes are:

  1. Be knowledgeable on the Big Data Fundamentals, including the evolution of Big Data, the characteristics of Big Data and the challenges introduced.
  2. Be Proficient on characterizing, formally defining the usability of big data, and extracting the core technical/research questions from a real-world problem.
  3. Can acquire and implement various efficient indexing schemes to manage different types of data (to cater for “Variety” of data), which include but not limit to geo-spatial data, spatial-textual data, multimedia data, time series data, high-dimensional structured data, crowdsourced data.
  4. Design algorithms to achieve efficient query processing over heterogeneous data (on top of the index designed), and can conduct theoretical analysis on the space and time complexity of the algorithm that applies to large-scale heterogeneous data.
  5. Adopt an end-to-end approach to turn the theoretical analysis to physical development of system prototype that address real-life applications.

 


Overview of Learning Activities

Key concepts will be explained in lectures, classes or online, where syllabus material will be presented and the subject matter will be illustrated with demonstrations and examples. Tutorials and/or labs and/or group discussions (including online forums) focused on projects and problem solving will provide practice in the application of theory and procedures, allow exploration of concepts with teaching staff and other students, and give feedback on your progress and understanding; assignments, as described in Overview of Assessment (below), requiring an integrated understanding of the subject matter; and private study, working through the course as presented in classes and learning materials, and gaining practice at solving conceptual and technical problems.

 

Total study hours

 

A total of 120 hours of study is expected during this course, comprising:

Teacher-directed hours (48 hours): lectures, tutorials and laboratory sessions. Each week there will be 2 hours of lecture plus 2 hours of practical work in a computer laboratory. You are encouraged to participate through asking questions, commenting on the material based on your own experiences and by formulating solutions to provided exercises. The tutorial/laboratory sessions will introduce you to the tools and techniques necessary to undertake the assignment work.

Student-directed hours (72 hours): You are expected to be self-directed, studying independently outside class.


Overview of Learning Resources

You will make use of computer laboratories and relevant software provided by the School. You will be able to access course information and learning materials through myRMIT and may be provided with copies of additional materials in class or via email. Lists of relevant reference texts, resources in the library and freely accessible Internet sites will be provided.

Use the RMIT Bookshops textbook list search page to find any recommended textbook(s).


Overview of Assessment

Exam: 50%  This assessment supports CLOs 1-4   Assignments: 50% Two assignments: both assignments consist of consuming the algorithm and index designed from a research paper on big data management, followed by implementing the algorithm, theoretical analysis of the algorithm, and finally turn it to an end-to-end system prototype that is going to work on large scale heterogeneous data. We will give assistance through practical session to some core tasks needed in assignment completion. This assessment supports CLOs 1-4.   You are going to build a data exploration over heterogeneous data, say spatial-textual data, like a mini-google map (the original idea will be given from 1-3 research papers in big data management area). You are responsible for developing the whole infrastructure, from front-office for user interface and result visualization, mid office for algorithm design to achieve efficient and scalable query processing, to data indexing and storage in the end-office job, and you are required to collect data, clean data, index data, implementing the data exploration algorithms, demonstrate. You need to make face-to-face demonstration for the system prototype built as a result of your programming assignment. Demonstration time will be announced later. This assessment supports CLO 5.