Course Title: Managing Semi-structured and Unstructured Data
Part A: Course Overview
Course Title: Managing Semi-structured and Unstructured Data
Credit Points: 12.00
Terms
Course Code |
Campus |
Career |
School |
Learning Mode |
Teaching Period(s) |
ISYS1078 |
City Campus |
Postgraduate |
140H Computer Science & Information Technology |
Face-to-Face |
Sem 2 2006, Sem 2 2007, Sem 2 2008, Sem 2 2009, Sem 2 2010, Sem 2 2011, Sem 2 2012, Sem 2 2013, Sem 2 2014, Sem 2 2015 |
ISYS1078 |
City Campus |
Postgraduate |
171H School of Science |
Face-to-Face |
Sem 1 2017, Sem 1 2018, Sem 2 2019 |
ISYS1079 |
City Campus |
Undergraduate |
140H Computer Science & Information Technology |
Face-to-Face |
Sem 2 2006, Sem 2 2007, Sem 2 2008, Sem 2 2009, Sem 2 2010, Sem 2 2011, Sem 2 2012, Sem 2 2013, Sem 2 2014, Sem 2 2015 |
ISYS1079 |
City Campus |
Undergraduate |
171H School of Science |
Face-to-Face |
Sem 1 2017, Sem 1 2018, Sem 2 2019 |
Course Coordinator: Dr. Zhuang Li
Course Coordinator Phone: +61 3 9925
Course Coordinator Email: zhuang.li@rmit.edu.au
Course Coordinator Availability: By appointment
Pre-requisite Courses and Assumed Knowledge and Capabilities
Enforced Pre-Requisite Courses
Successful completion of the following course/s:
Recommended Prior Study
You should have satisfactorily completed or received credit for the following course/s before you commence this course:
OR
If you have completed prior studies at RMIT or another institution that developed the skills and knowledge covered in the above course/s you may be eligible to apply for credit transfer.
Alternatively, if you have prior relevant work experience that developed the skills and knowledge covered in the above course/s you may be eligible for recognition of prior learning.
Please follow the link for further information on how to apply for credit for prior study or experience.
Course Description
Large Language Models (LLMs) have transformed the way we access and generate information, but they remain limited by the static nature of their training data. Retrieval-Augmented Generation systems offer a powerful solution by combining language models with search capabilities. These systems dynamically retrieve relevant content from external sources, allowing intelligent agents to provide up-to-date and context-aware responses. As a result, search engines and retrieval modules have become integral to modern AI systems, enabling more accurate and grounded decision-making.
This course begins by introducing foundational concepts in information retrieval. You will explore the structure of documents, queries, and collections, and learn how to evaluate relevance in large-scale systems. Key topics include document indexing, Boolean and ranked retrieval models, query expansion techniques, and evaluation using standard benchmarks. These methods form the basis of effective search infrastructure and are critical to supporting dynamic reasoning in downstream applications.
In the second half of the course, you will design and implement a lightweight multi-agent system that leverages LLMs to interact with structured and unstructured data sources. This system will demonstrate how LLM agents coordinate, share knowledge, and make decisions in real time. You will develop skills in LLM communication protocols, simple agent planning, and data-driven reasoning, preparing you to build intelligent systems capable of integrating retrieval into complex workflows. You will also explore retrieval-augmented generation (RAG), prompt engineering, agent memory, and tool use with LLM APIs.
If you are enrolled in this course as a component of your Bachelor Honours Program, your overall mark will contribute to the calculation of the Weighted Average Mark (WAM).
See the WAM information web page for more information.
Objectives/Learning Outcomes/Capability Development
Program Learning Outcomes
This course contributes to the program learning outcomes for the following program(s):
Major: Advanced Computer Science
- BP094P23 - Bachelor of Computer Science
- BP347 - Bachelor of Computer Science (Professional)
Major: Advanced Data Science
- BP340P23 - Bachelor of Data Science
- BP348 - Bachelor of Data Science (Professional)
PLO1: Knowledge - Apply a broad and coherent set of knowledge and skills for developing user-centric computing solutions for contemporary societal challenges.
PLO2: Problem Solving - Apply systematic problem solving and decision-making methodologies to identify, design and implement computing solutions to real world problems, demonstrating the ability to work independently to self-manage processes and projects.
PLO3: Cognitive and Technical Skill - Critically analyse and evaluate user requirements and design systems employing software development tools, techniques, and emerging technologies.
PLO4: Communication - Communicate effectively with diverse audiences, employing a range of communication methods in interactions.to both computing and non-computing personnel.
For more information on the program learning outcomes for your program, please see the program guide.
Upon successful completion of this course, you should be able to:
- Apply core principles of information retrieval to identify relevant content in large-scale datasets using both classical methods and modern LLM APIs.
- Design basic indexing strategies and retrieval pipelines for semi-structured data, incorporating hybrid sparse-dense techniques and LLM-based embeddings.
- Evaluate retrieval system performance using standard test collections and established evaluation metrics.
- Develop simple LLM-based agents that can respond to structured queries by using predefined coordination protocols.
- Compose a basic workflow that integrates retrieval methods with prompt-based LLM approaches to process unstructured or noisy data.
Overview of Learning Activities
The learning activities in this course include:
- Lectorials to explain key IR and multi-agent concepts, supported by live coding or demonstrations.
- Labs, and group discussions (including online forums) focused on problem-solving and implementation.
- Peer and teaching staff feedback on system designs and assignments.
- Literature reviews and research discussions to explore trends in IR and LLM agent-based systems.
- Independent study to reinforce theoretical and practical skills.
Overview of Learning Resources
You will make extensive use of computer laboratories and relevant software provided by the School. Course information and learning materials will be available through MyRMIT, and additional materials may be provided in class or via email.
Lists of relevant reference texts, library resources, and freely accessible Internet sites will be provided.
Use the RMIT Bookshop’s textbook list search page to find any recommended textbook(s).
Overview of Assessment
Assessment Tasks
Assessment Task 1: Document Pre-processing and Feature Engineering
Weighting: 20%
Supports CLOs 1 and 2
Assessment Task 2: Information Retrieval System and Evaluation
Weighting: 30%
Supports CLOs 2 and 3
Assessment Task 3: Lightweight Multi-Agent System
Weighting: 50%
Supports CLOs 1, 2, 3, 4 and 5
If you have a long-term medical condition and/or disability it may be possible to negotiate to vary aspects of the learning or assessment methods. You can contact the program coordinator or Equitable Learning Services if you would like to find out more.