Data Analytics and Event Intelligence Application
Introduction
The project is designed for students to create cloud-based applications using microservices, with a focus on serverless architecture for data analytics. Students will choose a data-related challenge, develop microservices to address it, and integrate those into a cohesive application. The project responds to the growing need for advanced data analytics software and provides practical experience in building solutions that data scientists across various industries might employ. It emphasizes the development of microservices with a serverless approach to facilitate scalable and efficient data analysis.
Target Application
There is no doubt that many businesses now use sophisticated systems for decision-making but their great limitation is in finding and accessing good quality data. In this project, we assume a non-profit organisation that will use students to build an Event Intelligence Application using agile methods combined with cloud and DevOps principles so that different components can be built progressively over time. Teams will be following agile processes to build the system incrementally and comply with some design recommendations.
This Event Intelligence Application aims to help users create “event datasets“ and offer them to researchers and small companies so that they can conduct analytics experiments or feed into their business intelligence systems. As an example, an investment company wants to look at weather events because bad weather can influence the production of agricultural goods which in turn affects the stock price of companies that distribute these goods. Another example is looking at political events from the news that might affect some oil-producing countries which would increase transport costs.
Sometimes companies are interested in understanding chains of events: an event that causes another event to happen, which causes another event etc. This can be used to make predictions in the future. Examples of events relevant to users include:
Financial Events: Movements and trends in the financial markets, including changes in stock prices, currency exchange rates, and economic indicators that influence financial decisions and market behaviours.
Climate-Related Events: Changes in temperature, humidity, and other climate variables, potentially affecting regions or specific buildings.
News Events: Announcements or developments that impact society, the economy, or specific industries.
Social Media Events: Activities on social media platforms that signify significant events, trends, or public sentiment shifts.
Economic Events: Fluctuations in economic indicators such as interest rates and stock prices, on both macro (days/months) and micro (hours/minutes) scales.
Health-Related Events: Information about health crises, disease outbreaks, or advancements in medical research.
ESG (Environmental, Social, and Governance) Events: Data that reflects a company's operations to environmental stewardship, social responsibility, and governance practices. This includes corporate actions impacting sustainability, ethical impacts of business practices, social justice issues, and governance structures that ensure accountability and transparency. ESG data is critical for investors and stakeholders looking to assess the sustainability and ethical impact of their investments, as well as for companies aiming to improve their societal footprint.
Additional details about these data sources are provided in a separate document
In summary, the Event Intelligence Application is aimed at users interested in acquiring event datasets that help them do further processing (e.g., investigate some hypothesis, visualise events on a dashboard, make predictions etc.). As many users can have overlapping needs, the company decides to gather as much event data as possible from different sources and get them managed by dedicated microservices. This way, the overall Event Intelligence Application automates the process of gathering raw data from several data sources and analysing the data using data processing pipelines made up of reusable components.
Design Guidelines
Architecture
The software architecture diagram provided outlines a system designed around key principles to ensure scalability and independence. Here is a high-level concept:
Microservices are structured to expose APIs, facilitating internal and external communications.
Data collected, generated, or processed by these microservices is stored in a shared AWS S3, ensuring durability and availability.
The architecture is serverless, leveraging cloud scalability and cost-efficiency.
Independence among microservices is maintained to avoid tight coupling, allowing each to function and scale independently.
*The dash arrows are just an optional flow.
Students are encouraged to creatively design the architecture of their applications, adhering to the fundamental principles provided. This ensures that while students have the freedom to innovate, they remain aligned with the project's core requirements of API exposure, cloud-based data storage, serverless resources, and microservice independence.
The Different Types of Microservices
Data Collection: This is the foundational microservice that acquires data from various external sources and standardizes it to fit a specified data model.
Data Retrieval: This service is tasked with fetching the stored data from cloud infrastructure, ensuring it is readily accessible for subsequent operations.
Data Preprocessing: A crucial step where the retrieved data undergoes cleansing and formatting, setting the stage for accurate analysis.
Analytical Model: Utilizing the preprocessed data, this microservice applies statistical models and algorithms to distill insights and patterns.
Visualization/Reporting: The culminating microservice that presents analytical findings through visual aids or reports, facilitating easier interpretation and decision-making.
The Microservices and API
For the project, students are required to develop several distinct microservices tailored for data analytics. Each microservice must be capable of communicating through APIs, both for internal functionalities and external interfaces. These APIs will eventually be utilized and tested by other teams, which necessitates that each microservice is built to operate independently. This modular design allows for any of the microservices to be substituted with alternative implementations in the future if needed. Students will put their APIs on a marketplace for other teams to use it. For the final application, Student will get higher marks if more teams use your APIs, or you can integrate more other teams’ APIs into your application meaningfully.
One of the project goals is to develop APIs that others can use externally. While it's acceptable to have different internal microservices for certain backend processes, the focus should be on creating accessible and user-friendly APIs for external use.
Important Tips: To attract more users to your APIs, focus on designing your microservices to be both general and flexible. For example, when developing a microservice for calculating moving averages, enable users to customize parameters like the attribute to be analyzed or the moving average window size upon API invocation. Additionally, creating a microservice capable of processing different datasets from S3 buckets or external storage by inputting parameters will enhance the flexibility and usability of your services.
For Sprint 1, each group is required to develop at least one set of microservices. By Sprint 3, students are encouraged to create additional APIs to enhance the functionality of their applications.
Set1: Data Collection + Data Retrieval + Data Preprocessing
Set2: Data Collection + Data Retrieval + Visualization/Reporting
Set3: Data Collection + Data Retrieval + Analytical Model
Set4: Data Preprocessing + Analytical Model
Set5: Analytical Model + Visualization/Reporting
The Data Model
For the project, students are encouraged to download or access on-line open-source datasets within finance, ESG, economy, news & social network, climate, and public health related areas. Each student is required to select one (or more) of these datasets as the basis for their microservices application.
In adherence to the project guidelines, it is imperative that all event data is stored using a consistent format. This format is detailed in the data model document, accessible through the provided link. The rationale behind maintaining uniform data formatting is to enable seamless sharing of data analytics services among different teams, regardless of the data source. This standardized approach enhances collaboration and interoperability across the project. Details can be found in page:
25T1-Data Model Specification - SENG3011 25T1 - UNSW SENG
The Tech Stack
The recommended technology stack for this project is centered around Amazon Web Services (AWS), and it includes various tools and services that students will utilize:
S3: This service will be employed for data storage and is provided by the teaching team.
Lambda: Used for serverless computing, enabling students to run code without provisioning or managing servers.
API Gateway: Facilitates the creation, publishing, and management of APIs that will be used to communicate with microservices.
ECS (Elastic Container Service): Enables the deployment and management of containers, facilitating the scalability of microservices.
Terraform: Use Terraform to codify and automate the deployment of AWS resources like Lambda, API Gateway and ECS.
GitHub: Employ GitHub to store and version-control your Terraform configurations for AWS. Set up CI/CD pipelines with GitHub Action for automated deployments, ensuring consistency and collaboration among students.
Fargate: A serverless compute engine for containers that manages the underlying infrastructure.
*DynamoDB: A NoSQL database service that provides high performance at any scale.
*Sagemaker: A notebook service that supports building, training, and deploying machine learning models.
*App Runner: A service that simplifies the deployment of containerized applications for ease of management.
Variations are allowed by teams of students.
Sprints Overview
In the project, students will complete three sprints, each with specific objectives:
Sprint 1 will consist of creating an API microservice and testing it locally. The suggested list of external data sources is available separately, but other data sources can be selected by teams of students in consultation with their mentor. This API should be made accessible to other teams.
Sprint 2 will consist of another team conducting tests for the API developed in Sprint 1 within the shared infrastructure. The purpose is establish testing, observability/monitoring feature and implement CI/CD functionalities for the project.
Sprint 3 will develop an Event Intelligence Application with a GUI (Graphical User Interface) that utilizes APIs from Sprint 1 and new APIs. Incorporating APIs from other teams will result in additional marks. Additionally, an extra bonus will be awarded for wrapping your microservices within specific Python functions. Include a final report describing your design
Project Self-Evaluation (individual) summarizes all the work you have done in a good documentary way
Details about each sprint will be provided separately.