Privacy protected big data sharing environment
Project Description

ElasticSearch and Splunk are currently the most recognised big-data log analysis platform being implemented. ElasticSearch and Splunk use their own system for storing the collected logs from various system. Different log management system will duplicate the same set of logs into multiple copies of logs. However, as more and more logs were collected, storage usage will be heavily loaded.

Besides, whether the logs are stored in a privacy protected format has not been confirmed. Due to this privacy reason, many of the log management system users expressed their concern in exposing their sensitive information to unauthorised person.

In this project, the project group will have to:
- analyse the log format and data format used in Splunk and ElasticSearch (logstash) and understand how logs are being stored in the two systems and determine how log formats can be shared between them effectively
- analyse the privacy requirement of log data and determine what should NOT be transfer between log/threats analysis platform

Supervisor
IEONG Sze Chung Ricci
Co-Supervisor
IEONG Sze Chung Ricci
WANG Tao
Quota
2
Course type
UROP1100
Applicant's Roles

In this project, the applicants have to:
- Learn and understand about two big data platforms
- Analyse the log format and data format used in two security related big data platform
- Define the format of logs that de-anonymization will not be applied for recovering critical fields from logs

Applicant's Learning Objectives

- File format and storage format in two most commonly used big-data platform
- Determine how to de-anonymize from logs and prevent how to de-anonymize logs
- Define format of logs

Complexity of the project
Moderate