SIT 719 SIT719 Security and Privacy Issues in Analytics Credit Task 8.2: k-anonymity for Sensitive...

Question

SIT 719

SIT719 Security and Privacy Issues in Analytics

Credit Task 8.2: k-anonymity for Sensitive Data Privacy
Overview

Data owners want a way to transform a dataset containing highly sensitive information into a
privacy-preserving, low-risk set of records that can be shared with anyone. k-anonymity, a privacy
model commonly applied to protect the data subjects’ privacy in data sharing scenarios, and the
guarantees that k-anonymity can provide when used to anonymise data. There are different open
source and commercial tools which utilizes this privacy model to protect the sensitive data.

Amnesia is a data anonymization tool that allows to remove identifying information from data.
Amnesia not only removes direct identifiers like names, SSNs etc but also transforms secondary
identifiers like birth date and zip code so that individuals cannot be identified in the data. Amnesia
supports k-anonymity.

Please see the task description for the detailed tasks.

This is a Credit task, so please make sure you are already up to date with all Pass tasks before
attempting this task.

Task Description
Instructions:

1. Write a 500 word summary addressing the followings:
a) Quasi-identifiers
) k-anonymity
c) How k-anonymity can help prevent privacy attack?

2. Do some research to identify some commercial and open-source tools for data
anonymization.

Then, Make a list of the tools.

Upload the summary report to the onTrack system.
Overview
Task Description

sit719-82c-4ferr532.pdf

Neha · Accepted Answer

Quasi-Identifier
The Quasi-Identifier can be defined as a piece of information which can be used by an intruder to find out something specific about a target or individual. This can be predicted from a large number of people (Zhang, X., Liu, C., Nepal, S. and Chen, J). The intruder can find this out using the following personal information about the specific target person:
· Specific target person is well known, and the information is publicly available.
· The publicly available registries or the medias.
· The information which individual post about themselves over the social media.
· The information which is disclosed by individual to multiple people. 
It is important to know that it is possible predict a quasi-identifier using some other variable. Both the variables are considered as quasi identifiers. There is no point which can protect the variable A but not variable B and it is easy for the intruder to predict a variable using variable B (Koot, M.R., Mandjes, M., van’t Noordende, G. and de Laat, C). It is important to search for the related variables present in a data set. Examples of the correlated variables are date of birth for a baby and date of discharge from hospital, date of death and date or autopsy, weight at birth and weight of baby at discharge, age and date of graduation etc. 
K-Anonymity
The K-Anonymity can be defined as a privacy model which is applied to the data set to protect it and the privacy in data sharing scenarios (LeFevre, K., DeWitt, D.J. and Ramakrishnan, R). The k anonymity can provide privacy when used with anonymise data. There are many privacy-preserving systems which have the goal of providing K-Anonymity for the data subjects.

SIT 719 SIT719 Security and Privacy Issues in Analytics Credit Task 8.2: k-anonymity for Sensitive Data Privacy Overview Data owners want a way to transform a dataset containing highly sensitive...

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment