Subspace-Based Aggregation for Enhancing Utility, Information Measures, and Cluster Identification in Privacy Preserved Data Mining on High-Dimensional Continuous Data

Shashidhar Virupaksha; D.Venkatesulu

Please use this identifier to cite or link to this item: http://localhost:8080/xmlui/handle/123456789/1759

Full metadata record

DC Field	Value	Language
dc.contributor.author	Shashidhar Virupaksha
dc.contributor.author	D.Venkatesulu
dc.date.accessioned	2022-05-23T08:44:16Z	-
dc.date.available	2022-05-23T08:44:16Z	-
dc.date.issued	2019
dc.identifier.citation	International Journal of Computers and Applications
dc.identifier.uri	http://localhost:8080/xmlui/handle/123456789/1759	-
dc.description.abstract	Clustering is a data mining technique that has been effectively used in the last few decades for knowledge extraction. Privacy is a major problem while releasing data for clustering and therefore privacy-preserving data mining (PPDM) algorithms have been developed. Aggregation is a popular PPDM technique that has been used. However, in the last few years, certain applications require that data mining be performed on high-dimensional data. The present privacy preservation techniques perform aggregation in a univariate manner along each dimension. This affects the utility measures, information measures, and especially retention of original clusters. This paper proposes a new technique called as subspace-based aggregation (SBA). SBA categorizes the dimensions into dense and non-dense subspaces based on the density of points. Aggregation is performed separately for dense and non-dense subspaces. This approach helps to maximize utility measures, information measures, and retention of clusters. SBA is run on high-dimensional continuous datasets from UCI Machine Learning repository. SBA is compared with related work methods such as SINGLE, SIMPLE, MDAV, and PPPCA. SBA provides an improvement of 66% in utility, 400% in cluster identification, 5% in co-variance, and standard deviation.
dc.language.iso	en
dc.publisher	Published online
dc.title	Subspace-Based Aggregation for Enhancing Utility, Information Measures, and Cluster Identification in Privacy Preserved Data Mining on High-Dimensional Continuous Data
dc.type	Article
Appears in Collections:	Computer Science Engineering Department

Files in This Item:

File	Size	Format
CSE-13.docx	14.45 kB	Microsoft Word XML	View/Open

Show simple item record

Presidency University Library Repository

Search repository for Presidency university question papers, faculty research papers and theses.