A Comparative Study of Codification Techniques for Clustering Heart Disease Database
Modeling and Control in Biomedical Systems, Volume # 7 | Part# 1
Authors
Barceló-Rico, Fátima; Díez, José Luis
Identifier
10.3182/20090812-3-DK-2006.00028
Index Terms
Biomedical signal processing
Abstract
This paper compares dierent proposals for codifying categorical attributes in a Heart Disease database, in order to be able to apply numerical clustering algorithms to them. The main idea of the new approach is a codification of categorical attributes based on polar coordinates. This will be compared with other methods for clustering mixed databases found in literature. This proposal has many advantages: it relatively easy to understand and apply, the increment in the length of the input matrix is not excessively large, and the committed error is under control. The proposed codification has been combined in this case with the well known K-means algorithm and has showed a very good performance in a Heart Disease database benchmark.
References
A. Ahmad and L. Dey. A k-mean clustering algorithm for
mixed numeric and categorical data. Data & Knowledge
Engineering, 63(2):503{527, 2007.
R. Babuska. Fuzzy Modeling and Identification. PhD
dissertation, Delft University of Technology, Delf, The
Netherlands, 1996.
Albertos Pedro Benitez-Sanchez Ignacio, Diez Jose Luis.
Applying dynamic data mining on multi-agent systems.
Proc. IFAC 17th World Congress, 2008.
J.C. Bezdek. Pattern recognition with Fuzzy Objective
Function Algorithms. Ed. Plenum Press, New York,
USA, 1987.
J. Crossa and J. Franco. Statistical methods for classifying
genotypes. Euphytica, 137(1):19{37, 2004.
J.V. de Oliveira and W. Pedrycz. Advances in Fuzzy
Clustering and its Applications. John Wiley & Sons,
Inc. New York, NY, USA, 2007.
Jose Luis Diez, Antonio Sala, and Jose Luis Navarro.
Target-shaped possibilistic clustering applied to local-
model identification. Engineering Applications of Artifi-
cial Intelligence, 19:201{208, 2006.
Sala A. Diez J. L., Navarro J. L. A fuzzy clustering
algorithm enhancing local model interpretability. Soft
Computing - A Fusion of Foundations, Methodologies
and Applications, 11:973{983, 2007.
R. O. Duda, P. E. Hart, and D. G. Stork. Pattern
Classification. Ed. John Wiley & Sons, New York, USA,
2000.
R. Gelbard, O. Goldman, and I. Spiegler. Investigating di-
versity of clustering methods: An empirical comparison.
Data & Knowledge Engineering, 63(1):155{166, 2007.
Michael Goebel and Le Gruenwald. A survey of
data mining and knowledge discovery software tools.
SIGKDD Explor. Newsl., 1(1):20{33, 1999. doi:
http://doi.acm.org/10.1145/846170.846172.
S. Guha, R. Rastogi, and K. Shim. Rock: A robust clus-
tering algorithm for categorical attributes. Information
Systems, 25(5):345{366, 2000.
C.C. Hsu, C.L. Chen, and Y.W. Su. Hierarchical clustering
of mixed data based on distance hierarchy. Information
Sciences, 177(20):4474{4492, 2007.
Z. Huang. Extensions to the k-Means Algorithm for
Clustering Large Data Sets with Categorical Values.
Data Mining and Knowledge Discovery, 2(3):283{304,
1998.
et. al Janos Abonyi. Advances in Fuzzy Clustering and its
applications. John Wiley & Sons, Ltd, 2007.
Y.S. Kim and S. Mitra. An adaptive integrated fuzzy
clustering model for pattern recognition. Fuzzy Sets and
Systems, 65:297{310, 1994.
M.J. Zaki, M. Peters, I. Assent, and T. Seidl. Clicks:
An effective algorithm for mining subspace clusters in
categorical datasets. Data & Knowledge Engineering,
60(1):51{70, 2007.
Shengchun Deng Zengyou He, Xiaofei Xu. Scalable Algo-
rithms for Clustering Large Datasets with Mixed Type
Attributes. International Journal of Intelligent Systems,
20:1077{1089, 2005.
T. Zhang, R. Ramakrishnan, and M. Livny. Birch: An
efficient data clustering method for large databases.
Proc. SIGmod, 96:103{114, 1996.
