Research Article
Modified Mahalanobis Taguchi System for
Imbalance Data Classification
Mahmoud El-Banna
Industrial Engineering Department, German Jordanian University, P.O. Box 35247, Amman 11180, Jordan
Correspondence should be addressed to Mahmoud El-Banna; malbanna@gmail.com
Received 7 March 2017; Revised 14 May 2017; Accepted 22 May 2017; Published 24 July 2017
A
c
ademic Editor: Massimo Panella
Copyright © Mahmoud El-Banna. is is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
e Mahalanobis Taguchi System (MTS) is considered one of the most promising binary classication algorithms to handle
imbalance data. Unfortunately, MTS lacks a method for determining an ecient threshold for the binary classication. In
this paper, a nonlinear optimization model is formulated based on minimizing the distance between MTS Receiver Operating
Characteristics (ROC) curve and the theoretical optimal point named Modied Mahalanobis Taguchi System (MMTS). To
validate the MMTS classication ecacy, it has been benchmarked with Support Vector Machines (SVMs), Naive Bayes (NB),
Probabilistic Mahalanobis Taguchi Systems (PTM), Synthetic Minority Oversampling Technique (SMOTE), Adaptive Conformal
Transformation (ACT), Kernel Boundary Alignment (KBA), Hidden Naive Bayes (HNB), and other improved Naive Bayes
algorithms. MMTS outperforms the benchmarked algorithms especially when the imbalance ratio is greater than . A real life
case study on manufacturing sector is used to demonstrate the applicability of the proposed model and to compare its performance
with Mahalanobis Genetic Algorithm (MGA).
1. Introduction
Classicationisoneofthesupervisedlearningapproaches
in which a new observation needs to be assigned to one of
the predetermined classes or categories. If the number of the
predetermined classes is more than two, it is a multiclass
classication problem; otherwise, the problem is known as
the binary classication problem. At present, these problems
have found applications in dierent domains such as product
quality [] and speech recognition [].
e classication accuracy depends on both the classier
and the data types. e classier types can be categorized
according to supervised versus unsupervised learning, linear
versus nonlinear hyperplane, and feature selection versus
feature extraction based approach []. On the other hand,
Sun et al. [] reported that the parameters aecting the
classication are the overlapping between data (i.e., class
separability), small sample size, within-class concept (i.e., a
single class may consist of various subclasses, which do not
necessary have the same size), and the data distribution for
each class. If the data distribution of one class is dierent
from distributions of others, then the data is considered
imbalance. e border that separates balance from imbalance
data is vague; for example, imbalance ratio, which is the ratio
between the major to minor class observations, is reported
from small values of to to : [].
e assumption of an equal number of observations
in each class is elementary in using the common classi-
cation methods such as decision tree analysis, Support
Vector Machines, discriminant analysis, and neural networks
[]. Imbalance data occurs oen in real life such as text
classication[].eproblemoftreatingtheapplicationsthat
have imbalance data with the common classiers leads to bias
in the classication accuracy (i.e., the predictive accuracy for
the minority class will be much less than for the majority
class) and/or considering the minority observation as noise or
outliers, which will result in ignoring them from the classier.
To handle the classication of imbalanced data problem,
the research community uses data and algorithmic or both
approaches. For the data approach, the main idea is to
balance the class density randomly or informatively (i.e.,
targeted) either eliminating (downsampling) the majority
class observations or replicating (oversampling) the minority
class observations or doing both. While at the algorithmic
approach, the main idea is to adapt the classier algorithms
Hindawi
Computational Intelligence and Neuroscience
Volume 2017, Article ID 5874896, 15 pages
https://doi.org/10.1155/2017/5874896