Enter your keyword

30 Jan 2009: Seminar A HIGHLY AND EFFICIENT  ROBUST OUTLIER LABELING

30 Jan 2009: Seminar A HIGHLY AND EFFICIENT  ROBUST OUTLIER LABELING

Algebra Research Group FMIPA ITB conduct a seminar entitled by A HIGHLY AND EFFICIENT 
ROBUST OUTLIER LABELING on,

Date : Friday, January 30th, 2009
Time : 09.00 – 11.00
Place : Study Hall, Labtek 3 Matematika, ITB
Jl. Ganesa 10, Bandung

This seminar is going to be presented by Prof. Maman Djauhari. 

ABSTRACT 

Minimum covariance determinant (MCD), introduced by Rousseeuw (1985), is a
highly robust estimation method of multivariate location and scatter. It
satisfies all desired properties such as: it has high breakdown point and
asymptotically tends to the maximum value, it is affine-equivariant, and
its influence function is bounded. Its fast version algorithm (Rousseeuw
and van Driessen, 1999), also called Fast MCD, is the most widely used
robust estimation method in the literature and applications. Nowadays, it
becomes more and more popular after Hubert et al. (2005) introduce the
improved version of Fast MCD which is able to give the solution as close
as possible to the global minimum.

The ultimate goal of Fast MCD is to define a robust Mahalanobis distance
in order to separate all outlier suspects. This step is called outlier
labeling. However, Fast MCD is not apt for large and high dimensional data
sets. This is a natural consequence of the use of (1) covariance
determinant as multivariate dispersion measure and (2) Mahalanobis
distance, which involves the inversion of the covariance matrix, to define
ordering structure in the set of data vectors in the sense of
center-outward ordering.

The ability to overcome that obstacle is very challenging. It is
indispensable, for example, in bioinformatics, computer intrusion
detection, data mining, genetic engineering, financial industry and many
other fields of application. To contribute to that challenging area, in
this work we start by analyzing the structure of Fast MCD. We show that
mathematically, all what we need is a measure of multivariate dispersion
and a depth function. The latter will lead us to a new frontier of
research where computational geometry, computer science, linear algebra,
and statistics play simultaneously the most important role.

Later on, we introduce a new theorem on depth function and a new theorem
on measure of multivariate dispersion to construct a new robust estimation
method of location and scatter which can be used even when data sets are
large and of high dimension. Based on these estimates we define a robust
center-outward ordering and a new outlier labeling. For data sets of low
dimension, the new method is computationally far more efficient than Fast
MCD and, as simulation experiments strongly indicate, it is as robust as
Fast MCD. For high dimensional data sets, where Fast MCD does not work, it
is able to handle the obstacle mentioned above.

Keywords: affine-equivariant, breakdown point, covariance determinant,
depth function, influential function, multivariate variability measure 

 

No Comments

Post a Comment

Your email address will not be published.