**1st Seminar by CAST—Centre for Applied Statistics and Data Analytics**

**Recent developments in multivariate methods using multiple scatter matrices**

**Date**: August 10th, 2016 (14h00 - 16h00)

**Venue**: University of Tampere, Finland

**Scientific Organisation**: Klaus Nordhausen: klaus.nordhausen@utu.fi, Paulo Canas Rodrigues: Paulo.Rodrigues@uta.fi

**Local Organisation**: Ansa Lilja: Ansa.Lilja@uta.fi

The first Seminar by the Centre for Applied Statistics and Data Analytics (CAST) will be held at the School of Health Sciences, University of Tampere, Finland, at August 10th, 2016. It will gather researchers, other faculty and students interested in applied statistics and data analytics working at the University of Tampere and other Universities, Research Institutes and Companies.

The main aims of the seminar events by CAST are: (i) to bring awareness of the importance of statistics and data analysis in research; (ii) to create a forum of discussion where researchers present their work and research questions followed by discussion and feedback from the audience; (iii) strengthen the links between schools and research groups that might lead to future collaborations in terms of research articles and funding applications.

The visiting speakers of the seminar are Aurore Archimbaud (Universite Toulouse 1 Capitole), Markus Matilainen (University of Turku), Klaus Nordhausen (University of Turku) and Joni Virta (University of Turku)

Please reserve your calendars accordingly and spread the information to whom it may concern. Registrations.

**Wednesday, August 10th, 2016 (group room A308, Arvo building in Kauppi)**

14h00 – 14h05: Opening (Klaus Nordhausen)

14h05 – 14h30: **Klaus Nordhausen** : *Is it `plug & play' or `plug & pray' in robust multivariate statistics?*

14h30 – 15h00: **Aurore Archimbaud ***: Components selection for multivariate outlier detection with ICS*

15h00 – 15h30: **Markus Matilainen** : *Some independent component analysis tools for time series data *

15h30 – 16h00: **Joni Virta** : *Independent component analysis for tensor-valued data*

** **

** ****ABSTRACTS:**

Klaus Nordhausen

University of Turku, klaus.nordhausen@utu.fi

http://users.utu.fi/klanor/

Title: Is it `plug & play' or `plug & pray' in robust multivariate statistics?

Authors: Klaus Nordhausen and David E. Tyler

Abstract: The sample covariance matrix, which is well known to be highly non-robust, plays a central role in many classical multivariate statistical methods. A popular approach for making such multivariate methods more robust is to simply replace the sample covariance matrix with some robust scatter matrix. In this talk we will demonstrate that multivariate methods often require that certain properties of the covariance matrix also hold for the robust scatter matrix in order for the corresponding robust ``plug-in'' method to be a valid approach, and that not all scatter matrices necessarily possess the desired properties. Plug-in methods for the following three multivariate methods are considered in more detail in this talk: independent components analysis, observational regression and graphical modeling. For each case, it is shown that replacing the sample covariance matrix with a symmetrized robust scatter matrix yields a valid robust multivariate procedure.

Aurore Archimbaud

TSE-R, University Toulouse 1 Capitole, 21 allee de Brienne, 31000

Toulouse, France

E-mail: aurore.archimbaud@ut-capitole.fr

https://www.tse-fr.eu/people/aurore-archimbaud?lang=en

Title: Components selection for multivariate outlier detection with ICS

Abstract: The detection of a small proportion of multivariate outliers such as identifying production errors in industrial processes is an important topic. In this context, the Invariant Coordinate Selection (ICS) method is an efficient identification procedure. The ingenious idea of the method, compared to other multivariate methods such as Principal Component Analysis (PCA) or robust PCA, is to simultaneously diagonalize two scatter matrices. In case of a small percentage of outliers, the ICS coordinates are ordered decreasingly according to a generalized concept of kurtosis depending on the considered pair of scatters. Taking into account the coordinates associated with large kurtosis values, the observations far away from the center of the data are declared as outliers. One challenging step in the procedure is to select the components that display outliers. Two approaches are introduced and compared. The first one is comparable to a test procedure where the critical value is calculated using some simulations. The other approach incorporates some univariate normality tests.

Markus Matilainen

University of Turku, markus.matilainen@utu.fi

Title: Some independent component analysis tools for time series data

Abstract:Blind Source Separation models are semiparametric models, where the components of an observed p-variate vector x are assumed to be linear combinations of the components of some unobserved p-variate source vector z. In time series context, the observations are assumed to be from a p-variate time series. We focus on independent component analysis (ICA), which is a special case of Blind Source Separation. We introduce extensions of classic FOBI (Fourth Order Blind Identification) and JADE (Joint Approximate Diagonalization of Eigen-matrices) estimates and a variant of SOBI (Second Order Blind Identification) estimate for multivariate time series, with a special focus on time series with stochastic volatility. In the end of the talk some results from a simulation study are presented.

Joni Virta

University of Turku, jomivi@utu.fi

http://users.utu.fi/jomivi/

Title: Independent component analysis for tensor-valued data

Authors: Joni Virta (University of Turku, jomivi@utu.fi), Bing Li, Klaus Nordhausen and Hannu Oja

Abstract: In preprocessing high-dimensional tensor data, e.g. images or videos, a common procedure is to vectorize the observed tensors and subject the resulting vectors to one of the many methods used for independent component analysis (ICA). However, the structure of the original tensor is lost in the vectorization along with any meaningful interpretations of its modes. To provide a more suitable alternative, we propose the Tensor fourth order blind identification (TFOBI), a tensor-valued analogy of the classic Fourth order blind identification (FOBI), to be used with the semiparametric tensor independent component model. In TFOBI, instead of vectorizing, we stay in the tensor form and in a sense perform FOBI simultaneously on all the modes of the observed tensors. Furthermore, being an extension of FOBI, TFOBI shares with it its computational simplicity. Simulated and real-world examples are used to showcase the method's usefulness and superiority over the combination of vectorizing and FOBI.

Kalevantie 4, 33014 Tampereen yliopisto

puh. (03) 355 111

puh. (03) 355 111