Bruges, Belgium, April 25-26-27
Content of the proceedings
-
Theory and practice of adaptive input driven dynamical dystems
Regression
Brain-computer interfaces
Image and time series analysis
Interpretable models in machine learning
Machine ensembles: theory and applications
Bayesian and graphical models, optimization
Unsupervised learning
Statistical methods and kernel-based algorithms
Classification and model selection
Recent developments in clustering algorithms
Feature selection and information-based learning
Nonlinear dimensionality reduction and topological learning
Recurrent and neural networks, reinforcement learning, control
Parallel hardware architectures for acceleration of neural network computation
Theory and practice of adaptive input driven dynamical dystems
ES2012-6
Theory of Input Driven Dynamical Systems
Manjunath Gandhi, Tino Peter, Herbert Jaeger
Theory of Input Driven Dynamical Systems
Manjunath Gandhi, Tino Peter, Herbert Jaeger
Abstract:
Most dynamic models of interest in machine learning, robotics, AI or cognitive science are nonautonomous and input-driven. In the last few years number of important innovations have occurred in mathemati- cal research on nonautonomous systems. In understanding the long term behavior of nonautonomous systems, the notion of an attractor is fun- damental. With a time varying input, it turns out that for a notion of an attractor to be useful, the attractor cannot a single subset, but must be conceived as a sequence of sets varying with time as well. The aim of this tutorial is to illuminate useful notions of attractors of nonautonomous systems, and also introduce some newly emerging concepts of dynamical systems theory which are particularly relevant for input driven systems.
Most dynamic models of interest in machine learning, robotics, AI or cognitive science are nonautonomous and input-driven. In the last few years number of important innovations have occurred in mathemati- cal research on nonautonomous systems. In understanding the long term behavior of nonautonomous systems, the notion of an attractor is fun- damental. With a time varying input, it turns out that for a notion of an attractor to be useful, the attractor cannot a single subset, but must be conceived as a sequence of sets varying with time as well. The aim of this tutorial is to illuminate useful notions of attractors of nonautonomous systems, and also introduce some newly emerging concepts of dynamical systems theory which are particularly relevant for input driven systems.
ES2012-142
Simple reservoirs with chain topology based on a single time-delay nonlinear node
José Manuel Gutiérrez, D. San-Martín, Silvia Ortin, Luis Pesquera
Simple reservoirs with chain topology based on a single time-delay nonlinear node
José Manuel Gutiérrez, D. San-Martín, Silvia Ortin, Luis Pesquera
Abstract:
A physical scheme based on a single nonlinear dynamical system with delayed feedback has been recently proposed for Reservoir Computing (RC) [1]. In this paper we present a computational implementation of this idea using a simple chain topology with properties derived from its physical counterpart (e.g. the reservoir is defined by two tunable parameters related to feedback- and input-strength terms). An application to time series prediction is described and a comparison with other standard reservoir computing methods is given.
A physical scheme based on a single nonlinear dynamical system with delayed feedback has been recently proposed for Reservoir Computing (RC) [1]. In this paper we present a computational implementation of this idea using a simple chain topology with properties derived from its physical counterpart (e.g. the reservoir is defined by two tunable parameters related to feedback- and input-strength terms). An application to time series prediction is described and a comparison with other standard reservoir computing methods is given.
ES2012-175
Balancing of neural contributions for multi-modal hidden state association
Christian Emmerich, R. Felix Reinhart, Jochen J. Steil
Balancing of neural contributions for multi-modal hidden state association
Christian Emmerich, R. Felix Reinhart, Jochen J. Steil
Abstract:
We generalize the formulation of associative reservoir computing networks to multiple input modalities and demonstrate applications in image and audio processing scenarios.Robust association with reservoir networks requires to cope with potential error amplification of output feedback dynamics and to handle differently sized input and output modalities. We propose a dendritic neuron model in combination with a modified reservoir regularization technique to address both issues.
We generalize the formulation of associative reservoir computing networks to multiple input modalities and demonstrate applications in image and audio processing scenarios.Robust association with reservoir networks requires to cope with potential error amplification of output feedback dynamics and to handle differently sized input and output modalities. We propose a dendritic neuron model in combination with a modified reservoir regularization technique to address both issues.
ES2012-45
Input-Output Hidden Markov Models for trees
Davide Bacciu, Alessio Micheli, Alessandro Sperduti
Input-Output Hidden Markov Models for trees
Davide Bacciu, Alessio Micheli, Alessandro Sperduti
Abstract:
The paper introduces an input-driven generative model for tree-structured data that extends the bottom-up hidden tree Markov model with non-homogenous transition and emission probabilities. The advantage of introducing an input-driven dynamics in structured-data processing is experimentally investigated. The results of this preliminary analysis suggest that input-driven models can capture more discriminative structural information than non-input-driven approaches.
The paper introduces an input-driven generative model for tree-structured data that extends the bottom-up hidden tree Markov model with non-homogenous transition and emission probabilities. The advantage of introducing an input-driven dynamics in structured-data processing is experimentally investigated. The results of this preliminary analysis suggest that input-driven models can capture more discriminative structural information than non-input-driven approaches.
ES2012-89
Constructive Reservoir Computation with Output Feedbacks for Structured Domains
Claudio Gallicchio, Alessio Micheli, Giulio Visco
Constructive Reservoir Computation with Output Feedbacks for Structured Domains
Claudio Gallicchio, Alessio Micheli, Giulio Visco
Abstract:
We introduce a novel constructive algorithm which progressively builds the architecture of GraphESN, which generalizes Reservoir Computing to learning in graph domains. Exploiting output feedback signals in a forward fashion in such construction, allows us to introduce supervision in the reservoir encoding process. The potentiality of the proposed approach is experimentally assessed on real-world tasks from Toxicology.
We introduce a novel constructive algorithm which progressively builds the architecture of GraphESN, which generalizes Reservoir Computing to learning in graph domains. Exploiting output feedback signals in a forward fashion in such construction, allows us to introduce supervision in the reservoir encoding process. The potentiality of the proposed approach is experimentally assessed on real-world tasks from Toxicology.
ES2012-123
Process Mining in Non-Stationary Environments
Phil Weber, Tino Peter, Behzad Bordbar
Process Mining in Non-Stationary Environments
Phil Weber, Tino Peter, Behzad Bordbar
Abstract:
Process Mining uses event logs to discover and analyse business processes, typically assumed to be static. However as businesses adapt to change, processes can be expected to change. Since one application of process mining is ensuring conformance to prescribed processes or rules, timely detection of change is important. We consider process mining in such non-stationary environments and show that using a probabilistic view of processes, timely and confident detection of change is possible.
Process Mining uses event logs to discover and analyse business processes, typically assumed to be static. However as businesses adapt to change, processes can be expected to change. Since one application of process mining is ensuring conformance to prescribed processes or rules, timely detection of change is important. We consider process mining in such non-stationary environments and show that using a probabilistic view of processes, timely and confident detection of change is possible.
ES2012-189
Short Term Memory Quantifications in Input-Driven Linear Dynamical Systems
Tino Peter, Ali Rodan
Short Term Memory Quantifications in Input-Driven Linear Dynamical Systems
Tino Peter, Ali Rodan
Abstract:
We investigate the relation between two quantitative measures characterizing short term memory in input driven dynamical systems, namely the short term memory capacity (MC) and the Fisher memory curve (FMC). We show that under some assumptions, the two quantities can be interpreted as squared `Mahanabolis' norms of images of the input vector under the system's dynamics and that even though MC and FMC map the memory structure of the system from two quite different perspectives, they can be linked by a close relation.
We investigate the relation between two quantitative measures characterizing short term memory in input driven dynamical systems, namely the short term memory capacity (MC) and the Fisher memory curve (FMC). We show that under some assumptions, the two quantities can be interpreted as squared `Mahanabolis' norms of images of the input vector under the system's dynamics and that even though MC and FMC map the memory structure of the system from two quite different perspectives, they can be linked by a close relation.
Regression
ES2012-51
Supervised learning to tune simulated annealing for in silico protein structure prediction
Alejandro Marcos Alvarez, Francis Maes, Louis Wehenkel
Supervised learning to tune simulated annealing for in silico protein structure prediction
Alejandro Marcos Alvarez, Francis Maes, Louis Wehenkel
Abstract:
Simulated annealing is a widely used stochastic optimization algorithm whose efficiency essentially depends on the proposal distribution used to generate the next search state at each step. We propose to adapt this distribution to a family of parametric optimization problems by using supervised machine learning on a sample of search states derived from a set of typical runs of the algorithm over this family. We apply this idea in the context of in silico protein structure prediction.
Simulated annealing is a widely used stochastic optimization algorithm whose efficiency essentially depends on the proposal distribution used to generate the next search state at each step. We propose to adapt this distribution to a family of parametric optimization problems by using supervised machine learning on a sample of search states derived from a set of typical runs of the algorithm over this family. We apply this idea in the context of in silico protein structure prediction.
ES2012-61
Structural Risk Minimization and Rademacher Complexity for Regression
Davide Anguita, Alessandro Ghio, Luca Oneto, Sandro Ridella
Structural Risk Minimization and Rademacher Complexity for Regression
Davide Anguita, Alessandro Ghio, Luca Oneto, Sandro Ridella
Abstract:
The Structural Risk Minimization principle allows estimating the generalization ability of a learned hypothesis by measuring the complexity of the entire hypothesis class. Two of the most recent and effective complexity measures are the Rademacher Complexity and the Maximal Discrepancy, which have been applied to the derivation of generalization bounds for kernel classifiers. In this work, we extend their application to the regression framework.
The Structural Risk Minimization principle allows estimating the generalization ability of a learned hypothesis by measuring the complexity of the entire hypothesis class. Two of the most recent and effective complexity measures are the Rademacher Complexity and the Maximal Discrepancy, which have been applied to the derivation of generalization bounds for kernel classifiers. In this work, we extend their application to the regression framework.
ES2012-159
Quantile regression with multilayer perceptrons.
joseph Rynkiewicz, Solohaja-Faniaha Dimby
Quantile regression with multilayer perceptrons.
joseph Rynkiewicz, Solohaja-Faniaha Dimby
Abstract:
We consider nonlinear quantile regression involving multilayer perceptrons (MLP). In this paper we investigate the asymptotic behavior of quantile regression in a general framework. First by allowing possibly non-identifiable regression models like MLP's with redudant hidden units, then by relaxing the conditions on the density of the noise. We present an universal bound for the overfitting of such models under weak assumptions. The main application of this bound is to give a hint about determining the true architecture of the MLP quantile regression model. As an illustration, we use this theoretical result to propose and compare effective criteria to find the true architecture of quantile MLP regression model.
We consider nonlinear quantile regression involving multilayer perceptrons (MLP). In this paper we investigate the asymptotic behavior of quantile regression in a general framework. First by allowing possibly non-identifiable regression models like MLP's with redudant hidden units, then by relaxing the conditions on the density of the noise. We present an universal bound for the overfitting of such models under weak assumptions. The main application of this bound is to give a hint about determining the true architecture of the MLP quantile regression model. As an illustration, we use this theoretical result to propose and compare effective criteria to find the true architecture of quantile MLP regression model.
ES2012-4
Posterior regularization and attribute assessment of under-determined linear mappings
Marc Strickert, Michael Seifert
Posterior regularization and attribute assessment of under-determined linear mappings
Marc Strickert, Michael Seifert
Abstract:
Linear mappings are omnipresent in data processing analysis ranging from regression to distance metric learning. The interpretation of coefficients from under-determined mappings raises an unexpected challenge when the original modeling goal does not impose regularization. Therefore, a general posterior regularization strategy is presented for inducing unique results, and additional sensitivity analysis enables attribute assessment for facilitating model interpretation. An application to infrared spectra reflects data smoothness and indicates improved generalization.
Linear mappings are omnipresent in data processing analysis ranging from regression to distance metric learning. The interpretation of coefficients from under-determined mappings raises an unexpected challenge when the original modeling goal does not impose regularization. Therefore, a general posterior regularization strategy is presented for inducing unique results, and additional sensitivity analysis enables attribute assessment for facilitating model interpretation. An application to infrared spectra reflects data smoothness and indicates improved generalization.
ES2012-65
Effects of noise-reduction on neural function approximation
Frank-Florian Steege, Volker Stephan, Horst-Michael Groß
Effects of noise-reduction on neural function approximation
Frank-Florian Steege, Volker Stephan, Horst-Michael Groß
Abstract:
Noise disturbance in training data prevents a good approximation of a function by neural networks. To achieve better approximation results we combine neural networks with noise reduction algorithms. We compare different methods to distinguish between samples with high noise level (outliers) in a dataset and samples with low noise level. Drawbacks of common outlier detection approaches are analysed and a new approach is defined which increases the quality of network function approximations. We demonstrate the effects of noise reduction on artificial datasets and on real data from the process control domain.
Noise disturbance in training data prevents a good approximation of a function by neural networks. To achieve better approximation results we combine neural networks with noise reduction algorithms. We compare different methods to distinguish between samples with high noise level (outliers) in a dataset and samples with low noise level. Drawbacks of common outlier detection approaches are analysed and a new approach is defined which increases the quality of network function approximations. We demonstrate the effects of noise reduction on artificial datasets and on real data from the process control domain.
ES2012-80
Learning geometric combinations of Gaussian kernels with alternating Quasi-Newton algorithm
David Picard, Nicolas Thome, Matthieu Cord, Alain Rakotomamonjy
Learning geometric combinations of Gaussian kernels with alternating Quasi-Newton algorithm
David Picard, Nicolas Thome, Matthieu Cord, Alain Rakotomamonjy
Abstract:
We propose a novel algorithm for learning a geometric combination of Gaussian kernel jointly with a SVM classifier. This problem is the product counterpart of MKL, with restriction to Gaussian kernels. Our algorithm finds a local solution by alternating a Quasi-Newton gradient descent over the kernels and a classical SVM solver over the instances. We show promising results on well known data sets which suggest the soundness of the approach.
We propose a novel algorithm for learning a geometric combination of Gaussian kernel jointly with a SVM classifier. This problem is the product counterpart of MKL, with restriction to Gaussian kernels. Our algorithm finds a local solution by alternating a Quasi-Newton gradient descent over the kernels and a classical SVM solver over the instances. We show promising results on well known data sets which suggest the soundness of the approach.
ES2012-177
Real time drunkenness analysis in a realistic car simulation
Audrey Robinel, Didier Puzenat
Real time drunkenness analysis in a realistic car simulation
Audrey Robinel, Didier Puzenat
Abstract:
This paper describes a blood alcohol content estimation method for car driver, based on a comportment analysis performed within a realistic simulation. An artificial neural network learns how to estimate subject's blood alcohol content. Low-level recording of user actions on the steering wheel and pedals are used to feed a multilayer perceptron, and a breathalyzer is used to build the learning examples set (desired output). Results are compared with a successful previous work based on a simple video game and demonstrate the ``complexity scalability'' of the approach.
This paper describes a blood alcohol content estimation method for car driver, based on a comportment analysis performed within a realistic simulation. An artificial neural network learns how to estimate subject's blood alcohol content. Low-level recording of user actions on the steering wheel and pedals are used to feed a multilayer perceptron, and a breathalyzer is used to build the learning examples set (desired output). Results are compared with a successful previous work based on a simple video game and demonstrate the ``complexity scalability'' of the approach.
ES2012-173
Learning visuo-motor coordination for pointing without depth calculation
Ananda Freire, Andre Lemme, Jochen J. Steil, Guilherme Barreto
Learning visuo-motor coordination for pointing without depth calculation
Ananda Freire, Andre Lemme, Jochen J. Steil, Guilherme Barreto
Abstract:
Pointing refers to orienting a hand, arm, head or body towards an object and is possible without calculating the object's depth and 3D position. We show that pointing can be learned as holistic direct mapping from an object's pixel coordinates in the visual field to joint angles, which define pose and orientation of a human or robot. To this aim, we record real world and noisy training images together with corresponding robot pointing postures for the humanoid robot iCub. We then learn and comparatively evaluate pointing with an multi-layer perceptron, an extrem learning machine and a reservoir network, but also demonstrate that learning fails at reconstructing the depth of trained objects.
Pointing refers to orienting a hand, arm, head or body towards an object and is possible without calculating the object's depth and 3D position. We show that pointing can be learned as holistic direct mapping from an object's pixel coordinates in the visual field to joint angles, which define pose and orientation of a human or robot. To this aim, we record real world and noisy training images together with corresponding robot pointing postures for the humanoid robot iCub. We then learn and comparatively evaluate pointing with an multi-layer perceptron, an extrem learning machine and a reservoir network, but also demonstrate that learning fails at reconstructing the depth of trained objects.
Brain-computer interfaces
ES2012-130
BCI Signal Classification using a Riemannian-based kernel
Alexandre Barachant, Stephane Bonnet, Marco Congedo, Christian Jutten
BCI Signal Classification using a Riemannian-based kernel
Alexandre Barachant, Stephane Bonnet, Marco Congedo, Christian Jutten
Abstract:
The use of spatial covariance matrix as feature is investigated for motor imagery EEG-based classification. A new kernel is derived by establishing a connection with the Riemannian geometry of symmetric positive definite matrices. Different kernels are tested, in combination with support vector machines, on a past BCI competition dataset. We demonstrate that this new approach outperforms significantly state of the art results without the need for spatial filtering.
The use of spatial covariance matrix as feature is investigated for motor imagery EEG-based classification. A new kernel is derived by establishing a connection with the Riemannian geometry of symmetric positive definite matrices. Different kernels are tested, in combination with support vector machines, on a past BCI competition dataset. We demonstrate that this new approach outperforms significantly state of the art results without the need for spatial filtering.
ES2012-18
One Class SVM and Canonical Correlation Analysis increase performance in a c-VEP based Brain-Computer Interface (BCI)
Martin Spüler, Wolfgang Rosenstiel, Martin Bogdan
One Class SVM and Canonical Correlation Analysis increase performance in a c-VEP based Brain-Computer Interface (BCI)
Martin Spüler, Wolfgang Rosenstiel, Martin Bogdan
Abstract:
The goal of a Brain-Computer Interface (BCI) is to enable communication by pure brain activity without the need for muscle control. Recently BCIs based on code-modulated visual evoked potentials (c-VEPs) have shown great potential to establish high-performance communication. In this paper we present two new methods to improve classification in a c-VEP BCI. Canonical correlation analysis can be used to build an optimal spatial filter for detection of c-VEPs, while the use of a one class support vector machine (OCSVM) makes the BCI more robust in terms of artefacts and thus increases performance. We show both methods to increase performance in an offline analysis on data from 8 subjects. As a proof of concept both methods are tested online with one subject, who achieved an average performance of 133 bit/min, which is higher than any other bitrate reported so far for a non-invasive BCI.
The goal of a Brain-Computer Interface (BCI) is to enable communication by pure brain activity without the need for muscle control. Recently BCIs based on code-modulated visual evoked potentials (c-VEPs) have shown great potential to establish high-performance communication. In this paper we present two new methods to improve classification in a c-VEP BCI. Canonical correlation analysis can be used to build an optimal spatial filter for detection of c-VEPs, while the use of a one class support vector machine (OCSVM) makes the BCI more robust in terms of artefacts and thus increases performance. We show both methods to increase performance in an offline analysis on data from 8 subjects. As a proof of concept both methods are tested online with one subject, who achieved an average performance of 133 bit/min, which is higher than any other bitrate reported so far for a non-invasive BCI.
ES2012-40
Automatic selection of the number of spatial filters for motor-imagery BCI
Yuan Yang, Sylvain Chevallier, Joe Wiart, Isabelle BLOCH
Automatic selection of the number of spatial filters for motor-imagery BCI
Yuan Yang, Sylvain Chevallier, Joe Wiart, Isabelle BLOCH
Abstract:
Common spatial pattern (CSP) is widely used for constructing spatial filters to extract features for motor-imagery-based BCI. One main parameter in CSP-based classification is the number of spatial filters used. An automatic method relying on Rayleigh quotient is presented to estimate its optimal value for each subject. Based on an existing dataset, we validate the contribution of the proposed method through a study of the effect of this parameter on the classification performance. The evaluation on testing data shows that the estimated subject-specific optimal values yield better performances than the recommended value in the literature.
Common spatial pattern (CSP) is widely used for constructing spatial filters to extract features for motor-imagery-based BCI. One main parameter in CSP-based classification is the number of spatial filters used. An automatic method relying on Rayleigh quotient is presented to estimate its optimal value for each subject. Based on an existing dataset, we validate the contribution of the proposed method through a study of the effect of this parameter on the classification performance. The evaluation on testing data shows that the estimated subject-specific optimal values yield better performances than the recommended value in the literature.
ES2012-46
The error-related potential and BCIs
Sandra Rousseau, Christian Jutten, Marco Congedo
The error-related potential and BCIs
Sandra Rousseau, Christian Jutten, Marco Congedo
Abstract:
The error-related potential is an event-related potential triggered by errors. Recently it has been the subject of many attentions notably for its possible use in BCI systems. Since it is linked to error occurrence, it could be used in the design of control loop to build more robust systems. In this paper we studied the characteristics of the error potential and present how it could be used for BCI systems improvement.
The error-related potential is an event-related potential triggered by errors. Recently it has been the subject of many attentions notably for its possible use in BCI systems. Since it is linked to error occurrence, it could be used in the design of control loop to build more robust systems. In this paper we studied the characteristics of the error potential and present how it could be used for BCI systems improvement.
ES2012-140
Semi-Supervised Neural Gas for Adaptive Brain-Computer Interfaces
Hannes Riechmann, Andrea Finke
Semi-Supervised Neural Gas for Adaptive Brain-Computer Interfaces
Hannes Riechmann, Andrea Finke
Abstract:
Non-stationarity is inherent in EEG data. We propose a concept for an adaptive brain computer interface (BCI) that adapts a classifier to the changes in EEG data. It combines labeled and unlabeled data acquired during normal operation of the system. The classifier is based on Fuzzy Neural Gas (FNG), a prototype-based classifier. Based on four data sets we show that retraining the classifier significantly increases classification accuracy. Our approach smoothly adapts to the session-to-session variations in the data.
Non-stationarity is inherent in EEG data. We propose a concept for an adaptive brain computer interface (BCI) that adapts a classifier to the changes in EEG data. It combines labeled and unlabeled data acquired during normal operation of the system. The classifier is based on Fuzzy Neural Gas (FNG), a prototype-based classifier. Based on four data sets we show that retraining the classifier significantly increases classification accuracy. Our approach smoothly adapts to the session-to-session variations in the data.
Image and time series analysis
ES2012-47
Combined scattering for rotation invariant texture analysis
Laurent Sifre, Stéphane Mallat
Combined scattering for rotation invariant texture analysis
Laurent Sifre, Stéphane Mallat
Abstract:
This paper introduces a combined scattering representation for texture classification, which is invariant to rotations and stable to de- formations. A combined scattering is computed with two nested cascades of wavelet transforms and complex modulus, along spatial and rotation variables. Results are compared with state-of-the-art algorithms, with a K-nearest neighbor classifier.
This paper introduces a combined scattering representation for texture classification, which is invariant to rotations and stable to de- formations. A combined scattering is computed with two nested cascades of wavelet transforms and complex modulus, along spatial and rotation variables. Results are compared with state-of-the-art algorithms, with a K-nearest neighbor classifier.
ES2012-187
Hidden Markov models for time series of counts with excess zeros
Madalina Olteanu, James Ridgway
Hidden Markov models for time series of counts with excess zeros
Madalina Olteanu, James Ridgway
Abstract:
Integer-valued time series are often modeled with Markov models or hidden Markov models (HMM). However, when the series represents count data it is often subject to excess zeros. In this case, usual distributions such as binomial or Poisson are unable to estimate the zero mass correctly. In order to overcome this issue, we introduce zero-inflated distributions in the hidden Markov model. The empirical results on simulated and real data show good convergence properties, while excess zeros are better estimated than with classical HMM.
Integer-valued time series are often modeled with Markov models or hidden Markov models (HMM). However, when the series represents count data it is often subject to excess zeros. In this case, usual distributions such as binomial or Poisson are unable to estimate the zero mass correctly. In order to overcome this issue, we introduce zero-inflated distributions in the hidden Markov model. The empirical results on simulated and real data show good convergence properties, while excess zeros are better estimated than with classical HMM.
ES2012-171
Application of Dynamic Time Warping on Kalman Filtering Framework for Abnormal ECG Filtering
Mohammad Niknazar, Bertrand Rivet, Christian Jutten
Application of Dynamic Time Warping on Kalman Filtering Framework for Abnormal ECG Filtering
Mohammad Niknazar, Bertrand Rivet, Christian Jutten
Abstract:
Existing nonlinear Bayesian filtering frameworks serve as an effective tool for the model-based filtering of noisy ECG recordings. However, since these methods are based on linear phase assumption, for some heart defects where abnormal waves only appear in certain cycles of the ECG, they are unable to simultaneously filter the normal and abnormal ECG segments. In this paper, a new method based on Dynamic Time Warping (DTW), which benefits information of all channels for nonlinear phase state calculation is presented. Results on real and synthetic data show that the new method can be successfully applied for filtering normal and abnormal ECG segments simultaneously.
Existing nonlinear Bayesian filtering frameworks serve as an effective tool for the model-based filtering of noisy ECG recordings. However, since these methods are based on linear phase assumption, for some heart defects where abnormal waves only appear in certain cycles of the ECG, they are unable to simultaneously filter the normal and abnormal ECG segments. In this paper, a new method based on Dynamic Time Warping (DTW), which benefits information of all channels for nonlinear phase state calculation is presented. Results on real and synthetic data show that the new method can be successfully applied for filtering normal and abnormal ECG segments simultaneously.
ES2012-124
texture classification based on symbolic data analysis
Carlos de Almeida, Renata Souza, Ana Lucia Candeias
texture classification based on symbolic data analysis
Carlos de Almeida, Renata Souza, Ana Lucia Candeias
Abstract:
This article presents a hybrid approach for texture-based image classification using the gray-level co-occurrence matrices (GLCM) and a new Fuzzy Kohonen Clustering Network for Symbolic Interval Data (IFKCN). The GLCM matrices extracted from an image database are processed to create the training data set using IFKCN algorithm. The IFKCN organizes and extracts prototypes from processed GLCM matrices. The experimental results demonstrate that the proposed method is encouraging with an average successful rate of 97.39%.
This article presents a hybrid approach for texture-based image classification using the gray-level co-occurrence matrices (GLCM) and a new Fuzzy Kohonen Clustering Network for Symbolic Interval Data (IFKCN). The GLCM matrices extracted from an image database are processed to create the training data set using IFKCN algorithm. The IFKCN organizes and extracts prototypes from processed GLCM matrices. The experimental results demonstrate that the proposed method is encouraging with an average successful rate of 97.39%.
ES2012-160
Learning Object-Class Segmentation with Convolutional Neural Networks
Hannes Schulz, Sven Behnke
Learning Object-Class Segmentation with Convolutional Neural Networks
Hannes Schulz, Sven Behnke
Abstract:
After successes at image classification, segmentation is the next step towards image understanding for neural networks. We propose a convolutional network architecture that includes innovative elements, such as multiple output maps, suitable loss functions, supervised pretraining, multiscale inputs, reused outputs, and pairwise class location filters. Experiments on three data sets show that our method performs on par with current in computer vision methods with regards to accuracy and exceeds them in speed.
After successes at image classification, segmentation is the next step towards image understanding for neural networks. We propose a convolutional network architecture that includes innovative elements, such as multiple output maps, suitable loss functions, supervised pretraining, multiscale inputs, reused outputs, and pairwise class location filters. Experiments on three data sets show that our method performs on par with current in computer vision methods with regards to accuracy and exceeds them in speed.
ES2012-185
Incremental feature building and classification for image segmentation
Guillaume Bernard, Michel Verleysen, John Lee
Incremental feature building and classification for image segmentation
Guillaume Bernard, Michel Verleysen, John Lee
Abstract:
Image segmentation problems can be solved with classification algorithms. However, their use is limited to features derived from intensities of pixels or patches. Features such as contiguity of two regions cannot be considered without prior knowledge of one of the two class labels. Instead of stacking various classification algorithms, we describe an incremental scheme with a KNN classifier that works in a space where feature relevance is progressively updated. Feature relevance can smoothly vary from total ignorance to absolute certainty. Experiments on artificial images demonstrate the capabilities of this incremental scheme.
Image segmentation problems can be solved with classification algorithms. However, their use is limited to features derived from intensities of pixels or patches. Features such as contiguity of two regions cannot be considered without prior knowledge of one of the two class labels. Instead of stacking various classification algorithms, we describe an incremental scheme with a KNN classifier that works in a space where feature relevance is progressively updated. Feature relevance can smoothly vary from total ignorance to absolute certainty. Experiments on artificial images demonstrate the capabilities of this incremental scheme.
Interpretable models in machine learning
ES2012-7
Making machine learning models interpretable
Alfredo Vellido, José D. Martín-Guerrero, Paulo Lisboa
Making machine learning models interpretable
Alfredo Vellido, José D. Martín-Guerrero, Paulo Lisboa
Abstract:
Data of different levels of complexity and of ever growing diversity of characteristics are the raw materials that machine learning practitioners try to model using their wide palette of methods and tools. The obtained models are meant to be a synthetic representation of the available, observed data that captures some of their intrinsic regularities or patterns. Therefore, the use of machine learning techniques for data analysis can be understood as a problem of pattern recognition or, more informally, of knowledge discovery and data mining. There exists a gap, though, between data modeling and knowledge extraction. Models, depending on the machine learning techniques employed, can be described in diverse ways but, in order to consider that some knowledge has been achieved from their description, we must take into account the human cognitive factor that any knowledge extraction process entails. These models as such can be rendered powerless unless they can be interpreted, and the process of human interpretation follows rules that go well beyond technical prowess. For this reason, interpretability is a paramount quality that machine learning methods should aim to achieve if they are to be applied in practice. This paper is a brief introduction to the special session on interpretable models in machine learning, organized as part of the 20th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. It includes a discussion on the several works accepted for the session, with an overview of the context of wider research on interpretability of machine learning models.
Data of different levels of complexity and of ever growing diversity of characteristics are the raw materials that machine learning practitioners try to model using their wide palette of methods and tools. The obtained models are meant to be a synthetic representation of the available, observed data that captures some of their intrinsic regularities or patterns. Therefore, the use of machine learning techniques for data analysis can be understood as a problem of pattern recognition or, more informally, of knowledge discovery and data mining. There exists a gap, though, between data modeling and knowledge extraction. Models, depending on the machine learning techniques employed, can be described in diverse ways but, in order to consider that some knowledge has been achieved from their description, we must take into account the human cognitive factor that any knowledge extraction process entails. These models as such can be rendered powerless unless they can be interpreted, and the process of human interpretation follows rules that go well beyond technical prowess. For this reason, interpretability is a paramount quality that machine learning methods should aim to achieve if they are to be applied in practice. This paper is a brief introduction to the special session on interpretable models in machine learning, organized as part of the 20th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. It includes a discussion on the several works accepted for the session, with an overview of the context of wider research on interpretability of machine learning models.
ES2012-36
Interval coded scoring systems for survival analysis
Vanya Van Belle, Sabine Van Huffel, Johan Suykens, Stephen Boyd
Interval coded scoring systems for survival analysis
Vanya Van Belle, Sabine Van Huffel, Johan Suykens, Stephen Boyd
Abstract:
Black-box mathematical models are powerful tools in classi- fication and regression problems. Thanks to the use of (unknown) transformations of the inputs, the outcome can be estimated, improving performance in comparison to standard statistical models. A disadvantage of these complex models however, is their lack of interpretability. This work illustrates how advanced methods can be made interpretable. Using constant B-spline kernel functions and sparsity constraints, interval coded scoring models for survival analysis are presented.
Black-box mathematical models are powerful tools in classi- fication and regression problems. Thanks to the use of (unknown) transformations of the inputs, the outcome can be estimated, improving performance in comparison to standard statistical models. A disadvantage of these complex models however, is their lack of interpretability. This work illustrates how advanced methods can be made interpretable. Using constant B-spline kernel functions and sparsity constraints, interval coded scoring models for survival analysis are presented.
ES2012-99
Visualizing the quality of dimensionality reduction
Bassam Mokbel, Wouter Lueks, Andrej Gisbrecht, Michael Biehl, Barbara Hammer
Visualizing the quality of dimensionality reduction
Bassam Mokbel, Wouter Lueks, Andrej Gisbrecht, Michael Biehl, Barbara Hammer
Abstract:
Many different evaluation measures for dimensionality reduction can be summarized based on the co-ranking framework [Lee and Verleysen, 2009]. Here, we extend this framework in two ways: (i) we show that the current parameterization of the quality shows unpredictable behavior, even in simple settings, and we propose a different parameterization which yields more intuitive results; (ii) we propose how to link the quality to point-wise quality measures which can directly be integrated into the visualization.
Many different evaluation measures for dimensionality reduction can be summarized based on the co-ranking framework [Lee and Verleysen, 2009]. Here, we extend this framework in two ways: (i) we show that the current parameterization of the quality shows unpredictable behavior, even in simple settings, and we propose a different parameterization which yields more intuitive results; (ii) we propose how to link the quality to point-wise quality measures which can directly be integrated into the visualization.
ES2012-162
Unmixing Hyperspectral Images with Fuzzy Supervised Self-Organizing Maps
Thomas Villmann, Erzsebet Merenyi, William H. Farrand
Unmixing Hyperspectral Images with Fuzzy Supervised Self-Organizing Maps
Thomas Villmann, Erzsebet Merenyi, William H. Farrand
Abstract:
We propose a powerful alternative to customary linear spectral unmixing, with a new neural model, which achieves locally linear but globally non-linear unmixing. This enables unmixing with respect to a large number of endmembers, while traditional linear unmixing is limited to a handful of endmembers.
We propose a powerful alternative to customary linear spectral unmixing, with a new neural model, which achieves locally linear but globally non-linear unmixing. This enables unmixing with respect to a large number of endmembers, while traditional linear unmixing is limited to a handful of endmembers.
ES2012-169
Constructing similarity networks using the Fisher information metric
Héctor Ruiz, Sandra Ortega, Ian Jarman, José D. Martín-Guerrero, Paulo Lisboa
Constructing similarity networks using the Fisher information metric
Héctor Ruiz, Sandra Ortega, Ian Jarman, José D. Martín-Guerrero, Paulo Lisboa
Abstract:
The Fisher information metric defines a Riemannian space where distances reflect similarity with respect to a given probability distribution. This metric can be used during the process of building a relational network, resulting in a structure that is informed about the similarity criterion. Furthermore, the relational nature of this network allows for an intuitive interpretation of the data through their location within the network and the way it relates to the most representative cases or prototypes.
The Fisher information metric defines a Riemannian space where distances reflect similarity with respect to a given probability distribution. This metric can be used during the process of building a relational network, resulting in a structure that is informed about the similarity criterion. Furthermore, the relational nature of this network allows for an intuitive interpretation of the data through their location within the network and the way it relates to the most representative cases or prototypes.
ES2012-28
extended visualization method for classification trees
José M. Martínez-Martínez, Pablo Escandell-Montero, Emilio Soria-Olivas, José D. Martín-Guerrero, Juan Gómez-Sanchis, Joan Vila-Francés
extended visualization method for classification trees
José M. Martínez-Martínez, Pablo Escandell-Montero, Emilio Soria-Olivas, José D. Martín-Guerrero, Juan Gómez-Sanchis, Joan Vila-Francés
Abstract:
Classification tree analysis is one of the main techniques used in Data Mining, and nowadays there is a lack of a visualization method that support this tool. Therefore, graphical procedures can be developed in order to help simplify interpretation and to obtain a better understanding. This paper proposes a method for representing the input data distribution for each class presented in each terminal node. For this purpose, the new visualization method Sectors on Sectors (SonS), proposed in [1], is used. The methodology is tested in two real data sets.
Classification tree analysis is one of the main techniques used in Data Mining, and nowadays there is a lack of a visualization method that support this tool. Therefore, graphical procedures can be developed in order to help simplify interpretation and to obtain a better understanding. This paper proposes a method for representing the input data distribution for each class presented in each terminal node. For this purpose, the new visualization method Sectors on Sectors (SonS), proposed in [1], is used. The methodology is tested in two real data sets.
ES2012-29
Cartogram representation of the batch-SOM magnification factor
Alessandra Tosi, Alfredo Vellido
Cartogram representation of the batch-SOM magnification factor
Alessandra Tosi, Alfredo Vellido
Abstract:
Model interpretability is a problem of knowledge extraction from the patterns found in raw data. One key source of knowledge is information visualization, which can help us to gain insights into a problem through graphical representations and metaphors. Nonlinear dimensionality reduction techniques can provide flexible visual insight, but the locally varying representation distortion they produce makes interpretation far from intuitive. In this paper, we define a cartogram method, based on techniques of geographic representation, that allows reintroducing this distortion, measured as a magnification factor, in the visual maps of the batch-SOM model. It does so while preserving the topological continuity of the representation.
Model interpretability is a problem of knowledge extraction from the patterns found in raw data. One key source of knowledge is information visualization, which can help us to gain insights into a problem through graphical representations and metaphors. Nonlinear dimensionality reduction techniques can provide flexible visual insight, but the locally varying representation distortion they produce makes interpretation far from intuitive. In this paper, we define a cartogram method, based on techniques of geographic representation, that allows reintroducing this distortion, measured as a magnification factor, in the visual maps of the batch-SOM model. It does so while preserving the topological continuity of the representation.
ES2012-57
Integration of Structural Expert Knowledge about Classes for Classification Using the Fuzzy Supervised Neural Gas
Marika Kästner, Wieland Hermann, Thomas Villmann
Integration of Structural Expert Knowledge about Classes for Classification Using the Fuzzy Supervised Neural Gas
Marika Kästner, Wieland Hermann, Thomas Villmann
Abstract:
In this paper we describe a methodology how structural expert knowledge about class relations can be integrated in classification schemes, if these models judge class dissimilarities using an unary class coding scheme. In particular, we suggest for those models to incorporate these informations into the class dissimilarity measure.
In this paper we describe a methodology how structural expert knowledge about class relations can be integrated in classification schemes, if these models judge class dissimilarities using an unary class coding scheme. In particular, we suggest for those models to incorporate these informations into the class dissimilarity measure.
ES2012-148
Similarity networks for heterogeneous data
Lluís Belanche, Jerónimo Hernández
Similarity networks for heterogeneous data
Lluís Belanche, Jerónimo Hernández
Abstract:
A two-layer neural network is developed in which the neuron model computes a user-defined similarity function between inputs and weights. The neuron model is formed by the composition of an adapted logistic function with the mean of the partial input-weight similarities. The model is capable of dealing directly with variables of potentially different nature (continuous, ordinal, categorical); there is also provision for missing values. The network is trained using a fast two-stage procedure and involves the setting of only one parameter. In our experiments, the network achieves slightly superior performance on a set of challenging problems with respect to both RBF nets and RBF-kernel SVMs.
A two-layer neural network is developed in which the neuron model computes a user-defined similarity function between inputs and weights. The neuron model is formed by the composition of an adapted logistic function with the mean of the partial input-weight similarities. The model is capable of dealing directly with variables of potentially different nature (continuous, ordinal, categorical); there is also provision for missing values. The network is trained using a fast two-stage procedure and involves the setting of only one parameter. In our experiments, the network achieves slightly superior performance on a set of challenging problems with respect to both RBF nets and RBF-kernel SVMs.
ES2012-167
Discriminant functional gene groups identification with machine learning and prior knowledge
Grzegorz Zycinski, Margherita Squillario, Annalisa Barla, Tiziana Sanavia, Alessandro Verri, Barbara Di Camillo
Discriminant functional gene groups identification with machine learning and prior knowledge
Grzegorz Zycinski, Margherita Squillario, Annalisa Barla, Tiziana Sanavia, Alessandro Verri, Barbara Di Camillo
Abstract:
In computational biology, the analysis of high-throughput data poses several issues on the reliability, reproducibility and interpretability of the results. It has been suggested that one reason for these inconsistencies may be that in complex diseases, such as cancer, multiple genes belonging to one or more physiological pathways are associated with the outcomes. Thus, a possible approach to improve list interpretability is to integrate biological information from genomic databases in the learning process. Here we propose SVS, a machine learning based pipeline that incorporates domain biological knowledge a priori to structure the data matrix before the feature selection and classification phases. The pipeline is completed by a final step of semantic clustering and visualization. The clustering phase provides further interpretability of the results, allowing the identification of their biological meaning. To prove the efficacy of this procedure we analyzed a public dataset on prostate cancer.
In computational biology, the analysis of high-throughput data poses several issues on the reliability, reproducibility and interpretability of the results. It has been suggested that one reason for these inconsistencies may be that in complex diseases, such as cancer, multiple genes belonging to one or more physiological pathways are associated with the outcomes. Thus, a possible approach to improve list interpretability is to integrate biological information from genomic databases in the learning process. Here we propose SVS, a machine learning based pipeline that incorporates domain biological knowledge a priori to structure the data matrix before the feature selection and classification phases. The pipeline is completed by a final step of semantic clustering and visualization. The clustering phase provides further interpretability of the results, allowing the identification of their biological meaning. To prove the efficacy of this procedure we analyzed a public dataset on prostate cancer.
Machine ensembles: theory and applications
ES2012-9
An Exploration of Research Directions in Machine Ensemble Theory and Applications
Anibal Figueiras-Vidal, Lior Rokach
An Exploration of Research Directions in Machine Ensemble Theory and Applications
Anibal Figueiras-Vidal, Lior Rokach
Abstract:
A concise overview of the fundamentals and the main types of machine ensembles serves to propose a structured perspective for the papers that are included in this special session. The subsequent brief discussion of the works, emphasizing their principal contributions, permits an extraction of a series of suggestions for further research in the fruitful area of ensemble learning.
A concise overview of the fundamentals and the main types of machine ensembles serves to propose a structured perspective for the papers that are included in this special session. The subsequent brief discussion of the works, emphasizing their principal contributions, permits an extraction of a series of suggestions for further research in the fruitful area of ensemble learning.
ES2012-19
On the Independence of the Individual Predictions in Parallel Randomized Ensembles
Daniel Hernández-Lobato, Gonzalo Martínez-Muñoz, Alberto Suárez
On the Independence of the Individual Predictions in Parallel Randomized Ensembles
Daniel Hernández-Lobato, Gonzalo Martínez-Muñoz, Alberto Suárez
Abstract:
In randomized parallel ensembles the class label predictions for a particular instance by different ensemble classifiers are independent random variables. Taking advantage of this independence we design a statistical test to identify instances near the decision borders, which are difficult to classify because of their proximity to these borders. For these instances, the performance of the ensemble is poor and approaches random guessing. The validity of this analysis and the usefulness of the statistical test proposed are illustrated in several real-world classification problems.
In randomized parallel ensembles the class label predictions for a particular instance by different ensemble classifiers are independent random variables. Taking advantage of this independence we design a statistical test to identify instances near the decision borders, which are difficult to classify because of their proximity to these borders. For these instances, the performance of the ensemble is poor and approaches random guessing. The validity of this analysis and the usefulness of the statistical test proposed are illustrated in several real-world classification problems.
ES2012-121
Introducing diversity among the models of multi-label classification ensemble
Lena Chekina, Lior Rokach, Bracha Shapira
Introducing diversity among the models of multi-label classification ensemble
Lena Chekina, Lior Rokach, Bracha Shapira
Abstract:
A number of ensemble algorithms for solving multi-label classification problems have been proposed in recent years. Diversity among the base learners is known to be important for constructing a good ensemble. In this paper we define a method for introducing diversity among the base learners of one of the previously presented multi-label ensemble classifiers. An empirical comparison on 10 datasets demonstrates that model diversity leads to an improvement in prediction accuracy in 80% of the evaluated cases. Additionally, in most cases the proposed "diverse" ensemble method outperforms other multi-label ensembles as well.
A number of ensemble algorithms for solving multi-label classification problems have been proposed in recent years. Diversity among the base learners is known to be important for constructing a good ensemble. In this paper we define a method for introducing diversity among the base learners of one of the previously presented multi-label ensemble classifiers. An empirical comparison on 10 datasets demonstrates that model diversity leads to an improvement in prediction accuracy in 80% of the evaluated cases. Additionally, in most cases the proposed "diverse" ensemble method outperforms other multi-label ensembles as well.
ES2012-141
Distributed learning via Diffusion adaptation with application to ensemble learning
Zaid Towfic, Jianshu Chen, Ali Sayed
Distributed learning via Diffusion adaptation with application to ensemble learning
Zaid Towfic, Jianshu Chen, Ali Sayed
Abstract:
We examine the problem of learning a set of parameters from a distributed dataset. We assume the datasets are collected by agents over a distributed ad-hoc network, and that the communication of the actual raw data is prohibitive due to either privacy constraints or communication constraints. We propose a distributed algorithm for online learning that is proved to guarantee a bounded excess risk and the bound can be made arbitrary small for sufficiently small step-sizes. We apply our framework to the expert advice problem where nodes learn the weights for the trained experts distributively.
We examine the problem of learning a set of parameters from a distributed dataset. We assume the datasets are collected by agents over a distributed ad-hoc network, and that the communication of the actual raw data is prohibitive due to either privacy constraints or communication constraints. We propose a distributed algorithm for online learning that is proved to guarantee a bounded excess risk and the bound can be made arbitrary small for sufficiently small step-sizes. We apply our framework to the expert advice problem where nodes learn the weights for the trained experts distributively.
ES2012-26
Regularized Committee of Extreme Learning Machine for Regression Problems
Pablo Escandell-Montero, José M. Martínez-Martínez, Emilio Soria-Olivas, Josep Guimerá-Tomás, Marcelino Martínez-Sober, Antonio J. Serrano-López
Regularized Committee of Extreme Learning Machine for Regression Problems
Pablo Escandell-Montero, José M. Martínez-Martínez, Emilio Soria-Olivas, Josep Guimerá-Tomás, Marcelino Martínez-Sober, Antonio J. Serrano-López
Abstract:
Extreme learning machine (ELM) is an efficient learning algorithm for single-hidden layer feedforward networks (SLFN). This paper proposes the combination of ELM networks using a regularized committee. Simulations on many real-world regression data sets have demonstrated that this algorithm generally outperforms the original ELM algorithm.
Extreme learning machine (ELM) is an efficient learning algorithm for single-hidden layer feedforward networks (SLFN). This paper proposes the combination of ELM networks using a regularized committee. Simulations on many real-world regression data sets have demonstrated that this algorithm generally outperforms the original ELM algorithm.
ES2012-156
Linear kernel combination using boosting
Alexis Lechervy, Philippe-Henri Gosselin, Frédéric Precioso
Linear kernel combination using boosting
Alexis Lechervy, Philippe-Henri Gosselin, Frédéric Precioso
Abstract:
In this paper, we propose a novel algorithm to design multiclass kernels based on an iterative combination of weak kernels in a schema inspired from boosting framework. Our solution has a linear complexity in the number of training dataset size. We evaluate our method for classification first on a toy example by integrating our multi-class kernel into a kNN classifier and comparing our results with a reference iterative kernel design method, and then for image categorization by considering a classic image database and comparing our boosted linear kernel combination with the direct linear combination of all features in a linear SVM.
In this paper, we propose a novel algorithm to design multiclass kernels based on an iterative combination of weak kernels in a schema inspired from boosting framework. Our solution has a linear complexity in the number of training dataset size. We evaluate our method for classification first on a toy example by integrating our multi-class kernel into a kNN classifier and comparing our results with a reference iterative kernel design method, and then for image categorization by considering a classic image database and comparing our boosted linear kernel combination with the direct linear combination of all features in a linear SVM.
ES2012-158
The stability of feature selection and class prediction from ensemble tree classifiers
Jérôme Paul, Michel Verleysen, Pierre Dupont
The stability of feature selection and class prediction from ensemble tree classifiers
Jérôme Paul, Michel Verleysen, Pierre Dupont
Abstract:
The bootstrap aggregating procedure at the core of ensemble tree classifiers reduces, in most cases, the variance of such models while offering good generalization capabilities. The average predictive performance of those ensembles is known to improve up to a certain point while increasing the ensemble size. The present work studies this convergence in contrast to the stability of the class prediction and the variable selection performed while and after growing the ensemble. Experiments on several biomedical datasets, using random forests or bagging of decision trees, show that class prediction and, most notably, variable selection typically require orders of magnitude more trees to get stable.
The bootstrap aggregating procedure at the core of ensemble tree classifiers reduces, in most cases, the variance of such models while offering good generalization capabilities. The average predictive performance of those ensembles is known to improve up to a certain point while increasing the ensemble size. The present work studies this convergence in contrast to the stability of the class prediction and the variable selection performed while and after growing the ensemble. Experiments on several biomedical datasets, using random forests or bagging of decision trees, show that class prediction and, most notably, variable selection typically require orders of magnitude more trees to get stable.
Bayesian and graphical models, optimization
ES2012-183
Sparse Nonparametric Topic Model for Transfer Learning
Ali Faisal, Jussi Gillberg, Jaakko Peltonen, Gayle Leen, Samuel Kaski
Sparse Nonparametric Topic Model for Transfer Learning
Ali Faisal, Jussi Gillberg, Jaakko Peltonen, Gayle Leen, Samuel Kaski
Abstract:
Count data arises for example in bioinformatics or analysis of text documents represented as word count vectors. With several data sets available from related sources, like papers in related conference tracks, exploiting their similarities by transfer learning can improve models compared to modeling sources independently. We introduce a Bayesian generative transfer learning model which represents similarity across document collections by sparse sharing of latent topics controlled by an Indian Buffet Process. Unlike Hierarchical Dirichlet Process based multi-task learning, our model decouples topic sharing probability from topic strength, making sharing of low-strength topics easier, and outperforms the HDP approach in experiments.
Count data arises for example in bioinformatics or analysis of text documents represented as word count vectors. With several data sets available from related sources, like papers in related conference tracks, exploiting their similarities by transfer learning can improve models compared to modeling sources independently. We introduce a Bayesian generative transfer learning model which represents similarity across document collections by sparse sharing of latent topics controlled by an Indian Buffet Process. Unlike Hierarchical Dirichlet Process based multi-task learning, our model decouples topic sharing probability from topic strength, making sharing of low-strength topics easier, and outperforms the HDP approach in experiments.
ES2012-94
Assessment of sequential Boltmann machines on a lexical processing task
Alberto Testolin, Alessandro Sperduti, Ivilin Stoianov, Marco Zorzi
Assessment of sequential Boltmann machines on a lexical processing task
Alberto Testolin, Alessandro Sperduti, Ivilin Stoianov, Marco Zorzi
Abstract:
Recently, a promising probabilistic model based on Boltzmann Machines, i.e. the Recurrent Temporal RBM, has been proposed. It is able to learn physical dynamics (e.g. videos of bouncing balls), however up to now it was not clear whether this ability could apply to symbolic tasks. Here we assess its capabilities on learning graphotactic rules from a set of English words. It emerged that the model is able to extract local transition rules between items of a sequence, but it does not seem to be suited to encode a whole word.
Recently, a promising probabilistic model based on Boltzmann Machines, i.e. the Recurrent Temporal RBM, has been proposed. It is able to learn physical dynamics (e.g. videos of bouncing balls), however up to now it was not clear whether this ability could apply to symbolic tasks. Here we assess its capabilities on learning graphotactic rules from a set of English words. It emerged that the model is able to extract local transition rules between items of a sequence, but it does not seem to be suited to encode a whole word.
ES2012-165
Functional Mixture Discriminant Analysis with hidden process regression for curve classification
Faicel Chamroukhi, Hervé Glotin, Céline Rabouy
Functional Mixture Discriminant Analysis with hidden process regression for curve classification
Faicel Chamroukhi, Hervé Glotin, Céline Rabouy
Abstract:
We present a new mixture model-based discriminant analysis approach for functional data using a specific hidden process regression model. The approach allows for fitting flexible curve-models to each class of complex-shaped curves presenting regime changes. The model parameters are learned by maximizing the observed-data log-likelihood for each class by using a dedicated expectation-maximization (EM) algorithm. Comparisons on simulated data with alternative approaches show that the proposed approach provides better results.
We present a new mixture model-based discriminant analysis approach for functional data using a specific hidden process regression model. The approach allows for fitting flexible curve-models to each class of complex-shaped curves presenting regime changes. The model parameters are learned by maximizing the observed-data log-likelihood for each class by using a dedicated expectation-maximization (EM) algorithm. Comparisons on simulated data with alternative approaches show that the proposed approach provides better results.
ES2012-95
An analysis of Gaussian-binary restricted Boltzmann machines for natural images
Nan Wang, Jan Melchior, Laurenz Wiskott
An analysis of Gaussian-binary restricted Boltzmann machines for natural images
Nan Wang, Jan Melchior, Laurenz Wiskott
Abstract:
A Gaussian-binary restricted Boltzmann machine is a widely used energy-based model for continuous data distributions, although many authors reported difficulties in training on natural images. To clarify the model's capabilities and limitations we derive a rewritten formula of the probability density function as a linear superposition of Gaussians. Based on this formula we show how Gaussian-binary RBMs learn natural image statistics. However the probability density function is not a good representation of the data distribution.
A Gaussian-binary restricted Boltzmann machine is a widely used energy-based model for continuous data distributions, although many authors reported difficulties in training on natural images. To clarify the model's capabilities and limitations we derive a rewritten formula of the probability density function as a linear superposition of Gaussians. Based on this formula we show how Gaussian-binary RBMs learn natural image statistics. However the probability density function is not a good representation of the data distribution.
ES2012-27
learning task relatedness via dirichlet process priors for linear regression models
Marcel Hermkes, Nicolas Kuehn, Carsten Riggelsen
learning task relatedness via dirichlet process priors for linear regression models
Marcel Hermkes, Nicolas Kuehn, Carsten Riggelsen
Abstract:
In this paper we present a hierarchical model of linear regression functions in the context of multi-task learning. The parameters of the linear model are coupled by a Dirichlet Process (DP) prior, which implies a clustering of related functions for different tasks. To make approximate Bayesian inference under this model we apply the Bayesian Hierarchical Clustering (BHC) algorithm. The experiments are conducted on two real world problems: (i) school exam score prediction and (ii) prediction of ground-motion parameters. In comparison to baseline methods with no shared prior the results show an improved prediction performance when using the hierarchical model.
In this paper we present a hierarchical model of linear regression functions in the context of multi-task learning. The parameters of the linear model are coupled by a Dirichlet Process (DP) prior, which implies a clustering of related functions for different tasks. To make approximate Bayesian inference under this model we apply the Bayesian Hierarchical Clustering (BHC) algorithm. The experiments are conducted on two real world problems: (i) school exam score prediction and (ii) prediction of ground-motion parameters. In comparison to baseline methods with no shared prior the results show an improved prediction performance when using the hierarchical model.
ES2012-33
EMFit based Ultrasonic Phased Arrays with evolved Weights for Biomimetic Target Localization
Jan Steckel, Andre Boen, Dieter Vanderest, Herbert Peremans
EMFit based Ultrasonic Phased Arrays with evolved Weights for Biomimetic Target Localization
Jan Steckel, Andre Boen, Dieter Vanderest, Herbert Peremans
Abstract:
Bats use the spatial filtering performed by their pinnae in localization tasks. We propose a similar localization scheme based on the spatial filtering of the received echoes by a phased array. By evolving the weights of a linear phased array using a genetic algorithm, a very efficient spatial filter can be implemented. The localization performance of the evolved array in combination with the biomimetic localization algorithm is compared to a standard phased array localization scheme.
Bats use the spatial filtering performed by their pinnae in localization tasks. We propose a similar localization scheme based on the spatial filtering of the received echoes by a phased array. By evolving the weights of a linear phased array using a genetic algorithm, a very efficient spatial filter can be implemented. The localization performance of the evolved array in combination with the biomimetic localization algorithm is compared to a standard phased array localization scheme.
Unsupervised learning
ES2012-31
magnitude sensitive competitive learning
Enrique Pelayo, David Buldain, Carlos Orrite
magnitude sensitive competitive learning
Enrique Pelayo, David Buldain, Carlos Orrite
Abstract:
This paper presents a new algorithm, Magnitude Sensitive Competitive Learning (MSCL), which has the ability of distributing the unit weights following any magnitude calculated from the unit parameters or the input data inside the Voronoi region of the unit. This controlled behavior permits to surpass other standard Competitive Learning algorithms that only tend to concentrate neurons accordingly to the input data density. Some application examples applying different magnitude functions show the MSCL possibilities.
This paper presents a new algorithm, Magnitude Sensitive Competitive Learning (MSCL), which has the ability of distributing the unit weights following any magnitude calculated from the unit parameters or the input data inside the Voronoi region of the unit. This controlled behavior permits to surpass other standard Competitive Learning algorithms that only tend to concentrate neurons accordingly to the input data density. Some application examples applying different magnitude functions show the MSCL possibilities.
ES2012-152
From neuronal cost-based metrics towards sparse coded signals classification
Anthony Mouraud, Quentin Barthélemy, Aurélien Mayoue, Cédric Gouy-Pailler, Anthony Larue, Hélène Paugam-Moisy
From neuronal cost-based metrics towards sparse coded signals classification
Anthony Mouraud, Quentin Barthélemy, Aurélien Mayoue, Cédric Gouy-Pailler, Anthony Larue, Hélène Paugam-Moisy
Abstract:
Sparse signal decompositions are keys to efficient compression, storage and denoising, but they lack appropriate methods to exploit this sparsity for a classification purpose. Sparse coding methods based on dictionary learning may result in spikegrams, a sparse and temporal representation of signals by a raster of kernel occurrences through time. This paper proposes a method for coupling spike train cost-based metrics (from neuroscience) with spikegram sparse decompositions, for clustering multivariate signals. Experiments on character trajectories, recorded by sensors from natural handwriting, prove the validity of the approach, compared with currently available classification performance in literature.
Sparse signal decompositions are keys to efficient compression, storage and denoising, but they lack appropriate methods to exploit this sparsity for a classification purpose. Sparse coding methods based on dictionary learning may result in spikegrams, a sparse and temporal representation of signals by a raster of kernel occurrences through time. This paper proposes a method for coupling spike train cost-based metrics (from neuroscience) with spikegram sparse decompositions, for clustering multivariate signals. Experiments on character trajectories, recorded by sensors from natural handwriting, prove the validity of the approach, compared with currently available classification performance in literature.
ES2012-163
Hybrid hierarchical clustering: cluster assessment via cluster validation indices
Mark Embrechts, Jonathan Linton, Christopher Gatti
Hybrid hierarchical clustering: cluster assessment via cluster validation indices
Mark Embrechts, Jonathan Linton, Christopher Gatti
Abstract:
This paper introduces a novel method for speeding up hier- archical clustering with cluster seeding with the clusters obtained from a different clustering method (e.g., K-means). A benchmark study compares the cluster performance of hierarchical clustering and hierarchical cluster- ing with cluster seeding based on several cluster performance indices using a wide variety of real-world and artificial benchmark data sets. While cluster seeding can significantly speed up agglomerative hierarchical clus- tering, it will also affect the cluster quality, and thus the validation indices as well. Extensive benchmarks show that the impact of cluster seeding is often rather small.
This paper introduces a novel method for speeding up hier- archical clustering with cluster seeding with the clusters obtained from a different clustering method (e.g., K-means). A benchmark study compares the cluster performance of hierarchical clustering and hierarchical cluster- ing with cluster seeding based on several cluster performance indices using a wide variety of real-world and artificial benchmark data sets. While cluster seeding can significantly speed up agglomerative hierarchical clus- tering, it will also affect the cluster quality, and thus the validation indices as well. Extensive benchmarks show that the impact of cluster seeding is often rather small.
ES2012-48
Unsupervised learning of motion patterns
Thomas Guthier, Julian Eggert, Volker Willert
Unsupervised learning of motion patterns
Thomas Guthier, Julian Eggert, Volker Willert
Abstract:
Neurophysiological findings suggest that the visual cortex of mammals contains neural populations that are sensitive to specific motion patterns. In this paper, we present a new method to learn such patterns in an unsupervised way. To represent motion, dense optical flow fields of videos containing humans performing several actions like walking and running are estimated. We introduce VNMF, an extension of the translation invariant NMF that works on flow fields, along with a new energy term that enforces parts-basedness. VNMF incorporates three principles found in neural motion processing: Sparsity, non-negativity and direction selectivity. We find that the extracted motion patterns are shaped like body parts, which supports the idea that the representation of biological motion is directly linked to the shape of an object.
Neurophysiological findings suggest that the visual cortex of mammals contains neural populations that are sensitive to specific motion patterns. In this paper, we present a new method to learn such patterns in an unsupervised way. To represent motion, dense optical flow fields of videos containing humans performing several actions like walking and running are estimated. We introduce VNMF, an extension of the translation invariant NMF that works on flow fields, along with a new energy term that enforces parts-basedness. VNMF incorporates three principles found in neural motion processing: Sparsity, non-negativity and direction selectivity. We find that the extracted motion patterns are shaped like body parts, which supports the idea that the representation of biological motion is directly linked to the shape of an object.
ES2012-52
Robust clustering of high-dimensional data
Anastasios Bellas, Charles Bouveyron, Marie Cottrell, Jérôme Lacaille
Robust clustering of high-dimensional data
Anastasios Bellas, Charles Bouveyron, Marie Cottrell, Jérôme Lacaille
Abstract:
We address the problem of robust clustering of high - dimensional data, which is recurrent in real-world applications. Existing robust clustering methods are unfortunately sensitive in high dimension, while existing approaches for high-dimensional data are in general not robust. We propose a hybrid iterative EM-based algorithm that combines an efficient high-dimensional clustering algorithm and the trimming technique. We test our algorithm on synthetic and real-world data from the domain of aircraft engine health monitoring and show its efficiency for high-dimensional noisy datasets.
We address the problem of robust clustering of high - dimensional data, which is recurrent in real-world applications. Existing robust clustering methods are unfortunately sensitive in high dimension, while existing approaches for high-dimensional data are in general not robust. We propose a hybrid iterative EM-based algorithm that combines an efficient high-dimensional clustering algorithm and the trimming technique. We test our algorithm on synthetic and real-world data from the domain of aircraft engine health monitoring and show its efficiency for high-dimensional noisy datasets.
ES2012-78
Image reconstruction using an iterative SOM based algorithm
jouini Manel, Thiria Sylvie, crépon Michel
Image reconstruction using an iterative SOM based algorithm
jouini Manel, Thiria Sylvie, crépon Michel
Abstract:
The frequent presence of clouds in optical remotely sensed imagery prevents space and time continuity and limits its exploitation. The aim of this study is to propose a new statistical processing approach for the reconstruction of areas covered by clouds in a time sequence of optical satellite images. The approach is an iterative SOM based algorithm and was applied to reconstruct ocean color images. It used the information contained in color images and a set of satellite-derived dynamic ocean products (sea surface temperature: SST, altimetry: SSH) to reproduce the local spatio temporal relationships of the cloudy images. The reconstruction method is general and can be applied to fill gaps in mutli-dimensional and correlated data.
The frequent presence of clouds in optical remotely sensed imagery prevents space and time continuity and limits its exploitation. The aim of this study is to propose a new statistical processing approach for the reconstruction of areas covered by clouds in a time sequence of optical satellite images. The approach is an iterative SOM based algorithm and was applied to reconstruct ocean color images. It used the information contained in color images and a set of satellite-derived dynamic ocean products (sea surface temperature: SST, altimetry: SSH) to reproduce the local spatio temporal relationships of the cloudy images. The reconstruction method is general and can be applied to fill gaps in mutli-dimensional and correlated data.
Statistical methods and kernel-based algorithms
ES2012-10
Deconvolution in nonparametric statistics
Kris De Brabanter, Bart De Moor
Deconvolution in nonparametric statistics
Kris De Brabanter, Bart De Moor
Abstract:
In this tutorial paper we give an overview of deconvolution problems in nonparametric statistics. First, we consider the problem of density estimation given a contaminated sample. We illustrate that the classical Rosenblatt-Parzen kernel density estimator is unable to capture the full shape of the density while the presented method experiences almost no problems. Second, we use the previous estimator in a nonparametric regression framework with errors-in-variables.
In this tutorial paper we give an overview of deconvolution problems in nonparametric statistics. First, we consider the problem of density estimation given a contaminated sample. We illustrate that the classical Rosenblatt-Parzen kernel density estimator is unable to capture the full shape of the density while the presented method experiences almost no problems. Second, we use the previous estimator in a nonparametric regression framework with errors-in-variables.
ES2012-170
Weighted/Structured Total Least Squares problems and polynomial system solving
Philippe Dreesen, Kim Batselier, Bart De Moor
Weighted/Structured Total Least Squares problems and polynomial system solving
Philippe Dreesen, Kim Batselier, Bart De Moor
Abstract:
Weighted and Structured Total Least Squares (W/STLS) problems are generalizations of Total Least Squares with additional weighting and/or structure constraints. W/STLS are found at the heart of several mathematical engineering techniques, such as statistics and systems theory, and are typically solved by local optimization methods, having the drawback that one cannot guarantee global optimality of the retrieved solution. This paper employs the Riemannian SVD formulation to write the W/STLS problem as a system of polynomial equations. Using a novel matrix technique for solving systems of polynomial equations, the globally optimal solution of the W/STLS problem is retrieved.
Weighted and Structured Total Least Squares (W/STLS) problems are generalizations of Total Least Squares with additional weighting and/or structure constraints. W/STLS are found at the heart of several mathematical engineering techniques, such as statistics and systems theory, and are typically solved by local optimization methods, having the drawback that one cannot guarantee global optimality of the retrieved solution. This paper employs the Riemannian SVD formulation to write the W/STLS problem as a system of polynomial equations. Using a novel matrix technique for solving systems of polynomial equations, the globally optimal solution of the W/STLS problem is retrieved.
ES2012-149
Joint Regression and Linear Combination of Time Series for Optimal Prediction
Dries Geebelen, Kim Batselier, Philippe Dreesen, Signoretto Marco, Johan Suykens, Bart De Moor, Joos Vandewalle
Joint Regression and Linear Combination of Time Series for Optimal Prediction
Dries Geebelen, Kim Batselier, Philippe Dreesen, Signoretto Marco, Johan Suykens, Bart De Moor, Joos Vandewalle
Abstract:
In most machine learning applications the time series to predict is fixed and one has to learn a prediction model that causes the smallest error. In this paper choosing the time series to predict is part of the optimization problem. This time series has to be a linear combination of a priori given time series. The optimization problem that we have to solve can be formulated as choosing the linear combination of a priori known matrices such that the smallest singular vector is minimized. This problem has many local minima and can be formulated as a polynomial system which we will solve using a polynomial system solver. The proposed prediction algorithm has applications in algorithmic trading in which a linear combination of stocks will be bought.
In most machine learning applications the time series to predict is fixed and one has to learn a prediction model that causes the smallest error. In this paper choosing the time series to predict is part of the optimization problem. This time series has to be a linear combination of a priori given time series. The optimization problem that we have to solve can be formulated as choosing the linear combination of a priori known matrices such that the smallest singular vector is minimized. This problem has many local minima and can be formulated as a polynomial system which we will solve using a polynomial system solver. The proposed prediction algorithm has applications in algorithmic trading in which a linear combination of stocks will be bought.
ES2012-60
Averaging of kernel functions
Lluís Belanche, Alessandra Tosi
Averaging of kernel functions
Lluís Belanche, Alessandra Tosi
Abstract:
In kernel-based machines, the integration of several kernels to build more flexible learning methods is a promising avenue for research. In particular, in Multiple Kernel Learning a compound kernel is build by learning a kernel that is the weighted mean of several sources. We show in this paper that the only feasible average for kernel learning is precisely the arithmetic average. We also show that three familiar means (the geometric, inverse root mean square and harmonic means) for positive real values actually generate valid kernels.
In kernel-based machines, the integration of several kernels to build more flexible learning methods is a promising avenue for research. In particular, in Multiple Kernel Learning a compound kernel is build by learning a kernel that is the weighted mean of several sources. We show in this paper that the only feasible average for kernel learning is precisely the arithmetic average. We also show that three familiar means (the geometric, inverse root mean square and harmonic means) for positive real values actually generate valid kernels.
ES2012-37
maximum likelihood estimation and polynomial system solving
Kim Batselier, Philippe Dreesen, Bart De Moor
maximum likelihood estimation and polynomial system solving
Kim Batselier, Philippe Dreesen, Bart De Moor
Abstract:
This article presents an alternative method to find the global maximum likelihood estimates of the mixing probabilities of a mixture of multinomial distributions. For these mixture models it is shown that the maximum likelihood estimates of the mixing probabilities correspond with the roots of a multivariate polynomial system. A new algorithm, set in a linear algebra framework, is presented which allows to find all these roots by solving a generalized eigenvalue problem.
This article presents an alternative method to find the global maximum likelihood estimates of the mixing probabilities of a mixture of multinomial distributions. For these mixture models it is shown that the maximum likelihood estimates of the mixing probabilities correspond with the roots of a multivariate polynomial system. A new algorithm, set in a linear algebra framework, is presented which allows to find all these roots by solving a generalized eigenvalue problem.
Classification and model selection
ES2012-43
L1-based compression of random forest models
Arnaud Joly, François Schnitzler, Pierre Geurts, Louis Wehenkel
L1-based compression of random forest models
Arnaud Joly, François Schnitzler, Pierre Geurts, Louis Wehenkel
Abstract:
Random forests are effective supervised learning methods applicable to large-scale datasets. However, the space complexity of tree ensembles, in terms of their total number of nodes, is often prohibitive, specially in the context of problems with very high-dimensional input spaces. We propose to study their compressibility by applying a L1-based regularization to the set of indicator functions defined by all their nodes. We show experimentally that preserving or even improving the model accuracy while significantly reducing its space complexity is indeed possible.
Random forests are effective supervised learning methods applicable to large-scale datasets. However, the space complexity of tree ensembles, in terms of their total number of nodes, is often prohibitive, specially in the context of problems with very high-dimensional input spaces. We propose to study their compressibility by applying a L1-based regularization to the set of indicator functions defined by all their nodes. We show experimentally that preserving or even improving the model accuracy while significantly reducing its space complexity is indeed possible.
ES2012-88
RNN Based Batch Mode Active Learning Framework
Gaurav Maheshwari, Vikram Pudi
RNN Based Batch Mode Active Learning Framework
Gaurav Maheshwari, Vikram Pudi
Abstract:
Active Learning has been applied in many real world classification tasks to reduce the amount of labeled data required for training a classifier. However most of the existing active learning strategies select only a single sample for labeling by the oracle in every iteration. This results in retraining the classifier after each sample is added which is quite computationally expensive. Also many of the existing sample selection strategies are not suitable for the multi-class classification tasks. To overcome these issues, we propose an efficient batch mode framework for active learning using the notion of influence sets based on Reverse Nearest Neighbor, which is applicable for multi-class classification as well. To demonstrate the effectiveness of our technique, we compare its performance against existing active learning techniques on real life datasets. Experimental results show that our technique outperforms existing active learning methods significantly especially on multi-class datasets.
Active Learning has been applied in many real world classification tasks to reduce the amount of labeled data required for training a classifier. However most of the existing active learning strategies select only a single sample for labeling by the oracle in every iteration. This results in retraining the classifier after each sample is added which is quite computationally expensive. Also many of the existing sample selection strategies are not suitable for the multi-class classification tasks. To overcome these issues, we propose an efficient batch mode framework for active learning using the notion of influence sets based on Reverse Nearest Neighbor, which is applicable for multi-class classification as well. To demonstrate the effectiveness of our technique, we compare its performance against existing active learning techniques on real life datasets. Experimental results show that our technique outperforms existing active learning methods significantly especially on multi-class datasets.
ES2012-85
Adaptive learning for complex-valued data
Kerstin Bunte, Frank-Michael Schleif, Michael Biehl
Adaptive learning for complex-valued data
Kerstin Bunte, Frank-Michael Schleif, Michael Biehl
Abstract:
In this paper we propose a variant of the Generalized Matrix Learning Vector Quantization (GMLVQ) for dissimilarity learning on complex-valued data. Complex features can be encountered in various data domains, e.g. stemming from Fourier transform ion cyclotron resonance mass spectrometry and image analysis. Current approaches deal with complex inputs by ignoring the imaginary parts or concatenating real and imaginary parts to a longer real valued vector. In this contribution we propose a prototype based classification method, which allows to deal with complex-valued data in its natural form. The algorithm is demonstrated on a benchmark data set and for leave recognition using Zernike moments. We observe that the complex version converges much faster compared to the original GMLVQ evaluated on the real parts only. The complex version has less free parameters than using a concatenated vector and is thus computationally more efficient than original GMLVQ.
In this paper we propose a variant of the Generalized Matrix Learning Vector Quantization (GMLVQ) for dissimilarity learning on complex-valued data. Complex features can be encountered in various data domains, e.g. stemming from Fourier transform ion cyclotron resonance mass spectrometry and image analysis. Current approaches deal with complex inputs by ignoring the imaginary parts or concatenating real and imaginary parts to a longer real valued vector. In this contribution we propose a prototype based classification method, which allows to deal with complex-valued data in its natural form. The algorithm is demonstrated on a benchmark data set and for leave recognition using Zernike moments. We observe that the complex version converges much faster compared to the original GMLVQ evaluated on the real parts only. The complex version has less free parameters than using a concatenated vector and is thus computationally more efficient than original GMLVQ.
ES2012-111
Automatic Group-Outlier Detection
Amine Chaibi, Azzag Hanane, mustapha lebbah
Automatic Group-Outlier Detection
Amine Chaibi, Azzag Hanane, mustapha lebbah
Abstract:
We propose in this paper a new measure called GOF (Group Outlier Factor) to detect groups outliers. To validate this measure we integrated it in a clustering process using Self organizing Map. The proposed approach is based on relative density of each group of data and simultaneously provides a partitioning of data and a quantitative indicator (GOF). The obtained results are very encouraging to continue in this direction.
We propose in this paper a new measure called GOF (Group Outlier Factor) to detect groups outliers. To validate this measure we integrated it in a clustering process using Self organizing Map. The proposed approach is based on relative density of each group of data and simultaneously provides a partitioning of data and a quantitative indicator (GOF). The obtained results are very encouraging to continue in this direction.
ES2012-176
A CUSUM approach for online change-point detection on curve sequences
Nicolas CHEIFETZ, Allou Samé, Patrice Aknin, Emmanuel DE VERDALLE
A CUSUM approach for online change-point detection on curve sequences
Nicolas CHEIFETZ, Allou Samé, Patrice Aknin, Emmanuel DE VERDALLE
Abstract:
Anomaly detection on sequential data is common in many domains such as fraud detection for credit cards, intrusion detection for cyber-security or military surveillance. This paper addresses a new CUSUM-like method for change point detection on curves sequences in a context of preventive maintenance of transit buses door systems. The proposed approach is derived from a specific generative modeling of curves. The system is considered out of control when the parameters of the curves density change. Experimental studies performed on real world data demonstrate the promising behavior of the proposed method.
Anomaly detection on sequential data is common in many domains such as fraud detection for credit cards, intrusion detection for cyber-security or military surveillance. This paper addresses a new CUSUM-like method for change point detection on curves sequences in a context of preventive maintenance of transit buses door systems. The proposed approach is derived from a specific generative modeling of curves. The system is considered out of control when the parameters of the curves density change. Experimental studies performed on real world data demonstrate the promising behavior of the proposed method.
ES2012-13
One-class classifier based on extreme value statistics
David Martínez-Rego, Evan Kriminger, Jose C. Principe, Oscar Fontenla-Romero, Amparo Alonso-Betanzos
One-class classifier based on extreme value statistics
David Martínez-Rego, Evan Kriminger, Jose C. Principe, Oscar Fontenla-Romero, Amparo Alonso-Betanzos
Abstract:
Interest in One-Class Classification methods has soared in recent years due to its wide applicability in many practical problems where classification in the absence of counterexamples is needed. In this paper, a new one class classification rule based on order statistics is presented. It only relies on the embedding of the classification problem into a metric space, so it is suitable for Euclidean or other structured mappings. The suitability of the proposed method is assessed through a comparison both for artificial and real life data sets. The good results obtained pave the road to its application on practical novelty detection problems
Interest in One-Class Classification methods has soared in recent years due to its wide applicability in many practical problems where classification in the absence of counterexamples is needed. In this paper, a new one class classification rule based on order statistics is presented. It only relies on the embedding of the classification problem into a metric space, so it is suitable for Euclidean or other structured mappings. The suitability of the proposed method is assessed through a comparison both for artificial and real life data sets. The good results obtained pave the road to its application on practical novelty detection problems
ES2012-139
Classifying Scotch Whisky from near-infrared Raman spectra with a Radial Basis Function Network with Relevance Learning
Andreas Backhaus, Praveen Cheriyan Ashok, Bavishna Balagopal Praveen, Kishan Dholakia, Udo Seiffert
Classifying Scotch Whisky from near-infrared Raman spectra with a Radial Basis Function Network with Relevance Learning
Andreas Backhaus, Praveen Cheriyan Ashok, Bavishna Balagopal Praveen, Kishan Dholakia, Udo Seiffert
Abstract:
The instantaneous assessment of high-priced liquor products with minimal sample volume and no special preparation is an important task for quality monitoring and fraud detection. In this contribution the automated classification of Raman spectra acquired with a special optofluidic chip is performed with the use of a number of Artificial Neural Networks. A standard Radial Basis Function Network is adopted to incorporate relevance learning and showed robust classification performance across classification tasks. The acquired relevance weighting per feature dimension can be used to reduce the number of features while retaining a high level of accuracy.
The instantaneous assessment of high-priced liquor products with minimal sample volume and no special preparation is an important task for quality monitoring and fraud detection. In this contribution the automated classification of Raman spectra acquired with a special optofluidic chip is performed with the use of a number of Artificial Neural Networks. A standard Radial Basis Function Network is adopted to incorporate relevance learning and showed robust classification performance across classification tasks. The acquired relevance weighting per feature dimension can be used to reduce the number of features while retaining a high level of accuracy.
ES2012-81
Supervised and unsupervised classification approaches for human activity recognition using body-mounted sensors
Dorra Trabelsi, Samer Mohammed, Faicel Chamroukhi, Latifa Oukhellou, Yacine Amirat
Supervised and unsupervised classification approaches for human activity recognition using body-mounted sensors
Dorra Trabelsi, Samer Mohammed, Faicel Chamroukhi, Latifa Oukhellou, Yacine Amirat
Abstract:
In this paper, the activity recognition problem from 3-d acceleration data measured with body-worn accelerometers is formulated as a problem of multidimensional time series segmentation and classication. More specically, the proposed approach uses a statistical model based on Multiple Hidden Markov Model Regression (MHMMR) to automatically analyze the human activity. The method takes into account the sequential appearance and temporal evolution of the data to easily detect activities and transitions. Classication results obtained by comparing the proposed approach to those of the standard supervised classication approaches as well as the standard hidden Markov model show that the proposed approach is promising.
In this paper, the activity recognition problem from 3-d acceleration data measured with body-worn accelerometers is formulated as a problem of multidimensional time series segmentation and classication. More specically, the proposed approach uses a statistical model based on Multiple Hidden Markov Model Regression (MHMMR) to automatically analyze the human activity. The method takes into account the sequential appearance and temporal evolution of the data to easily detect activities and transitions. Classication results obtained by comparing the proposed approach to those of the standard supervised classication approaches as well as the standard hidden Markov model show that the proposed approach is promising.
ES2012-86
Matrix relevance LVQ in steroid metabolomics based classification of adrenal tumors
Michael Biehl, Petra Schneider, David Smith, Han Stiekema, Angela Taylor, Beverly Hughes, Cedric Shackleton, Paul Stewart, Wiebke Arlt
Matrix relevance LVQ in steroid metabolomics based classification of adrenal tumors
Michael Biehl, Petra Schneider, David Smith, Han Stiekema, Angela Taylor, Beverly Hughes, Cedric Shackleton, Paul Stewart, Wiebke Arlt
Abstract:
We present a machine learning system for the differential diagnosis of benign adrenocortical adenoma (ACA) vs. malignant adrenocortical carcinoma (ACC). The data employed for the classification are urinary excretion values of 32 steroid metabolites. We apply prototype-based classification techniques to discriminate the classes, in particular, we use modifications of Generalized Learning Vector Quantization including matrix relevance learning. The obtained system achieves high sensitivity and specificity and outperforms previously used approaches for the detection of adrenal malignancy. Moreover, the method identifies a subset of most discriminative markers which facilitates its future use as a non-invasive high-throughput diagnostic tool.
We present a machine learning system for the differential diagnosis of benign adrenocortical adenoma (ACA) vs. malignant adrenocortical carcinoma (ACC). The data employed for the classification are urinary excretion values of 32 steroid metabolites. We apply prototype-based classification techniques to discriminate the classes, in particular, we use modifications of Generalized Learning Vector Quantization including matrix relevance learning. The obtained system achieves high sensitivity and specificity and outperforms previously used approaches for the detection of adrenal malignancy. Moreover, the method identifies a subset of most discriminative markers which facilitates its future use as a non-invasive high-throughput diagnostic tool.
ES2012-154
Recognition of HIV-1 subtypes and antiretroviral drug resistance using weightless neural networks
Caio Souza, Flavio Nobre, Priscila Lima, Robson Silva, Rodrigo Brindeiro, Felipe França
Recognition of HIV-1 subtypes and antiretroviral drug resistance using weightless neural networks
Caio Souza, Flavio Nobre, Priscila Lima, Robson Silva, Rodrigo Brindeiro, Felipe França
Abstract:
This work presents an application of an improved version of the WiSARD weightless neural network in the recognition of different mutation types of HIV-1 and in the determination of antiretroviral drugs resistence. The data set used consists of 1205 gene sequence of the HIV-1 protease of subtypes B, C and F from patients under treatment failure. Experiments performed with the bleaching technique over the WiSARD model under different data representation strategies have shown promising results, both in terms of accuracy and standard deviation.
This work presents an application of an improved version of the WiSARD weightless neural network in the recognition of different mutation types of HIV-1 and in the determination of antiretroviral drugs resistence. The data set used consists of 1205 gene sequence of the HIV-1 protease of subtypes B, C and F from patients under treatment failure. Experiments performed with the bleaching technique over the WiSARD model under different data representation strategies have shown promising results, both in terms of accuracy and standard deviation.
ES2012-32
Adaptive Optimization for Cross Validation
Rudi Alessandro, Chiusano Gabriele, Alessandro Verri
Adaptive Optimization for Cross Validation
Rudi Alessandro, Chiusano Gabriele, Alessandro Verri
Abstract:
The process of model selection and assessment aims at finding a subset of parameters that minimize the expected test error for a model related to a learning algorithm. Given a subset of tuning parameters, an exhaustive grid search is typically performed. In this paper an automatic algorithm for model selection and assessment is proposed. It adaptively learns the error function in the parameters space, making use of the Scale Space theory and the Statistical Learning theory in order to estimate a reduced number of models and, at the same time, to make them more reliable. Extensive experiments are perfomed on the MNIST dataset.
The process of model selection and assessment aims at finding a subset of parameters that minimize the expected test error for a model related to a learning algorithm. Given a subset of tuning parameters, an exhaustive grid search is typically performed. In this paper an automatic algorithm for model selection and assessment is proposed. It adaptively learns the error function in the parameters space, making use of the Scale Space theory and the Statistical Learning theory in order to estimate a reduced number of models and, at the same time, to make them more reliable. Extensive experiments are perfomed on the MNIST dataset.
ES2012-62
The `K' in K-fold Cross Validation
Davide Anguita, Luca Ghelardoni, Alessandro Ghio, Luca Oneto, Sandro Ridella
The `K' in K-fold Cross Validation
Davide Anguita, Luca Ghelardoni, Alessandro Ghio, Luca Oneto, Sandro Ridella
Abstract:
The K-fold Cross Validation (KCV) technique is one of the most used approaches by practitioners for model selection and error estimation of classifiers. The KCV consists in splitting a dataset into k subsets; then, iteratively, some of them are used to learn the model, while the others are exploited to assess its performance. However, in spite of the KCV success, only practical rule-of-thumb methods exist to choose the number and the cardinality of the subsets. We propose here an approach, which allows to tune the number of the subsets of the KCV in a data-dependent way, so to obtain a reliable, tight and rigorous estimation of the probability of misclassification of the chosen model.
The K-fold Cross Validation (KCV) technique is one of the most used approaches by practitioners for model selection and error estimation of classifiers. The KCV consists in splitting a dataset into k subsets; then, iteratively, some of them are used to learn the model, while the others are exploited to assess its performance. However, in spite of the KCV success, only practical rule-of-thumb methods exist to choose the number and the cardinality of the subsets. We propose here an approach, which allows to tune the number of the subsets of the KCV in a data-dependent way, so to obtain a reliable, tight and rigorous estimation of the probability of misclassification of the chosen model.
Recent developments in clustering algorithms
ES2012-5
Recent developments in clustering algorithms
Charles Bouveyron, Barbara Hammer, Thomas Villmann
Recent developments in clustering algorithms
Charles Bouveyron, Barbara Hammer, Thomas Villmann
ES2012-30
Curves clustering with approximation of the density of functional random variables
Julien Jacques, Cristian Preda
Curves clustering with approximation of the density of functional random variables
Julien Jacques, Cristian Preda
Abstract:
Model-based clustering for functional data is considered. An alternative to model-based clustering using the functional principal components is proposed by approximating the density of functional random variables. The EM algorithm is used for parameter estimation and the maximum a posteriori rule provides the clusters. Real data application illustrate the interest of the proposed methodology.
Model-based clustering for functional data is considered. An alternative to model-based clustering using the functional principal components is proposed by approximating the density of functional random variables. The EM algorithm is used for parameter estimation and the maximum a posteriori rule provides the clusters. Real data application illustrate the interest of the proposed methodology.
ES2012-22
Modified Conn-Index for the evaluation of fuzzy clusterings
Tina Geweniger, Marika Kästner, Mandy Lange, Thomas Villmann
Modified Conn-Index for the evaluation of fuzzy clusterings
Tina Geweniger, Marika Kästner, Mandy Lange, Thomas Villmann
Abstract:
We propose an extension of the Conn-Index to evaluate fuzzy cluster solutions obtained from fuzzy prototype vector quantization, whereas the original Conn-Index was designed for crisp vector quantization models. The fuzzy index explicitly takes the fuzzy assignments resulting from fuzzy vector quantization into account. This avoids the information loss which would occur if the original crisp index is applied to fuzzy solutions.
We propose an extension of the Conn-Index to evaluate fuzzy cluster solutions obtained from fuzzy prototype vector quantization, whereas the original Conn-Index was designed for crisp vector quantization models. The fuzzy index explicitly takes the fuzzy assignments resulting from fuzzy vector quantization into account. This avoids the information loss which would occur if the original crisp index is applied to fuzzy solutions.
ES2012-107
modularity-based clustering for network-constrained trajectories
Mohamed Khalil El Mahrsi, Fabrice Rossi
modularity-based clustering for network-constrained trajectories
Mohamed Khalil El Mahrsi, Fabrice Rossi
Abstract:
We present a novel clustering approach for moving object trajectories that are constrained by an underlying road network. The approach builds a similarity graph based on these trajectories then uses modularity-optimization hiearchical graph clustering to regroup trajectories with similar profiles. Our experimental study shows the superiority of the proposed approach over classic hierarchical clustering and gives a brief insight to visualization of the clustering results.
We present a novel clustering approach for moving object trajectories that are constrained by an underlying road network. The approach builds a similarity graph based on these trajectories then uses modularity-optimization hiearchical graph clustering to regroup trajectories with similar profiles. Our experimental study shows the superiority of the proposed approach over classic hierarchical clustering and gives a brief insight to visualization of the clustering results.
ES2012-127
A Discussion on Parallelization Schemes for Stochastic Vector Quantization Algorithms
Matthieu Durut, Benoit Patra, Fabrice Rossi
A Discussion on Parallelization Schemes for Stochastic Vector Quantization Algorithms
Matthieu Durut, Benoit Patra, Fabrice Rossi
Abstract:
This paper studies parallelization schemes for stochastic Vector Quantization algorithms in order to obtain time speed-ups using distributed resources. We show that the most intuitive parallelization scheme does not lead to better performances than the sequential algorithm. Another distributed scheme is therefore introduced which obtains the expected speed-ups. Then, it is improved to fit implementation on distributed architectures where communications are slow and inter-machines synchronization too costly. The schemes are tested with simulated distributed architectures and, for the last one, with Microsoft Windows Azure platform obtaining speed-ups up to $32$ VMs.
This paper studies parallelization schemes for stochastic Vector Quantization algorithms in order to obtain time speed-ups using distributed resources. We show that the most intuitive parallelization scheme does not lead to better performances than the sequential algorithm. Another distributed scheme is therefore introduced which obtains the expected speed-ups. Then, it is improved to fit implementation on distributed architectures where communications are slow and inter-machines synchronization too costly. The schemes are tested with simulated distributed architectures and, for the last one, with Microsoft Windows Azure platform obtaining speed-ups up to $32$ VMs.
ES2012-132
Dissimilarity Clustering by Hierarchical Multi-Level Refinement
Brieuc Conan-Guez, Fabrice Rossi
Dissimilarity Clustering by Hierarchical Multi-Level Refinement
Brieuc Conan-Guez, Fabrice Rossi
Abstract:
We introduce in this paper a new way of optimizing the natural extension of the quantization error using in k-means clustering to dissimilarity data. The proposed method is based on hierarchical clustering analysis combined with multi-level heuristic refinement. The method is computationally efficient and achieves better quantisation errors than the relational k-means.
We introduce in this paper a new way of optimizing the natural extension of the quantization error using in k-means clustering to dissimilarity data. The proposed method is based on hierarchical clustering analysis combined with multi-level heuristic refinement. The method is computationally efficient and achieves better quantisation errors than the relational k-means.
ES2012-188
Relevance learning for time series inspection
Andrej Gisbrecht, Dusan Sovilj, Barbara Hammer, Amaury Lendasse
Relevance learning for time series inspection
Andrej Gisbrecht, Dusan Sovilj, Barbara Hammer, Amaury Lendasse
Abstract:
By means of local neighborhood regression and time windows, the generative topographic mapping (GTM) allows to predict and visually inspect time series data. GTM itself, however, is fully unsupervised. In this contribution, we propose an extension of relevance learning to time series regression with GTM. This way, the metric automatically adapts according to the relevant time lags resulting in a sparser representation, improved accuracy, and smoother visualization of the data.
By means of local neighborhood regression and time windows, the generative topographic mapping (GTM) allows to predict and visually inspect time series data. GTM itself, however, is fully unsupervised. In this contribution, we propose an extension of relevance learning to time series regression with GTM. This way, the metric automatically adapts according to the relevant time lags resulting in a sparser representation, improved accuracy, and smoother visualization of the data.
Feature selection and information-based learning
ES2012-12
How regular is neuronal activity?
Lubomir Kostal, Petr Lansky, Ondrej Pokora
How regular is neuronal activity?
Lubomir Kostal, Petr Lansky, Ondrej Pokora
Abstract:
We propose and investigate two information-based measures of statistical dispersion of neuronal firing: the entropy-based dispersion and Fisher information-based dispersion. The measures are compared with the standard deviation. Although the standard deviation is used routinely, we show, that it is not well suited to quantify some aspects of dispersion that are often expected intuitively, such as the degree of randomness. The proposed dispersion measures are not entirely independent, although each describes the firing regularity from a different point of view.
We propose and investigate two information-based measures of statistical dispersion of neuronal firing: the entropy-based dispersion and Fisher information-based dispersion. The measures are compared with the standard deviation. Although the standard deviation is used routinely, we show, that it is not well suited to quantify some aspects of dispersion that are often expected intuitively, such as the degree of randomness. The proposed dispersion measures are not entirely independent, although each describes the firing regularity from a different point of view.
ES2012-120
On the Potential Inadequacy of Mutual Information for Feature Selection
Benoît Frénay, Gauthier Doquire, Michel Verleysen
On the Potential Inadequacy of Mutual Information for Feature Selection
Benoît Frénay, Gauthier Doquire, Michel Verleysen
Abstract:
Despite its popularity as a relevance criterion for feature selection, the mutual information can sometimes be inadequate for this task. Indeed, it is commonly accepted that a set of features maximising the mutual information with the target vector leads to a lower probability of misclassification. However, this assumption is in general not true. Justifications and illustrations of this fact are given in this paper.
Despite its popularity as a relevance criterion for feature selection, the mutual information can sometimes be inadequate for this task. Indeed, it is commonly accepted that a set of features maximising the mutual information with the target vector leads to a lower probability of misclassification. However, this assumption is in general not true. Justifications and illustrations of this fact are given in this paper.
ES2012-68
Cluster homogeneity as a semi-supervised principle for feature selection using mutual information
Frederico Coelho, Antonio Padua Braga, Michel Verleysen
Cluster homogeneity as a semi-supervised principle for feature selection using mutual information
Frederico Coelho, Antonio Padua Braga, Michel Verleysen
Abstract:
In this work the principle of homogeneity between labels and data clusters is exploited in order to develop a semi-supervised Feature Selection method. This principle will permit the use of cluster information to improve the estimation of feature relevance in order to increase selection performance. Mutual Information is used in a Forward-Backward search process, on this filter aproach, in order to evaluate the relevance of each feature to the data distribution and the existent labels, in a context of few labeled and many unlabeled instances.
In this work the principle of homogeneity between labels and data clusters is exploited in order to develop a semi-supervised Feature Selection method. This principle will permit the use of cluster information to improve the estimation of feature relevance in order to increase selection performance. Mutual Information is used in a Forward-Backward search process, on this filter aproach, in order to evaluate the relevance of each feature to the data distribution and the existent labels, in a context of few labeled and many unlabeled instances.
ES2012-133
enhanced emotion recognition by feature selection to animate a talking head
Hela Daassi-Gnaba, Yacine OUSSAR
enhanced emotion recognition by feature selection to animate a talking head
Hela Daassi-Gnaba, Yacine OUSSAR
Abstract:
It is known that deaf and hard of hearing people can sub- stantially improve their skill to lip reading if they have access to speaker emotion. Moreover, it has been shown that animating an artificial talking head can provide this modality. In this paper, we assume that emotion recognition to animate such talking head can be performed using a small set of relevant features extracted from the speech signal. More precisely, we show that the implementation of linear classifiers using Support Vector Machines (SVM) with the involvement of a feature selection method leads to a promising performance which confirms our assumption.
It is known that deaf and hard of hearing people can sub- stantially improve their skill to lip reading if they have access to speaker emotion. Moreover, it has been shown that animating an artificial talking head can provide this modality. In this paper, we assume that emotion recognition to animate such talking head can be performed using a small set of relevant features extracted from the speech signal. More precisely, we show that the implementation of linear classifiers using Support Vector Machines (SVM) with the involvement of a feature selection method leads to a promising performance which confirms our assumption.
ES2012-97
Range-based non-orthogonal ICA using cross-entropy method
Easter Selvan Suviseshamuthu, Amit Chattopadhyay, Umberto Amato, Pierre-Antoine Absil
Range-based non-orthogonal ICA using cross-entropy method
Easter Selvan Suviseshamuthu, Amit Chattopadhyay, Umberto Amato, Pierre-Antoine Absil
Abstract:
A derivative-free framework for optimizing a non-smooth range-based contrast function in order to estimate independent components is presented. The proposed algorithm employs the von-Mises Fisher (vMF) distribution to draw random samples in the cross-entropy (CE) method, thereby intrinsically maintaining the unit-norm constraint that removes the scaling indeterminacy in independent component analysis (ICA) problem. Empirical studies involving natural images show how this approach outperforms popular schemes [1] in terms of the separation performance.
A derivative-free framework for optimizing a non-smooth range-based contrast function in order to estimate independent components is presented. The proposed algorithm employs the von-Mises Fisher (vMF) distribution to draw random samples in the cross-entropy (CE) method, thereby intrinsically maintaining the unit-norm constraint that removes the scaling indeterminacy in independent component analysis (ICA) problem. Empirical studies involving natural images show how this approach outperforms popular schemes [1] in terms of the separation performance.
Nonlinear dimensionality reduction and topological learning
ES2012-3
Type 1 and 2 symmetric divergences for stochastic neighbor embedding
John Lee
Type 1 and 2 symmetric divergences for stochastic neighbor embedding
John Lee
Abstract:
Stochastic neighbor embedding (SNE) is a method of dimensionality reduction (DR) that involves softmax similarities measured between all pairs of data points. In order to build a low-dimensional embedding, SNE tries to reproduce the similarities observed in the high-dimensional data space. The capability of softmax similarities to fight the phenomenon of norm concentration has been studied in previous work. This paper investigates a complementary aspect, namely, the cost function that quantifies the mismatch between the high- and low-dimensional similarities. We show experimentally that switching from a simple Kullback-Leibler divergences to symmetric mixtures of divergences increases the quality of DR. This modification brings SNE to the performance level of its Student $t$-distributed variant, without the need to resort to non-identical similarity definitions in the high- and low-dimensional spaces. These results allow us to conclude that future improvements in similarity-based DR will likely emerge from better definitions of the cost function.
Stochastic neighbor embedding (SNE) is a method of dimensionality reduction (DR) that involves softmax similarities measured between all pairs of data points. In order to build a low-dimensional embedding, SNE tries to reproduce the similarities observed in the high-dimensional data space. The capability of softmax similarities to fight the phenomenon of norm concentration has been studied in previous work. This paper investigates a complementary aspect, namely, the cost function that quantifies the mismatch between the high- and low-dimensional similarities. We show experimentally that switching from a simple Kullback-Leibler divergences to symmetric mixtures of divergences increases the quality of DR. This modification brings SNE to the performance level of its Student $t$-distributed variant, without the need to resort to non-identical similarity definitions in the high- and low-dimensional spaces. These results allow us to conclude that future improvements in similarity-based DR will likely emerge from better definitions of the cost function.
ES2012-25
Out-of-sample kernel extensions for nonparametric dimensionality reduction
Andrej Gisbrecht, Wouter Lueks, Bassam Mokbel, Barbara Hammer
Out-of-sample kernel extensions for nonparametric dimensionality reduction
Andrej Gisbrecht, Wouter Lueks, Bassam Mokbel, Barbara Hammer
Abstract:
Nonparametric dimensionality reduction (DR) techniques such as locally linear embedding or t-distributed stochastic neighbor embedding (t-SNE) constitute standard tools to visualize high dimensional and complex data in the Euclidean plane. With increasing data volumes and streaming applications, it is often no longer possible to project all data points at once. Rather, out-of-sample extensions (OOS) derived from a small subset of all data points are used. In this contribution, we propose a kernel mapping for OOS in contrast to direct techniques based on the DR method. This can be trained based on a given example set, or it can be trained indirectly based on the cost function of the DR technique. Considering t-SNE as an example and several benchmarks, we show that a kernel mapping outperforms direct OOS as provided by t-SNE.
Nonparametric dimensionality reduction (DR) techniques such as locally linear embedding or t-distributed stochastic neighbor embedding (t-SNE) constitute standard tools to visualize high dimensional and complex data in the Euclidean plane. With increasing data volumes and streaming applications, it is often no longer possible to project all data points at once. Rather, out-of-sample extensions (OOS) derived from a small subset of all data points are used. In this contribution, we propose a kernel mapping for OOS in contrast to direct techniques based on the DR method. This can be trained based on a given example set, or it can be trained indirectly based on the cost function of the DR technique. Considering t-SNE as an example and several benchmarks, we show that a kernel mapping outperforms direct OOS as provided by t-SNE.
ES2012-157
A generative model that learns Betti numbers from a data set
Maxime Maillot, Michaël Aupetit, Gérard Govaert
A generative model that learns Betti numbers from a data set
Maxime Maillot, Michaël Aupetit, Gérard Govaert
Abstract:
Analysis of multidimensional data is challenging. Topological invariants can be used to summarize essential features of such data sets. In this work, we propose to compute the Betti numbers from a generative model based on a simplicial complex learnt from the data. We compare it to the Witness Complex, a geometric technique based on nearest neighbors. Our results on different data distributions with known topology show that Betti numbers are well recovered with our method.
Analysis of multidimensional data is challenging. Topological invariants can be used to summarize essential features of such data sets. In this work, we propose to compute the Betti numbers from a generative model based on a simplicial complex learnt from the data. We compare it to the Witness Complex, a geometric technique based on nearest neighbors. Our results on different data distributions with known topology show that Betti numbers are well recovered with our method.
Recurrent and neural networks, reinforcement learning, control
ES2012-138
Highly efficient localisation utilising weightless neural systems
Ben McElroy, Gillham Michael, Gareth Howells, Sarah Spurgeon, Kelly Michael, John Batchelor, Pepper Matthew
Highly efficient localisation utilising weightless neural systems
Ben McElroy, Gillham Michael, Gareth Howells, Sarah Spurgeon, Kelly Michael, John Batchelor, Pepper Matthew
Abstract:
Efficient localization is a highly desirable property for an autonomous navigation system. Weightless neural networks offer a real-time approach to robotics applications by reducing hardware and software requirements for pattern recognition techniques. Such networks offer the potential for objects, structures, routes and locations to be easily identified and maps constructed from fused limited sensor data as information becomes available. We show that in the absence of concise and complex information, localisation can be obtained using simple algorithms from data with inherent uncertainties using a combination of Genetic Algorithm techniques applied to a Weightless Neural Architecture.
Efficient localization is a highly desirable property for an autonomous navigation system. Weightless neural networks offer a real-time approach to robotics applications by reducing hardware and software requirements for pattern recognition techniques. Such networks offer the potential for objects, structures, routes and locations to be easily identified and maps constructed from fused limited sensor data as information becomes available. We show that in the absence of concise and complex information, localisation can be obtained using simple algorithms from data with inherent uncertainties using a combination of Genetic Algorithm techniques applied to a Weightless Neural Architecture.
ES2012-172
The Exploration vs Exploitation Trade-Off in Bandit Problems: An Empirical Study
Bernard Manderick, Saba Yahyaa
The Exploration vs Exploitation Trade-Off in Bandit Problems: An Empirical Study
Bernard Manderick, Saba Yahyaa
Abstract:
We compare well-known action selection policies used in reinforcement learning like epsilon-greedy and softmax with lesser known ones like the Gittins index and the knowledge gradient on bandit problems. The latter two are in comparison very performant. Moreover the knowledge gradient can be generalized to other problems.
We compare well-known action selection policies used in reinforcement learning like epsilon-greedy and softmax with lesser known ones like the Gittins index and the knowledge gradient on bandit problems. The latter two are in comparison very performant. Moreover the knowledge gradient can be generalized to other problems.
ES2012-15
intrinsic plasticity via natural gradient descent
Klaus Neumann, Jochen J. Steil
intrinsic plasticity via natural gradient descent
Klaus Neumann, Jochen J. Steil
Abstract:
This paper introduces the natural gradient for intrinsic plasticity, which tunes a neuron’s activation function such that its output distribution becomes exponentially distributed. The information-geometric properties of the intrinsic plasticity potential are analyzed and the improved learning dynamics when using the natural gradient are evaluated for a variety of input distributions. The applied measure for evaluation is the relative geodesic length of the respective path in parameter space.
This paper introduces the natural gradient for intrinsic plasticity, which tunes a neuron’s activation function such that its output distribution becomes exponentially distributed. The information-geometric properties of the intrinsic plasticity potential are analyzed and the improved learning dynamics when using the natural gradient are evaluated for a variety of input distributions. The applied measure for evaluation is the relative geodesic length of the respective path in parameter space.
ES2012-24
Complex Valued Artificial Recurrent Neural Network as a Novel Approach to Model the Perceptual Binding Problem
Alexey Minin, Alois Knoll, Hans-Georg Zimmermann
Complex Valued Artificial Recurrent Neural Network as a Novel Approach to Model the Perceptual Binding Problem
Alexey Minin, Alois Knoll, Hans-Georg Zimmermann
Abstract:
The brain is constantly faced with the task of grouping together features of objects that it perceives, in order to arrive at a coherent representation of these objects. Such features are, for example, shape, motion, color, depth, but also other aspects of perception. There is experimental evidence and a large body of theoretical work that supports the hypothesis that brains solve this so-called “binding” problem by synchronizing the temporal firing patterns in neuronal assemblies, with neurons that are sensitive to different features. According to this hypothesis, temporal correlations between neuronal impulses represent the fact that different perceived features have to be associated with one and the same object. In this paper we suggest a new model for solving the binding problem by introduc-ing complex-valued recurrent networks. These networks can represent sinusoidal oscillations and their phase, i.e., they can model the binding problem of neuronal assemblies by adjusting the relative phase of the oscillations of different feature detectors. As feature examples, we use color and shape – but the network would also function with any combination of other features. The suggested network architecture performs image generalization but can also be used as an image memory. The information about object color is represented in the phase of the network weights, while the spatial distribution of the neurons codes represent the object’s shape. We will show that the architecture can generalize ob-ject shapes and recognize object color with very low computational overhead.
The brain is constantly faced with the task of grouping together features of objects that it perceives, in order to arrive at a coherent representation of these objects. Such features are, for example, shape, motion, color, depth, but also other aspects of perception. There is experimental evidence and a large body of theoretical work that supports the hypothesis that brains solve this so-called “binding” problem by synchronizing the temporal firing patterns in neuronal assemblies, with neurons that are sensitive to different features. According to this hypothesis, temporal correlations between neuronal impulses represent the fact that different perceived features have to be associated with one and the same object. In this paper we suggest a new model for solving the binding problem by introduc-ing complex-valued recurrent networks. These networks can represent sinusoidal oscillations and their phase, i.e., they can model the binding problem of neuronal assemblies by adjusting the relative phase of the oscillations of different feature detectors. As feature examples, we use color and shape – but the network would also function with any combination of other features. The suggested network architecture performs image generalization but can also be used as an image memory. The information about object color is represented in the phase of the network weights, while the spatial distribution of the neurons codes represent the object’s shape. We will show that the architecture can generalize ob-ject shapes and recognize object color with very low computational overhead.
ES2012-90
A discrete/rhythmic pattern generating RNN
Tim Waegeman, Francis Wyffels, Benjamin Schrauwen
A discrete/rhythmic pattern generating RNN
Tim Waegeman, Francis Wyffels, Benjamin Schrauwen
Abstract:
Biological research supports the concept that advanced motion emerges from modular building blocks, which generate both rhythmical and discrete patterns. Inspired by these ideas, roboticists try to implement such building blocks using different techniques. In this paper, we show how to build such module by using a recurrent neural network (RNN) to encapsulate both discrete and rhythmical motion patterns into a single network. We evaluate the proposed system on a planar robotic manipulator. For training, we record several handwriting motions by back driving the robot manipulator. Finally, we demonstrate the ability to learn multiple motions (even discrete and rhythmic) and evaluate the pattern generation robustness in the presence of perturbations.
Biological research supports the concept that advanced motion emerges from modular building blocks, which generate both rhythmical and discrete patterns. Inspired by these ideas, roboticists try to implement such building blocks using different techniques. In this paper, we show how to build such module by using a recurrent neural network (RNN) to encapsulate both discrete and rhythmical motion patterns into a single network. We evaluate the proposed system on a planar robotic manipulator. For training, we record several handwriting motions by back driving the robot manipulator. Finally, we demonstrate the ability to learn multiple motions (even discrete and rhythmic) and evaluate the pattern generation robustness in the presence of perturbations.
ES2012-102
Fast calibration of hand movements-based interface for arm exoskeleton control
Hugo Martin, Sylvain Chevallier, Eric Monacelli
Fast calibration of hand movements-based interface for arm exoskeleton control
Hugo Martin, Sylvain Chevallier, Eric Monacelli
Abstract:
Several muscular degenerative diseases alter motor abilities of large muscles but spare smaller muscles, e.g. keeping hand motor skills relatively unaffected while upper limbs ones are altered. Thus, hand movements could be be used to control an arm exoskeleton for rehabilitation and assistive purpose. Using an infra-red sensors (IR) based interface for the exoskeleton control, this paper describes the learning part of the system, endowing the system with a fast online calibration and adaptation abilities. This learning component shows good results and have been succesfully implemented on the real system.
Several muscular degenerative diseases alter motor abilities of large muscles but spare smaller muscles, e.g. keeping hand motor skills relatively unaffected while upper limbs ones are altered. Thus, hand movements could be be used to control an arm exoskeleton for rehabilitation and assistive purpose. Using an infra-red sensors (IR) based interface for the exoskeleton control, this paper describes the learning part of the system, endowing the system with a fast online calibration and adaptation abilities. This learning component shows good results and have been succesfully implemented on the real system.
ES2012-179
Manifold-based non-parametric learning of action-value functions
Hunor Jakab, Lehel Csato
Manifold-based non-parametric learning of action-value functions
Hunor Jakab, Lehel Csato
Abstract:
Finding good approximations to state-action value functions is a central problem in model-free on-line reinforcement learning. The use of non-parametric function approximators enables us to simultaneously represent model and confidence. Q functions are often discontinuous and we present a novel Gaussian process (GP) kernel function to cope with this problem. We use a manifold-based distance measure in our kernels, the manifold being induced by the graph structure extracted from data. Using on-line learning, the graph formation is parallel with the main algorithm. This results in a compact and efficient graph structure, eliminates the need for predefined basis functions and improves the accuracy of estimated value functions, as tested on simulated robotic control tasks.
Finding good approximations to state-action value functions is a central problem in model-free on-line reinforcement learning. The use of non-parametric function approximators enables us to simultaneously represent model and confidence. Q functions are often discontinuous and we present a novel Gaussian process (GP) kernel function to cope with this problem. We use a manifold-based distance measure in our kernels, the manifold being induced by the graph structure extracted from data. Using on-line learning, the graph formation is parallel with the main algorithm. This results in a compact and efficient graph structure, eliminates the need for predefined basis functions and improves the accuracy of estimated value functions, as tested on simulated robotic control tasks.
ES2012-174
Recurrent Neural State Estimation in Domains with Long-Term Dependencies
Siegmund Duell, Lina Weichbrodt, Alexander Hans, Steffen Udluft
Recurrent Neural State Estimation in Domains with Long-Term Dependencies
Siegmund Duell, Lina Weichbrodt, Alexander Hans, Steffen Udluft
Abstract:
This paper presents a state estimation approach for reinforcement learning (RL) of a partially observable Markov decision process. It is based on a special recurrent neural network architecture, the Markov decision process extraction network with shortcuts (MPEN-S). In contrast to previous work regarding this topic, we address the problem of long-term dependencies, which cause major problems in many real-world applications. The architecture is designed to model the reward-relevant dynamics of an environment and is capable to condense large sets of continuous observables to a compact Markovian state representation. The resulting estimate can be used as input for RL methods that assume the underlying system to be a Markov decision process. Although the approach was developed with RL in mind, it is also useful for general prediction tasks.
This paper presents a state estimation approach for reinforcement learning (RL) of a partially observable Markov decision process. It is based on a special recurrent neural network architecture, the Markov decision process extraction network with shortcuts (MPEN-S). In contrast to previous work regarding this topic, we address the problem of long-term dependencies, which cause major problems in many real-world applications. The architecture is designed to model the reward-relevant dynamics of an environment and is capable to condense large sets of continuous observables to a compact Markovian state representation. The resulting estimate can be used as input for RL methods that assume the underlying system to be a Markov decision process. Although the approach was developed with RL in mind, it is also useful for general prediction tasks.
ES2012-82
Using event-based metric for event-based neural network weight adjustment
Thierry Vieville, Rodrigo Salas, Bruno Cessac
Using event-based metric for event-based neural network weight adjustment
Thierry Vieville, Rodrigo Salas, Bruno Cessac
Abstract:
The problem of adjusting the parameters of an event-based network model is addressed here at the programmatic level. Considering temporal processing, the goal is to adjust the network units weights so that the outcoming events correspond to what is desired. The present work proposes a way to adapt, in the deterministic and discrete case, usual alignment metrics in order to derive suitable adjustment rules. At the numerical level, the stability and unbiasness of the method is verified.
The problem of adjusting the parameters of an event-based network model is addressed here at the programmatic level. Considering temporal processing, the goal is to adjust the network units weights so that the outcoming events correspond to what is desired. The present work proposes a way to adapt, in the deterministic and discrete case, usual alignment metrics in order to derive suitable adjustment rules. At the numerical level, the stability and unbiasness of the method is verified.
Parallel hardware architectures for acceleration of neural network computation
ES2012-8
Parallel neural hardware: the time is right
Ulrich Rückert, Erzsebet Merenyi
Parallel neural hardware: the time is right
Ulrich Rückert, Erzsebet Merenyi
Abstract:
It seems obvious that the massively parallel computations inherent in artificial neural networks (ANNs) can only be realized by massively parallel hardware. However, the vast majority of the many ANN applications simulate their ANNs on sequential computers which, in turn, are not resource-efficient. The increasing availability of parallel standard hardware such as FPGAs, graphics processors, and multi-core processors offers new scopes and challenges in respect to resource-efficiency and real-time applications of ANNs. Within this paper we will discuss some key issues for parallel ANN implementation on these standard devices compared to special purpose ANN implementations.
It seems obvious that the massively parallel computations inherent in artificial neural networks (ANNs) can only be realized by massively parallel hardware. However, the vast majority of the many ANN applications simulate their ANNs on sequential computers which, in turn, are not resource-efficient. The increasing availability of parallel standard hardware such as FPGAs, graphics processors, and multi-core processors offers new scopes and challenges in respect to resource-efficiency and real-time applications of ANNs. Within this paper we will discuss some key issues for parallel ANN implementation on these standard devices compared to special purpose ANN implementations.
ES2012-44
Towards biologically realistic multi-compartment neuron model emulation in analog VLSI
Sebastian Millner, Andreas Hartel, Johannes Schemmel, Karlheinz Meier
Towards biologically realistic multi-compartment neuron model emulation in analog VLSI
Sebastian Millner, Andreas Hartel, Johannes Schemmel, Karlheinz Meier
Abstract:
We present a new concept for multi-compartment emulation on neuromorphic hardware based on the BrainScaleS wafer-scale system. The implementation features complex dendrite routing capabilities, realistic scaling of compartmental parameters and active spike propagation. Simulations proof the circuit’s capability of reproducing passive dendritic properties of a model from literature.
We present a new concept for multi-compartment emulation on neuromorphic hardware based on the BrainScaleS wafer-scale system. The implementation features complex dendrite routing capabilities, realistic scaling of compartmental parameters and active spike propagation. Simulations proof the circuit’s capability of reproducing passive dendritic properties of a model from literature.
ES2012-35
A GPU-accelerated algorithm for self-organizing maps in a distributed environment
Peter Wittek, Sándor Darányi
A GPU-accelerated algorithm for self-organizing maps in a distributed environment
Peter Wittek, Sándor Darányi
Abstract:
In this paper we introduce a MapReduce-based implementation of self-organizing maps that performs compute-bound operations on distributed GPUs. The kernels are optimized to ensure coalesced memory access and effective use of shared memory. We have performed extensive tests of our algorithms on a cluster of eight nodes with two NVidia Tesla M2050 attached to each, and we achieve a 10x speedup for self-organizing maps over a distributed CPU algorithm.
In this paper we introduce a MapReduce-based implementation of self-organizing maps that performs compute-bound operations on distributed GPUs. The kernels are optimized to ensure coalesced memory access and effective use of shared memory. We have performed extensive tests of our algorithms on a cluster of eight nodes with two NVidia Tesla M2050 attached to each, and we achieve a 10x speedup for self-organizing maps over a distributed CPU algorithm.
ES2012-54
Low-Power Manhattan Distance Calculation Circuit for Self-Organizing Neural Networks Implemented in the CMOS Technology
Rafal Dlugosz, Tomasz Talaska, Witold Pedrycz, Pierre-Andre Farine
Low-Power Manhattan Distance Calculation Circuit for Self-Organizing Neural Networks Implemented in the CMOS Technology
Rafal Dlugosz, Tomasz Talaska, Witold Pedrycz, Pierre-Andre Farine
Abstract:
The paper presents an analog, current-mode circuit that calculates a distance between the neuron weights vectors W and the input learning patterns X. The circuit can be used as a component of different self-organizing neural networks (NN) implemented at the transistor level in the CMOS technology. In Self-Organizing Maps (SOM) as well as in NNs using the Neural Gas or the Winner Takes All (WTA) learning algorithms, to calculate a distance between the X and the W vectors, the same circuit can be used that makes the proposed circuit a universal solution. Earlier detailed simulations carried out by means of the software model of the WTA NN and the Kohonen SOM showed that using both the Euclidean (L2) and the Manhattan (L1) distance measures leads to similar learning results. For this reason, the L1 measure has been implemented, as in this case the circuit is much simpler than in the L2 case, resulting in very low chip area and low power dissipation. This enables including even large NNs in miniaturized portable devices, such as sensors in Wireless Sensor Networks (WSN) or Wireless Body Area Networks (WBAN).
The paper presents an analog, current-mode circuit that calculates a distance between the neuron weights vectors W and the input learning patterns X. The circuit can be used as a component of different self-organizing neural networks (NN) implemented at the transistor level in the CMOS technology. In Self-Organizing Maps (SOM) as well as in NNs using the Neural Gas or the Winner Takes All (WTA) learning algorithms, to calculate a distance between the X and the W vectors, the same circuit can be used that makes the proposed circuit a universal solution. Earlier detailed simulations carried out by means of the software model of the WTA NN and the Kohonen SOM showed that using both the Euclidean (L2) and the Manhattan (L1) distance measures leads to similar learning results. For this reason, the L1 measure has been implemented, as in this case the circuit is much simpler than in the L2 case, resulting in very low chip area and low power dissipation. This enables including even large NNs in miniaturized portable devices, such as sensors in Wireless Sensor Networks (WSN) or Wireless Body Area Networks (WBAN).
ES2012-71
Parallelization of Deep Networks
Michele De Filippo De Grazia, Ivilin Stoianov, Marco Zorzi
Parallelization of Deep Networks
Michele De Filippo De Grazia, Ivilin Stoianov, Marco Zorzi
Abstract:
Learning multiple levels of feature detectors in Deep Belief Networks is a promising approach both for neuro-cognitive modeling and for practical applications, but it comes at the cost of high computational requirements. Here we propose a method for the parallelization of unsupervised generative learning in deep networks based on distributing training data among multiple computational nodes in a cluster. We show that this approach almost linearly reduces the training time with very limited cost on performance.
Learning multiple levels of feature detectors in Deep Belief Networks is a promising approach both for neuro-cognitive modeling and for practical applications, but it comes at the cost of high computational requirements. Here we propose a method for the parallelization of unsupervised generative learning in deep networks based on distributing training data among multiple computational nodes in a cluster. We show that this approach almost linearly reduces the training time with very limited cost on performance.
ES2012-110
Hardware accelerated real time classification of hyperspectral imaging data for coffee sorting
Andreas Backhaus, Jan Lachmair, Ulrich Rückert, Udo Seiffert
Hardware accelerated real time classification of hyperspectral imaging data for coffee sorting
Andreas Backhaus, Jan Lachmair, Ulrich Rückert, Udo Seiffert
Abstract:
Hyperspectral imaging has been proven to be a viable tool for automated food inspection that is non-invasive and on-line capable. In this contribution a hardware implemented Self-Organizing Feature Map with Conscience (C-SOM) is presented that is capable of on-line adaptation and recall in order to learn to classify green coffee varieties as well as coffee of different roast stages. The C-SOM showed favourable results in some datasets compared to a number of classical supervised neural network classifiers. The massive parallel neural hardware architecture allows for constant processing times at different map sizes.
Hyperspectral imaging has been proven to be a viable tool for automated food inspection that is non-invasive and on-line capable. In this contribution a hardware implemented Self-Organizing Feature Map with Conscience (C-SOM) is presented that is capable of on-line adaptation and recall in order to learn to classify green coffee varieties as well as coffee of different roast stages. The C-SOM showed favourable results in some datasets compared to a number of classical supervised neural network classifiers. The massive parallel neural hardware architecture allows for constant processing times at different map sizes.
ES2012-137
Implementation Issues of Kohonen Self-Organizing Map Realized on FPGA
Rafal Dlugosz, Marta Kolasa, Michal Szulc, Witold Pedrycz, Pierre-Andre Farine
Implementation Issues of Kohonen Self-Organizing Map Realized on FPGA
Rafal Dlugosz, Marta Kolasa, Michal Szulc, Witold Pedrycz, Pierre-Andre Farine
Abstract:
Presented are the investigations showing an impact of the length of data signals in hardware implemented Kohonen Self-Organizing Maps (SOM) on the quality of the learning process. The aim of this work was to determine the allowable reduction of the number of bits in particular signals that does not deteriorate the network behavior. The efficiency of the learning process has been quantified by using the quantization error. The results obtained for the SOM realized on Field Programmable Gate Array (FPGA), as well as by means of the software model of the SOM show that the smallest allowable resolution (expressed in bits) of the weight signals equals seven, while the minimal bit length of the neighborhood signal ranges from 3 to 6 (depending on the map topology). For such values and properly selected values of other parameters the learning process remains undisturbed. Reducing the number of bits has an influence on the number of neurons that can be synthesized on a single FPGA device.
Presented are the investigations showing an impact of the length of data signals in hardware implemented Kohonen Self-Organizing Maps (SOM) on the quality of the learning process. The aim of this work was to determine the allowable reduction of the number of bits in particular signals that does not deteriorate the network behavior. The efficiency of the learning process has been quantified by using the quantization error. The results obtained for the SOM realized on Field Programmable Gate Array (FPGA), as well as by means of the software model of the SOM show that the smallest allowable resolution (expressed in bits) of the weight signals equals seven, while the minimal bit length of the neighborhood signal ranges from 3 to 6 (depending on the map topology). For such values and properly selected values of other parameters the learning process remains undisturbed. Reducing the number of bits has an influence on the number of neurons that can be synthesized on a single FPGA device.
ES2012-146
A hybrid CMOS/memristive nanoelectronic circuit for programming synaptic weights
Arne Heittmann, Tobias G. Noll
A hybrid CMOS/memristive nanoelectronic circuit for programming synaptic weights
Arne Heittmann, Tobias G. Noll
Abstract:
In this paper a hybrid circuit is presented which comprises nanoelectronic resistive switches based on the electrochemical memory effect (ECM) as well as devices from a standard 40nm-CMOS process. A closed ECM device model, which is based on device physics, was used for simulations allowing for a precise prediction of the expected I-V characteristics. The device is used as a non-volatile and/or programmable synapse in a neuromorphic architecture. Expected performance figures are derived such as write time as well as robustness with regard to variations of supply voltage and timing errors. The results show that ECM cells are prospective devices for hybrid neuromorphic systems.
In this paper a hybrid circuit is presented which comprises nanoelectronic resistive switches based on the electrochemical memory effect (ECM) as well as devices from a standard 40nm-CMOS process. A closed ECM device model, which is based on device physics, was used for simulations allowing for a precise prediction of the expected I-V characteristics. The device is used as a non-volatile and/or programmable synapse in a neuromorphic architecture. Expected performance figures are derived such as write time as well as robustness with regard to variations of supply voltage and timing errors. The results show that ECM cells are prospective devices for hybrid neuromorphic systems.
ES2012-161
gNBXe -- a Reconfigurable Neuroprocessor for Various Types of Self-Organizing Maps
Jan Lachmair, Erzsebet Merenyi, Mario Porrmann, Ulrich Rückert
gNBXe -- a Reconfigurable Neuroprocessor for Various Types of Self-Organizing Maps
Jan Lachmair, Erzsebet Merenyi, Mario Porrmann, Ulrich Rückert
Abstract:
In this paper we present the FPGA-based hardware accelerator gNBXe for emulation of classical Self-Organizing Maps (SOMs) and Conscience SOM (CSOM) in a multi-FPGA environment. After discussing how the CSOM is mapped to a resource-efficient digital hardware implementation, we present how the modular system architecture can be flexibly adapted to various application datasets. The hardware costs and scalability of a multi-FPGA based accelerator using Xilinx Virtex2 and Virtex4 FPGAs are discussed. Compared to a state-of-the-art multi-core PC, a speedup of 9.1 is achieved for a CSOM with 4,840 neurons and 196 synaptic weights.
In this paper we present the FPGA-based hardware accelerator gNBXe for emulation of classical Self-Organizing Maps (SOMs) and Conscience SOM (CSOM) in a multi-FPGA environment. After discussing how the CSOM is mapped to a resource-efficient digital hardware implementation, we present how the modular system architecture can be flexibly adapted to various application datasets. The hardware costs and scalability of a multi-FPGA based accelerator using Xilinx Virtex2 and Virtex4 FPGAs are discussed. Compared to a state-of-the-art multi-core PC, a speedup of 9.1 is achieved for a CSOM with 4,840 neurons and 196 synaptic weights.