Bruges, Belgium, April 27-28-29
Content of the proceedings
-
Information theory related learning
Self-organizing maps and recurrent networks
Semi-supervised learning
Computational Intelligence in Life Sciences
Learning I
Seeing is believing: The importance of visualization in real-world machine learning applications
Learning theory
Feature selection and dimensionality reduction
Learning of causal relations
Learning II
Sequence and time processing
Optimization and learning
Deep Learning
Information theory related learning
ES2011-7
Information theory related learning
Thomas Villmann, Jose C. Principe, Andrej Cichocki
Information theory related learning
Thomas Villmann, Jose C. Principe, Andrej Cichocki
Abstract:
This is the introduction paper to a special session held on ESANN conference 2011. It reviews and highlights recent developments and new direction in information related learning, which is a fastly developing research area. These algorithms are based on the fundamental principles of information theory and relate them implicitly or explicitly to learning algoithms and strategies.
This is the introduction paper to a special session held on ESANN conference 2011. It reviews and highlights recent developments and new direction in information related learning, which is a fastly developing research area. These algorithms are based on the fundamental principles of information theory and relate them implicitly or explicitly to learning algoithms and strategies.
ES2011-100
Optimization of Parametrized Divergences in Fuzzy c-Means
Tina Geweniger, Marika Kaestner, Thomas Villmann
Optimization of Parametrized Divergences in Fuzzy c-Means
Tina Geweniger, Marika Kaestner, Thomas Villmann
Abstract:
We propose the utilization of divergences as dissimilarity measure in the Fuzzy c-Means algorithm for the clustering of functional data. Further we adapt the relevance parameter to improve the data representation and therefore obtain more accurate clusterings in terms of separation and compactness. We show for two example applications that this method leads to improved performance.
We propose the utilization of divergences as dissimilarity measure in the Fuzzy c-Means algorithm for the clustering of functional data. Further we adapt the relevance parameter to improve the data representation and therefore obtain more accurate clusterings in terms of separation and compactness. We show for two example applications that this method leads to improved performance.
ES2011-66
Multivariate class labeling in Robust Soft LVQ
Petra Schneider, Tina Geweniger, Frank-Michael Schleif, Michael Biehl, Thomas Villmann
Multivariate class labeling in Robust Soft LVQ
Petra Schneider, Tina Geweniger, Frank-Michael Schleif, Michael Biehl, Thomas Villmann
Abstract:
We introduce a generalization of Robust Soft Learning Vector Quantization (RSLVQ). This algorithm for nearest prototype classification is derived from an explicit cost function and follows the dynamics of a stochastic gradient ascent. We generalize the RSLVQ cost function with respect to vectorial class labels: Probabilistic LVQ (PLVQ) allows to realize multivariate class memberships for prototypes and training samples, and the prototype labels can be learned from the data during training. We present experiments to demonstrate the new algorithm in practice.
We introduce a generalization of Robust Soft Learning Vector Quantization (RSLVQ). This algorithm for nearest prototype classification is derived from an explicit cost function and follows the dynamics of a stochastic gradient ascent. We generalize the RSLVQ cost function with respect to vectorial class labels: Probabilistic LVQ (PLVQ) allows to realize multivariate class memberships for prototypes and training samples, and the prototype labels can be learned from the data during training. We present experiments to demonstrate the new algorithm in practice.
ES2011-11
Statistical dependence measure for feature selection in microarray datasets
Veronica Bolon-Canedo, Sohan Seth, Noelia Sánchez-Maroño, Amparo Alonso-Betanzos, Jose C. Principe
Statistical dependence measure for feature selection in microarray datasets
Veronica Bolon-Canedo, Sohan Seth, Noelia Sánchez-Maroño, Amparo Alonso-Betanzos, Jose C. Principe
Abstract:
Feature selection is the domain of machine learning which studies data-driven methods to select, among a set of input variables, the ones that will lead to the most accurate predictive model. In this paper, a statistical dependence measure is presented for variable selection in the context of classification. Its performance is tested over DNA microarray data, a challenging dataset for machine learning researchers due to the high number of genes and relatively small number of measurements. This measure is compared against the so called mRMR approach, and is shown to obtain better or equal performance over the binary datasets.
Feature selection is the domain of machine learning which studies data-driven methods to select, among a set of input variables, the ones that will lead to the most accurate predictive model. In this paper, a statistical dependence measure is presented for variable selection in the context of classification. Its performance is tested over DNA microarray data, a challenging dataset for machine learning researchers due to the high number of genes and relatively small number of measurements. This measure is compared against the so called mRMR approach, and is shown to obtain better or equal performance over the binary datasets.
ES2011-72
Mathematical Foundations of the Self Organized Neighbor Embedding (SONE) for Dimension Reduction and Visualization
Kerstin Bunte, Frank-Michael Schleif, Sven Haase, Thomas Villmann
Mathematical Foundations of the Self Organized Neighbor Embedding (SONE) for Dimension Reduction and Visualization
Kerstin Bunte, Frank-Michael Schleif, Sven Haase, Thomas Villmann
Abstract:
In this paper we propose the generalization of the recently introduced Neighbor Embedding Exploratory Observation Machine (NE-XOM) for dimension reduction and visualization. We provide a general mathematical framework called Self Organized Neighbor Embedding (SONE). It treats the components, like data similarity measures and neighborhood functions, independently and easily changeable. And it enables the utilization of different divergences, based on the theory of Fr\'echet derivatives. In this way we propose a new dimension reduction and visualization algorithm, which can be easily adapted to the user specific request and the actual problem.
In this paper we propose the generalization of the recently introduced Neighbor Embedding Exploratory Observation Machine (NE-XOM) for dimension reduction and visualization. We provide a general mathematical framework called Self Organized Neighbor Embedding (SONE). It treats the components, like data similarity measures and neighborhood functions, independently and easily changeable. And it enables the utilization of different divergences, based on the theory of Fr\'echet derivatives. In this way we propose a new dimension reduction and visualization algorithm, which can be easily adapted to the user specific request and the actual problem.
Self-organizing maps and recurrent networks
ES2011-71
Sparsity Issues in Self-Organizing-Maps for Structures
Markus Hagenbuchner, Giovanni Da San Martino, Ah Chung Tsoi, Alessandro Sperduti
Sparsity Issues in Self-Organizing-Maps for Structures
Markus Hagenbuchner, Giovanni Da San Martino, Ah Chung Tsoi, Alessandro Sperduti
Abstract:
Recent developments with Self-Organizing Maps (SOMs) produced methods capable of clustering graph structured data onto a fixed dimensional display space. These methods have been applied successfully to a number of benchmark problems and produced state-of-the-art results. This paper discusses a limitation of the most powerful version of these SOMs, known as probability measure graph SOMs (PMGraphSOMs), viz., the sparsity induced by processing a large number of small graphs, which prevents a successful application of PMGraphSOM to such problems. An approach using the idea of compactifying the generated state space to address this sparsity problem is proposed. An application to an established benchmark problem, viz., the Mutag dataset in toxicology will show that the proposed method is effective when dealing with a large number of small graphs. Hence, this work fills a gap between the processing of a number of small graphs, and the processing of densely connected graphs using PMGraphSOMs.
Recent developments with Self-Organizing Maps (SOMs) produced methods capable of clustering graph structured data onto a fixed dimensional display space. These methods have been applied successfully to a number of benchmark problems and produced state-of-the-art results. This paper discusses a limitation of the most powerful version of these SOMs, known as probability measure graph SOMs (PMGraphSOMs), viz., the sparsity induced by processing a large number of small graphs, which prevents a successful application of PMGraphSOM to such problems. An approach using the idea of compactifying the generated state space to address this sparsity problem is proposed. An application to an established benchmark problem, viz., the Mutag dataset in toxicology will show that the proposed method is effective when dealing with a large number of small graphs. Hence, this work fills a gap between the processing of a number of small graphs, and the processing of densely connected graphs using PMGraphSOMs.
ES2011-26
Multi-Goal Path Planning Using Self-Organizing Map with Navigation Functions
Jan Faigl, Jan Mačák
Multi-Goal Path Planning Using Self-Organizing Map with Navigation Functions
Jan Faigl, Jan Mačák
Abstract:
In this paper, we present a combination of Self-Organizing Map (SOM) approach and navigation functions in the Traveling Salesman Problem (TSP) with segment goals. The problem arises from the inspection planning, where a path from which all points of the given polygonal environment have to be ``seen''. This type of problem belongs to the TSP with neighborhoods family. Moreover, the problem has to also deal with paths among obstacles, thus the problem is not the Euclidean TSP. The proposed approach demonstrates applicability of SOM principles in such problems in which SOM has not yet been applied.
In this paper, we present a combination of Self-Organizing Map (SOM) approach and navigation functions in the Traveling Salesman Problem (TSP) with segment goals. The problem arises from the inspection planning, where a path from which all points of the given polygonal environment have to be ``seen''. This type of problem belongs to the TSP with neighborhoods family. Moreover, the problem has to also deal with paths among obstacles, thus the problem is not the Euclidean TSP. The proposed approach demonstrates applicability of SOM principles in such problems in which SOM has not yet been applied.
ES2011-40
Statistical properties of the `Hopfield estimator' of dynamical systems
Miguel Atencia, Gonzalo Joya
Statistical properties of the `Hopfield estimator' of dynamical systems
Miguel Atencia, Gonzalo Joya
Abstract:
This paper analyses the statistical properties of a method for estimating the parameters of systems defined by ordinary differential equations. In previous work, this estimation method was defined as an adapted version of Hopfield neural networks, and its convergence and robustness with respect to signal disturbances were proved, even when parameters are time-varying. This contribution aims at establishing the behaviour of the estimation error by performing a set of simulations where a random noise with known probability distribution is added to signals. It is shown that, asymptotically, the estimator is unbiased and its variance vanishes. Further theoretical work is being undertaken in order to rigourously support these empirical findings.
This paper analyses the statistical properties of a method for estimating the parameters of systems defined by ordinary differential equations. In previous work, this estimation method was defined as an adapted version of Hopfield neural networks, and its convergence and robustness with respect to signal disturbances were proved, even when parameters are time-varying. This contribution aims at establishing the behaviour of the estimation error by performing a set of simulations where a random noise with known probability distribution is added to signals. It is shown that, asymptotically, the estimator is unbiased and its variance vanishes. Further theoretical work is being undertaken in order to rigourously support these empirical findings.
ES2011-31
Negatively Correlated Echo State Networks
Ali Rodan, Peter Tino
Negatively Correlated Echo State Networks
Ali Rodan, Peter Tino
Abstract:
Echo State Network (ESN) is a special type of recurrent neural network with fixed random recurrent part (reservoir) and a trainable reservoir-to-output readout mapping (typically obtained by linear regression). In this work we utilise an ensemble of ESNs with diverse reservoirs whose collective read-out is obtained through Negative Correlation Learning (NCL) of ensemble of Multi-Layer Perceptrons (MLP), where each individual MPL realises the readout from a single ESN. Experimental results on three data sets confirm that, compared with both single ESN and flat ensembles of ESNs, NCL based ESN ensembles achieve better generalisation performance.
Echo State Network (ESN) is a special type of recurrent neural network with fixed random recurrent part (reservoir) and a trainable reservoir-to-output readout mapping (typically obtained by linear regression). In this work we utilise an ensemble of ESNs with diverse reservoirs whose collective read-out is obtained through Negative Correlation Learning (NCL) of ensemble of Multi-Layer Perceptrons (MLP), where each individual MPL realises the readout from a single ESN. Experimental results on three data sets confirm that, compared with both single ESN and flat ensembles of ESNs, NCL based ESN ensembles achieve better generalisation performance.
ES2011-48
Reservoir regularization stabilizes learning of Echo State Networks with output feedback
R. Felix Reinhart, Jochen J. Steil
Reservoir regularization stabilizes learning of Echo State Networks with output feedback
R. Felix Reinhart, Jochen J. Steil
Abstract:
Output feedback is crucial for autonomous and parameterized pattern generation with reservoir networks. Read-out learning can lead to error amplification in these settings and therefore regularization is important for both generalization and reduction of error amplification. We show that regularization of the inner reservoir network mitigates parameter dependencies and boosts the task-specific performance.
Output feedback is crucial for autonomous and parameterized pattern generation with reservoir networks. Read-out learning can lead to error amplification in these settings and therefore regularization is important for both generalization and reduction of error amplification. We show that regularization of the inner reservoir network mitigates parameter dependencies and boosts the task-specific performance.
Semi-supervised learning
ES2011-57
A Multi-kernel Framework for Inductive Semi-supervised Learning
Xilan TIAN, Gilles Gasso, Stéphane Canu
A Multi-kernel Framework for Inductive Semi-supervised Learning
Xilan TIAN, Gilles Gasso, Stéphane Canu
Abstract:
We investigate the benefit of combining both cluster assumption and manifold assumption underlying most of the semi-supervised algorithms using the flexibility and the efficiency of multi-kernel learning. The multiple kernel version of Transductive SVM (a cluster assumption based approach) is proposed and it is solved based on DC (Difference of Convex functions) programming. Experiments on the benchmark data sets indicate the promising results of the proposed work.
We investigate the benefit of combining both cluster assumption and manifold assumption underlying most of the semi-supervised algorithms using the flexibility and the efficiency of multi-kernel learning. The multiple kernel version of Transductive SVM (a cluster assumption based approach) is proposed and it is solved based on DC (Difference of Convex functions) programming. Experiments on the benchmark data sets indicate the promising results of the proposed work.
ES2011-80
Training of multiple classifier systems utilizing partially labeled sequential data sets
Martin Schels, Patrick Schillinger, Friedhelm Schwenker
Training of multiple classifier systems utilizing partially labeled sequential data sets
Martin Schels, Patrick Schillinger, Friedhelm Schwenker
Abstract:
Making use of unlabeled data samples for training a classifier is a desirable aim for many real world applications in pattern recognition. In this study, a multiple classifier system is utilized to investigate this matter. Further, cluster analysis is used in order to group the available data while neglecting the actual labels. Then, by implementing an information fusion architecture based on these clusters, a classification architecture is constructed. This kind of an architecture is investigated by means of a facial expression data collection with focusing on one-against-one class decisions to produce locally “unlabeled”, i.e. not assigned to one of the considered classes, data.
Making use of unlabeled data samples for training a classifier is a desirable aim for many real world applications in pattern recognition. In this study, a multiple classifier system is utilized to investigate this matter. Further, cluster analysis is used in order to group the available data while neglecting the actual labels. Then, by implementing an information fusion architecture based on these clusters, a classification architecture is constructed. This kind of an architecture is investigated by means of a facial expression data collection with focusing on one-against-one class decisions to produce locally “unlabeled”, i.e. not assigned to one of the considered classes, data.
Computational Intelligence in Life Sciences
ES2011-5
Recent trends in computational intelligence in life sciences
Udo Seiffert, Frank-Michael Schleif, Dietlind Zühlke
Recent trends in computational intelligence in life sciences
Udo Seiffert, Frank-Michael Schleif, Dietlind Zühlke
ES2011-36
Adaptive Kernel Smoothing Regression for Spatio-Temporal Environmental Datasets
Federico Montesino Pouzols, Amaury Lendasse
Adaptive Kernel Smoothing Regression for Spatio-Temporal Environmental Datasets
Federico Montesino Pouzols, Amaury Lendasse
Abstract:
This paper describes a method for performing kernel smoothing regression in an incremental, adaptive manner. A simple and fast combination of incremental vector quantization with kernel smoothing regression using adaptive bandwidth is shown to be effective for online regression modeling of environmental datasets. The method is illustrated on openly available datasets corresponding to the Tropical Atmosphere Ocean array and the Helsinki Commission hydrographical database for the Baltic Sea.
This paper describes a method for performing kernel smoothing regression in an incremental, adaptive manner. A simple and fast combination of incremental vector quantization with kernel smoothing regression using adaptive bandwidth is shown to be effective for online regression modeling of environmental datasets. The method is illustrated on openly available datasets corresponding to the Tropical Atmosphere Ocean array and the Helsinki Commission hydrographical database for the Baltic Sea.
ES2011-18
Generalized functional relevance learning vector quantization
Marika Kaestner, Barbara Hammer, Michael Biehl, Thomas Villmann
Generalized functional relevance learning vector quantization
Marika Kaestner, Barbara Hammer, Michael Biehl, Thomas Villmann
Abstract:
Generalized learning vector quantization (GRLVQ) is a prototype based classification algorithm with metric adaptation weighting each data dimensions according to their relevance for the classification task. We present in this paper an extension for functional data, which are usually very high dimensional. This approach supposes the data vectors have to be functional representations. Taking into account, these information the so-called relevance profile are modeled by superposition of simple basis functions depending on only a few parameters. As a consequence, the resulting functional GRLVQ has drastically reduced number of parameters to be adapted for relevance learning. We demonstrate the ability of the new algorithms for standard functional data sets using different basis functions, namely Gaussians and Lorentzians.
Generalized learning vector quantization (GRLVQ) is a prototype based classification algorithm with metric adaptation weighting each data dimensions according to their relevance for the classification task. We present in this paper an extension for functional data, which are usually very high dimensional. This approach supposes the data vectors have to be functional representations. Taking into account, these information the so-called relevance profile are modeled by superposition of simple basis functions depending on only a few parameters. As a consequence, the resulting functional GRLVQ has drastically reduced number of parameters to be adapted for relevance learning. We demonstrate the ability of the new algorithms for standard functional data sets using different basis functions, namely Gaussians and Lorentzians.
ES2011-89
Patch Affinity Propagation
Xibin Zhu, Barbara Hammer
Patch Affinity Propagation
Xibin Zhu, Barbara Hammer
Abstract:
Affinity propagation constitutes an exemplar based clustering technique which reliably optimizes the quantization error given a matrix of pairwise data dissimilarities by means of the max-sum algorithm for factor graphs. Albeit very efficient for sparse matrices, it displays squared complexity in the worst case, hence it is not suited as high throughput method due to time and memory constraints. We propose an extension of affinity propagation to patch clustering such that data are treated in chunks of fixed size with limited memory requirements and linear time. We test the suitability of the approach for two biomedical applications.
Affinity propagation constitutes an exemplar based clustering technique which reliably optimizes the quantization error given a matrix of pairwise data dissimilarities by means of the max-sum algorithm for factor graphs. Albeit very efficient for sparse matrices, it displays squared complexity in the worst case, hence it is not suited as high throughput method due to time and memory constraints. We propose an extension of affinity propagation to patch clustering such that data are treated in chunks of fixed size with limited memory requirements and linear time. We test the suitability of the approach for two biomedical applications.
ES2011-20
Multispectral image characterization by partial generalized covariance
Marc Strickert, Björn Labitzke, Andreas Kolb, Thomas Villmann
Multispectral image characterization by partial generalized covariance
Marc Strickert, Björn Labitzke, Andreas Kolb, Thomas Villmann
Abstract:
A general method is presented for the assessment of data attribute variability, which plays an important role in initial screening of multi- and high-dimensional data sets. Instead of the commonly used second centralized moment, known as variance, the proposed method allows a mathematically rigorous characterization of attribute sensitivity given not only Euclidean distances but partial data comparisons by general similarity measures. Depending on the choice of measure different spectral features get highlighted by attribute assessment, this way creating new image segmentation aspects, as shown in a comparison of Euclidean distance, Pearson correlation and gamma divergence applied to multi-spectral images.
A general method is presented for the assessment of data attribute variability, which plays an important role in initial screening of multi- and high-dimensional data sets. Instead of the commonly used second centralized moment, known as variance, the proposed method allows a mathematically rigorous characterization of attribute sensitivity given not only Euclidean distances but partial data comparisons by general similarity measures. Depending on the choice of measure different spectral features get highlighted by attribute assessment, this way creating new image segmentation aspects, as shown in a comparison of Euclidean distance, Pearson correlation and gamma divergence applied to multi-spectral images.
Learning I
ES2011-14
Increased robustness and intermittent dynamics in structured Reservoir Networks with feedback
Sarah Jarvis, Stefan Rotter, Ulrich Egert
Increased robustness and intermittent dynamics in structured Reservoir Networks with feedback
Sarah Jarvis, Stefan Rotter, Ulrich Egert
Abstract:
Recent studies using feedforward Echo State Networks (ESN) demonstrate that reservoir stability can be strongly affected by reservoir substructures, such as clusters. Here, we evaluate the impact of including feedback on clustered ESNs and assert that certain cluster configurations extend the permissible range of spectral radius values. We also report a new class of reservoir activity: intermittent dynamics, characterized by variable periods of chaotic activity before returning to quiescent behaviour. Using a non-linear benchmark data set, we establish clustered ESNs have comparable performance against conventional ESNs, but importantly display an increased tolerance and robustness to spectral radius and input choice.
Recent studies using feedforward Echo State Networks (ESN) demonstrate that reservoir stability can be strongly affected by reservoir substructures, such as clusters. Here, we evaluate the impact of including feedback on clustered ESNs and assert that certain cluster configurations extend the permissible range of spectral radius values. We also report a new class of reservoir activity: intermittent dynamics, characterized by variable periods of chaotic activity before returning to quiescent behaviour. Using a non-linear benchmark data set, we establish clustered ESNs have comparable performance against conventional ESNs, but importantly display an increased tolerance and robustness to spectral radius and input choice.
ES2011-69
Anticipating Rewards in Continuous Time and Space with Echo State Networks and Actor-Critic Design
Mohamed Oubbati
Anticipating Rewards in Continuous Time and Space with Echo State Networks and Actor-Critic Design
Mohamed Oubbati
Abstract:
In this paper we implement echo state networks (ESNs) within the concept of actor-critic design to obtain optimal control policy for a mobile robot. The robot is asked to anticipate future rewards/punishments and react accordingly. In this design, the ESN critic has to consider continuous state/action space, deal with uncertainties, and be computationally cheap in order to deal with real-world constraints. Benefits and problems involved in a obstacle avoidance scenario are discussed, supported by real-world experiments.
In this paper we implement echo state networks (ESNs) within the concept of actor-critic design to obtain optimal control policy for a mobile robot. The robot is asked to anticipate future rewards/punishments and react accordingly. In this design, the ESN critic has to consider continuous state/action space, deal with uncertainties, and be computationally cheap in order to deal with real-world constraints. Benefits and problems involved in a obstacle avoidance scenario are discussed, supported by real-world experiments.
ES2011-60
Application of stochastic recurrent reinforcement learning to index trading
Denise Gorse
Application of stochastic recurrent reinforcement learning to index trading
Denise Gorse
Abstract:
A novel stochastic adaptation of the recurrent reinforcement learning (RRL) methodology is applied to daily, weekly, and monthly stock index data, and compared to results obtained elsewhere using genetic programming (GP). The data sets used have been a considered a challenging test for algorithmic trading. It is demonstrated that RRL can reliably outperform buy-and-hold for the higher frequency data, in contrast to GP which performed best for monthly data.
A novel stochastic adaptation of the recurrent reinforcement learning (RRL) methodology is applied to daily, weekly, and monthly stock index data, and compared to results obtained elsewhere using genetic programming (GP). The data sets used have been a considered a challenging test for algorithmic trading. It is demonstrated that RRL can reliably outperform buy-and-hold for the higher frequency data, in contrast to GP which performed best for monthly data.
ES2011-110
A brief tutorial on reinforcement learning: The game of Chung Toi
Christopher Gatti, Jonathan Linton, Mark Embrechts
A brief tutorial on reinforcement learning: The game of Chung Toi
Christopher Gatti, Jonathan Linton, Mark Embrechts
Abstract:
This work presents a simple implementation of reinforcement learning, using the temporal difference algorithm and a neural network, applied to the board game of Chung Toi, which is a challenging variation of Tic-Tac-Toe. The implementation of this learning algorithm is fully described and includes all parameter settings and various techniques to improve the ability of the network to learn the board game. With relatively little training, the network was able to win nearly 90% of games played against a 'smart' random opponent.
This work presents a simple implementation of reinforcement learning, using the temporal difference algorithm and a neural network, applied to the board game of Chung Toi, which is a challenging variation of Tic-Tac-Toe. The implementation of this learning algorithm is fully described and includes all parameter settings and various techniques to improve the ability of the network to learn the board game. With relatively little training, the network was able to win nearly 90% of games played against a 'smart' random opponent.
ES2011-62
D-VisionDraughts: a draughts player neural network that learns by reinforcement in a high performance environment
Araujo Barcelos Ayres Roberto, Silva Julia Rita Maria, Matias Junior Rivalino
D-VisionDraughts: a draughts player neural network that learns by reinforcement in a high performance environment
Araujo Barcelos Ayres Roberto, Silva Julia Rita Maria, Matias Junior Rivalino
Abstract:
This paper describes D-VisionDraughts, a distributed player agent for draughts which is based on Neural Networks trained by Temporal Differences. D-VisionDraughs is trained in a high performance environment and achieves a high level of play without expert game analysis and with minimum human intervention. D-VisionDraughts corresponds to a distributed version of the efficient agent player VisionDraughts. In this way, the main contributions of this paper consist on substituting the distributed Young Brothers Wait Concept algorithm (YBWC) for the serial alpha-beta search algorithm used in VisionDraughts and on measuring the impact of a high performance environment into the non supervised learning abilities of the player. Evaluative tests proved that even a modest distributed version counting on just 10 processors is able to reduce from 68% the search runtime and to increase from 15% its capacity of winning
This paper describes D-VisionDraughts, a distributed player agent for draughts which is based on Neural Networks trained by Temporal Differences. D-VisionDraughs is trained in a high performance environment and achieves a high level of play without expert game analysis and with minimum human intervention. D-VisionDraughts corresponds to a distributed version of the efficient agent player VisionDraughts. In this way, the main contributions of this paper consist on substituting the distributed Young Brothers Wait Concept algorithm (YBWC) for the serial alpha-beta search algorithm used in VisionDraughts and on measuring the impact of a high performance environment into the non supervised learning abilities of the player. Evaluative tests proved that even a modest distributed version counting on just 10 processors is able to reduce from 68% the search runtime and to increase from 15% its capacity of winning
ES2011-58
SO-VAT: Self-Organizing Visual Assessment of cluster Tendency for large data sets
Enrique Pelayo, Carlos Orrite, David Buldain
SO-VAT: Self-Organizing Visual Assessment of cluster Tendency for large data sets
Enrique Pelayo, Carlos Orrite, David Buldain
Abstract:
A new method, Self-Organizing Visual Assessment of cluster Tendency (SO-VAT), is given for visually assessing the cluster tendency in large data sets. It is based on training a SOM with the input samples, and then calculating the VAT image from a selected group of the generated neurons, selection that is done according to a certain density of activation. Tests with synthetic and real examples demonstrate that the new SO-VAT algorithm results in clearer images and shorter computing time than applying directly the VAT procedure to the whole input-data set.
A new method, Self-Organizing Visual Assessment of cluster Tendency (SO-VAT), is given for visually assessing the cluster tendency in large data sets. It is based on training a SOM with the input samples, and then calculating the VAT image from a selected group of the generated neurons, selection that is done according to a certain density of activation. Tests with synthetic and real examples demonstrate that the new SO-VAT algorithm results in clearer images and shorter computing time than applying directly the VAT procedure to the whole input-data set.
ES2011-104
New conditioning model for robots
Jean Marc Salotti
New conditioning model for robots
Jean Marc Salotti
Abstract:
We present a neural network for the prediction of rewards in a conditioning model. It is based on two noisy-or and one noisy-and nodes and update rules inspired from BANNER technique. Interesting results are presented with application to robotics.
We present a neural network for the prediction of rewards in a conditioning model. It is based on two noisy-or and one noisy-and nodes and update rules inspired from BANNER technique. Interesting results are presented with application to robotics.
ES2011-94
Stability of Neural Network Control for Uncertain Sampled-Data Systems
Pornchai Khlaeo-om, Sasikanchana Yenaeng, Sunya Pasuk, Supachai Aroonpun, Sompun Aumpawan
Stability of Neural Network Control for Uncertain Sampled-Data Systems
Pornchai Khlaeo-om, Sasikanchana Yenaeng, Sunya Pasuk, Supachai Aroonpun, Sompun Aumpawan
Abstract:
This paper derives robust stability conditions for neural network control of sampled-data systems whose parameters are uncertain. The controllers are nonlinear, full state regulators implemented as single hidden layer, feedforward neural networks. The controlled systems must be locally controllable and full-state accessible. The robust stability is confirmed by the existence of a Lyapunov function of the closed loop systems. A modified backpropagation algorithm with a model reference technique is employed to determine the weights of the controllers. Simulation results on the classical motor-driven inverted pendulum model are presented to demonstrate the applications of these conditions.
This paper derives robust stability conditions for neural network control of sampled-data systems whose parameters are uncertain. The controllers are nonlinear, full state regulators implemented as single hidden layer, feedforward neural networks. The controlled systems must be locally controllable and full-state accessible. The robust stability is confirmed by the existence of a Lyapunov function of the closed loop systems. A modified backpropagation algorithm with a model reference technique is employed to determine the weights of the controllers. Simulation results on the classical motor-driven inverted pendulum model are presented to demonstrate the applications of these conditions.
ES2011-55
Thresholds tuning of a neuro-symbolic net controlling a behavior-based robotic system
Mariacarla Staffa, Silvia Rossi, Massimo De Gregorio, Ernesto Burattini
Thresholds tuning of a neuro-symbolic net controlling a behavior-based robotic system
Mariacarla Staffa, Silvia Rossi, Massimo De Gregorio, Ernesto Burattini
Abstract:
In this paper we present the results obtained by adopting an evolutionary approach to tune some critical neuron thresholds of a neuro-symbolic net that regulates the overall emergent behavior of a behavior-based robotic system.
In this paper we present the results obtained by adopting an evolutionary approach to tune some critical neuron thresholds of a neuro-symbolic net that regulates the overall emergent behavior of a behavior-based robotic system.
ES2011-101
Ensemble Usage for More Reliable Policy Identification in Reinforcement Learning
Alexander Hans, Steffen Udluft
Ensemble Usage for More Reliable Policy Identification in Reinforcement Learning
Alexander Hans, Steffen Udluft
Abstract:
Reinforcement learning (RL) methods employing powerful function approximators like neural networks have become an interesting approach for optimal control. Since they learn a policy from observations, they are also applicable when no analytical description of the system is available. Although impressive results have been reported, their handling in practice is still hard, as they can fail at reliably determining a good policy. In previous work, we used ensembles of policies from independent runs of neural fitted Q-iteration (NFQ) to produce successful policies more reliable. In this paper we evaluate the approach on more problems and propose to form ensembles from successive iterations of a single NFQ run as a computationally cheap alternative to completely independent runs.
Reinforcement learning (RL) methods employing powerful function approximators like neural networks have become an interesting approach for optimal control. Since they learn a policy from observations, they are also applicable when no analytical description of the system is available. Although impressive results have been reported, their handling in practice is still hard, as they can fail at reliably determining a good policy. In previous work, we used ensembles of policies from independent runs of neural fitted Q-iteration (NFQ) to produce successful policies more reliable. In this paper we evaluate the approach on more problems and propose to form ensembles from successive iterations of a single NFQ run as a computationally cheap alternative to completely independent runs.
ES2011-16
Fisherman learning algorithm of the SOM realized in the CMOS technology
Rafal Dlugosz, Marta Kolasa, Witold Pedrycz
Fisherman learning algorithm of the SOM realized in the CMOS technology
Rafal Dlugosz, Marta Kolasa, Witold Pedrycz
Abstract:
This study presents an idea of transistor level realization of the fisherman learning algorithm of Self-Organizing Maps (SOMs). The realization of this algorithm in hardware calls for a soluion of several specific problems not present in software implementation. The main problem is related to an iterative nature of the adaptation process of the neighboring neurons positioned at particular rings surrounding the winning neuron. This makes the circuit structure of the SOM very complex. To come up with a feasible realization, we introduce some modifications to the original fisherman algorithm. Detailed simulations of the software model of the SOM show that these modifications do not have the negative impact on the learning process, and helps bring significant reduction of the circuit complexity. In consequence, a fully parallel adaptation of all neurons is possible, which makes the SOM very fast.
This study presents an idea of transistor level realization of the fisherman learning algorithm of Self-Organizing Maps (SOMs). The realization of this algorithm in hardware calls for a soluion of several specific problems not present in software implementation. The main problem is related to an iterative nature of the adaptation process of the neighboring neurons positioned at particular rings surrounding the winning neuron. This makes the circuit structure of the SOM very complex. To come up with a feasible realization, we introduce some modifications to the original fisherman algorithm. Detailed simulations of the software model of the SOM show that these modifications do not have the negative impact on the learning process, and helps bring significant reduction of the circuit complexity. In consequence, a fully parallel adaptation of all neurons is possible, which makes the SOM very fast.
ES2011-12
Abstract category learning
Atsushi Hashimoto, Haruo Hosoya
Abstract category learning
Atsushi Hashimoto, Haruo Hosoya
Abstract:
Motivated by a neurophysiological experiment on prefrontal cortex, we study a scheme for learning abstract categories. An abstract category represents a set of vectors that are identical to each other modulo substitution, e.g., 'ABAB', 'BABA', 'ACAC', etc. We employ a clustering-based unsupervised learning method for such abstract categories, in which the recognition step is reduced to the problem of maximal perfect weight matching. Our simulations using artificial inputs confirm that the scheme learns abstract categories robustly even with a certain level of noise in the inputs.
Motivated by a neurophysiological experiment on prefrontal cortex, we study a scheme for learning abstract categories. An abstract category represents a set of vectors that are identical to each other modulo substitution, e.g., 'ABAB', 'BABA', 'ACAC', etc. We employ a clustering-based unsupervised learning method for such abstract categories, in which the recognition step is reduced to the problem of maximal perfect weight matching. Our simulations using artificial inputs confirm that the scheme learns abstract categories robustly even with a certain level of noise in the inputs.
ES2011-97
Symbolic computing of LS-SVM based models
Siamak Mehrkanoon, Li Jiang, Carlos Alzate, J.A.K. Suykens
Symbolic computing of LS-SVM based models
Siamak Mehrkanoon, Li Jiang, Carlos Alzate, J.A.K. Suykens
Abstract:
This paper introduces a software tool emph{SYM-LS-SVM-SOLVER} written in emph{Maple} to derive the dual system and the dual model representation of LS-SVM based models, symbolically. emph{SYM-LS-SVM-SOLVER} constructs the Lagrangian from the given objective function and list of constraints. Afterwards it obtains the KKT (Karush-Kuhn-Tucker) optimality conditions and finally formulates a linear system in terms of the dual variables. The effectiveness of the developed solver is illustrated by applying it to a variety of problems involving LS-SVM based models.
This paper introduces a software tool emph{SYM-LS-SVM-SOLVER} written in emph{Maple} to derive the dual system and the dual model representation of LS-SVM based models, symbolically. emph{SYM-LS-SVM-SOLVER} constructs the Lagrangian from the given objective function and list of constraints. Afterwards it obtains the KKT (Karush-Kuhn-Tucker) optimality conditions and finally formulates a linear system in terms of the dual variables. The effectiveness of the developed solver is illustrated by applying it to a variety of problems involving LS-SVM based models.
ES2011-43
Sparse LS-SVMs with L0–norm minimization
J. López, K. De Brabanter, J.R. Dorronsoro, J.A.K. Suykens
Sparse LS-SVMs with L0–norm minimization
J. López, K. De Brabanter, J.R. Dorronsoro, J.A.K. Suykens
Abstract:
Least-Squares Support Vector Machines (LS-SVMs) have been successfully applied in many classification and regression tasks. Their main drawback is the lack of sparseness of the final models. Thus, a procedure to sparsify LS-SVMs is a frequent desideratum. In this paper, we adapt to the LS-SVM case a recent work for sparsifying classical SVM classifiers, which is based on an iterative approximation to the L0-norm. Experiments on real-world classification and regression datasets illustrate that this adaptation achieves very sparse models, without significant loss of accuracy compared to standard LS-SVMs or SVMs.
Least-Squares Support Vector Machines (LS-SVMs) have been successfully applied in many classification and regression tasks. Their main drawback is the lack of sparseness of the final models. Thus, a procedure to sparsify LS-SVMs is a frequent desideratum. In this paper, we adapt to the LS-SVM case a recent work for sparsifying classical SVM classifiers, which is based on an iterative approximation to the L0-norm. Experiments on real-world classification and regression datasets illustrate that this adaptation achieves very sparse models, without significant loss of accuracy compared to standard LS-SVMs or SVMs.
ES2011-29
A post-processing strategy for SVM learning from unbalanced data
Haydemar Núñez, Luis Gonzalez-Abril, Cecilio Angulo
A post-processing strategy for SVM learning from unbalanced data
Haydemar Núñez, Luis Gonzalez-Abril, Cecilio Angulo
Abstract:
Standard learning algorithms may perform poorly when learning from unbalanced datasets. Based on the Fisher's discriminant analysis, a post-processing strategy is introduced to deal datasets with significant imbalance in the data distribution. Empirical results from experiments for a learned SVM model on twelve UCI datasets indicates that the proposed solution improves the original SVM, and they also improve those reported when using a z-SVM, in terms of g-mean and sensitivity.
Standard learning algorithms may perform poorly when learning from unbalanced datasets. Based on the Fisher's discriminant analysis, a post-processing strategy is introduced to deal datasets with significant imbalance in the data distribution. Empirical results from experiments for a learned SVM model on twelve UCI datasets indicates that the proposed solution improves the original SVM, and they also improve those reported when using a z-SVM, in terms of g-mean and sensitivity.
ES2011-87
Clustering data streams with weightless neural networks
Douglas de O. Cardoso, Priscila M. V. Lima Lima, Massimo De Gregorio, João Gama, Felipe M. G. França
Clustering data streams with weightless neural networks
Douglas de O. Cardoso, Priscila M. V. Lima Lima, Massimo De Gregorio, João Gama, Felipe M. G. França
Abstract:
Producing good quality clustering of data streams in real time is a difficult problem, since it is necessary to perform the analysis of data points arriving in a continuous style, with the support of quite limited computational resources. The incremental and evolving nature of the resulting clustering structures must reflect the dynamics of the target data stream. The WiSARD weightless perceptron, and its associated DRASiW extension, are intrinsically capable of, respectively, performing one-shot learning and producing prototypes of the learnt categories. This work introduces a simple generalization of RAM-based neurons in order to explore both weightless neural models in the data stream clustering problem.
Producing good quality clustering of data streams in real time is a difficult problem, since it is necessary to perform the analysis of data points arriving in a continuous style, with the support of quite limited computational resources. The incremental and evolving nature of the resulting clustering structures must reflect the dynamics of the target data stream. The WiSARD weightless perceptron, and its associated DRASiW extension, are intrinsically capable of, respectively, performing one-shot learning and producing prototypes of the learnt categories. This work introduces a simple generalization of RAM-based neurons in order to explore both weightless neural models in the data stream clustering problem.
ES2011-49
Locating Anomalies Using Bayesian Factorizations and Masks
Li Yao, Amaury Lendasse, Francesco Corona
Locating Anomalies Using Bayesian Factorizations and Masks
Li Yao, Amaury Lendasse, Francesco Corona
Abstract:
A plethora of methods have been developed to handle anomaly detection in various application domains. This work focuses on locating anomalies inside a categorical data set without assuming any specific domain knowledge. By exploiting the conditional dependence and independence relationships among data attributes, not only can data analysts recognize the anomaly, but also locate the potentially anomalous attributes inside an anomalous instance following its masks. Masks are geometrically generated based on the factorization of the joint probability from a Bayesian network automatically learnt from the given data set.
A plethora of methods have been developed to handle anomaly detection in various application domains. This work focuses on locating anomalies inside a categorical data set without assuming any specific domain knowledge. By exploiting the conditional dependence and independence relationships among data attributes, not only can data analysts recognize the anomaly, but also locate the potentially anomalous attributes inside an anomalous instance following its masks. Masks are geometrically generated based on the factorization of the joint probability from a Bayesian network automatically learnt from the given data set.
ES2011-42
Comparison of the Complex Valued and Real Valued Neural Networks Trained with Gradient Descent and Random Search Algorithms
Hans-Georg Zimmermann, Alexey Minin, Viktoria Kusherbaeva
Comparison of the Complex Valued and Real Valued Neural Networks Trained with Gradient Descent and Random Search Algorithms
Hans-Georg Zimmermann, Alexey Minin, Viktoria Kusherbaeva
Abstract:
Complex Valued Neural Network is one of the open topics in the machine learning society. In this paper we will try to get through the problems of the complex valued neural networks. The outcome of the current research is the combined global-local algorithm for training the complex valued feed forward neural network which is appropriate for the considered chaotic problem.
Complex Valued Neural Network is one of the open topics in the machine learning society. In this paper we will try to get through the problems of the complex valued neural networks. The outcome of the current research is the combined global-local algorithm for training the complex valued feed forward neural network which is appropriate for the considered chaotic problem.
Seeing is believing: The importance of visualization in real-world machine learning applications
ES2011-3
Seeing is believing: The importance of visualization in real-world machine learning applications
Alfredo Vellido, José D Martín, Fabrice Rossi, Paulo Lisboa
Seeing is believing: The importance of visualization in real-world machine learning applications
Alfredo Vellido, José D Martín, Fabrice Rossi, Paulo Lisboa
Abstract:
The increasing availability of data sets with a huge amount of information, coded in many different features, justifies the research on new methods of knowledge extraction: the great challenge is the translation of the raw data into useful information that can be used to improve decision-making processes, detect relevant profiles, find out relationships among features, etc. It is undoubtedly true that a picture is worth a thousand words, what makes visualization methods be likely the most appealing and one of the most relevant kinds of knowledge extration methods. At ESANN 2011, the special session ``Seeing is believing: The importance of visualization in real-world machine learning applications'' reflects some of the main emerging topics in the field. This tutorial prefaces the session, summarizing some of its contributions, while also providing some clues to the current state and the near future of visualization methods within the framework of Machine Learning.
The increasing availability of data sets with a huge amount of information, coded in many different features, justifies the research on new methods of knowledge extraction: the great challenge is the translation of the raw data into useful information that can be used to improve decision-making processes, detect relevant profiles, find out relationships among features, etc. It is undoubtedly true that a picture is worth a thousand words, what makes visualization methods be likely the most appealing and one of the most relevant kinds of knowledge extration methods. At ESANN 2011, the special session ``Seeing is believing: The importance of visualization in real-world machine learning applications'' reflects some of the main emerging topics in the field. This tutorial prefaces the session, summarizing some of its contributions, while also providing some clues to the current state and the near future of visualization methods within the framework of Machine Learning.
ES2011-24
Hierarchical clustering for graph visualization
Stéphan Clémençon, Hector De Arazoza, Fabrice Rossi, Viet Chi Tran
Hierarchical clustering for graph visualization
Stéphan Clémençon, Hector De Arazoza, Fabrice Rossi, Viet Chi Tran
Abstract:
This paper describes a graph visualization methodology based on hierarchical maximal modularity clustering, with interactive and significant coarsening and refining possibilities. An application of this method to HIV epidemic analysis in Cuba is outlined.
This paper describes a graph visualization methodology based on hierarchical maximal modularity clustering, with interactive and significant coarsening and refining possibilities. An application of this method to HIV epidemic analysis in Cuba is outlined.
ES2011-44
A probabilistic approach to the visual exploration of G Protein-Coupled Receptor sequences
Alfredo Vellido, Martha Ivón Cárdenas, Iván Olier, Xavier Rovira, Jesús Giraldo
A probabilistic approach to the visual exploration of G Protein-Coupled Receptor sequences
Alfredo Vellido, Martha Ivón Cárdenas, Iván Olier, Xavier Rovira, Jesús Giraldo
Abstract:
The study of G protein-coupled receptors (GPCRs) is of great interest in pharmaceutical research, but only a few of their 3D structures are known at present. On the contrary, their amino acid sequences are known and accessible. Sequence analysis can provide new insight on GPCR function. Here, we use a kernel-based statistical machine learning model for the visual exploration of GPCR functional groups from their sequences. This is based on the rich information provided by the model regarding the probability of each sequence belonging to a certain receptor group.
The study of G protein-coupled receptors (GPCRs) is of great interest in pharmaceutical research, but only a few of their 3D structures are known at present. On the contrary, their amino acid sequences are known and accessible. Sequence analysis can provide new insight on GPCR function. Here, we use a kernel-based statistical machine learning model for the visual exploration of GPCR functional groups from their sequences. This is based on the rich information provided by the model regarding the probability of each sequence belonging to a certain receptor group.
ES2011-74
Growing Hierarchical Sectors on Sectors
José M Martínez, Pablo Escandell, Emilio Soria, José D Martín, Juan Gómez, Joan Vila
Growing Hierarchical Sectors on Sectors
José M Martínez, Pablo Escandell, Emilio Soria, José D Martín, Juan Gómez, Joan Vila
Abstract:
Self-organizing maps are widely used in visual data mining. This paper proposes a new visualization approach for GHSOM algorithm, a hierarchical variant of SOM. The method is based on pie charts. That improves the visualization in hierarchical data structures making possible to extract all the existing relationships among the attributes of the neurons at any hierarchy level. The methodology is tested in one synthetic data set and one real data set. Achieved results show the suitability and usefulness of the proposed approach.
Self-organizing maps are widely used in visual data mining. This paper proposes a new visualization approach for GHSOM algorithm, a hierarchical variant of SOM. The method is based on pie charts. That improves the visualization in hierarchical data structures making possible to extract all the existing relationships among the attributes of the neurons at any hierarchy level. The methodology is tested in one synthetic data set and one real data set. Achieved results show the suitability and usefulness of the proposed approach.
ES2011-56
Analysis of a Reinforcement Learning algorithm using Self-Organizing Maps
Vicente Buendia-Ramon, Emilio Soria, José D Martín, Pablo Escandell, José M Martínez
Analysis of a Reinforcement Learning algorithm using Self-Organizing Maps
Vicente Buendia-Ramon, Emilio Soria, José D Martín, Pablo Escandell, José M Martínez
Abstract:
The scenario of this work is defined by the need of many Machine Learning algorithms to tune a number of parameters that define its behavior; the resulting performance can be evaluated with different indices. The relationship between parameters and performance may be neither linear nor straightforward. This work proposes a qualitative approach to the afore-mentioned relationship by using Self-Organizing Maps due to their visual information processing. The approach is evaluated in the framework of Reinforcement Learning algorithms.
The scenario of this work is defined by the need of many Machine Learning algorithms to tune a number of parameters that define its behavior; the resulting performance can be evaluated with different indices. The relationship between parameters and performance may be neither linear nor straightforward. This work proposes a qualitative approach to the afore-mentioned relationship by using Self-Organizing Maps due to their visual information processing. The approach is evaluated in the framework of Reinforcement Learning algorithms.
Learning theory
ES2011-15
General bound of overfitting for MLP regression models
joseph Rynkiewicz
General bound of overfitting for MLP regression models
joseph Rynkiewicz
Abstract:
Multilayer perceptrons (MLP) with one hidden layer have been used for a long time to deal with non-linear regression. However, in some task, MLP's are too powerful models and a small mean square error (MSE) may be more due to overfitting than to actual modelling. If the noise of the regression model is Gaussian, the overfitting of the model is totally determined by the behavior of the likelihood ratio test statistic (LRTS), however in numerous cases the assumption of normality of the noise is arbitrary if not false. In this paper, we present an universal bound for the overfitting of such model under smooth assumptions, this bound is valid without Gaussian or identifiability assumptions. The main application of this bound is to give a hint about determining the true architecture of the MLP model when the number of data goes to infinite.
Multilayer perceptrons (MLP) with one hidden layer have been used for a long time to deal with non-linear regression. However, in some task, MLP's are too powerful models and a small mean square error (MSE) may be more due to overfitting than to actual modelling. If the noise of the regression model is Gaussian, the overfitting of the model is totally determined by the behavior of the likelihood ratio test statistic (LRTS), however in numerous cases the assumption of normality of the noise is arbitrary if not false. In this paper, we present an universal bound for the overfitting of such model under smooth assumptions, this bound is valid without Gaussian or identifiability assumptions. The main application of this bound is to give a hint about determining the true architecture of the MLP model when the number of data goes to infinite.
ES2011-78
Maximal Discrepancy vs. Rademacher Complexity for error estimation
Davide Anguita, Alessandro Ghio, Luca Oneto, Sandro Ridella
Maximal Discrepancy vs. Rademacher Complexity for error estimation
Davide Anguita, Alessandro Ghio, Luca Oneto, Sandro Ridella
Abstract:
The Maximal Discrepancy and the Rademacher Complexity are powerful statistical tools that can be exploited to obtain reliable, albeit not tight, upper bounds of the generalization error of a classifier. We study the different behavior of the two methods when applied to linear classifiers and suggest a practical procedure to tighten the bounds. The resulting generalization estimation can be succesfully used for classifier model selection.
The Maximal Discrepancy and the Rademacher Complexity are powerful statistical tools that can be exploited to obtain reliable, albeit not tight, upper bounds of the generalization error of a classifier. We study the different behavior of the two methods when applied to linear classifiers and suggest a practical procedure to tighten the bounds. The resulting generalization estimation can be succesfully used for classifier model selection.
Feature selection and dimensionality reduction
ES2011-8
Class-Specific Feature Selection for One-Against-All Multiclass SVMs
Gaël de Lannoy, Damien Francois, Michel Verleysen
Class-Specific Feature Selection for One-Against-All Multiclass SVMs
Gaël de Lannoy, Damien Francois, Michel Verleysen
Abstract:
This paper proposes a method to perform class-specific feature selection in multiclass addressed solved with the one-against-all strategy. The main issue arises at the final step of the classification process, where binary classifier outputs must be compared one against another to elect the winning class. This comparison may be biased towards one specific class when the binary classifiers are built on distinct feature subsets. This paper proposes a normalization of the binary classifiers outputs that allows fair comparisons in such cases.
This paper proposes a method to perform class-specific feature selection in multiclass addressed solved with the one-against-all strategy. The main issue arises at the final step of the classification process, where binary classifier outputs must be compared one against another to elect the winning class. This comparison may be biased towards one specific class when the binary classifiers are built on distinct feature subsets. This paper proposes a normalization of the binary classifiers outputs that allows fair comparisons in such cases.
ES2011-50
Mutual information for feature selection with missing data
Gauthier Doquire, Michel Verleysen
Mutual information for feature selection with missing data
Gauthier Doquire, Michel Verleysen
Abstract:
Feature selection is an important task for many machine learning applications; moreover missing data are encoutered very often in practice. This paper proposes to adapt a nearest neighbors based mutual information estimator to handle missing data and to use it to achieve feature selection. Results on articial and real world datasets show that the method is able to select important features without the need for any imputation algorithm. Moreover, experiments also indicate that selecting the features before imputing the data generally increases the precision of the prediction models.
Feature selection is an important task for many machine learning applications; moreover missing data are encoutered very often in practice. This paper proposes to adapt a nearest neighbors based mutual information estimator to handle missing data and to use it to achieve feature selection. Results on articial and real world datasets show that the method is able to select important features without the need for any imputation algorithm. Moreover, experiments also indicate that selecting the features before imputing the data generally increases the precision of the prediction models.
ES2011-19
Probabilistic Fisher discriminant analysis
Charles Bouveyron, Camille Brunet
Probabilistic Fisher discriminant analysis
Charles Bouveyron, Camille Brunet
Abstract:
Fisher Discriminant Analysis (FDA) is a popular method for dimensionality reduction and classification but has poor performances in the cases of label noise and sparse labeled data. To overcome these limitations, we propose a probabilistic framework for FDA and extend it to the semi-supervised case. Experiments on real-world datasets show that the proposed approach works as well as FDA in standard situations and outperform it in the label noise and sparse label cases.
Fisher Discriminant Analysis (FDA) is a popular method for dimensionality reduction and classification but has poor performances in the cases of label noise and sparse labeled data. To overcome these limitations, we propose a probabilistic framework for FDA and extend it to the semi-supervised case. Experiments on real-world datasets show that the proposed approach works as well as FDA in standard situations and outperform it in the label noise and sparse label cases.
ES2011-73
Supervised dimension reduction mappings
Kerstin Bunte, Michael Biehl, Barbara Hammer
Supervised dimension reduction mappings
Kerstin Bunte, Michael Biehl, Barbara Hammer
Abstract:
We propose a general principle to extend dimension reduction tools to explicit dimension reduction mappings and we show that this can serve as an interface to incorporate prior knowledge in the form of class labels. We explicitly demonstrate this technique by combining locally linear mappings which result from matrix learning vector quantization schemes with the t-distributed stochastic neighbor embedding cost function. The technique is tested on several benchmark data sets.
We propose a general principle to extend dimension reduction tools to explicit dimension reduction mappings and we show that this can serve as an interface to incorporate prior knowledge in the form of class labels. We explicitly demonstrate this technique by combining locally linear mappings which result from matrix learning vector quantization schemes with the t-distributed stochastic neighbor embedding cost function. The technique is tested on several benchmark data sets.
Learning of causal relations
ES2011-2
Learning of causal relations
John A. Quinn, Joris Mooij, Tom Heskes, Michael Biehl
Learning of causal relations
John A. Quinn, Joris Mooij, Tom Heskes, Michael Biehl
Abstract:
To learn about causal relations between variables just by observing samples from them, particular assumptions must be made about those variables' distributions. This article gives a practical description of how such a learning task can be undertaken based on different possible assumptions. Two categories of assumptions lead to different methods, constraint-based and Bayesian learning, and in each case we review both the basic ideas and some recent extensions and alternatives to them.
To learn about causal relations between variables just by observing samples from them, particular assumptions must be made about those variables' distributions. This article gives a practical description of how such a learning task can be undertaken based on different possible assumptions. Two categories of assumptions lead to different methods, constraint-based and Bayesian learning, and in each case we review both the basic ideas and some recent extensions and alternatives to them.
ES2011-93
Inferring the causal decomposition under the presence of deterministic relations
Jan Lemeire, Stijn Meganck, Francesco Cartella, Tingting Liu, Alexander Statnikov
Inferring the causal decomposition under the presence of deterministic relations
Jan Lemeire, Stijn Meganck, Francesco Cartella, Tingting Liu, Alexander Statnikov
Abstract:
The presence of deterministic relations pose problems for current algorithms that learn the causal structure of a system based on the observed conditional independencies. Deterministic variables lead to information equivalences; two sets of variables have the same information about a third variable. Based on information content, one cannot decide on the direct causes. Several edges model equally well the dependencies. We call them equivalent edges. We propose to select among the equivalent edges the one with the simplest descriptive complexity. This approach assumes that the descriptive complexity increases along a causal path. As confirmed by our experimental results, the accuracy of the method depends on the chance of accidental matches of complexities.
The presence of deterministic relations pose problems for current algorithms that learn the causal structure of a system based on the observed conditional independencies. Deterministic variables lead to information equivalences; two sets of variables have the same information about a third variable. Based on information content, one cannot decide on the direct causes. Several edges model equally well the dependencies. We call them equivalent edges. We propose to select among the equivalent edges the one with the simplest descriptive complexity. This approach assumes that the descriptive complexity increases along a causal path. As confirmed by our experimental results, the accuracy of the method depends on the chance of accidental matches of complexities.
ES2011-106
A unified approach to estimation and control of the False Discovery Rate in Bayesian network skeleton identification
Angelos P. Armen, Ioannis Tsamardinos
A unified approach to estimation and control of the False Discovery Rate in Bayesian network skeleton identification
Angelos P. Armen, Ioannis Tsamardinos
Abstract:
Constraint-based Bayesian network (BN) structure learning algorithms typically control the False Positive Rate (FPR) of their skeleton identification phase. The False Discovery Rate (FDR), however, may be of greater interest and methods for its utilization by these algorithms have been recently devised. We present a unified approach to BN skeleton identification FDR estimation and control and experimentally evaluate the performance of FDR estimators in both tasks over several networks. We demonstrate that estimation is too conservative for most networks and strong control at common FDR thresholds is not achieved with some networks; finally, we identify the possible causes of this situation.
Constraint-based Bayesian network (BN) structure learning algorithms typically control the False Positive Rate (FPR) of their skeleton identification phase. The False Discovery Rate (FDR), however, may be of greater interest and methods for its utilization by these algorithms have been recently devised. We present a unified approach to BN skeleton identification FDR estimation and control and experimentally evaluate the performance of FDR estimators in both tasks over several networks. We demonstrate that estimation is too conservative for most networks and strong control at common FDR thresholds is not achieved with some networks; finally, we identify the possible causes of this situation.
ES2011-82
A structure independent algorithm for causal discovery
Tom Claassen, Tom Heskes
A structure independent algorithm for causal discovery
Tom Claassen, Tom Heskes
Abstract:
We present two inference rules, based on so called minimal conditional independencies, that are sufficient to find all invariant arrowheads in a single causal DAG, even when selection bias may be present. It turns out that the set of seven graphical orientation rules that are usually employed to identify these arrowheads are, in fact, just different instances/manifestations of these two rules. The resulting algorithm to obtain all definite causal information is elegant and fast, once the (often surprisingly small) set of minimal independencies is found.
We present two inference rules, based on so called minimal conditional independencies, that are sufficient to find all invariant arrowheads in a single causal DAG, even when selection bias may be present. It turns out that the set of seven graphical orientation rules that are usually employed to identify these arrowheads are, in fact, just different instances/manifestations of these two rules. The resulting algorithm to obtain all definite causal information is elegant and fast, once the (often surprisingly small) set of minimal independencies is found.
ES2011-84
Causal relevance learning for robust classification under interventions
Ernest Mwebaze, John A. Quinn, Michael Biehl
Causal relevance learning for robust classification under interventions
Ernest Mwebaze, John A. Quinn, Michael Biehl
Abstract:
In some classification problems the distribution of the test data is different from that of the training data because of external manipulations to the variables we observe. We propose a classification scheme which is robust to outside interventions by identifying causes in the training data, given that causes of a target variable remain predictive even when the data is manipulated. We do this be extending Relevance Learning Vector Quantization (RLVQ), a classification scheme that learns a relevance profile for the classification task presented. Our proposed algorithm, Causal-RLVQ, learns a relevance profile that weights causally relevant features more strongly. The algorithm can determine a tradeoff between robustness to intervention and accuracy on non-manipulated data, yielding RLVQ as a special case.
In some classification problems the distribution of the test data is different from that of the training data because of external manipulations to the variables we observe. We propose a classification scheme which is robust to outside interventions by identifying causes in the training data, given that causes of a target variable remain predictive even when the data is manipulated. We do this be extending Relevance Learning Vector Quantization (RLVQ), a classification scheme that learns a relevance profile for the classification task presented. Our proposed algorithm, Causal-RLVQ, learns a relevance profile that weights causally relevant features more strongly. The algorithm can determine a tradeoff between robustness to intervention and accuracy on non-manipulated data, yielding RLVQ as a special case.
ES2011-76
A constraint-based approach to incorporate prior knowledge in causal models
Giorgos Borboudakis, Sofia Triantafillou, Vincenzo Lagani, Ioannis Tsamardinos
A constraint-based approach to incorporate prior knowledge in causal models
Giorgos Borboudakis, Sofia Triantafillou, Vincenzo Lagani, Ioannis Tsamardinos
Abstract:
In this paper we address the problem of incorporating prior knowledge, in the form of causal relations, in causal models. We use the framework of Maximal Ancestral Graphs and use a constraint-based approach, allowing the incorporation of prior knowledge in a more flexible manner compared to previous approaches. Finally, we discuss the limitation of the used graphical models to capture causal knowledge, as well as open questions regarding this problem.
In this paper we address the problem of incorporating prior knowledge, in the form of causal relations, in causal models. We use the framework of Maximal Ancestral Graphs and use a constraint-based approach, allowing the incorporation of prior knowledge in a more flexible manner compared to previous approaches. Finally, we discuss the limitation of the used graphical models to capture causal knowledge, as well as open questions regarding this problem.
Learning II
ES2011-23
Selecting from an infinite set of features in SVM
Remi Flamary, Florian Yger, Alain Rakotomamonjy
Selecting from an infinite set of features in SVM
Remi Flamary, Florian Yger, Alain Rakotomamonjy
Abstract:
Dealing with the continuous parameters of a feature extraction method has always been a difficult task that is usually solved by cross-validation. In this paper, we propose an active set algorithm for selecting automatically these parameters in a SVM classification context. Our experiments on texture recognition and BCI signal classification show that optimizing the feature parameters in a continuous space while learning the decision function yields to better performances than using fixed parameters obtained from a grid sampling.
Dealing with the continuous parameters of a feature extraction method has always been a difficult task that is usually solved by cross-validation. In this paper, we propose an active set algorithm for selecting automatically these parameters in a SVM classification context. Our experiments on texture recognition and BCI signal classification show that optimizing the feature parameters in a continuous space while learning the decision function yields to better performances than using fixed parameters obtained from a grid sampling.
ES2011-51
Mutual information based feature selection for mixed data
Gauthier Doquire, Michel Verleysen
Mutual information based feature selection for mixed data
Gauthier Doquire, Michel Verleysen
Abstract:
The problem of feature selection is crucial for many applications and has thus been studied extensively. However, most of the existing methods are designed to handle data consisting only in categorical or in real-valued features while a mix of both kinds of features is often encoutered in practice. This paper proposes an approach based on mutual information and the maximal Relevance minimal Redundancy principle to handle the case of mixed data. It combines aspects of both wrapper and filter methods and is well suited for regression problems. Experiments on artificial and real-world datasets show the interest of the methodology.
The problem of feature selection is crucial for many applications and has thus been studied extensively. However, most of the existing methods are designed to handle data consisting only in categorical or in real-valued features while a mix of both kinds of features is often encoutered in practice. This paper proposes an approach based on mutual information and the maximal Relevance minimal Redundancy principle to handle the case of mixed data. It combines aspects of both wrapper and filter methods and is well suited for regression problems. Experiments on artificial and real-world datasets show the interest of the methodology.
ES2011-91
Unsupervised feature selection for sparse data
Artur Ferreira, Mario Figueiredo
Unsupervised feature selection for sparse data
Artur Ferreira, Mario Figueiredo
Abstract:
Feature selection is a well-known problem in machine learning and pattern recognition. Many high-dimensional datasets are sparse, that is, many features have zero value. In some cases, we do not known the class label for some (or even all) patterns in the dataset, leading us to semi-supervised or unsupervised learning problems. For instance, in text classification with the bag-of-words (BoW) representations, there is usually a large number of features, many of which may be irrelevant (or even detrimental) for categorization tasks. In this paper, we propose one efficient unsupervised feature selection technique for sparse data, suitable for both standard floating point and binary features. The experimental results on standard datasets show that the proposed method yields efficient feature selection, reducing the number of features while simultaneously improving the classification accuracy.
Feature selection is a well-known problem in machine learning and pattern recognition. Many high-dimensional datasets are sparse, that is, many features have zero value. In some cases, we do not known the class label for some (or even all) patterns in the dataset, leading us to semi-supervised or unsupervised learning problems. For instance, in text classification with the bag-of-words (BoW) representations, there is usually a large number of features, many of which may be irrelevant (or even detrimental) for categorization tasks. In this paper, we propose one efficient unsupervised feature selection technique for sparse data, suitable for both standard floating point and binary features. The experimental results on standard datasets show that the proposed method yields efficient feature selection, reducing the number of features while simultaneously improving the classification accuracy.
ES2011-53
Multi-class classification in the presence of labelling errors
Jakramate Bootkrajang, Ata Kaban
Multi-class classification in the presence of labelling errors
Jakramate Bootkrajang, Ata Kaban
Abstract:
Learning a classifier from a training set that contains labelling errors is a difficult, yet not very well studied problem. Here we present a model-based approach that extends multi-class quadratic normal discriminant analysis with a model of the mislabelling process. We demonstrate the benefits of this approach in terms of parameter recovery as well as improved classification performance, on both synthetic and real-world multi-class problems. We also obtain enhanced accuracy in comparison with a previous model-free approach.
Learning a classifier from a training set that contains labelling errors is a difficult, yet not very well studied problem. Here we present a model-based approach that extends multi-class quadratic normal discriminant analysis with a model of the mislabelling process. We demonstrate the benefits of this approach in terms of parameter recovery as well as improved classification performance, on both synthetic and real-world multi-class problems. We also obtain enhanced accuracy in comparison with a previous model-free approach.
ES2011-28
Principal component analysis for unsupervised calibration of bio-inspired airflow array sensors
Michiel Van Dyck, Herbert Peremans
Principal component analysis for unsupervised calibration of bio-inspired airflow array sensors
Michiel Van Dyck, Herbert Peremans
Abstract:
This paper describes the automatic calibration of a set of air flow sensitive sensors on a robot exposed to unknown random air flow stimuli. This might support the idea that the cricket cercus neural system in the terminal abdominal ganglion is evolved by learning. The algorithm makes use of the singular value decomposition (SVD) and the known reduced model dimension of the system for learning the sensor array setup. The absolute orientation of the array can only be found in function of a reference flow or reference sensor which must be calibrated manually. When only a change in airflow measure is needed, the reference sensor can be left uncalibrated.
This paper describes the automatic calibration of a set of air flow sensitive sensors on a robot exposed to unknown random air flow stimuli. This might support the idea that the cricket cercus neural system in the terminal abdominal ganglion is evolved by learning. The algorithm makes use of the singular value decomposition (SVD) and the known reduced model dimension of the system for learning the sensor array setup. The absolute orientation of the array can only be found in function of a reference flow or reference sensor which must be calibrated manually. When only a change in airflow measure is needed, the reference sensor can be left uncalibrated.
ES2011-27
Effects of sparseness and randomness of pairwise distance matrix on t-SNE results
Eli Parviainen
Effects of sparseness and randomness of pairwise distance matrix on t-SNE results
Eli Parviainen
Abstract:
We apply ideas from random graph theory to sparse pairwise distance matrices in dimension reduction. We use matrices with some short and some randomly chosen distances, and study effects of matrix sparseness and randomness on trustworthiness and continuity of t-SNE visualizations. The existing works have either concentrated on matrices with only short distances, or implemented heuristics with mixed distances without explaining the effects. We find that trustworthiness generally increases with randomness, but not without limit. Continuity is less affected, but drops if matrices become too random. Sparseness has little effect on continuity, but decreases trustworthiness. Decrease in quality appears sublinear, which suggests that sparse t-SNE could be made subquadratic in complexity without too much effect on quality.
We apply ideas from random graph theory to sparse pairwise distance matrices in dimension reduction. We use matrices with some short and some randomly chosen distances, and study effects of matrix sparseness and randomness on trustworthiness and continuity of t-SNE visualizations. The existing works have either concentrated on matrices with only short distances, or implemented heuristics with mixed distances without explaining the effects. We find that trustworthiness generally increases with randomness, but not without limit. Continuity is less affected, but drops if matrices become too random. Sparseness has little effect on continuity, but decreases trustworthiness. Decrease in quality appears sublinear, which suggests that sparse t-SNE could be made subquadratic in complexity without too much effect on quality.
ES2011-38
Nearest neighbors and correlation dimension for dimensionality estimation. Application to factor analysis of real biological time series data.
Jérôme Lapuyade-Lahorgue, Ali Mohammad-Djafari
Nearest neighbors and correlation dimension for dimensionality estimation. Application to factor analysis of real biological time series data.
Jérôme Lapuyade-Lahorgue, Ali Mohammad-Djafari
Abstract:
Determining the number of components in dimensionality reduction techniques is still one of the open problems of research on data analysis. These methods are often used in knowledge extraction of multivariate great dimensional data, but very often the number of components is assumed to be known. One of the classical methods to estimate this dimensionality is based on the Principal Components Analysis (PCA) eigenvalues. However, this method supposes that the model is linear and the signals are Gaussian. To be able to consider non-linear and non-Gaussian cases, we propose in this paper “measure based methods” as nearest neighbors dimension and correlation dimension. The comparaison between the three methods is evaluated both with simulated data and with real biological data, which are gene expression time series. The main goal of this study is to estimate the minimum number of factors.
Determining the number of components in dimensionality reduction techniques is still one of the open problems of research on data analysis. These methods are often used in knowledge extraction of multivariate great dimensional data, but very often the number of components is assumed to be known. One of the classical methods to estimate this dimensionality is based on the Principal Components Analysis (PCA) eigenvalues. However, this method supposes that the model is linear and the signals are Gaussian. To be able to consider non-linear and non-Gaussian cases, we propose in this paper “measure based methods” as nearest neighbors dimension and correlation dimension. The comparaison between the three methods is evaluated both with simulated data and with real biological data, which are gene expression time series. The main goal of this study is to estimate the minimum number of factors.
ES2011-61
A Similarity Function with Local Feature Weighting for Structured Data
Rubén Suárez, Rocío García-Durán, Fernando Fernández
A Similarity Function with Local Feature Weighting for Structured Data
Rubén Suárez, Rocío García-Durán, Fernando Fernández
Abstract:
The application of learning approaches as Kernel or Instance Based methods to tree structured data requires the definition of similarity functions able to deal with such data. A new similarity function the for nearest prototype classification in relational data that follows a tree structure is defined in this paper. Its main characteristic is its capability to weight the importance of the different data features in different areas of the feature space. This work is built over two previous ideas:a similarity function for Local Feature Weighting (LFW), and a Relational Nearest Prototype Classification algorithm (RNPC).
The application of learning approaches as Kernel or Instance Based methods to tree structured data requires the definition of similarity functions able to deal with such data. A new similarity function the for nearest prototype classification in relational data that follows a tree structure is defined in this paper. Its main characteristic is its capability to weight the importance of the different data features in different areas of the feature space. This work is built over two previous ideas:a similarity function for Local Feature Weighting (LFW), and a Relational Nearest Prototype Classification algorithm (RNPC).
ES2011-83
Exploiting vertices states in GraphESN by weighted nearest neighbor
Claudio Gallicchio, Alessio Micheli
Exploiting vertices states in GraphESN by weighted nearest neighbor
Claudio Gallicchio, Alessio Micheli
Abstract:
Graph Echo State Networks (GraphESN) extend the Reservoir Computing approach to directly process graph structures. The reservoir is applied to every vertex of an input graph, realizing a contractive encoding process and resulting in a structured state isomorphic to the input. Whenever an unstructured output is required, a state mapping function maps the structured state into a fixed-size feature representation that feeds the linear readout. In this paper we propose an alternative approach, based on distance-weighted nearest neighbor, to realize a more flexible readout exploiting the state information computed for every vertex according to its individual relevance.
Graph Echo State Networks (GraphESN) extend the Reservoir Computing approach to directly process graph structures. The reservoir is applied to every vertex of an input graph, realizing a contractive encoding process and resulting in a structured state isomorphic to the input. Whenever an unstructured output is required, a state mapping function maps the structured state into a fixed-size feature representation that feeds the linear readout. In this paper we propose an alternative approach, based on distance-weighted nearest neighbor, to realize a more flexible readout exploiting the state information computed for every vertex according to its individual relevance.
ES2011-86
The role of Fisher information in primary data space for neighbourhood mapping
Héctor Ruiz, Ian Jarman, José D Martín, Paulo Lisboa
The role of Fisher information in primary data space for neighbourhood mapping
Héctor Ruiz, Ian Jarman, José D Martín, Paulo Lisboa
Abstract:
Clustering methods and nearest neighbour classifiers typically compute distances between data points as a measure of similarity, with nearby pairs of points considered more like each other than remote pairs. The distance measure of choice is often Euclidean, implicitly treating all directions in space as equally relevant. This paper reviews the application of Fisher information to derive a metric in primary data space. The aim is to provide a natural coordinate space to represent pairwise distances with respect to a probability distribution p(c|x), defined by an external label c, and use it to compute more informative distances.
Clustering methods and nearest neighbour classifiers typically compute distances between data points as a measure of similarity, with nearby pairs of points considered more like each other than remote pairs. The distance measure of choice is often Euclidean, implicitly treating all directions in space as equally relevant. This paper reviews the application of Fisher information to derive a metric in primary data space. The aim is to provide a natural coordinate space to represent pairwise distances with respect to a probability distribution p(c|x), defined by an external label c, and use it to compute more informative distances.
ES2011-95
Communication Challenges in Cloud K-means
Fabrice Rossi, Matthieu Durut
Communication Challenges in Cloud K-means
Fabrice Rossi, Matthieu Durut
Abstract:
This paper studies how parallel machine learning algorithms can be implemented on top of Microsoft Windows Azure cloud computing platform. More specifically, the proposed algorithm focuses on communication mechanisms through the storage.
This paper studies how parallel machine learning algorithms can be implemented on top of Microsoft Windows Azure cloud computing platform. More specifically, the proposed algorithm focuses on communication mechanisms through the storage.
ES2011-107
A Spectral Based Clustering Algorithm for Categorical Data with Maximum Modularity
Lazhar Labiod, Younès Bennani
A Spectral Based Clustering Algorithm for Categorical Data with Maximum Modularity
Lazhar Labiod, Younès Bennani
Abstract:
In this paper we propose a spectral based clustering algorithm to maximize an extended Modularity measure for categorical data; first, we establish the connection with the Relational Analysis criterion. Second, the maximization of the extended modularity is shown as a trace maximization problem. A spectral based algorithm is then presented to search for the partitions maximizing the extended Modularity criterion. Experimental results indicate that the new algorithm is efficient and effective at finding a good clustering across a variety of real world data sets
In this paper we propose a spectral based clustering algorithm to maximize an extended Modularity measure for categorical data; first, we establish the connection with the Relational Analysis criterion. Second, the maximization of the extended modularity is shown as a trace maximization problem. A spectral based algorithm is then presented to search for the partitions maximizing the extended Modularity criterion. Experimental results indicate that the new algorithm is efficient and effective at finding a good clustering across a variety of real world data sets
ES2011-41
Single-trial P300 detection with Kalman filtering and SVMs
Lucie Daubigney, Olivier Pietquin
Single-trial P300 detection with Kalman filtering and SVMs
Lucie Daubigney, Olivier Pietquin
Abstract:
Brain Computer Interfaces (BCI) are systems enabling humans to communicate with machines through signals generated by the brain. Several kinds of signals can be envisioned as well as means to measure them. In this paper we are particularly interested in even-related brain potentials (ERP) and especially visually-evoked potential signals (P300) measured with surface electroencephalograms (EEG). When the human is stimulated with visual inputs, the P300 signals arise about 300 ms after the visual stimulus has been received. Yet, the EEG signal is often very noisy which makes the P300 detection hard. It is customary to use an average of several trials to enhance the P300 signal and reduce the random noise but this results in a lower bit rate of the interface. In this contribution, we propose a novel approach to P300 detection using Kalman filtering and SVMs. Experiments show that this method is a promising step toward single-trial detection of P300.
Brain Computer Interfaces (BCI) are systems enabling humans to communicate with machines through signals generated by the brain. Several kinds of signals can be envisioned as well as means to measure them. In this paper we are particularly interested in even-related brain potentials (ERP) and especially visually-evoked potential signals (P300) measured with surface electroencephalograms (EEG). When the human is stimulated with visual inputs, the P300 signals arise about 300 ms after the visual stimulus has been received. Yet, the EEG signal is often very noisy which makes the P300 detection hard. It is customary to use an average of several trials to enhance the P300 signal and reduce the random noise but this results in a lower bit rate of the interface. In this contribution, we propose a novel approach to P300 detection using Kalman filtering and SVMs. Experiments show that this method is a promising step toward single-trial detection of P300.
ES2011-35
Classifying mental states with machine learning algorithms using alpha activity decline
Carina Walter, Gabriele Cierniak, Peter Gerjets, Wolfgang Rosenstiel, Martin Bogdan
Classifying mental states with machine learning algorithms using alpha activity decline
Carina Walter, Gabriele Cierniak, Peter Gerjets, Wolfgang Rosenstiel, Martin Bogdan
Abstract:
This publication aims at developing computer based learning environments adapting to learners’ individual cognitive condition. The adaptive mechanism, based on Brain-Computer-Interface (BCI) methodology, relays on electroencephalogram (EEG)-data to diagnose learners’ mental states. A first within-subjects study (10 students) was accomplished aiming at differentiating between states of learning and non-learning by means of EEG-data. Support-Vector-Machines classified characteristics in the EEG-signals for these two different stimuli on average as 74.55% correct. For individual students the percentage of correct classification reached 92.22%. The results indicate that continuous EEG-data combined with BCI methodology is a promising approach to measuring learners’ mental states online.
This publication aims at developing computer based learning environments adapting to learners’ individual cognitive condition. The adaptive mechanism, based on Brain-Computer-Interface (BCI) methodology, relays on electroencephalogram (EEG)-data to diagnose learners’ mental states. A first within-subjects study (10 students) was accomplished aiming at differentiating between states of learning and non-learning by means of EEG-data. Support-Vector-Machines classified characteristics in the EEG-signals for these two different stimuli on average as 74.55% correct. For individual students the percentage of correct classification reached 92.22%. The results indicate that continuous EEG-data combined with BCI methodology is a promising approach to measuring learners’ mental states online.
ES2011-30
Approaches for Automatic Speaker Recognition in a Binaural Humanoid Context
Karim Youssef, Bastien Breteau, Sylvain Argentieri, Jean-Luc Zarader, Zefeng Wang
Approaches for Automatic Speaker Recognition in a Binaural Humanoid Context
Karim Youssef, Bastien Breteau, Sylvain Argentieri, Jean-Luc Zarader, Zefeng Wang
Abstract:
This paper presents two methods of Automatic Speaker Recognition (ASkR). ASkR has been largely studied in the last decades, but in most cases in mono-microphone or microphone array contexts. Our systems are placed in a binaural humanoid context where the signals captured by both ears of a humanoid robot will be exploited to perform the ASkR. Both methods use Mel-Frequency Cepstral Coding (MFCC), but one performs the classification with Predictive Neural Networks (PNN) and the other performs it with Gaussian Mixture Models (GMM). Tests are made on a database simulating the functioning of the human ears. They study the influence of noise, reverberations and speaker spatial position on the recognition rate.
This paper presents two methods of Automatic Speaker Recognition (ASkR). ASkR has been largely studied in the last decades, but in most cases in mono-microphone or microphone array contexts. Our systems are placed in a binaural humanoid context where the signals captured by both ears of a humanoid robot will be exploited to perform the ASkR. Both methods use Mel-Frequency Cepstral Coding (MFCC), but one performs the classification with Predictive Neural Networks (PNN) and the other performs it with Gaussian Mixture Models (GMM). Tests are made on a database simulating the functioning of the human ears. They study the influence of noise, reverberations and speaker spatial position on the recognition rate.
ES2011-37
Fast Data Mining with Sparse Chemical Graph Fingerprints by Estimating the Probability of Unique Patterns
Georg Hinselmann, Lars Rosenbaum, Andeas Jahn, Andreas Zell
Fast Data Mining with Sparse Chemical Graph Fingerprints by Estimating the Probability of Unique Patterns
Georg Hinselmann, Lars Rosenbaum, Andeas Jahn, Andreas Zell
Abstract:
The aim of this work is to introduce a modification of chemical graphs fingerprints for data mining. The algorithm reduces the number of features by taking the probability of producing an unique feature at a specific search depth into account. We observed the probability of generating a non-unique feature depending on a search parameter (which leads to a power-law growths of features) and modeled it by a sigmoid function. This function was integrated into a fingerprinting routine to reduce the features according to their probability. The predictive performance was convincing with a considerable speedup for the training of a linear support vector machine for sparse instances.
The aim of this work is to introduce a modification of chemical graphs fingerprints for data mining. The algorithm reduces the number of features by taking the probability of producing an unique feature at a specific search depth into account. We observed the probability of generating a non-unique feature depending on a search parameter (which leads to a power-law growths of features) and modeled it by a sigmoid function. This function was integrated into a fingerprinting routine to reduce the features according to their probability. The predictive performance was convincing with a considerable speedup for the training of a linear support vector machine for sparse instances.
ES2011-85
Automatic Enhancement of Correspondence Detection in an Object Tracking System
Denis Schulze, Sven Wachsmuth, Katharina J. Rohlfing
Automatic Enhancement of Correspondence Detection in an Object Tracking System
Denis Schulze, Sven Wachsmuth, Katharina J. Rohlfing
Abstract:
This paper proposes a strategy to automatically detect the correspondence between measurements of different sensors using object tracking. In addition the strategy includes the ability to learn new features to facilitate the correspondence computation for future measurements. Therefore first a correlation between objects of different modalities is computed using time synchronous changes of attribute values. Using statistical methods to determine the dependencies between changes of different attributes it is shown how a multi layer perceptron (MLP) can be used to enhance the correspondence detection in ambiguous situations.
This paper proposes a strategy to automatically detect the correspondence between measurements of different sensors using object tracking. In addition the strategy includes the ability to learn new features to facilitate the correspondence computation for future measurements. Therefore first a correlation between objects of different modalities is computed using time synchronous changes of attribute values. Using statistical methods to determine the dependencies between changes of different attributes it is shown how a multi layer perceptron (MLP) can be used to enhance the correspondence detection in ambiguous situations.
Sequence and time processing
ES2011-70
Time Experiencing by Robotic Agents
Michail Maniadakis, Marc Wittmann, Panos Trahanias
Time Experiencing by Robotic Agents
Michail Maniadakis, Marc Wittmann, Panos Trahanias
Abstract:
Biological organisms perceive and act in the world based on spatiotemporal experiences and interpretations. However, artificial agents consider mainly the spatial relationships that exist in the world, typically ignoring its temporal aspects. In an attempt to direct research interest towards the fundamental issue of time experiencing, the current work explores two temporally different versions of a robotic rule switching task. An evolutionary process is employed to design a neural network controller capable of accomplishing both versions of the task. The systematic exploration of neural network dynamics revealed a self-organized time perception capacity in the agent's cognitive system that significantly facilitates the accomplishment of tasks, through the modulation of the supplementary behavioural and cognitive processes.
Biological organisms perceive and act in the world based on spatiotemporal experiences and interpretations. However, artificial agents consider mainly the spatial relationships that exist in the world, typically ignoring its temporal aspects. In an attempt to direct research interest towards the fundamental issue of time experiencing, the current work explores two temporally different versions of a robotic rule switching task. An evolutionary process is employed to design a neural network controller capable of accomplishing both versions of the task. The systematic exploration of neural network dynamics revealed a self-organized time perception capacity in the agent's cognitive system that significantly facilitates the accomplishment of tasks, through the modulation of the supplementary behavioural and cognitive processes.
ES2011-45
Visual place recognition using Bayesian filtering with Markov chains
Mathieu Dubois, Hervé Guillaume, Emmanuelle Frenoux, Philippe Tarroux
Visual place recognition using Bayesian filtering with Markov chains
Mathieu Dubois, Hervé Guillaume, Emmanuelle Frenoux, Philippe Tarroux
Abstract:
We present a novel idea to use Bayesian filtering in the case of place recognition. More precisely, our system combines global image characterization, Learned Vector Quantization, Markov chains and Bayesian filtering. The goal is to integrate several images seen by a robot during exploration of the environment and the dependency between them. We present our system and the new Bayesian filtering algorithm. Our system has been evaluated on a standard database and shows promising results.
We present a novel idea to use Bayesian filtering in the case of place recognition. More precisely, our system combines global image characterization, Learned Vector Quantization, Markov chains and Bayesian filtering. The goal is to integrate several images seen by a robot during exploration of the environment and the dependency between them. We present our system and the new Bayesian filtering algorithm. Our system has been evaluated on a standard database and shows promising results.
ES2011-52
Iterative multi-task sequence labeling for predicting structural properties of proteins
Francis Maes, Julien Becker, Louis Wehenkel
Iterative multi-task sequence labeling for predicting structural properties of proteins
Francis Maes, Julien Becker, Louis Wehenkel
Abstract:
Developing computational tools for predicting protein structural information given their amino acid sequence is of primary importance in protein science. Problems, such as the prediction of secondary structures, of solvent accessibility, or of disordered regions, can be expressed as sequence labeling problems and could be solved independently by existing machine learning based sequence labeling approaches. But, since these problems are closely related, we propose to rather approach them jointly in a multi-task approach. To this end, we introduce a new generic framework for iterative multi-task sequence labeling. We apply this - conceptually simple but quite effective - strategy to jointly solve a set of five protein annotation tasks. Our empirical results with two protein datasets show that the proposed strategy significantly outperforms the single-task approaches.
Developing computational tools for predicting protein structural information given their amino acid sequence is of primary importance in protein science. Problems, such as the prediction of secondary structures, of solvent accessibility, or of disordered regions, can be expressed as sequence labeling problems and could be solved independently by existing machine learning based sequence labeling approaches. But, since these problems are closely related, we propose to rather approach them jointly in a multi-task approach. To this end, we introduce a new generic framework for iterative multi-task sequence labeling. We apply this - conceptually simple but quite effective - strategy to jointly solve a set of five protein annotation tasks. Our empirical results with two protein datasets show that the proposed strategy significantly outperforms the single-task approaches.
ES2011-54
Identification of sparse spatio-temporal features in Evoked Response Potentials
Nisrine Jrad, Marco Congedo
Identification of sparse spatio-temporal features in Evoked Response Potentials
Nisrine Jrad, Marco Congedo
Abstract:
Electroencephalographic Evoked Response Potentials (ERP)s exhibit distinct and individualized spatial and temporal characteristics. Identification of spatio-temporal features improves single-trial classification performance and allows a better understanding of the underlying physiology. This paper presents a method for analyzing the spatio-temporal characteristics associated with Error related Potentials (ErrP)s. First, a resampling procedure based on Global Field Power (GFP) extracts temporal features. Second, a spatially weighted SVM (sw-SVM) is proposed to learn a spatial filter optimizing the classification performance for each temporal feature. Third, the so obtained ensemble of sw-SVM classifiers are combined using a weighted combination of all sw-SVM outputs. Results indicate that inclusion of temporal features provides useful insight regarding the spatio-temporal characteristics of error potentials.
Electroencephalographic Evoked Response Potentials (ERP)s exhibit distinct and individualized spatial and temporal characteristics. Identification of spatio-temporal features improves single-trial classification performance and allows a better understanding of the underlying physiology. This paper presents a method for analyzing the spatio-temporal characteristics associated with Error related Potentials (ErrP)s. First, a resampling procedure based on Global Field Power (GFP) extracts temporal features. Second, a spatially weighted SVM (sw-SVM) is proposed to learn a spatial filter optimizing the classification performance for each temporal feature. Third, the so obtained ensemble of sw-SVM classifiers are combined using a weighted combination of all sw-SVM outputs. Results indicate that inclusion of temporal features provides useful insight regarding the spatio-temporal characteristics of error potentials.
ES2011-102
Hybrid HMM and HCRF model for sequence classification
Yann Soullard, thierry Artieres
Hybrid HMM and HCRF model for sequence classification
Yann Soullard, thierry Artieres
Abstract:
We propose a hybrid model combining a generative model and a discriminative model for signal labelling and classification tasks, aiming at taking the best from each world. The idea is to focus the learning of the discriminative model on most likely state sequences as output by the generative model. This allows taking advantage of the usual increased accuracy of generative models on small training datasets and of discriminative models on large training datasets. We instantiate this framework with Hidden Markov Models and Hidden Conditional Random Fields. We validate our model on financial time series and on handwriting data.
We propose a hybrid model combining a generative model and a discriminative model for signal labelling and classification tasks, aiming at taking the best from each world. The idea is to focus the learning of the discriminative model on most likely state sequences as output by the generative model. This allows taking advantage of the usual increased accuracy of generative models on small training datasets and of discriminative models on large training datasets. We instantiate this framework with Hidden Markov Models and Hidden Conditional Random Fields. We validate our model on financial time series and on handwriting data.
ES2011-105
A Neural Filter for Electrolocation in Weakly Electric Fish
DaeEun Kim
A Neural Filter for Electrolocation in Weakly Electric Fish
DaeEun Kim
Abstract:
Weakly electric fish have the electric organ to generate the electric field and electrosensory mechanism to read the change of electric field with their electroreceptors. The electric organ produces a waveform of electric field as their electric organ discharge (EOD). Their active electrolocation system can detect distortion of the self-generated electric field, which is caused by a target object, and estimate the position of the target object. In this paper, we suggest a hypothesis that the periodic EOD signals can be involved to extract localization features from the noisy electrosensory signals and then provide a possible neural network to process noise-filtering to obtain the accurate information of a target position. The neural network has sinusoidal weights to process a time series of sensor readings for each electroreceptor and to obtain the original clean signal purely depending on object perturbation.
Weakly electric fish have the electric organ to generate the electric field and electrosensory mechanism to read the change of electric field with their electroreceptors. The electric organ produces a waveform of electric field as their electric organ discharge (EOD). Their active electrolocation system can detect distortion of the self-generated electric field, which is caused by a target object, and estimate the position of the target object. In this paper, we suggest a hypothesis that the periodic EOD signals can be involved to extract localization features from the noisy electrosensory signals and then provide a possible neural network to process noise-filtering to obtain the accurate information of a target position. The neural network has sinusoidal weights to process a time series of sensor readings for each electroreceptor and to obtain the original clean signal purely depending on object perturbation.
Optimization and learning
ES2011-75
Non-linearly increasing resampling in racing algorithms
Verena Heidrich-Meisner, Christian Igel
Non-linearly increasing resampling in racing algorithms
Verena Heidrich-Meisner, Christian Igel
Abstract:
Racing algorithms are iterative methods trying to find the best among several options with high probability. The quality of each option is a random variable and is estimated by its empirical mean and concentration bounds obtained from repeated sampling. In each iteration of a standard racing algorithm each promising option is reevaluated once before being statistically compared with its competitors. We argue that Hoeffding and empirical Bernstein races benefit from decoupling the racing iteration from the number of samples per option and illustrate this on an artificial benchmark problem.
Racing algorithms are iterative methods trying to find the best among several options with high probability. The quality of each option is a random variable and is estimated by its empirical mean and concentration bounds obtained from repeated sampling. In each iteration of a standard racing algorithm each promising option is reevaluated once before being statistically compared with its competitors. We argue that Hoeffding and empirical Bernstein races benefit from decoupling the racing iteration from the number of samples per option and illustrate this on an artificial benchmark problem.
ES2011-13
A distributed learning algorithm based on two-layer artificial neural networks and genetic algorithms
Diego Peteiro-Barral, Bertha Guijarro-Berdiñas, Beatriz Pérez-Sánchez, Oscar Fontenla-Romero
A distributed learning algorithm based on two-layer artificial neural networks and genetic algorithms
Diego Peteiro-Barral, Bertha Guijarro-Berdiñas, Beatriz Pérez-Sánchez, Oscar Fontenla-Romero
Abstract:
In many real-world applications of machine learning, the amount of data is now beyond the capability of learning algorithms because they cannot process all available data in a reasonable time. Moreover, most large datasets are naturally distributed or they are being stored in a distributed manner. A promising line of research in order to deal with large and/or distributed data is distributed learning. We propose a new distributed learning algorithm based on two-layer artificial neural networks and genetic algorithms. The results obtained show that our method performs better than other distributed learning algorithms.
In many real-world applications of machine learning, the amount of data is now beyond the capability of learning algorithms because they cannot process all available data in a reasonable time. Moreover, most large datasets are naturally distributed or they are being stored in a distributed manner. A promising line of research in order to deal with large and/or distributed data is distributed learning. We propose a new distributed learning algorithm based on two-layer artificial neural networks and genetic algorithms. The results obtained show that our method performs better than other distributed learning algorithms.
Deep Learning
ES2011-4
An Introduction to Deep Learning
Ludovic Arnold, Sébastien Rebecchi, Sylvain Chevallier, Hélène Paugam-Moisy
An Introduction to Deep Learning
Ludovic Arnold, Sébastien Rebecchi, Sylvain Chevallier, Hélène Paugam-Moisy
Abstract:
The deep learning paradigm tackles problems on which shallow architectures (e.g. SVM) are affected by the curse of dimensionality. As part of a two-stage learning scheme involving multiple layers of nonlinear processing a set of statistically robust features is automatically extracted from the data. The present tutorial introducing the ESANN deep learning special session details the state-of-the-art models and summarizes the current understanding of this learning approach that is a reference for many difficult classification tasks.
The deep learning paradigm tackles problems on which shallow architectures (e.g. SVM) are affected by the curse of dimensionality. As part of a two-stage learning scheme involving multiple layers of nonlinear processing a set of statistically robust features is automatically extracted from the data. The present tutorial introducing the ESANN deep learning special session details the state-of-the-art models and summarizes the current understanding of this learning approach that is a reference for many difficult classification tasks.
ES2011-10
Using very deep autoencoders for content-based image retrieval
Alex Krizhevsky, Geoffrey Hinton
Using very deep autoencoders for content-based image retrieval
Alex Krizhevsky, Geoffrey Hinton
Abstract:
We show how to learn many layers of features on color images and we use these features to initialize deep autoencoders. We then use the autoencoders to map images to short binary codes. Using semantic hashing, 28-bit codes can be used to retrieve images that are similar to a query image in a time that is independent of the size of the database. This extremely fast retrieval makes it possible to search using multiple different transformations of the query image. 256-bit binary codes allow much more accurate matching and can be used to prune the set of images found using the 28-bit codes.
We show how to learn many layers of features on color images and we use these features to initialize deep autoencoders. We then use the autoencoders to map images to short binary codes. Using semantic hashing, 28-bit codes can be used to retrieve images that are similar to a query image in a time that is independent of the size of the database. This extremely fast retrieval makes it possible to search using multiple different transformations of the query image. 256-bit binary codes allow much more accurate matching and can be used to prune the set of images found using the 28-bit codes.
ES2011-79
Training RBMs based on the signs of the CD approximation of the log-likelihood derivatives
Asja Fischer, Christian Igel
Training RBMs based on the signs of the CD approximation of the log-likelihood derivatives
Asja Fischer, Christian Igel
Abstract:
Contrastive Divergence (CD) learning is frequently applied to Restricted Boltzmann Machines (RBMs), the building blocks of deep believe networks. It relies on biased approximations of the log-likelihood gradient. This bias can deteriorate the learning process. It was claimed that the signs of most components of the CD update are equal to the corresponding signs of the log-likelihood gradient. This suggests using optimization techniques only depending on the signs. Resilient backpropagation is such a method and we combine it with CD learning. However, it does not prevent divergence caused by the approximation bias.
Contrastive Divergence (CD) learning is frequently applied to Restricted Boltzmann Machines (RBMs), the building blocks of deep believe networks. It relies on biased approximations of the log-likelihood gradient. This bias can deteriorate the learning process. It was claimed that the signs of most components of the CD update are equal to the corresponding signs of the log-likelihood gradient. This suggests using optimization techniques only depending on the signs. Resilient backpropagation is such a method and we combine it with CD learning. However, it does not prevent divergence caused by the approximation bias.
ES2011-21
A supervised strategy for deep kernel machine
Florian Yger, Maxime Berar, Gilles Gasso, Alain Rakotomamonjy
A supervised strategy for deep kernel machine
Florian Yger, Maxime Berar, Gilles Gasso, Alain Rakotomamonjy
Abstract:
This paper presents an alternative approach for learning a Multilayer Kernel Machine (MKM) based on kernel partial least squares. After explaining the caracteristics and motivation of our approach, we show compelling evidences on real world applications.
This paper presents an alternative approach for learning a Multilayer Kernel Machine (MKM) based on kernel partial least squares. After explaining the caracteristics and motivation of our approach, we show compelling evidences on real world applications.