Bruges, Belgium, April 23-24-25
Content of the proceedings
-
Advances in Spiking Neural Information Processing Systems (SNIPS)
Vector quantization- and nearest neighbour-based methods
Byte the bullet: learning on real-world computing architectures
Reinforcement learning and optimization
Nonlinear dimensionality reduction
Signal and temporal processing
Learning of structured and non-standard data
Kernel methods
Learning and Modeling Big Data
Classification
Dynamical systems and online learning
Advances on Weightless Neural Systems
Clustering
Regression, Forceasting and Extreme Learning Machines
Label noise in classification
Advances in Spiking Neural Information Processing Systems (SNIPS)
ES2014-13
Spiking Neural Networks: Principles and Challenges
André Grüning, Sander Bohte
Spiking Neural Networks: Principles and Challenges
André Grüning, Sander Bohte
ES2014-50
A new biologically plausible supervised learning method for spiking neurons
Aboozar Taherkhani, Ammar Belatreche, Yuhua Li, Liam Maguire
A new biologically plausible supervised learning method for spiking neurons
Aboozar Taherkhani, Ammar Belatreche, Yuhua Li, Liam Maguire
Abstract:
STDP is believed to play an important role in learning and memory. Additionally, experimental evidence shows that a few strong neural inputs can drive a neuron response and subsequently affect learning of other inputs. Furthermore, recent studies have shown that local dendritic depolarization can impact STDP induction. This paper integrates these three biological concepts to devise a new biologically plausible supervised learning method for spiking neurons. Experimental results show that the proposed method can effectively map a random spatiotemporal input pattern to a random target output spike train with a much faster learning speed than ReSuMe.
STDP is believed to play an important role in learning and memory. Additionally, experimental evidence shows that a few strong neural inputs can drive a neuron response and subsequently affect learning of other inputs. Furthermore, recent studies have shown that local dendritic depolarization can impact STDP induction. This paper integrates these three biological concepts to devise a new biologically plausible supervised learning method for spiking neurons. Experimental results show that the proposed method can effectively map a random spatiotemporal input pattern to a random target output spike train with a much faster learning speed than ReSuMe.
ES2014-112
Spiking AGREL
Davide Zambrano, Jaldert O. Rombouts, Cecilia Laschi, Sander Bohte
Spiking AGREL
Davide Zambrano, Jaldert O. Rombouts, Cecilia Laschi, Sander Bohte
Abstract:
Spiking neural networks are characterised by the spiking neuron models they use and how these spiking neurons process information communicated through spikes - the neural code. We demonstrate a plausible spiking neural network based on Spike Response Models and predictive spike-coding. When combined with a plausible reinforcement learning strategy - Attention Gated REinforcement Learning (AGREL), we show that for the first time, such plausible spiking neural networks can compute non-linear mapping, including XOR.
Spiking neural networks are characterised by the spiking neuron models they use and how these spiking neurons process information communicated through spikes - the neural code. We demonstrate a plausible spiking neural network based on Spike Response Models and predictive spike-coding. When combined with a plausible reinforcement learning strategy - Attention Gated REinforcement Learning (AGREL), we show that for the first time, such plausible spiking neural networks can compute non-linear mapping, including XOR.
ES2014-172
Classifying Patterns in a Spiking Neural Network
Brian Gardner, André Grüning
Classifying Patterns in a Spiking Neural Network
Brian Gardner, André Grüning
Abstract:
Learning rules for spiking neural networks have emerged that can classify spatio-temporal spiking patterns as precise target spike trains, although there remains uncertainty in which rule to select that offers the greatest performance. Here, we quantify the performance of a stochastic neuron model in learning to classify input patterns by precise target responses as outputs, and compare its performance against other learning rules. We achieve a level of performance that is comparable with that found previously for alternative neuron models, and demonstrate the advantages of classifying inputs by multiple-spike timings: both by increasing the performance and the reliability of classifications.
Learning rules for spiking neural networks have emerged that can classify spatio-temporal spiking patterns as precise target spike trains, although there remains uncertainty in which rule to select that offers the greatest performance. Here, we quantify the performance of a stochastic neuron model in learning to classify input patterns by precise target responses as outputs, and compare its performance against other learning rules. We achieve a level of performance that is comparable with that found previously for alternative neuron models, and demonstrate the advantages of classifying inputs by multiple-spike timings: both by increasing the performance and the reliability of classifications.
ES2014-186
Toward STDP-based population action in large networks of spiking neurons
Emmanuel Daucé
Toward STDP-based population action in large networks of spiking neurons
Emmanuel Daucé
Abstract:
We present simulation results that clarify the role of Spike-Timing Dependent Plasticity (STDP) in brain processing as a putative mechanism to transfer spatio-temporal regularities, as observed in sensory signals, toward action, expressed as a global increase of the target population activity, followed by a reset. The repetition of this activation-reset mechanism gives rise to a series of synchronous waves of activity when the same stimulus is repeated over and over. Our simulation results are obtained in recurrent networks of conductance-based neurons under realistic coupling contraints.
We present simulation results that clarify the role of Spike-Timing Dependent Plasticity (STDP) in brain processing as a putative mechanism to transfer spatio-temporal regularities, as observed in sensory signals, toward action, expressed as a global increase of the target population activity, followed by a reset. The repetition of this activation-reset mechanism gives rise to a series of synchronous waves of activity when the same stimulus is repeated over and over. Our simulation results are obtained in recurrent networks of conductance-based neurons under realistic coupling contraints.
Vector quantization- and nearest neighbour-based methods
ES2014-91
Supervised Generative Models for Learning Dissimilarity Data
David Nebel, Barbara Hammer, Thomas Villmann
Supervised Generative Models for Learning Dissimilarity Data
David Nebel, Barbara Hammer, Thomas Villmann
Abstract:
Exemplar based techniques such as affinity propagation \cite{ap} represent data in terms of typical exemplars. This has two benefits: (i) the resulting models are directly interpretable by humans since representative exemplars can be inspected in the same way as data points, (ii) the model can be applied to any dissimilarity measure including non-Euclidean or non-metric settings. Most exemplar based techniques have been proposed in the unsupervised setting only, such that their performance in supervised learning tasks can be weak depending on the given data. Here, we address the problem of learning exemplar-based models for general dissimilarity data in a discriminative framework. For this purpose, we extend a generative model proposed in \cite{srlvq} to an exemplar based scenario using a generalized EM framework for its optimization. The resulting classifiers represent data in terms of sparse models while keeping high performance in state-of-the art benchmarks.
Exemplar based techniques such as affinity propagation \cite{ap} represent data in terms of typical exemplars. This has two benefits: (i) the resulting models are directly interpretable by humans since representative exemplars can be inspected in the same way as data points, (ii) the model can be applied to any dissimilarity measure including non-Euclidean or non-metric settings. Most exemplar based techniques have been proposed in the unsupervised setting only, such that their performance in supervised learning tasks can be weak depending on the given data. Here, we address the problem of learning exemplar-based models for general dissimilarity data in a discriminative framework. For this purpose, we extend a generative model proposed in \cite{srlvq} to an exemplar based scenario using a generalized EM framework for its optimization. The resulting classifiers represent data in terms of sparse models while keeping high performance in state-of-the art benchmarks.
ES2014-131
Rejection strategies for learning vector quantization
Lydia Fischer, Barbara Hammer, Heiko Wersing
Rejection strategies for learning vector quantization
Lydia Fischer, Barbara Hammer, Heiko Wersing
Abstract:
We present prototype-based classification schemes, e. g. learning vector quantization, with cost-function-based and geometrically motivated reject options. We evaluate the reject schemes in experiments on artificial and benchmark data sets. We demonstrate that reject options improve the accuracy of the models in most cases, and that the performance of the proposed schemes is comparable to the optimal reject option of the Bayes classifier in cases where the latter is available.
We present prototype-based classification schemes, e. g. learning vector quantization, with cost-function-based and geometrically motivated reject options. We evaluate the reject schemes in experiments on artificial and benchmark data sets. We demonstrate that reject options improve the accuracy of the models in most cases, and that the performance of the proposed schemes is comparable to the optimal reject option of the Bayes classifier in cases where the latter is available.
ES2014-93
Optimization of General Statistical Accuracy Measures for Classification Based on Learning Vector Quantization
Marika Kaden, Wieland Hermann, Thomas Villmann
Optimization of General Statistical Accuracy Measures for Classification Based on Learning Vector Quantization
Marika Kaden, Wieland Hermann, Thomas Villmann
Abstract:
We propose a framework for classification learning based on generalized learning vector quantization using statistical quality measures as cost function. Statistical measures like the $F$-measure or the Matthews correlation coefficient reflect better the performance for two-class classification problems than the simple accuracy, in particular if the data classes are imbalanced. For this purpose, we introduce soft approximations of those quantities contained in the confusion matrix, which are the basis for the calculation of the quality measures.
We propose a framework for classification learning based on generalized learning vector quantization using statistical quality measures as cost function. Statistical measures like the $F$-measure or the Matthews correlation coefficient reflect better the performance for two-class classification problems than the simple accuracy, in particular if the data classes are imbalanced. For this purpose, we introduce soft approximations of those quantities contained in the confusion matrix, which are the basis for the calculation of the quality measures.
ES2014-96
Augmented hashing for semi-supervised scenarios
Zalán-Péter Bodó, Lehel Csato
Augmented hashing for semi-supervised scenarios
Zalán-Péter Bodó, Lehel Csato
Abstract:
Hashing methods for fast approximate nearest-neighbor search are getting more and more attention with the excessive growth of the available data today. Embedding the points into the Hamming space is an important question of the hashing process. Analogously to machine learning there exist unsupervised, supervised and semi-supervised hashing methods. In this paper we propose a generic procedure to extend unsupervised codeword generators using error correcting codes and semi-supervised classifiers. To show the effectiveness of the method we combine linear spectral hashing and two semi-supervised algorithms in the experiments.
Hashing methods for fast approximate nearest-neighbor search are getting more and more attention with the excessive growth of the available data today. Embedding the points into the Hamming space is an important question of the hashing process. Analogously to machine learning there exist unsupervised, supervised and semi-supervised hashing methods. In this paper we propose a generic procedure to extend unsupervised codeword generators using error correcting codes and semi-supervised classifiers. To show the effectiveness of the method we combine linear spectral hashing and two semi-supervised algorithms in the experiments.
ES2014-57
Improving accuracy by reducing the importance of hubs in nearest-neighbor recommendations
François Fouss, Clémentine Van Parijs
Improving accuracy by reducing the importance of hubs in nearest-neighbor recommendations
François Fouss, Clémentine Van Parijs
Abstract:
A traditional approach for recommending items to persons consists of including a step of forming neighborhoods of users/items. This work focuses on such nearest-neighbor approaches and, more specically, on a particular type of neighbors, the ones frequently appearing in the neighborhoods of users/items (i.e., very similar to many other users/items in the data set), referred to as hubs in the literature. The aim of this paper is to explore through experiments how the presence of hubs aects the accuracy of nearest-neighbor recommendations.
A traditional approach for recommending items to persons consists of including a step of forming neighborhoods of users/items. This work focuses on such nearest-neighbor approaches and, more specically, on a particular type of neighbors, the ones frequently appearing in the neighborhoods of users/items (i.e., very similar to many other users/items in the data set), referred to as hubs in the literature. The aim of this paper is to explore through experiments how the presence of hubs aects the accuracy of nearest-neighbor recommendations.
ES2014-174
A new approach for multiple instance learning based on a homogeneity bag operator
Alexandre Wagner Chagas Faria, David Menotti, André P. Lemos, Antônio P. Braga
A new approach for multiple instance learning based on a homogeneity bag operator
Alexandre Wagner Chagas Faria, David Menotti, André P. Lemos, Antônio P. Braga
Abstract:
Multiple Instance Learning (MIL) proposes a new paradigm when instance labeling, in the learning step, is not possible or infeasible, by assigning a single label (positive or negative) to a set of instances called bag. In this paper, an operator based on homogeneity of positive bags for MIL is introduced. Our method consists in removing instances from the positives bags according to their similarity with the ones from the negative bags. The experimental results show that our operator always increases the accuracy of the Citation kNN algorithm achieving the best results in 2 out of 4 datasets when compared with other classic methods in the literature.
Multiple Instance Learning (MIL) proposes a new paradigm when instance labeling, in the learning step, is not possible or infeasible, by assigning a single label (positive or negative) to a set of instances called bag. In this paper, an operator based on homogeneity of positive bags for MIL is introduced. Our method consists in removing instances from the positives bags according to their similarity with the ones from the negative bags. The experimental results show that our operator always increases the accuracy of the Citation kNN algorithm achieving the best results in 2 out of 4 datasets when compared with other classic methods in the literature.
Byte the bullet: learning on real-world computing architectures
ES2014-12
Byte The Bullet: Learning on Real-World Computing Architectures
Alessandro Ghio, Luca Oneto
Byte The Bullet: Learning on Real-World Computing Architectures
Alessandro Ghio, Luca Oneto
Abstract:
Fast, effective, and reliable models: these are the desiderata of every theorist and practitioner. Machine Learning (ML) algorithms, proposed in the last decades, proved to be effective and reliable in solving complex real-world problems, but they are usually designed without taking into account the underlying computing architecture. On the contrary, the effort of contemplating the exploited computing device is often motivated by application-specific and real-world requirements, such as the need to accelerate the learning process with dedicated/distributed hardware, or to foster energy-sparing requirements of applications based on mobile standalone devices. The ESANN 2014 "Byte The Bullet: Learning on Real-World Computing Architectures" special session has pooled a compilation of the most recent proposals in this area, by encouraging submissions related to the development and the application of fast, effective, reliable techniques, which consider possibilities, potentialities and constraints of real-world computing architectures as basic cornerstones and motivations.
Fast, effective, and reliable models: these are the desiderata of every theorist and practitioner. Machine Learning (ML) algorithms, proposed in the last decades, proved to be effective and reliable in solving complex real-world problems, but they are usually designed without taking into account the underlying computing architecture. On the contrary, the effort of contemplating the exploited computing device is often motivated by application-specific and real-world requirements, such as the need to accelerate the learning process with dedicated/distributed hardware, or to foster energy-sparing requirements of applications based on mobile standalone devices. The ESANN 2014 "Byte The Bullet: Learning on Real-World Computing Architectures" special session has pooled a compilation of the most recent proposals in this area, by encouraging submissions related to the development and the application of fast, effective, reliable techniques, which consider possibilities, potentialities and constraints of real-world computing architectures as basic cornerstones and motivations.
ES2014-27
Learning with few bits on small-scale devices: From regularization to energy efficiency
Davide Anguita, Alessandro Ghio, Luca Oneto, Sandro Ridella
Learning with few bits on small-scale devices: From regularization to energy efficiency
Davide Anguita, Alessandro Ghio, Luca Oneto, Sandro Ridella
Abstract:
The implementation of Machine Learning (ML) algorithms on stand-alone small-scale devices allows incorporating in these systems new services and advanced functionalities without the need of resorting to remote computing systems. Despite having undeniable advantages with respect to conventional general-purpose devices, e.g. in terms of cost/performance ratios, small-scale systems suffer of issues related to their resource-limited nature, like limited battery capacity and processing power. In order to deal with such limitations, we propose to merge local Rademacher Complexities and bit-based hypothesis spaces to build thrifty models, which can be effectively implemented on small-scale resource-limited devices. Experiments, carried out on a smartphone in a Human Activity Recognition application, show the benefits of the proposed approach in terms of model accuracy and battery duration.
The implementation of Machine Learning (ML) algorithms on stand-alone small-scale devices allows incorporating in these systems new services and advanced functionalities without the need of resorting to remote computing systems. Despite having undeniable advantages with respect to conventional general-purpose devices, e.g. in terms of cost/performance ratios, small-scale systems suffer of issues related to their resource-limited nature, like limited battery capacity and processing power. In order to deal with such limitations, we propose to merge local Rademacher Complexities and bit-based hypothesis spaces to build thrifty models, which can be effectively implemented on small-scale resource-limited devices. Experiments, carried out on a smartphone in a Human Activity Recognition application, show the benefits of the proposed approach in terms of model accuracy and battery duration.
ES2014-171
Speedy greedy feature selection: Better redshift estimation via massive parallelism
Fabian Gieseke, Kai Polsterer, Cosmin Eugen Oancea, Christian Igel
Speedy greedy feature selection: Better redshift estimation via massive parallelism
Fabian Gieseke, Kai Polsterer, Cosmin Eugen Oancea, Christian Igel
Abstract:
Nearest neighbor models are among the most basic tools in machine learning, and recent work has demonstrated their effectiveness in the field of astronomy. The performance of these models crucially depends on the underlying metric, and in particular on the selection of a meaningful subset of informative features. The feature selection is task-dependent and usually very time-consuming. In this work, we propose an efficient parallel implementation of incremental feature selection for nearest neighbor models utilizing nowadays graphics processing units. Our framework provides significant computational speed-ups over its sequential single-core competitor of up to two orders of magnitude. We demonstrate the applicability of the overall scheme on one of the most challenging tasks in astronomy: the detection of distant galaxies.
Nearest neighbor models are among the most basic tools in machine learning, and recent work has demonstrated their effectiveness in the field of astronomy. The performance of these models crucially depends on the underlying metric, and in particular on the selection of a meaningful subset of informative features. The feature selection is task-dependent and usually very time-consuming. In this work, we propose an efficient parallel implementation of incremental feature selection for nearest neighbor models utilizing nowadays graphics processing units. Our framework provides significant computational speed-ups over its sequential single-core competitor of up to two orders of magnitude. We demonstrate the applicability of the overall scheme on one of the most challenging tasks in astronomy: the detection of distant galaxies.
ES2014-179
Context- and cost-aware feature selection in ultra-low-power sensor interfaces
Steven Lauwereins, Komail Badami, Wannes Meert, Marian Verhelst
Context- and cost-aware feature selection in ultra-low-power sensor interfaces
Steven Lauwereins, Komail Badami, Wannes Meert, Marian Verhelst
Abstract:
This paper introduces the use of machine learning to drastically improve efficiency of ultra-low-power sensor interfaces. Adaptive feature extraction circuits are assisted by hardware embedded learning to dynamically activate only most relevant features. This selection is done in a context and power cost-aware way, through modification of the C4.5 algorithm. Furthermore, context dependence of different feature sets is explained. As proof-of-principle, a Voice Activity Detector is expanded with the proposed context- and cost-dependent voice/noise classifier, resulting in an average circuit power savings of 75%, with negligible accuracy loss.
This paper introduces the use of machine learning to drastically improve efficiency of ultra-low-power sensor interfaces. Adaptive feature extraction circuits are assisted by hardware embedded learning to dynamically activate only most relevant features. This selection is done in a context and power cost-aware way, through modification of the C4.5 algorithm. Furthermore, context dependence of different feature sets is explained. As proof-of-principle, a Voice Activity Detector is expanded with the proposed context- and cost-dependent voice/noise classifier, resulting in an average circuit power savings of 75%, with negligible accuracy loss.
ES2014-140
Lightning fast asynchronous distributed k-means clustering
Árpád Berta, István Hegedűs, Róbert Ormándi
Lightning fast asynchronous distributed k-means clustering
Árpád Berta, István Hegedűs, Róbert Ormándi
Abstract:
One of the most fundamental data processing approach is the clustering. This is even true when the computational environment is radically different from the usual centralized architectures. Here we focus on the problem of designing efficient and fast K-Means approaches which work in fully distributed, asynchronous networks without any central control. We assume that the network has a huge number of computational units (even orders of magnitude more than the number of computational units in a general cloud). Additionally, we allow that network failures occur and the computational units leave and join the network through the computation. Our approaches apply online learning clustering models which take different random walks in the network, while they update themselves using the data points stored by the computational units, and various ensemble techniques combine them to get a faster convergence. We define different instantiations of the general framework that applies various ensemble techniques. We evaluate them empirically against several state-of-the-art distributed baseline algorithms in different computational scenarios. The results of our experiments indicate that our methods robust against network failure, while they produce accurate clustering and converge extremely fast.
One of the most fundamental data processing approach is the clustering. This is even true when the computational environment is radically different from the usual centralized architectures. Here we focus on the problem of designing efficient and fast K-Means approaches which work in fully distributed, asynchronous networks without any central control. We assume that the network has a huge number of computational units (even orders of magnitude more than the number of computational units in a general cloud). Additionally, we allow that network failures occur and the computational units leave and join the network through the computation. Our approaches apply online learning clustering models which take different random walks in the network, while they update themselves using the data points stored by the computational units, and various ensemble techniques combine them to get a faster convergence. We define different instantiations of the general framework that applies various ensemble techniques. We evaluate them empirically against several state-of-the-art distributed baseline algorithms in different computational scenarios. The results of our experiments indicate that our methods robust against network failure, while they produce accurate clustering and converge extremely fast.
Reinforcement learning and optimization
ES2014-128
Selective Neural Network Ensembles in Reinforcement Learning
Stefan Faußer, Friedhelm Schwenker
Selective Neural Network Ensembles in Reinforcement Learning
Stefan Faußer, Friedhelm Schwenker
Abstract:
Ensemble models can achieve more accurate predictions than single learners. Selective ensembles further improve the predictions by selecting an informative subset of the full ensemble. We consider reinforcement learning ensembles, where the members are neural networks. In this context we study a new algorithm for ensemble subset selection in reinforcement learning scenarios. The aim of the proposed learning strategy is to minimize the Bellman errors of the collected states. In the empirical evaluation, two benchmark applications with large state spaces have been considered, namely SZ-Tetris and generalized maze. Here, our selective ensemble algorithm significantly outperforms other approaches.
Ensemble models can achieve more accurate predictions than single learners. Selective ensembles further improve the predictions by selecting an informative subset of the full ensemble. We consider reinforcement learning ensembles, where the members are neural networks. In this context we study a new algorithm for ensemble subset selection in reinforcement learning scenarios. The aim of the proposed learning strategy is to minimize the Bellman errors of the collected states. In the empirical evaluation, two benchmark applications with large state spaces have been considered, namely SZ-Tetris and generalized maze. Here, our selective ensemble algorithm significantly outperforms other approaches.
ES2014-130
Learning resets of neural working memory
Jaldert O. Rombouts, Pieter Roelfsema, Sander Bohte
Learning resets of neural working memory
Jaldert O. Rombouts, Pieter Roelfsema, Sander Bohte
Abstract:
Working memory is a key component of intelligence that the brain implements as persistent neural activations. How do persistent neurons learn to store information, and how can they be made to forget this information once it is no longer relevant? When animals learn episodic tasks, neurons in prefrontal cortex learn to represent task ends. We show that a biologically plausible neural network model equipped with persistent memory and a `reset' action can learn to store and forget information at task ends by reinforcement learning. The new model has competitive performance compared to a variety of (biologically implausible) models.
Working memory is a key component of intelligence that the brain implements as persistent neural activations. How do persistent neurons learn to store information, and how can they be made to forget this information once it is no longer relevant? When animals learn episodic tasks, neurons in prefrontal cortex learn to represent task ends. We show that a biologically plausible neural network model equipped with persistent memory and a `reset' action can learn to store and forget information at task ends by reinforcement learning. The new model has competitive performance compared to a variety of (biologically implausible) models.
ES2014-136
Naive Augmenting Q-Learning to Process Feature-Based Representations of States
Janis Zuters
Naive Augmenting Q-Learning to Process Feature-Based Representations of States
Janis Zuters
Abstract:
Temporal difference algorithms perform well on discrete and small problems. This paper proposes a modification of the Q-learning algorithm towards natural ability to receive a feature list instead of an already identified state in the input. Complete observability is still assumed. The algorithm, Naive Augmenting Q-Learning, has been designed through building a hierarchical structure of input features (a kind of feature-state mapping) to avoid a direct state identification, thus potentially optimizing the required resources for storing and processing action values.
Temporal difference algorithms perform well on discrete and small problems. This paper proposes a modification of the Q-learning algorithm towards natural ability to receive a feature list instead of an already identified state in the input. Complete observability is still assumed. The algorithm, Naive Augmenting Q-Learning, has been designed through building a hierarchical structure of input features (a kind of feature-state mapping) to avoid a direct state identification, thus potentially optimizing the required resources for storing and processing action values.
ES2014-132
Direct Model-Predictive Control
Jean-Joseph Christophe, Jérémie Decock, Olivier Teytaud
Direct Model-Predictive Control
Jean-Joseph Christophe, Jérémie Decock, Olivier Teytaud
Abstract:
Due to simplicity, efficiency and convenience, Model Predictive Control, which consists in optimizing future decisions based on a pessimistic deterministic forecast of the random processes, is one of the main tools for stochastic control. Yet, it suffers from a large computation time, unless the tactical horizon (i.e. the number of future time steps included in the optimization) is strongly reduced, and lack of real stochas ticity handling. We here propose a combination between Model Predictive Control and Direct Policy Search (using Neural Networks), combining the best of both worlds.
Due to simplicity, efficiency and convenience, Model Predictive Control, which consists in optimizing future decisions based on a pessimistic deterministic forecast of the random processes, is one of the main tools for stochastic control. Yet, it suffers from a large computation time, unless the tactical horizon (i.e. the number of future time steps included in the optimization) is strongly reduced, and lack of real stochas ticity handling. We here propose a combination between Model Predictive Control and Direct Policy Search (using Neural Networks), combining the best of both worlds.
ES2014-127
Ensembles of extreme learning machine networks for value prediction
Pablo Escandell-Montero, José María Martínez Martínez, Emilio Soria-Olivas, Joan Vila-Francés, José David Martín-Guerrero
Ensembles of extreme learning machine networks for value prediction
Pablo Escandell-Montero, José María Martínez Martínez, Emilio Soria-Olivas, Joan Vila-Francés, José David Martín-Guerrero
Abstract:
Value prediction is an important subproblem of several reinforcement learning (RL) algorithms. In previous work, it has been shown that the combination of least-squares temporal-difference learning with ELM (extreme learning machine) networks is a powerful method for value prediction in continuous-state problems. This work work proposes the use of ensembles to improve the approximation capabilities of ELM networks in the context of RL.
Value prediction is an important subproblem of several reinforcement learning (RL) algorithms. In previous work, it has been shown that the combination of least-squares temporal-difference learning with ELM (extreme learning machine) networks is a powerful method for value prediction in continuous-state problems. This work work proposes the use of ensembles to improve the approximation capabilities of ELM networks in the context of RL.
ES2014-108
An application of the temporal difference algorithm to the truck backer-upper problem
Christopher Gatti, Mark Embrechts
An application of the temporal difference algorithm to the truck backer-upper problem
Christopher Gatti, Mark Embrechts
Abstract:
We use a reinforcement learning approach to learn a real world control problem, the truck backer-upper problem. In this problem, a tractor trailer truck must be backed into a loading dock from an arbitrary location and orientation. Our approach uses the temporal difference al- gorithm using a neural network as the value function approximator. The novelty of this work is that our implementation is simple, yet it is able to successfully back the truck into the loading dock from random initial locations and orientations.
We use a reinforcement learning approach to learn a real world control problem, the truck backer-upper problem. In this problem, a tractor trailer truck must be backed into a loading dock from an arbitrary location and orientation. Our approach uses the temporal difference al- gorithm using a neural network as the value function approximator. The novelty of this work is that our implementation is simple, yet it is able to successfully back the truck into the loading dock from random initial locations and orientations.
ES2014-175
Application of Newton's Method to action selection in continuous state- and action-space reinforcement learning
Barry Nichols, Dimitris Dracopoulos
Application of Newton's Method to action selection in continuous state- and action-space reinforcement learning
Barry Nichols, Dimitris Dracopoulos
Abstract:
An algorithm based on Newton's Method is proposed for action selection in continuous state- and action-space reinforcement learning without a policy network or discretization. The proposed method is validated on two benchmark problems: Cart-Pole and double Cart-Pole on which the proposed method achieves comparable or improved performance with less parameters to tune and in less training episodes than CACLA, which has previously been shown to outperform many other continuous state- and action-space reinforcement learning algorithms.
An algorithm based on Newton's Method is proposed for action selection in continuous state- and action-space reinforcement learning without a policy network or discretization. The proposed method is validated on two benchmark problems: Cart-Pole and double Cart-Pole on which the proposed method achieves comparable or improved performance with less parameters to tune and in less training episodes than CACLA, which has previously been shown to outperform many other continuous state- and action-space reinforcement learning algorithms.
ES2014-39
Linear Scalarized Knowledge Gradient in the Multi-Objective Multi-Armed Bandits Problem
Saba Yahyaaa, Madalina Drugan, Bernard Manderick
Linear Scalarized Knowledge Gradient in the Multi-Objective Multi-Armed Bandits Problem
Saba Yahyaaa, Madalina Drugan, Bernard Manderick
Abstract:
The multi-objective, multi-armed bandits (MOMABs) problem is a Markov decision process with stochastic rewards. Each arm generates a vector of rewards instead of a single reward and these multiple rewards might be conflicting. The agent has a set of optimal arms and the agent's goal is not only finding the optimal arms, but also playing them fairly. To find the optimal arm set, the agent uses a linear scalarized (LS) function which converts the multi-objective arms into one-objective arms. LS function is simple, however it can not find all the optimal arm set. As a result, we extend knowledge gradient (KG) policy to LS function. We propose two variants of linear scalarized-KG, LS-KG across arms and dimensions. We experimentally compare the two variant, LS-KG across arms finds the optimal arm set, while LS-KG across dimensions plays fairly the optimal arms.
The multi-objective, multi-armed bandits (MOMABs) problem is a Markov decision process with stochastic rewards. Each arm generates a vector of rewards instead of a single reward and these multiple rewards might be conflicting. The agent has a set of optimal arms and the agent's goal is not only finding the optimal arms, but also playing them fairly. To find the optimal arm set, the agent uses a linear scalarized (LS) function which converts the multi-objective arms into one-objective arms. LS function is simple, however it can not find all the optimal arm set. As a result, we extend knowledge gradient (KG) policy to LS function. We propose two variants of linear scalarized-KG, LS-KG across arms and dimensions. We experimentally compare the two variant, LS-KG across arms finds the optimal arm set, while LS-KG across dimensions plays fairly the optimal arms.
ES2014-114
Improving the firefly algorithm through the Barnes-Hut tree code
Kiril Ralinovski
Improving the firefly algorithm through the Barnes-Hut tree code
Kiril Ralinovski
Abstract:
The firefly algorithm is a nature-inspired meta-heuristic algorithm that has a variety of applications such as multimodal optimization, clustering and finding good solutions for NP-hard problems. The original algorithm and modifications thereof have so far always calculated interactions between all fireflies individually which leads to a complexity of O(n^2). In this paper we present a novel approach to reduce the complexity to O(n log(n)) in lower dimensions by using the Barnes-Hut tree code, which is used for n-body simulations in physics. This is possible due to the similar nature of both problems and requires only small modifications.
The firefly algorithm is a nature-inspired meta-heuristic algorithm that has a variety of applications such as multimodal optimization, clustering and finding good solutions for NP-hard problems. The original algorithm and modifications thereof have so far always calculated interactions between all fireflies individually which leads to a complexity of O(n^2). In this paper we present a novel approach to reduce the complexity to O(n log(n)) in lower dimensions by using the Barnes-Hut tree code, which is used for n-body simulations in physics. This is possible due to the similar nature of both problems and requires only small modifications.
ES2014-106
Improved Cat Swarm Optimization approach applied to reliability-redundancy problem
Carlos E. Klein, Leandro Coelho, Ângelo M. O. Sant’Anna, Roberto Z. Freire, Viviana C. Mariani
Improved Cat Swarm Optimization approach applied to reliability-redundancy problem
Carlos E. Klein, Leandro Coelho, Ângelo M. O. Sant’Anna, Roberto Z. Freire, Viviana C. Mariani
Abstract:
System reliability-redundancy optimization plays a vital role in real-world applications. Recently, a new meta-heuristic based on swarm intelligence called cat swarm optimization (CSO) algorithm has emerged. CSO is a stochastic optimization paradigm inspired from the natural behavior of cats. To enhance the performance of the CSO algorithm, an improved adaptive CSO (ICSO) algorithm is presented. Both CSO and ICSO approaches were applied to an overspeed protection system for a gas turbine, a benchmark in the reliability-redundancy mixed-integer optimization field. Better results obtained by the ICSO show that the algorithm can be an efficient alternative for solving reliability problems.
System reliability-redundancy optimization plays a vital role in real-world applications. Recently, a new meta-heuristic based on swarm intelligence called cat swarm optimization (CSO) algorithm has emerged. CSO is a stochastic optimization paradigm inspired from the natural behavior of cats. To enhance the performance of the CSO algorithm, an improved adaptive CSO (ICSO) algorithm is presented. Both CSO and ICSO approaches were applied to an overspeed protection system for a gas turbine, a benchmark in the reliability-redundancy mixed-integer optimization field. Better results obtained by the ICSO show that the algorithm can be an efficient alternative for solving reliability problems.
Nonlinear dimensionality reduction
ES2014-165
Relevance Learning for Dimensionality Reduction
Alexander Schulz, Andrej Gisbrecht, Barbara Hammer
Relevance Learning for Dimensionality Reduction
Alexander Schulz, Andrej Gisbrecht, Barbara Hammer
Abstract:
Nonlinear dimensionality reduction (NDR) techniques offer powerful data visualization schemes capturing nonlinear effects of the data at the costs of a decreased interpretability of the projection: Unlike for linear counterparts such as principal component analysis, the relevance of the original feature dimensions for the NDR projection is not clear. In this contribution we propose relevance learning schemes for NDR which enable to judge the relevance of a feature dimension for the projection. This technique can be extended to a metric learning scheme which opens a way to imprint the information as provided by a given visualization on the data representation in the original feature space.
Nonlinear dimensionality reduction (NDR) techniques offer powerful data visualization schemes capturing nonlinear effects of the data at the costs of a decreased interpretability of the projection: Unlike for linear counterparts such as principal component analysis, the relevance of the original feature dimensions for the NDR projection is not clear. In this contribution we propose relevance learning schemes for NDR which enable to judge the relevance of a feature dimension for the projection. This technique can be extended to a metric learning scheme which opens a way to imprint the information as provided by a given visualization on the data representation in the original feature space.
ES2014-76
Dimensionality reduction in decentralized networks by Gossip aggregation of principal components analyzers
Jérôme FELLUS, David Picard, Philippe-Henri Gosselin
Dimensionality reduction in decentralized networks by Gossip aggregation of principal components analyzers
Jérôme FELLUS, David Picard, Philippe-Henri Gosselin
Abstract:
This paper considers dimensionality reduction in large decentralized networks with limited node-local computing and memory resources and unreliable point-to-point connectivity (e.g, peer-to-peer, sensors or ad-hoc mobile networks). We propose an asynchronous decentralized algorithm built on a Gossip consensus protocol that perform Principal Components Analysis (PCA) of data spread over such networks. All nodes obtain the same local basis that span the global principal subspace. Reported experiments show that obtained bases both reach a consensus and accurately estimate the global PCA solution.
This paper considers dimensionality reduction in large decentralized networks with limited node-local computing and memory resources and unreliable point-to-point connectivity (e.g, peer-to-peer, sensors or ad-hoc mobile networks). We propose an asynchronous decentralized algorithm built on a Gossip consensus protocol that perform Principal Components Analysis (PCA) of data spread over such networks. All nodes obtain the same local basis that span the global principal subspace. Reported experiments show that obtained bases both reach a consensus and accurately estimate the global PCA solution.
ES2014-64
Multiscale stochastic neighbor embedding: Towards parameter-free dimensionality reduction
John A. Lee, Diego H. Peluffo-Ordonez, Michel Verleysen
Multiscale stochastic neighbor embedding: Towards parameter-free dimensionality reduction
John A. Lee, Diego H. Peluffo-Ordonez, Michel Verleysen
Abstract:
Stochastic neighbor embedding (SNE) is a method of dimensionality reduction that involves softmax similarities measured between all pairs of data points. To build a suitable embedding, SNE tries to reproduce in a low-dimensional space the similarities that are observed in the high-dimensional data space. Previous work has investigated the immunity of such similarities to norm concentration, as well as enhanced cost functions. This paper proposes an additional refinement, in the form of multiscale similarities, namely weighted sums of softmax ratios with decreasing bandwidths. The objective is to maximize the embedding quality at all scales, with a better preservation of both local and global neighborhoods, and also to exempt the user from having to fix a scale arbitrarily. Experiments on several data sets show that this multiscale version of SNE, combined with an appropriate cost function (sum of Jensen-Shannon divergences), outperforms all previous variants of SNE.
Stochastic neighbor embedding (SNE) is a method of dimensionality reduction that involves softmax similarities measured between all pairs of data points. To build a suitable embedding, SNE tries to reproduce in a low-dimensional space the similarities that are observed in the high-dimensional data space. Previous work has investigated the immunity of such similarities to norm concentration, as well as enhanced cost functions. This paper proposes an additional refinement, in the form of multiscale similarities, namely weighted sums of softmax ratios with decreasing bandwidths. The objective is to maximize the embedding quality at all scales, with a better preservation of both local and global neighborhoods, and also to exempt the user from having to fix a scale arbitrarily. Experiments on several data sets show that this multiscale version of SNE, combined with an appropriate cost function (sum of Jensen-Shannon divergences), outperforms all previous variants of SNE.
ES2014-75
Interactive dimensionality reduction for visual analytics
Ignacio Diaz-Blanco, Abel A. Cuadrado-Vega, Daniel Perez-Lopez, Francisco Garcia-Fernandez, Michel Verleysen
Interactive dimensionality reduction for visual analytics
Ignacio Diaz-Blanco, Abel A. Cuadrado-Vega, Daniel Perez-Lopez, Francisco Garcia-Fernandez, Michel Verleysen
Abstract:
In this work, we present a novel approach for data visualization based on interactive dimensionality reduction (iDR). The main idea of the paper relies on considering for visualization the intermediate results of non-convex DR algorithms under changes on the metric of the input data space driven by the user. With an appropriate visualization interface, our approach allows the user to focus on the relationships among dynamically selected groups of variables, as well as to assess the impact of a single variable or groups of variables in the structure of the data.
In this work, we present a novel approach for data visualization based on interactive dimensionality reduction (iDR). The main idea of the paper relies on considering for visualization the intermediate results of non-convex DR algorithms under changes on the metric of the input data space driven by the user. With an appropriate visualization interface, our approach allows the user to focus on the relationships among dynamically selected groups of variables, as well as to assess the impact of a single variable or groups of variables in the structure of the data.
ES2014-170
Recent methods for dimensionality reduction: A brief comparative analysis
Diego H. Peluffo-Ordonez, John A. Lee, Michel Verleysen
Recent methods for dimensionality reduction: A brief comparative analysis
Diego H. Peluffo-Ordonez, John A. Lee, Michel Verleysen
Abstract:
Dimensionality reduction is a key stage for both the design of a pattern recognition system or data visualization. Recently, there has been a increasing interest on those methods aimed at preserving the data topology. Among them, Laplacian eigenmaps (LE) and stochastic neighbour embedding (SNE) are the most representative. In this work, we present a brief comparative among very recent methods being alternatives to LE and SNE. Comparisons are done mainly on two aspects: algorithm implementation, and complexity. Also, relations between methods are depicted. The goal of this work is providing with some discussion and criterion decision to chose a method according to the user's needs and/or keeping a good trade-off between performance and required processing time.
Dimensionality reduction is a key stage for both the design of a pattern recognition system or data visualization. Recently, there has been a increasing interest on those methods aimed at preserving the data topology. Among them, Laplacian eigenmaps (LE) and stochastic neighbour embedding (SNE) are the most representative. In this work, we present a brief comparative among very recent methods being alternatives to LE and SNE. Comparisons are done mainly on two aspects: algorithm implementation, and complexity. Also, relations between methods are depicted. The goal of this work is providing with some discussion and criterion decision to chose a method according to the user's needs and/or keeping a good trade-off between performance and required processing time.
Signal and temporal processing
ES2014-123
Capturing confounding sources of variation in DNA methylation data by spatiotemporal independent component analysis
Emilie Renard, Andrew Teschendorff, Pierre-Antoine Absil
Capturing confounding sources of variation in DNA methylation data by spatiotemporal independent component analysis
Emilie Renard, Andrew Teschendorff, Pierre-Antoine Absil
Abstract:
Confounding sources of variation, which are often either unknown or known with error, are widespread in genomic datasets, and failing to adjust for them may adversely impact statistical inference. In this context, we propose a `spatiotemporal' independent component analysis method that possesses a novel invariance property, and we show that that spatiotemporal aspect may increase the ability of the method to model confounding sources of variation.
Confounding sources of variation, which are often either unknown or known with error, are widespread in genomic datasets, and failing to adjust for them may adversely impact statistical inference. In this context, we propose a `spatiotemporal' independent component analysis method that possesses a novel invariance property, and we show that that spatiotemporal aspect may increase the ability of the method to model confounding sources of variation.
ES2014-126
Towards an effective multi-map self organizing recurrent neuronal network
Denis Baheux, Hervé Frezza-Buet, Jeremy Fix
Towards an effective multi-map self organizing recurrent neuronal network
Denis Baheux, Hervé Frezza-Buet, Jeremy Fix
Abstract:
This paper presents a multi-map joint self-organizing architecture able to represent non-markovian temporal sequences. The proposed architecture is inspired by previous works based on dynamic neural fields. It provides a faster and easier to handle architecture making it easier to scale to higher dimensional machine learning problems.
This paper presents a multi-map joint self-organizing architecture able to represent non-markovian temporal sequences. The proposed architecture is inspired by previous works based on dynamic neural fields. It provides a faster and easier to handle architecture making it easier to scale to higher dimensional machine learning problems.
ES2014-94
Iterative ARIMA-multiple support vector regression models for long term time series prediction
João Oliveira, Teresa B. Ludermir
Iterative ARIMA-multiple support vector regression models for long term time series prediction
João Oliveira, Teresa B. Ludermir
Abstract:
Support Vector Regression (SVR) has been widely applied in time series forecasting. Considering long term predictions, iterative predictions perform many one-step-ahead predictions until the desired horizon is achieved. This process accumulates the error from previous predictions and may affect the quality of forecasts. In order to improve long term iterative predictions a hybrid multiple Autoregressive Integrated Moving Average(ARIMA)-SVR model is applied to perform predictions considering linear and non-linear components from the time series. The results show that the proposed method produces more accurate predictions in the long term context.
Support Vector Regression (SVR) has been widely applied in time series forecasting. Considering long term predictions, iterative predictions perform many one-step-ahead predictions until the desired horizon is achieved. This process accumulates the error from previous predictions and may affect the quality of forecasts. In order to improve long term iterative predictions a hybrid multiple Autoregressive Integrated Moving Average(ARIMA)-SVR model is applied to perform predictions considering linear and non-linear components from the time series. The results show that the proposed method produces more accurate predictions in the long term context.
ES2014-69
Region of interest detection using MLP
Tommi Kärkkäinen, Alexandr Maslov, Pekka Wartiainen
Region of interest detection using MLP
Tommi Kärkkäinen, Alexandr Maslov, Pekka Wartiainen
Abstract:
A novel technique to detect regions of interest in a time series as deviation from the characteristic behavior is proposed. The deterministic form of a signal is obtained using a reliably trained MLP neural network with detailed complexity management and cross-validation based generalization assurance. The proposed technique is demonstrated with simulated and real data.
A novel technique to detect regions of interest in a time series as deviation from the characteristic behavior is proposed. The deterministic form of a signal is obtained using a reliably trained MLP neural network with detailed complexity management and cross-validation based generalization assurance. The proposed technique is demonstrated with simulated and real data.
ES2014-142
Analysis of the Weighted Fuzzy C-means in the problem of source location
Everton Z. Nadalin, Rodrigo C. Silva, Romis Attux, João M. T. Romano
Analysis of the Weighted Fuzzy C-means in the problem of source location
Everton Z. Nadalin, Rodrigo C. Silva, Romis Attux, João M. T. Romano
Abstract:
This paper proposes the use of the clustering method called Weighted Fuzzy C-means to solve the problem of mixing matrix estimation in underdetermined source separation based on sparse component analysis. The performed comparative analysis shows that the approach has a significant application potential, especially if the distributions of the columns of the mixing matrix has a non-uniform character.
This paper proposes the use of the clustering method called Weighted Fuzzy C-means to solve the problem of mixing matrix estimation in underdetermined source separation based on sparse component analysis. The performed comparative analysis shows that the approach has a significant application potential, especially if the distributions of the columns of the mixing matrix has a non-uniform character.
ES2014-125
An Optimized Learning Algorithm Based on Linear Filters Suitable for Hardware implemented Self-Organizing Maps
Marta Kolasa, Rafal Dlugosz, Talaska Tomasz, Pedrycz Witold
An Optimized Learning Algorithm Based on Linear Filters Suitable for Hardware implemented Self-Organizing Maps
Marta Kolasa, Rafal Dlugosz, Talaska Tomasz, Pedrycz Witold
Abstract:
In this study, we present a fast and energy efficient learning algorithm suitable for Self-Organizing Maps (SOMs) realized in hardware. The proposed algorithm is an extension of the classical algorithm used in Kohonen SOM. It is based on the observation that the quantization error that is a typical quality measure of the learning process, does not decrease linearly along the learning process. One can observe the phases of the increased ‘activity’, during which the quantization error rapidly decreases, followed by ‘stagnation’ phases, during which its values are almost the same. The activity phases occur just after decreasing the neighborhood radius, R. A set of finite impulse response (FIR) filters is used to detect both phases. This enables an automatic switching the radius R to a smaller value that shorts a given stagnation phase and starts a new activity phase. Comprehensive investigations carried out by means of the software model of the SOM show that the learning process can be shorten even by 80-95% that allows for reduction of energy consumption even by 70-90%.
In this study, we present a fast and energy efficient learning algorithm suitable for Self-Organizing Maps (SOMs) realized in hardware. The proposed algorithm is an extension of the classical algorithm used in Kohonen SOM. It is based on the observation that the quantization error that is a typical quality measure of the learning process, does not decrease linearly along the learning process. One can observe the phases of the increased ‘activity’, during which the quantization error rapidly decreases, followed by ‘stagnation’ phases, during which its values are almost the same. The activity phases occur just after decreasing the neighborhood radius, R. A set of finite impulse response (FIR) filters is used to detect both phases. This enables an automatic switching the radius R to a smaller value that shorts a given stagnation phase and starts a new activity phase. Comprehensive investigations carried out by means of the software model of the SOM show that the learning process can be shorten even by 80-95% that allows for reduction of energy consumption even by 70-90%.
ES2014-115
Enhanced NMF initialization using a physical model for pollution source apportionment
Marc Plouvin, Abdelhakim Limem, Matthieu Puigt, Gilles Delmaire, Gilles Roussel, Dominique Courcot
Enhanced NMF initialization using a physical model for pollution source apportionment
Marc Plouvin, Abdelhakim Limem, Matthieu Puigt, Gilles Delmaire, Gilles Roussel, Dominique Courcot
Abstract:
In a previous work, we proposed an informed Non-negative Matrix Factorization (NMF) with a specific parametrization which involves constraints about some known components of the factorization. In this paper we extend the above work by adding some information provided by a physical dispersion model. In particular, we derive a special structure of one of the factorizing matrices, which provides a better initialization of the NMF procedure. Experiments on simulated mixtures of particulate matter sources show that our new approach outperforms both our previous one and the state-of-the-art NMF methods.
In a previous work, we proposed an informed Non-negative Matrix Factorization (NMF) with a specific parametrization which involves constraints about some known components of the factorization. In this paper we extend the above work by adding some information provided by a physical dispersion model. In particular, we derive a special structure of one of the factorizing matrices, which provides a better initialization of the NMF procedure. Experiments on simulated mixtures of particulate matter sources show that our new approach outperforms both our previous one and the state-of-the-art NMF methods.
ES2014-14
A New Error-Correcting Syndrome Decoder with Retransmit Signal Implemented with an Hardlimit Neural Network
Jose Fonseca
A New Error-Correcting Syndrome Decoder with Retransmit Signal Implemented with an Hardlimit Neural Network
Jose Fonseca
Abstract:
Still today the problem of counting the errors of a noisy received word is an open problem in literature. This means that when we use an error correcting code we cannot control if the number of errors of the received noisy word is greater than the error correction capability of the code of k errors, k=(d-1)/2, where d is the minimum Hamming distance of the code. The main advantage of our proposal results from the introduction of the Retransmit signal when the syndrome decoder detects an ambiguity situation and cannot correct the noisy word. These ambiguity situations occur when happens one more error than the error correction capability of the error correcting code. This property of the error correcting syndrome scheme allows increasing the error correction capability of an error correcting code by one error at a little increment of bandwidth or delay in the transmission. Although there are some proposals of implementation of error-correcting decoders with neural networks in literature our work is completely different in what concerns three main aspects. First we propose the implementation of the retransmit signal based on the detection of ambiguity of the minimum Hamming distance between the received word and each of the codewords, i.e. when there are more than one codeword at the minimum Hamming distance to the too noisy received word. Second we use a constructive approach that does not need training. And finally we use hardlimit neurons that can be implemented in hardware by a single transistor in a high gain setup. We begin with two exhaustive simulation experiments where we introduced all manners of occurrence of two errors in all codewords of two codes with minimum Hamming distances 3 and 4, respectively, which only guarantee all possible one error good corrections, to show how the ambiguities arise in the decoding process. Next we present the building blocks of the error correcting decoder based in hardlimit multilayered perceptrons and then we assembled all them out and show an example for an error correcting decoder for a four codewords error correcting code. Finally we discuss the advantages of our proposal and the consequences of the introduction of the Retransmit signal and define possible ways of evolution of our work.
Still today the problem of counting the errors of a noisy received word is an open problem in literature. This means that when we use an error correcting code we cannot control if the number of errors of the received noisy word is greater than the error correction capability of the code of k errors, k=(d-1)/2, where d is the minimum Hamming distance of the code. The main advantage of our proposal results from the introduction of the Retransmit signal when the syndrome decoder detects an ambiguity situation and cannot correct the noisy word. These ambiguity situations occur when happens one more error than the error correction capability of the error correcting code. This property of the error correcting syndrome scheme allows increasing the error correction capability of an error correcting code by one error at a little increment of bandwidth or delay in the transmission. Although there are some proposals of implementation of error-correcting decoders with neural networks in literature our work is completely different in what concerns three main aspects. First we propose the implementation of the retransmit signal based on the detection of ambiguity of the minimum Hamming distance between the received word and each of the codewords, i.e. when there are more than one codeword at the minimum Hamming distance to the too noisy received word. Second we use a constructive approach that does not need training. And finally we use hardlimit neurons that can be implemented in hardware by a single transistor in a high gain setup. We begin with two exhaustive simulation experiments where we introduced all manners of occurrence of two errors in all codewords of two codes with minimum Hamming distances 3 and 4, respectively, which only guarantee all possible one error good corrections, to show how the ambiguities arise in the decoding process. Next we present the building blocks of the error correcting decoder based in hardlimit multilayered perceptrons and then we assembled all them out and show an example for an error correcting decoder for a four codewords error correcting code. Finally we discuss the advantages of our proposal and the consequences of the introduction of the Retransmit signal and define possible ways of evolution of our work.
Learning of structured and non-standard data
ES2014-11
Recent trends in learning of structured and non-standard data
Frank-Michael Schleif, Peter Tino, Thomas Villmann
Recent trends in learning of structured and non-standard data
Frank-Michael Schleif, Peter Tino, Thomas Villmann
Abstract:
In many application domains data are not given in a classical vector space but occur in form of structural, sequential, relational characteristics or other non-standard formats. These data are often represented as graphs or by means of proximity matrices. Often these data sets are also huge and mathematically complicated to treat requesting for new efficient analysis algorithms which are the focus of this tutorial.
In many application domains data are not given in a classical vector space but occur in form of structural, sequential, relational characteristics or other non-standard formats. These data are often represented as graphs or by means of proximity matrices. Often these data sets are also huge and mathematically complicated to treat requesting for new efficient analysis algorithms which are the focus of this tutorial.
ES2014-58
Support Vector Ordinal Regression using Privileged Information
Fengzhen Tang, Peter Tino, Pedro Antonio Gutiérrez, Huanhuan Chen
Support Vector Ordinal Regression using Privileged Information
Fengzhen Tang, Peter Tino, Pedro Antonio Gutiérrez, Huanhuan Chen
Abstract:
We introduce a new methodology, called SVORIM+, for utilizing privileged information of the training examples, unavailable in the test regime, to improve generalization performance in ordinal regression. The privileged information is incorporated during the training by modelling the slacks through correcting functions for each of the parallel hyperplanes separating the ordered classes. The experimental results on several benchmark and time series datasets show that inclusion of the privileged information during training can boost the generalization performance significantly.
We introduce a new methodology, called SVORIM+, for utilizing privileged information of the training examples, unavailable in the test regime, to improve generalization performance in ordinal regression. The privileged information is incorporated during the training by modelling the slacks through correcting functions for each of the parallel hyperplanes separating the ordered classes. The experimental results on several benchmark and time series datasets show that inclusion of the privileged information during training can boost the generalization performance significantly.
ES2014-70
Segmented shape-symbolic time series representation
Herbert Teun Kruitbosch, Ioannis Giotis, Michael Biehl
Segmented shape-symbolic time series representation
Herbert Teun Kruitbosch, Ioannis Giotis, Michael Biehl
Abstract:
This work introduces a symbolic time series representation using monotonic sub-sequences and bottom up segmentation. The representation minimizes the square error between the segments and their approximation by monotonic functions. The representation can robustly classify the direction of a segment and is scale invariant with respect to the time and value dimensions. This paper describes two experiments. The first shows how accurately the monotonic functions are able to discriminate between different segments. The second tests how well the segmentation technique recognizes segments and classifies them with correct symbols. Finally this paper illustrates the new representation on real-world data.
This work introduces a symbolic time series representation using monotonic sub-sequences and bottom up segmentation. The representation minimizes the square error between the segments and their approximation by monotonic functions. The representation can robustly classify the direction of a segment and is scale invariant with respect to the time and value dimensions. This paper describes two experiments. The first shows how accurately the monotonic functions are able to discriminate between different segments. The second tests how well the segmentation technique recognizes segments and classifies them with correct symbols. Finally this paper illustrates the new representation on real-world data.
ES2014-82
Adaptive distance measures for sequential data
Bassam Mokbel, Benjamin Paassen, Barbara Hammer
Adaptive distance measures for sequential data
Bassam Mokbel, Benjamin Paassen, Barbara Hammer
Abstract:
Recent extensions of learning vector quantization (LVQ) to general (dis-)similarity data have paved the way towards LVQ classifiers for possibly discrete, structured objects such as sequences addressed by classical alignment. In this contribution, we propose a metric learning scheme based on this framework which allows for autonomous learning of the underlying scoring matrix according to a given discriminative task. Besides facilitating the often crucial and problematic choice of the scoring matrix in applications, this extension offers an increased interpretability of the results by pointing out structural invariances for the given task.
Recent extensions of learning vector quantization (LVQ) to general (dis-)similarity data have paved the way towards LVQ classifiers for possibly discrete, structured objects such as sequences addressed by classical alignment. In this contribution, we propose a metric learning scheme based on this framework which allows for autonomous learning of the underlying scoring matrix according to a given discriminative task. Besides facilitating the often crucial and problematic choice of the scoring matrix in applications, this extension offers an increased interpretability of the results by pointing out structural invariances for the given task.
ES2014-153
Applications of lp-Norms and their Smooth Approximations for Gradient Based Learning Vector Quantization
Mandy Lange, Dietlind Zühlke, Olaf Holz, Thomas Villmann
Applications of lp-Norms and their Smooth Approximations for Gradient Based Learning Vector Quantization
Mandy Lange, Dietlind Zühlke, Olaf Holz, Thomas Villmann
Abstract:
Learning vector quantization applying non-standard metrics became quite popular for classification performance improvement compared to standard approaches using the Euclidean distance. Kernel metrics and quadratic forms belong to the most promising approaches. In this paper we consider Minkowski distances ($l_{p}$-norms). In particular, $l_{1}$-norms are known to be robust against noise in data, such that, if this structural knowledge is available in advance about the data, this norm should be utilized. However, application in gradient based learning algorithms based on distance evaluations need to calculate the respective derivatives. Because $l_{p}$-distance formulas contain the absolute approximations thereof are required. We consider in this paper several approaches for smooth consistent approximations for numerical evaluations and demonstrate the applicability for exemplary real world applications.
Learning vector quantization applying non-standard metrics became quite popular for classification performance improvement compared to standard approaches using the Euclidean distance. Kernel metrics and quadratic forms belong to the most promising approaches. In this paper we consider Minkowski distances ($l_{p}$-norms). In particular, $l_{1}$-norms are known to be robust against noise in data, such that, if this structural knowledge is available in advance about the data, this norm should be utilized. However, application in gradient based learning algorithms based on distance evaluations need to calculate the respective derivatives. Because $l_{p}$-distance formulas contain the absolute approximations thereof are required. We consider in this paper several approaches for smooth consistent approximations for numerical evaluations and demonstrate the applicability for exemplary real world applications.
ES2014-155
Utilization of Chemical Structure Information for Analysis of Spectra Composites
Kristin Domaschke, André Roßberg, Thomas Villmann
Utilization of Chemical Structure Information for Analysis of Spectra Composites
Kristin Domaschke, André Roßberg, Thomas Villmann
Abstract:
In this paper, we propose the utilization of structural information of spectral data during the preprocessing to extend the ability of subsequent analysis methods. Specifically, a composite dataset of measured spectra is given containing a mixtures of only a few spectral components. Using the mixture information for a small subset and additional chemical knowledge, theoretical component spectra are generated, which can be used as additional data with known mixture information in further data processing. The resulting extended data set is analyzed using a self-organizing map to predict the unknown mixture ratios of the remaining subset by an associative learning.
In this paper, we propose the utilization of structural information of spectral data during the preprocessing to extend the ability of subsequent analysis methods. Specifically, a composite dataset of measured spectra is given containing a mixtures of only a few spectral components. Using the mixture information for a small subset and additional chemical knowledge, theoretical component spectra are generated, which can be used as additional data with known mixture information in further data processing. The resulting extended data set is analyzed using a self-organizing map to predict the unknown mixture ratios of the remaining subset by an associative learning.
ES2014-190
Weighted tree kernels for sequence analysis
Christopher Bowles, James Hogan
Weighted tree kernels for sequence analysis
Christopher Bowles, James Hogan
Abstract:
Genomic sequences are fundamentally text documents, admitting a number of representations according to need and tokenization. Control of gene expression depends crucially on the binding of enzymes to the DNA sequence at a number of small, often poorly conserved, binding sites, precluding successful application of standard pattern search tools. However some progress is possible due to the regular syntactic structure of the enzyme's component proteins and the corresponding binding sites, allowing us to view the problem as one of detecting grammatically correct genomic phrases. In this paper we propose new kernels which extend existing tree kernels for application to weighted trees, in order to capture the features which underpin this prediction task. Experimentally, we find that these kernels provide performance comparable with state of the art approaches for this problem, while offering significant computational advantages over earlier methods. The methods proposed may be applied to a broad range of sequence or tree-structured data, including many other word-based classification tasks in molecular biology and other domains.
Genomic sequences are fundamentally text documents, admitting a number of representations according to need and tokenization. Control of gene expression depends crucially on the binding of enzymes to the DNA sequence at a number of small, often poorly conserved, binding sites, precluding successful application of standard pattern search tools. However some progress is possible due to the regular syntactic structure of the enzyme's component proteins and the corresponding binding sites, allowing us to view the problem as one of detecting grammatically correct genomic phrases. In this paper we propose new kernels which extend existing tree kernels for application to weighted trees, in order to capture the features which underpin this prediction task. Experimentally, we find that these kernels provide performance comparable with state of the art approaches for this problem, while offering significant computational advantages over earlier methods. The methods proposed may be applied to a broad range of sequence or tree-structured data, including many other word-based classification tasks in molecular biology and other domains.
Kernel methods
ES2014-129
Easy multiple kernel learning
Fabio Aiolli, Michele Donini
Easy multiple kernel learning
Fabio Aiolli, Michele Donini
Abstract:
The goal of Multiple Kernel Learning (MKL) is to combine kernels derived from multiple sources in a data-driven way with the aim to enhance the accuracy of a kernel based machine. In this paper, we propose a time and space efficient MKL algorithm that can easily cope with hundreds of thousands of kernels and more. We compared our algorithm with other baselines plus three state-of-the-art MKL methods showing that our approach is often superior.
The goal of Multiple Kernel Learning (MKL) is to combine kernels derived from multiple sources in a data-driven way with the aim to enhance the accuracy of a kernel based machine. In this paper, we propose a time and space efficient MKL algorithm that can easily cope with hundreds of thousands of kernels and more. We compared our algorithm with other baselines plus three state-of-the-art MKL methods showing that our approach is often superior.
ES2014-35
Joint SVM for Accurate and Fast Image Tagging
Hanchen Xiong, Sandor Szedmak, Justus Piater
Joint SVM for Accurate and Fast Image Tagging
Hanchen Xiong, Sandor Szedmak, Justus Piater
Abstract:
This paper studies how joint training of multiple support vector machines (SVMs) can improve the effectiveness and efficiency of au- tomatic image annotation. We cast image annotation as an output-related multi-task learning framework, with the prediction of each tag’s presence as one individual task. Evidently, these tasks are related via correlations between tags. The proposed joint learning framework, which we call joint SVM, can encode the correlation between tags by defining appropriate ker- nel functions on the outputs. Another practical merit of the joint SVM is that it shares the same computational complexity as one single conven- tional SVM, although multiple tasks are solved simultaneously. According to our empirical results on an image-annotation benchmark database, our joint training strategy of SVMs can yield substantial improvements, in terms of both accuracy and efficiency, over training them independently. In particular, it outperforms many other state-of-the-art algorithms.
This paper studies how joint training of multiple support vector machines (SVMs) can improve the effectiveness and efficiency of au- tomatic image annotation. We cast image annotation as an output-related multi-task learning framework, with the prediction of each tag’s presence as one individual task. Evidently, these tasks are related via correlations between tags. The proposed joint learning framework, which we call joint SVM, can encode the correlation between tags by defining appropriate ker- nel functions on the outputs. Another practical merit of the joint SVM is that it shares the same computational complexity as one single conven- tional SVM, although multiple tasks are solved simultaneously. According to our empirical results on an image-annotation benchmark database, our joint training strategy of SVMs can yield substantial improvements, in terms of both accuracy and efficiency, over training them independently. In particular, it outperforms many other state-of-the-art algorithms.
ES2014-59
Kernel methods for mixed feature selection
Jérôme Paul, Pierre Dupont
Kernel methods for mixed feature selection
Jérôme Paul, Pierre Dupont
Abstract:
This paper introduces two feature selection methods to deal with heterogeneous data that include continuous and categorical variables. We propose to plug a dedicated kernel that handles both kind of variables into a Recursive Feature Elimination procedure using either a non-linear SVM or Multiple Kernel Learning. These methods are shown to offer significantly better predictive results than state-of-the-art alternatives on a variety of high-dimensional classification tasks.
This paper introduces two feature selection methods to deal with heterogeneous data that include continuous and categorical variables. We propose to plug a dedicated kernel that handles both kind of variables into a Recursive Feature Elimination procedure using either a non-linear SVM or Multiple Kernel Learning. These methods are shown to offer significantly better predictive results than state-of-the-art alternatives on a variety of high-dimensional classification tasks.
ES2014-147
The one-sided mean kernel: a positive definite kernel for time series
Nicolas Chrysanthos, Pierre Beauseroy, Hichem Snoussi, Edith Grall-Maes, Fabrice Ferrand
The one-sided mean kernel: a positive definite kernel for time series
Nicolas Chrysanthos, Pierre Beauseroy, Hichem Snoussi, Edith Grall-Maes, Fabrice Ferrand
Abstract:
We propose in this paper a new kernel for time series in the dynamic time warping family. We demonstrate using the theory of infinitely divisible kernels that this kernel is positive definite. We also illustrate many other interesting practical properties: it is a radial basis kernel, has no issues of diagonal dominance, and presents a consistent behavior in the case of time series sub-sampling.
We propose in this paper a new kernel for time series in the dynamic time warping family. We demonstrate using the theory of infinitely divisible kernels that this kernel is positive definite. We also illustrate many other interesting practical properties: it is a radial basis kernel, has no issues of diagonal dominance, and presents a consistent behavior in the case of time series sub-sampling.
ES2014-150
A robust regularization path for the Doubly Regularized Support Vector Machine
Antoine Lachaud, Stéphane Canu, David Mercier, Frederic Suard
A robust regularization path for the Doubly Regularized Support Vector Machine
Antoine Lachaud, Stéphane Canu, David Mercier, Frederic Suard
Abstract:
The Doubly Regularized SVM (DrSVM) is an extension of SVM using a mixture of L2 and L1 norm penalties. This kind of penalty, sometimes referred as the elastic net, allows to perform variable selection while taking into account correlations between variables. Introduced by Wang an efficient algorithm to compute the whole DrSVM solution path has been proposed. Unfortunately, in some cases, this path is discontinuous, and thus not piecewise linear. To solve this problem, we propose here a new sub gradient formulation of the DrSVM problem. This led us to propose an alternative L1 regularization path algorithm. This reformulation efficiently addresses the aforementioned problem and makes the initialization step more generic. The results show the validity of our sub-gradient formulation and the efficiency compared to the initial formulation.
The Doubly Regularized SVM (DrSVM) is an extension of SVM using a mixture of L2 and L1 norm penalties. This kind of penalty, sometimes referred as the elastic net, allows to perform variable selection while taking into account correlations between variables. Introduced by Wang an efficient algorithm to compute the whole DrSVM solution path has been proposed. Unfortunately, in some cases, this path is discontinuous, and thus not piecewise linear. To solve this problem, we propose here a new sub gradient formulation of the DrSVM problem. This led us to propose an alternative L1 regularization path algorithm. This reformulation efficiently addresses the aforementioned problem and makes the initialization step more generic. The results show the validity of our sub-gradient formulation and the efficiency compared to the initial formulation.
ES2014-34
Tensor decomposition of dense SIFT descriptors in object recognition
Tan Vo, Dat Tran, Ma Wanli
Tensor decomposition of dense SIFT descriptors in object recognition
Tan Vo, Dat Tran, Ma Wanli
Abstract:
In machine vision, Scale-invariant feature transform (SIFT) and its variants have been widely used in image classification task. However, the high dimensionality nature of SIFT features, usually in the order of multiple thousands per image, would require careful consideration in place to achieve accurate and timely categorization of objects within images. This paper explores the possibility of processing SIFT features as tensors and uses tensor decomposition techniques on high-order SIFT tensors for dimensionality reduction. The method focuses on both accuracy and efficiency aspects and the validation result with the Caltech 101 dataset confirms the improvement with notable margins.
In machine vision, Scale-invariant feature transform (SIFT) and its variants have been widely used in image classification task. However, the high dimensionality nature of SIFT features, usually in the order of multiple thousands per image, would require careful consideration in place to achieve accurate and timely categorization of objects within images. This paper explores the possibility of processing SIFT features as tensors and uses tensor decomposition techniques on high-order SIFT tensors for dimensionality reduction. The method focuses on both accuracy and efficiency aspects and the validation result with the Caltech 101 dataset confirms the improvement with notable margins.
ES2014-28
Fine-tuning of support vector machine parameters using racing algorithms
Péricles Miranda, Ricardo Silva, Ricardo Prudêncio
Fine-tuning of support vector machine parameters using racing algorithms
Péricles Miranda, Ricardo Silva, Ricardo Prudêncio
Abstract:
This paper investigates the iterative racing approach, I/F-Race, for selecting parameters of SVMs. As a racing algorithm, I/F-Race eliminates candidate models as soon as there is sufficient statistical evidence of their inferiority relative to other models with respect to the objective. The results revealed that the I/F-Race algorithm was able to achieve better parameter values in comparison to default parameters used in literature and parameters suggested by particle swarm optimization techniques.
This paper investigates the iterative racing approach, I/F-Race, for selecting parameters of SVMs. As a racing algorithm, I/F-Race eliminates candidate models as soon as there is sufficient statistical evidence of their inferiority relative to other models with respect to the objective. The results revealed that the I/F-Race algorithm was able to achieve better parameter values in comparison to default parameters used in literature and parameters suggested by particle swarm optimization techniques.
ES2014-4
reject option paradigm for the reduction of support vectors
Sousa Ricardo, Ajalmar R. da Rocha Neto, Guilherme Barreto, Jaime S. Cardoso, Miguel T. Coimbra
reject option paradigm for the reduction of support vectors
Sousa Ricardo, Ajalmar R. da Rocha Neto, Guilherme Barreto, Jaime S. Cardoso, Miguel T. Coimbra
Abstract:
In this paper we introduce a new conceptualization for the reduction of the number of support vectors (SVs) for an efficient design of support vector machines (SVMs). The techniques here presented provide a good balance between SVs reduction, performance accuracies and generalization capability. Our proposal explores concepts from classification with reject option mechanisms, which output a third class (the rejected instances) for a binary problem when a prediction cannot be given with sufficient confidence. Rejected instances along with misclassified ones are discarded from the original data to give rise to a classification problem that can be linearly solved. Our experimental study on two benchmark datasets show significant gains in terms of SVs reduction with competitive performances.
In this paper we introduce a new conceptualization for the reduction of the number of support vectors (SVs) for an efficient design of support vector machines (SVMs). The techniques here presented provide a good balance between SVs reduction, performance accuracies and generalization capability. Our proposal explores concepts from classification with reject option mechanisms, which output a third class (the rejected instances) for a binary problem when a prediction cannot be given with sufficient confidence. Rejected instances along with misclassified ones are discarded from the original data to give rise to a classification problem that can be linearly solved. Our experimental study on two benchmark datasets show significant gains in terms of SVs reduction with competitive performances.
ES2014-156
The Choquet kernel for monotone data
Ali Fallah Tehrani, Marc Strickert, Eyke Hüllermeier
The Choquet kernel for monotone data
Ali Fallah Tehrani, Marc Strickert, Eyke Hüllermeier
Abstract:
In this paper, we introduce a kernel for monotone data derived from the Choquet integral with its underlying fuzzy measure. % for monotone learning. While a na\"ive computation of this kernel has a complexity that is exponential in the number of data attributes, we propose a more efficient approach with quadratic time complexity. Kernel PCA and SVM classification are employed to illustrate characteristics and benefits of the new Choquet kernel in two experiments related to decision-making and pricing.
In this paper, we introduce a kernel for monotone data derived from the Choquet integral with its underlying fuzzy measure. % for monotone learning. While a na\"ive computation of this kernel has a complexity that is exponential in the number of data attributes, we propose a more efficient approach with quadratic time complexity. Kernel PCA and SVM classification are employed to illustrate characteristics and benefits of the new Choquet kernel in two experiments related to decision-making and pricing.
Learning and Modeling Big Data
ES2014-8
Learning and modeling big data
Barbara Hammer, Haibo He, Thomas Martinetz
Learning and modeling big data
Barbara Hammer, Haibo He, Thomas Martinetz
Abstract:
Caused by powerful sensors, advanced digitalisation techniques, and dramatically increased storage capabilities, big data in the sense of large or streaming data sets, very high dimensionality, or complex data formats constitute one of the major challenges faced by machine learning today. In this realm, a couple of typical assumptions of machine learning can no longer be met, such as e.g. the possibility to deal with all data in batch mode or data being identically distributed; this causes the need for novel algorithmic developments and paradigm shifts, or for the adaptation of existing ones to cope with such situations. The goal of this tutorial is to give an overview about recent machine learning approaches for big data, with a focus on principled algorithmic ideas in the field.
Caused by powerful sensors, advanced digitalisation techniques, and dramatically increased storage capabilities, big data in the sense of large or streaming data sets, very high dimensionality, or complex data formats constitute one of the major challenges faced by machine learning today. In this realm, a couple of typical assumptions of machine learning can no longer be met, such as e.g. the possibility to deal with all data in batch mode or data being identically distributed; this causes the need for novel algorithmic developments and paradigm shifts, or for the adaptation of existing ones to cope with such situations. The goal of this tutorial is to give an overview about recent machine learning approaches for big data, with a focus on principled algorithmic ideas in the field.
ES2014-61
Agglomerative hierarchical kernel spectral clustering for large scale networks
Mall Raghvendra, Langone Rocco, Suykens Johan
Agglomerative hierarchical kernel spectral clustering for large scale networks
Mall Raghvendra, Langone Rocco, Suykens Johan
Abstract:
We propose an agglomerative hierarchical kernel spectral clustering (AH-KSC) model for large scale complex networks. The kernel spectral clustering (KSC) method uses a primal-dual framework to build a model on a subgraph of the network. We exploit the structure of the projections in the eigenspace to automatically identify a set of distance thresholds. These thresholds lead to the different levels of hierarchy in the network. We use these distance thresholds on the eigen-projections of the entire network to obtain a hierarchical clustering in an agglomerative fashion. The proposed approach locates several levels of hierarchy which have clusters with high modularity (Q) and high adjusted rand index (ARI) w.r.t. the groundtruth communities. We compare AH-KSC with $2$ state-of-the-art large scale hierarchical community detection techniques.
We propose an agglomerative hierarchical kernel spectral clustering (AH-KSC) model for large scale complex networks. The kernel spectral clustering (KSC) method uses a primal-dual framework to build a model on a subgraph of the network. We exploit the structure of the projections in the eigenspace to automatically identify a set of distance thresholds. These thresholds lead to the different levels of hierarchy in the network. We use these distance thresholds on the eigen-projections of the entire network to obtain a hierarchical clustering in an agglomerative fashion. The proposed approach locates several levels of hierarchy which have clusters with high modularity (Q) and high adjusted rand index (ARI) w.r.t. the groundtruth communities. We compare AH-KSC with $2$ state-of-the-art large scale hierarchical community detection techniques.
ES2014-33
Proximity learning for non-standard big data
Frank-Michael Schleif
Proximity learning for non-standard big data
Frank-Michael Schleif
Abstract:
Complicated data sets, by means of high volume and variety are more and more common. In the life science domain for example, modern measurement systems generate huge amounts of data, which are only partially structured but have to be analyzed in reasonable time to trigger decisions and further analysis steps. Available algorithms to analyze such data do often not scale to larger problems or are inaccessible due to the variety of the data formats. Relational learning methods, as effective computational intelligence algorithms, can be used only for moderately large metric datasets. Here we discuss novel strategies to open relational methods for non-standard data formats at large scale, applied to a very large protein sequence database.
Complicated data sets, by means of high volume and variety are more and more common. In the life science domain for example, modern measurement systems generate huge amounts of data, which are only partially structured but have to be analyzed in reasonable time to trigger decisions and further analysis steps. Available algorithms to analyze such data do often not scale to larger problems or are inaccessible due to the variety of the data formats. Relational learning methods, as effective computational intelligence algorithms, can be used only for moderately large metric datasets. Here we discuss novel strategies to open relational methods for non-standard data formats at large scale, applied to a very large protein sequence database.
ES2014-36
Predicting Grain Protein Content of Winter Wheat
Mansouri Majdi, Marie-France Destain , Benjamin Dumont
Predicting Grain Protein Content of Winter Wheat
Mansouri Majdi, Marie-France Destain , Benjamin Dumont
Abstract:
The objective of this paper is to propose to use a new Improved Particle Filtering (IPF) based on minimizing Kullback-Leibler divergence for crop models' predictions. The performances of the method are compared with those of the conventional Particle Filtering (PF) at a complex crop model (AZODYN) to predict an important winter-wheat quality criterion, namely the grain protein content. Furthermore, the effect of measurement noise (e.g., different signal-to-noise ratios) on the performances of PF and IPF is investigated. The results of the comparative studies show that the IPF provides a significant improvement over the PF because, unlike the PF which depends on the choice of sampling distribution used to estimate the posterior distribution, the IPF yields an optimum choice of the sampling distribution, which also accounts for the observed data. The efficiency of IPF is expressed in terms of estimation accuracy (root mean square error).
The objective of this paper is to propose to use a new Improved Particle Filtering (IPF) based on minimizing Kullback-Leibler divergence for crop models' predictions. The performances of the method are compared with those of the conventional Particle Filtering (PF) at a complex crop model (AZODYN) to predict an important winter-wheat quality criterion, namely the grain protein content. Furthermore, the effect of measurement noise (e.g., different signal-to-noise ratios) on the performances of PF and IPF is investigated. The results of the comparative studies show that the IPF provides a significant improvement over the PF because, unlike the PF which depends on the choice of sampling distribution used to estimate the posterior distribution, the IPF yields an optimum choice of the sampling distribution, which also accounts for the observed data. The efficiency of IPF is expressed in terms of estimation accuracy (root mean square error).
Classification
ES2014-44
On the complexity of shallow and deep neural network classifiers
Monica Bianchini, Franco Scarselli
On the complexity of shallow and deep neural network classifiers
Monica Bianchini, Franco Scarselli
Abstract:
Recently, deep networks were proved to be more effective than shallow architectures to face complex real--world applications. However, theoretical results supporting this claim are still few and incomplete. In this paper, we propose a new topological measure to study how the depth of feedforward networks impacts on their ability of implementing high complexity functions. Upper and lower bounds on network complexity are established, based on the number of hidden units and on their activation functions, showing that deep architectures are able, with the same number of resources, to address more difficult classification problems.
Recently, deep networks were proved to be more effective than shallow architectures to face complex real--world applications. However, theoretical results supporting this claim are still few and incomplete. In this paper, we propose a new topological measure to study how the depth of feedforward networks impacts on their ability of implementing high complexity functions. Upper and lower bounds on network complexity are established, based on the number of hidden units and on their activation functions, showing that deep architectures are able, with the same number of resources, to address more difficult classification problems.
ES2014-188
Evidence build-up facilitates on-line adaptivity in dynamic environments: example of the BCI P300-speller
Emmanuel Daucé, Eoin Thomas
Evidence build-up facilitates on-line adaptivity in dynamic environments: example of the BCI P300-speller
Emmanuel Daucé, Eoin Thomas
Abstract:
We consider a P300 BCI application where the subjects have to spell-out figures and letters in an unsupervised fashion. We (i) show that a generic speller can attain the state-of-the-art accuracy without any training phase or calibration and (ii) present an adaptive setup that consistently increases the bit rate for most of the subjects.
We consider a P300 BCI application where the subjects have to spell-out figures and letters in an unsupervised fashion. We (i) show that a generic speller can attain the state-of-the-art accuracy without any training phase or calibration and (ii) present an adaptive setup that consistently increases the bit rate for most of the subjects.
ES2014-16
Choosing the Metric in High-Dimensional Spaces Based on Hub Analysis
Dominik Schnitzer, Arthur Flexer
Choosing the Metric in High-Dimensional Spaces Based on Hub Analysis
Dominik Schnitzer, Arthur Flexer
Abstract:
To avoid the undesired effects of distance concentration in high dimensional spaces, previous work has already advocated the use of fractional lp norms instead of the ubiquitous Euclidean norm. Closely related to concentration is the emergence of hub and anti-hub objects. Hub objects have a small distance to an exceptionally large number of data points while anti-hubs lie far from all other data points. The contribution of this work is an empirical examination of concentration and hubness, resulting in an unsupervised approach for choosing an lp norm by minimizing hubs while simultaneously maximizing nearest neighbor classification.
To avoid the undesired effects of distance concentration in high dimensional spaces, previous work has already advocated the use of fractional lp norms instead of the ubiquitous Euclidean norm. Closely related to concentration is the emergence of hub and anti-hub objects. Hub objects have a small distance to an exceptionally large number of data points while anti-hubs lie far from all other data points. The contribution of this work is an empirical examination of concentration and hubness, resulting in an unsupervised approach for choosing an lp norm by minimizing hubs while simultaneously maximizing nearest neighbor classification.
ES2014-62
Robust outlier detection with L0-SVDD
Meriem El Azami, Carole Lartizien, Stéphane Canu
Robust outlier detection with L0-SVDD
Meriem El Azami, Carole Lartizien, Stéphane Canu
Abstract:
The problem of outlier detection consists in finding data that is not representative of the population from which it was ostensibly derived. Recently, to solve this problem, Liu et al. have proposed a two step hypersphere-based approach, taking into account a confidence score pre-calculated for each input data. However, these scores are defined in a first phase, independently from the second one, making this approach is not well-suited for large stream data. To solve these difficulties, we propose a global reformulation of the support vector data description (SVDD) problem based on the L0 norm, well suited for {outlier detection}. We demonstrate that this L0-SVDD problem can be solved using an iterative procedure providing data specific weighting terms. We show that our approach outperforms state of the art outlier detection techniques using both synthetic and real clinical data.
The problem of outlier detection consists in finding data that is not representative of the population from which it was ostensibly derived. Recently, to solve this problem, Liu et al. have proposed a two step hypersphere-based approach, taking into account a confidence score pre-calculated for each input data. However, these scores are defined in a first phase, independently from the second one, making this approach is not well-suited for large stream data. To solve these difficulties, we propose a global reformulation of the support vector data description (SVDD) problem based on the L0 norm, well suited for {outlier detection}. We demonstrate that this L0-SVDD problem can be solved using an iterative procedure providing data specific weighting terms. We show that our approach outperforms state of the art outlier detection techniques using both synthetic and real clinical data.
ES2014-45
Toward parallel feature selection from vertically partitioned data
Veronica Bolon-Canedo, Noelia Sánchez-Maroño, Joana Cervino-Rabunal
Toward parallel feature selection from vertically partitioned data
Veronica Bolon-Canedo, Noelia Sánchez-Maroño, Joana Cervino-Rabunal
Abstract:
Feature selection is often required as a preliminary step for many pattern recognition problems. In recent years, parallel learning has been the focus of much attention due to the advent of high dimensionality. Still, most of the existing algorithms only work in a centralized manner, i.e. using the whole dataset at once. This paper proposes a parallel filter approach for vertically partitioned data. The idea is to split the data by features and then apply a filter at each partition performing several rounds to obtain a stable set of features. Later, a merging procedure is carried out to combine the results into a single subset of relevant features. Experiments on three representative datasets show that the execution time is considerably shortened whereas the performance is maintained or even improved compared to the standard algorithms applied to the non-partitioned datasets. The proposed approach can be used with any filter algorithm, so it could be seen as a general framework for parallel feature selection.
Feature selection is often required as a preliminary step for many pattern recognition problems. In recent years, parallel learning has been the focus of much attention due to the advent of high dimensionality. Still, most of the existing algorithms only work in a centralized manner, i.e. using the whole dataset at once. This paper proposes a parallel filter approach for vertically partitioned data. The idea is to split the data by features and then apply a filter at each partition performing several rounds to obtain a stable set of features. Later, a merging procedure is carried out to combine the results into a single subset of relevant features. Experiments on three representative datasets show that the execution time is considerably shortened whereas the performance is maintained or even improved compared to the standard algorithms applied to the non-partitioned datasets. The proposed approach can be used with any filter algorithm, so it could be seen as a general framework for parallel feature selection.
ES2014-100
Modeling consumption of contents and advertising in online newspapers
Iago Porto-Díaz, David Martínez-Rego, Oscar Fontenla-Romero, Amparo Alonso-Betanzos
Modeling consumption of contents and advertising in online newspapers
Iago Porto-Díaz, David Martínez-Rego, Oscar Fontenla-Romero, Amparo Alonso-Betanzos
Abstract:
This paper presents the design of a system for personalization of contents and advertising for readers of online newspapers. This software is conceived to work in a context of high network traffic with millions of URLs served each day. The model is divided into two subsystems. The first one takes care of the recommendation of news items. The mathematical model is based on the PageRank algorithm and considers several practical day-to-day scenarios. The second one, which is the subsystem of personalization of advertising, uses a multinomial logistic regression model in order to predict categories of advertising for banners within the news content. The system has obtained practical satisfactory results using real data.
This paper presents the design of a system for personalization of contents and advertising for readers of online newspapers. This software is conceived to work in a context of high network traffic with millions of URLs served each day. The model is divided into two subsystems. The first one takes care of the recommendation of news items. The mathematical model is based on the PageRank algorithm and considers several practical day-to-day scenarios. The second one, which is the subsystem of personalization of advertising, uses a multinomial logistic regression model in order to predict categories of advertising for banners within the news content. The system has obtained practical satisfactory results using real data.
ES2014-99
Reweighted l1 Dual Averaging Approach for Sparse Stochastic Learning
Jumutc Vilen, Suykens Johan
Reweighted l1 Dual Averaging Approach for Sparse Stochastic Learning
Jumutc Vilen, Suykens Johan
Abstract:
Recent advances in stochastic optimization and regularized dual averaging approaches revealed a substantial interest for a simple and scalable stochastic method which is tailored to some more specific needs. Among the latest one can find sparse signal recovery and l0-based sparsity inducing approaches. These methods in particular can force many components of the solution shrink to zero thus clarifying the importance of the features and simplifying the evaluation. In this paper we concentrate on enhancing sparsity of the recently proposed l1 Regularized Dual Averaging (RDA) method with a simple reweighting iterative procedure which in a limit applies the l0-norm penalty. We present some theoretical justifications of a bounded regret for a sequence of convex repeated games where every game stands for a separate reweighted l1-RDA problem. Numerical results show an enhanced sparsity of the proposed approach and some improvements over the l1-RDA method in generalization error.
Recent advances in stochastic optimization and regularized dual averaging approaches revealed a substantial interest for a simple and scalable stochastic method which is tailored to some more specific needs. Among the latest one can find sparse signal recovery and l0-based sparsity inducing approaches. These methods in particular can force many components of the solution shrink to zero thus clarifying the importance of the features and simplifying the evaluation. In this paper we concentrate on enhancing sparsity of the recently proposed l1 Regularized Dual Averaging (RDA) method with a simple reweighting iterative procedure which in a limit applies the l0-norm penalty. We present some theoretical justifications of a bounded regret for a sequence of convex repeated games where every game stands for a separate reweighted l1-RDA problem. Numerical results show an enhanced sparsity of the proposed approach and some improvements over the l1-RDA method in generalization error.
ES2014-121
Using Shannon Entropy as EEG Signal Feature for Fast Person Identification
Dinh Phung, Dat Tran, Ma Wanli, Phuoc Nguyen, Pham Tien
Using Shannon Entropy as EEG Signal Feature for Fast Person Identification
Dinh Phung, Dat Tran, Ma Wanli, Phuoc Nguyen, Pham Tien
Abstract:
Identification accuracy and speed are important factors in automatic person identification systems. In this paper, we propose a feature extraction method to extract brain wave features from different brain rhythms of electroencephalography (EEG) signal for the purpose of fast, yet accurate person identification. The proposed feature extraction method is based on the fact that EEG signal is complex, non-stationary, and non-linear. With this fact, non-linear analysis like entropy would be more appropriate. Shannon entropy (SE) based EEG features from alpha, beta, and gamma wave bands are extracted and evaluated for person identification. Experimental results show that SE features provide high person identification rates yet with a low feature dimension, thus better performance.
Identification accuracy and speed are important factors in automatic person identification systems. In this paper, we propose a feature extraction method to extract brain wave features from different brain rhythms of electroencephalography (EEG) signal for the purpose of fast, yet accurate person identification. The proposed feature extraction method is based on the fact that EEG signal is complex, non-stationary, and non-linear. With this fact, non-linear analysis like entropy would be more appropriate. Shannon entropy (SE) based EEG features from alpha, beta, and gamma wave bands are extracted and evaluated for person identification. Experimental results show that SE features provide high person identification rates yet with a low feature dimension, thus better performance.
ES2014-159
Machine learning techniques to assess the performance of a gait analysis system
Sébastien Piérard, Rémy Phan-Ba, Marc Van Droogenbroeck
Machine learning techniques to assess the performance of a gait analysis system
Sébastien Piérard, Rémy Phan-Ba, Marc Van Droogenbroeck
Abstract:
This paper presents a methodology based on machine learning techniques to assess the performance of a system measuring the trajectories of the lower limbs extremities for the follow-up of patients with multiple sclerosis. We show how we have established, with the help of machine learning, four important properties about this system: (1) an automated analysis of gait characteristics provides an improved analysis with respect to that of a human expert, (2) after learning, the gait characteristics provided by this system are valuable compared to measures taken by stopwatches, as used in the standardized tests, (3) the motion of the lower limbs extremities contains a lot of useful information about the gait, even if it is only a small part of the body motion, (4) a measurement system combined with a machine learning tool is sensitive to intra-subject modifications of the walking pattern.
This paper presents a methodology based on machine learning techniques to assess the performance of a system measuring the trajectories of the lower limbs extremities for the follow-up of patients with multiple sclerosis. We show how we have established, with the help of machine learning, four important properties about this system: (1) an automated analysis of gait characteristics provides an improved analysis with respect to that of a human expert, (2) after learning, the gait characteristics provided by this system are valuable compared to measures taken by stopwatches, as used in the standardized tests, (3) the motion of the lower limbs extremities contains a lot of useful information about the gait, even if it is only a small part of the body motion, (4) a measurement system combined with a machine learning tool is sensitive to intra-subject modifications of the walking pattern.
ES2014-144
A Random Forest proximity matrix as a new measure for gene annotation
Jose A. Seoane, Ian N.M. Day, Juan P. Casas, Colin Campbell, Tom R. Gaunt
A Random Forest proximity matrix as a new measure for gene annotation
Jose A. Seoane, Ian N.M. Day, Juan P. Casas, Colin Campbell, Tom R. Gaunt
Abstract:
In this paper we present a new score for gene annotation. This new score is based on the proximity matrix obtained from a trained Random Forest (RF) model. As an example application, we built this model using the association p-values of genotype with blood phenotype as input and the association of genotype data with coronary heart disease as output. This new score has been validated by comparing the Gene Ontology (GO) annotation using this score versus the score obtained from the gene annotation “String” tool. Using the new proximity based measure results in more accurate annotation, especially in the GO categories Molecular Function and Biological Process
In this paper we present a new score for gene annotation. This new score is based on the proximity matrix obtained from a trained Random Forest (RF) model. As an example application, we built this model using the association p-values of genotype with blood phenotype as input and the association of genotype data with coronary heart disease as output. This new score has been validated by comparing the Gene Ontology (GO) annotation using this score versus the score obtained from the gene annotation “String” tool. Using the new proximity based measure results in more accurate annotation, especially in the GO categories Molecular Function and Biological Process
ES2014-149
Neural network based 2D/3D fusion for robotic object recognition
Louis-Charles Caron, Yang Song, Filliat David, Alexander Gepperth
Neural network based 2D/3D fusion for robotic object recognition
Louis-Charles Caron, Yang Song, Filliat David, Alexander Gepperth
Abstract:
We present a neural network based fusion approach for robotic object recognition which integrates 2D and 3D descriptors in a flexible way. The presented recognition architecture is coupled to a real-time segmentation step based on 3D data, since a focus of our investigations is real-world operation. In the light of the fact that recognition must operate on less-than-perfect segmentation results (which is a very realistic assumption for real-time robotic implementation), we conduct tests of recognition performance using complex everyday objects in order to quantify the overall gain of performing 2D/3D fusion, and to discover where it is particularly useful. We find that the fusion approach ist most powerful when generalization is required, for example to significant viewpoint changes, and that a perfect segmentation is apparently not a necessary prerequisite for successful object discrimination.
We present a neural network based fusion approach for robotic object recognition which integrates 2D and 3D descriptors in a flexible way. The presented recognition architecture is coupled to a real-time segmentation step based on 3D data, since a focus of our investigations is real-world operation. In the light of the fact that recognition must operate on less-than-perfect segmentation results (which is a very realistic assumption for real-time robotic implementation), we conduct tests of recognition performance using complex everyday objects in order to quantify the overall gain of performing 2D/3D fusion, and to discover where it is particularly useful. We find that the fusion approach ist most powerful when generalization is required, for example to significant viewpoint changes, and that a perfect segmentation is apparently not a necessary prerequisite for successful object discrimination.
ES2014-148
Discrimination of visual pedestrians data by combining projection and prediction learning
Mathieu Lefort, Alexander Gepperth
Discrimination of visual pedestrians data by combining projection and prediction learning
Mathieu Lefort, Alexander Gepperth
Abstract:
PROPRE is a generic and semi-supervised neural learning paradigm that extracts meaningful concepts of multimodal data flows based on predictability across modalities. It consists on the combination of two computational paradigms. First, a topological projection of each data flow on a self-organizing map (SOM) to reduce input dimension. Second, each SOM activity is used to predict activities in all other SOMs. Predictability measure, that compares predicted and real activities, is used to modulate the SOM learning to favor mutually predictable stimuli. In this article, we study PROPRE applied to a classical visual pedestrian data classification task. The SOM learning modulation introduced in PROPRE improves significantly classification performance.
PROPRE is a generic and semi-supervised neural learning paradigm that extracts meaningful concepts of multimodal data flows based on predictability across modalities. It consists on the combination of two computational paradigms. First, a topological projection of each data flow on a self-organizing map (SOM) to reduce input dimension. Second, each SOM activity is used to predict activities in all other SOMs. Predictability measure, that compares predicted and real activities, is used to modulate the SOM learning to favor mutually predictable stimuli. In this article, we study PROPRE applied to a classical visual pedestrian data classification task. The SOM learning modulation introduced in PROPRE improves significantly classification performance.
ES2014-138
FINGeR: Framework for interactive neural-based gesture recognition
German Ignacio Parisi, Pablo Barros, Stefan Wermter
FINGeR: Framework for interactive neural-based gesture recognition
German Ignacio Parisi, Pablo Barros, Stefan Wermter
Abstract:
For operating in real world scenarios, the recognition of human gestures must be adaptive, robust and fast. Despite the prominent use of Kinect-like range sensors for demanding visual tasks involving motion, it still remains unclear how to process depth information for efficiently extrapolating the dynamics of hand gestures. We propose a learning framework based on neural evidence for processing visual information. We first segment and extract spatiotemporal hand properties from RGB-D videos. Shape and motion features are then processed by two parallel streams of hierarchical self-organizing maps and subsequently combined for a more robust representation. We provide experimental results to show how multi-cue integration increases recognition rates over a single-cue approach.
For operating in real world scenarios, the recognition of human gestures must be adaptive, robust and fast. Despite the prominent use of Kinect-like range sensors for demanding visual tasks involving motion, it still remains unclear how to process depth information for efficiently extrapolating the dynamics of hand gestures. We propose a learning framework based on neural evidence for processing visual information. We first segment and extract spatiotemporal hand properties from RGB-D videos. Shape and motion features are then processed by two parallel streams of hierarchical self-organizing maps and subsequently combined for a more robust representation. We provide experimental results to show how multi-cue integration increases recognition rates over a single-cue approach.
ES2014-168
Beyond histograms: why learned structure-preserving descriptors outperform HOG
Thomas Guthier, Volker Willert, Julian Eggert
Beyond histograms: why learned structure-preserving descriptors outperform HOG
Thomas Guthier, Volker Willert, Julian Eggert
Abstract:
Statistical image descriptors based on histograms (e.g. SIFT, HOG) are widely used in image processing, because they are fast and simple methods with high classification performance. However, they discard the local spatial topology and thus lose discriminative information contained in the image. We discuss the relations between HOG and VNMF descriptors, i.e. structure free histograms versus learned structure-preserving patterns. VNMF is a shift-invariant, sparse, non-negative unsupervised learning algorithm, that provides a distinct decomposition of the input into its parts. The VNMF descriptor outperforms the statistical HOG descriptor, because it preserves spatial topology leading to better classification results on real-world human action recognition benchmarks.
Statistical image descriptors based on histograms (e.g. SIFT, HOG) are widely used in image processing, because they are fast and simple methods with high classification performance. However, they discard the local spatial topology and thus lose discriminative information contained in the image. We discuss the relations between HOG and VNMF descriptors, i.e. structure free histograms versus learned structure-preserving patterns. VNMF is a shift-invariant, sparse, non-negative unsupervised learning algorithm, that provides a distinct decomposition of the input into its parts. The VNMF descriptor outperforms the statistical HOG descriptor, because it preserves spatial topology leading to better classification results on real-world human action recognition benchmarks.
ES2014-151
NMF-Density: NMF-Based Breast Density Classifier
Lahouari Ghouti, Abdullah Owaidh
NMF-Density: NMF-Based Breast Density Classifier
Lahouari Ghouti, Abdullah Owaidh
Abstract:
The amount of tissue available in the breast, commonly characterized by the breast density, is highly correlated with breast cancer. In fact, dense breasts have higher risk of developing breast cancer. On the other hand, breast density influences the mammographic interpretation since it decreases the sensitivity of breast cancer detection. This sensitivity decrease is due to the fact that both cancerous regions and tissue appear as white areas in breast mammograms. This paper introduces new features to improve the classification of breast density in digital mammograms according to the commonly used radiological lexicon (BI-RADS). These features are extracted from non-negative matrix factorization (NMF) of mammograms and classified using machine learning classifiers. Using ground truth mammographic data, the classification performance of the proposed features is assessed. Simulation results show that the latter significantly outperforms existing density features based on principal component analysis (PCA) by achieving higher classification accuracy.
The amount of tissue available in the breast, commonly characterized by the breast density, is highly correlated with breast cancer. In fact, dense breasts have higher risk of developing breast cancer. On the other hand, breast density influences the mammographic interpretation since it decreases the sensitivity of breast cancer detection. This sensitivity decrease is due to the fact that both cancerous regions and tissue appear as white areas in breast mammograms. This paper introduces new features to improve the classification of breast density in digital mammograms according to the commonly used radiological lexicon (BI-RADS). These features are extracted from non-negative matrix factorization (NMF) of mammograms and classified using machine learning classifiers. Using ground truth mammographic data, the classification performance of the proposed features is assessed. Simulation results show that the latter significantly outperforms existing density features based on principal component analysis (PCA) by achieving higher classification accuracy.
Dynamical systems and online learning
ES2014-49
Implicitly and explicitly constrained optimization problems for training of recurrent neural networks
Carl-Johan Thore
Implicitly and explicitly constrained optimization problems for training of recurrent neural networks
Carl-Johan Thore
Abstract:
Training of recurrent neural networks is typically formulated as unconstrained optimization problems. There is, however, an implicit constraint stating that the equations of state must be satisfied at every iteration in the optimization process. Such constraints can make a problem highly non-linear and thus difficult to solve. A potential remedy is to reformulate the problem into one in which the parameters and state are treated as independent variables and all constraints appear explicitly. In this paper we compare an implicitly and an explicitly constrained formulation of the same problem. Reported numerical results suggest that the latter is in some respects superior.
Training of recurrent neural networks is typically formulated as unconstrained optimization problems. There is, however, an implicit constraint stating that the equations of state must be satisfied at every iteration in the optimization process. Such constraints can make a problem highly non-linear and thus difficult to solve. A potential remedy is to reformulate the problem into one in which the parameters and state are treated as independent variables and all constraints appear explicitly. In this paper we compare an implicitly and an explicitly constrained formulation of the same problem. Reported numerical results suggest that the latter is in some respects superior.
ES2014-166
A HMM-based pre-training approach for sequential data
Luca Pasa, Alberto Testolin, Alessandro Sperduti
A HMM-based pre-training approach for sequential data
Luca Pasa, Alberto Testolin, Alessandro Sperduti
Abstract:
Much recent research highlighted the critical role of unsupervised pre-training to improve the performance of neural network models. However, extensions of those architectures to the temporal domain introduce additional issues, which often prevent to obtain good performance in a reasonable time. We propose a novel approach to pre-train sequential neural networks in which a simpler, approximate distribution generated by a linear model is first used to drive the weights in a better region of the parameter space. After this smooth distribution has been learned, the network is fine-tuned on the more complex real dataset. The benefits of the proposed method are demonstrated on a prediction task using two datasets of polyphonic music, and the general validity of this strategy is shown by applying it to two different recurrent neural network architectures.
Much recent research highlighted the critical role of unsupervised pre-training to improve the performance of neural network models. However, extensions of those architectures to the temporal domain introduce additional issues, which often prevent to obtain good performance in a reasonable time. We propose a novel approach to pre-train sequential neural networks in which a simpler, approximate distribution generated by a linear model is first used to drive the weights in a better region of the parameter space. After this smooth distribution has been learned, the network is fine-tuned on the more complex real dataset. The benefits of the proposed method are demonstrated on a prediction task using two datasets of polyphonic music, and the general validity of this strategy is shown by applying it to two different recurrent neural network architectures.
ES2014-162
Exploiting similarity in system identification tasks with recurrent neural networks
Sigurd Spieckermann, Siegmund Düll, Steffen Udluft, Alexander Hentschel, Thomas Runkler
Exploiting similarity in system identification tasks with recurrent neural networks
Sigurd Spieckermann, Siegmund Düll, Steffen Udluft, Alexander Hentschel, Thomas Runkler
Abstract:
A new dual-task learning approach based on recurrent neural networks with factored tensor components for system identification tasks is presented. The overall goal is to identify the underlying dynamics of a system given few observations which are complemented by auxiliary data from similar systems. The resulting system identification is motivated by various real-world industrial use cases, e.g. gas or wind turbine modeling for optimization and monitoring. The problem is formalized and the effectiveness of the proposed method is assessed on the cart-pole benchmark.
A new dual-task learning approach based on recurrent neural networks with factored tensor components for system identification tasks is presented. The overall goal is to identify the underlying dynamics of a system given few observations which are complemented by auxiliary data from similar systems. The resulting system identification is motivated by various real-world industrial use cases, e.g. gas or wind turbine modeling for optimization and monitoring. The problem is formalized and the effectiveness of the proposed method is assessed on the cart-pole benchmark.
ES2014-60
Comparison of local and global undirected graphical models
Zhemin Zhu, Djoerd Hiemstra, Peter Apers, Andreas Wombacher
Comparison of local and global undirected graphical models
Zhemin Zhu, Djoerd Hiemstra, Peter Apers, Andreas Wombacher
Abstract:
CRFs are discriminative undirected models which are globally normalized. Global normalization preserves CRFs from the label bias problem which most local models suffer from. Recently proposed co-occurrence rate networks (CRNs) are also discriminative undirected models. In contrast to CRFs, CRNs are locally normalized. It was established that CRNs are immune to the label bias problem even they are local models. In this paper, we further compare ECRNs (using fully empirical relative frequencies, not by support vector regression\footnote{This is different from our another later paper, in which we use support vector regression. }) and CRFs. The connection between Co-occurrence Rate, which is the exponential function of pointwise mutual information, and Copulas is built in continuous case. Also they are further evaluated statistically by experiments.
CRFs are discriminative undirected models which are globally normalized. Global normalization preserves CRFs from the label bias problem which most local models suffer from. Recently proposed co-occurrence rate networks (CRNs) are also discriminative undirected models. In contrast to CRFs, CRNs are locally normalized. It was established that CRNs are immune to the label bias problem even they are local models. In this paper, we further compare ECRNs (using fully empirical relative frequencies, not by support vector regression\footnote{This is different from our another later paper, in which we use support vector regression. }) and CRFs. The connection between Co-occurrence Rate, which is the exponential function of pointwise mutual information, and Copulas is built in continuous case. Also they are further evaluated statistically by experiments.
ES2014-139
Meta Online Learning: Experiments on a Unit Commitment Problem
Jialin Liu, Olivier Teytaud
Meta Online Learning: Experiments on a Unit Commitment Problem
Jialin Liu, Olivier Teytaud
Abstract:
Online learning is machine learning, in real time from successive data samples. Meta online learning consists in combining several online learning algorithms from a given set (termed portfolio) of algorithms. The goal can be (i) mitigating the effect of a bad choice of online learning algorithms (ii) parallelization (iii) combining the st rengths of different algorithms. We propose a methodology, termed lag, adapted to black-box online learning.
Online learning is machine learning, in real time from successive data samples. Meta online learning consists in combining several online learning algorithms from a given set (termed portfolio) of algorithms. The goal can be (i) mitigating the effect of a bad choice of online learning algorithms (ii) parallelization (iii) combining the st rengths of different algorithms. We propose a methodology, termed lag, adapted to black-box online learning.
ES2014-189
DELA: A Dynamic Online Ensemble Learning Algorithm
Hamid Bouchachia, Emili Balaguer-Ballester
DELA: A Dynamic Online Ensemble Learning Algorithm
Hamid Bouchachia, Emili Balaguer-Ballester
Abstract:
Abstract. The present paper investigates the problem of prediction in the context of dynamically changing environment, where data arrive over time. A Dynamic online Ensemble Learning Algorithm (DELA) is introduced. The adaptivity concerns three levels: structural adaptivity, combination adaptivity and model adaptivity. In particular, the structure of the ensemble is sought to evolve in order to be able to deal with the problem of data drift. The proposed online ensemble is evaluated on the stagger data set to show its predictive power in presence of data drift.
Abstract. The present paper investigates the problem of prediction in the context of dynamically changing environment, where data arrive over time. A Dynamic online Ensemble Learning Algorithm (DELA) is introduced. The adaptivity concerns three levels: structural adaptivity, combination adaptivity and model adaptivity. In particular, the structure of the ensemble is sought to evolve in order to be able to deal with the problem of data drift. The proposed online ensemble is evaluated on the stagger data set to show its predictive power in presence of data drift.
Advances on Weightless Neural Systems
ES2014-7
Advances on Weightless Neural Systems
Massimo De Gregorio, Felipe M. G. França, Priscila M. V. Lima, Wilson R. de Oliveira
Advances on Weightless Neural Systems
Massimo De Gregorio, Felipe M. G. França, Priscila M. V. Lima, Wilson R. de Oliveira
ES2014-51
Learning state prediction using a weightless neural explorer
Igor Aleksander, Helen Morton
Learning state prediction using a weightless neural explorer
Igor Aleksander, Helen Morton
Abstract:
A weightless neural state machine acting as an exploratory automaton changes its position in a simulated toy world by its own actions. A popular question is asked: how might the automaton ‘become conscious of’ the effect of its own actions? Here we develop previously defined iconic learning in such weightless machines so that this knowledge can be achieved. Weightlessness, iconic learning are expressed in terms of state equations. Experimental results that show the conditions under which correct predictions can be obtained on a neural simulator are presented. Issues of information integration and memory implication are briefly considered at the end of the paper.
A weightless neural state machine acting as an exploratory automaton changes its position in a simulated toy world by its own actions. A popular question is asked: how might the automaton ‘become conscious of’ the effect of its own actions? Here we develop previously defined iconic learning in such weightless machines so that this knowledge can be achieved. Weightlessness, iconic learning are expressed in terms of state equations. Experimental results that show the conditions under which correct predictions can be obtained on a neural simulator are presented. Issues of information integration and memory implication are briefly considered at the end of the paper.
ES2014-160
Can you follow that guy?
Mariacarla Staffa, Massimo De Gregorio, Maurizio Giordano, Silvia Rossi
Can you follow that guy?
Mariacarla Staffa, Massimo De Gregorio, Maurizio Giordano, Silvia Rossi
ES2014-107
Credit analysis with a clustering RAM-based neural classifier
Douglas Cardoso, Danilo Carvalho, Daniel Alves, Diego Souza, Hugo Carneiro, Carlos Pedreira, Priscila M. V. Lima, Felipe M. G. França
Credit analysis with a clustering RAM-based neural classifier
Douglas Cardoso, Danilo Carvalho, Daniel Alves, Diego Souza, Hugo Carneiro, Carlos Pedreira, Priscila M. V. Lima, Felipe M. G. França
Abstract:
Datasets with a large amount of noisy data are quite common in real-world classification problems. Robustness is an important characteristic of state-of-the-art classifiers that use error minimization techniques, thus requiring a long time to converge. This paper presents ClusWiSARD, a clustering customization of the WiSARD weightless neural network model, applied to credit analysis, a non-trivial real-world problem. Experimental evidence shows that ClusWiSARD is very competitive with SVM in respect to accuracy. Nonetheless, ClusWiSARD outperforms SVM in both training time, being two orders of magnitude faster, and test time, being two to three times faster.
Datasets with a large amount of noisy data are quite common in real-world classification problems. Robustness is an important characteristic of state-of-the-art classifiers that use error minimization techniques, thus requiring a long time to converge. This paper presents ClusWiSARD, a clustering customization of the WiSARD weightless neural network model, applied to credit analysis, a non-trivial real-world problem. Experimental evidence shows that ClusWiSARD is very competitive with SVM in respect to accuracy. Nonetheless, ClusWiSARD outperforms SVM in both training time, being two orders of magnitude faster, and test time, being two to three times faster.
ES2014-182
Training a classical weightless neural network in a quantum computer
Adenilton J. da Silva, Wilson R. de Oliveira, Teresa B. Ludermir
Training a classical weightless neural network in a quantum computer
Adenilton J. da Silva, Wilson R. de Oliveira, Teresa B. Ludermir
Abstract:
The purpose of this paper is to investigate a new quantum learning algorithm for classical weightless neural networks. The learning algorithm creates a superposition of all possible neural network configurations for a given architecture. The performance of the network over the training set is stored entangled with neural configuration and quantum search is performed to amplify the probability amplitude of the network with desired performance.
The purpose of this paper is to investigate a new quantum learning algorithm for classical weightless neural networks. The learning algorithm creates a superposition of all possible neural network configurations for a given architecture. The performance of the network over the training set is stored entangled with neural configuration and quantum search is performed to amplify the probability amplitude of the network with desired performance.
ES2014-167
Extracting rules from DRASiW’s "mental images"
Paulo Coutinho, Hugo Carneiro, Danilo Carvalho, Felipe M. G. França
Extracting rules from DRASiW’s "mental images"
Paulo Coutinho, Hugo Carneiro, Danilo Carvalho, Felipe M. G. França
Abstract:
DRASiW is an extension of the WiSARD weightless neural model that provides the ability of producing examples/prototypes, called "mental images", from learnt categories. This work introduces a novel way of performing rule extraction by applying the WiSARD/DRASiW RAM-based neural model upon a well-known machine learning benchmark. A functional exploration is offered in order to demonstrate how the new rule extraction mechanism behaves under different system configurations. Experimental results suggest that the rules conformance to data increases proportionally to the corresponding classifier accuracy. Furthermore, comparison with C4.5 decision tree algorithm shows that the DRASiW-based technique produces more compact sets of rules.
DRASiW is an extension of the WiSARD weightless neural model that provides the ability of producing examples/prototypes, called "mental images", from learnt categories. This work introduces a novel way of performing rule extraction by applying the WiSARD/DRASiW RAM-based neural model upon a well-known machine learning benchmark. A functional exploration is offered in order to demonstrate how the new rule extraction mechanism behaves under different system configurations. Experimental results suggest that the rules conformance to data increases proportionally to the corresponding classifier accuracy. Furthermore, comparison with C4.5 decision tree algorithm shows that the DRASiW-based technique produces more compact sets of rules.
ES2014-176
Vector space weightless neural networks
Wilson R. de Oliveira, Adenilton J. da Silva, Teresa B. Ludermir
Vector space weightless neural networks
Wilson R. de Oliveira, Adenilton J. da Silva, Teresa B. Ludermir
Abstract:
By embedding the boolean space $Z_2$ as an orthonormal basis of vector space we can treat the RAM based neuron as a matrix (operator) acting the vector space. We show how this model (inspired by our research on quantum neural networks) is of sufficient generality as to have classical weighted (perceptron-like), classical weightless (RAM-based, PLN, etc), quantum weighted and quantum weightless neural models as particular cases. It is also indicated how one could use it to solve 3-SAT and briefly mention how could one train this novel model.
By embedding the boolean space $Z_2$ as an orthonormal basis of vector space we can treat the RAM based neuron as a matrix (operator) acting the vector space. We show how this model (inspired by our research on quantum neural networks) is of sufficient generality as to have classical weighted (perceptron-like), classical weightless (RAM-based, PLN, etc), quantum weighted and quantum weightless neural models as particular cases. It is also indicated how one could use it to solve 3-SAT and briefly mention how could one train this novel model.
ES2014-89
Online tracking of multiple objects using WiSARD
Rafael Lima de Carvalho, Danilo Carvalho, Priscila M. V. Lima, Félix Mora-Camino, Felipe M. G. França
Online tracking of multiple objects using WiSARD
Rafael Lima de Carvalho, Danilo Carvalho, Priscila M. V. Lima, Félix Mora-Camino, Felipe M. G. França
Abstract:
This paper evaluates the WiSARD weightless model as a classification system on the problem of tracking multiple objects in realtime. Exploring the structure of this model, the proposed solution applies a re-learning stage in order to avoid interferences caused by background noise or variations in the target shape. Once the tracker finds a target at the first time, it applies only local searches around the neighbourhood in order to have fast response. This approach is evaluated through some experiments on real-world video data.
This paper evaluates the WiSARD weightless model as a classification system on the problem of tracking multiple objects in realtime. Exploring the structure of this model, the proposed solution applies a re-learning stage in order to avoid interferences caused by background noise or variations in the target shape. Once the tracker finds a target at the first time, it applies only local searches around the neighbourhood in order to have fast response. This approach is evaluated through some experiments on real-world video data.
ES2014-83
Probabilistic automata simulation with single layer weightless neural networks
Adenilton J. da Silva, Wilson R. de Oliveira, Teresa B. Ludermir
Probabilistic automata simulation with single layer weightless neural networks
Adenilton J. da Silva, Wilson R. de Oliveira, Teresa B. Ludermir
Abstract:
Computability of weightless neural networks is the major topic of this paper. In previous works it has been shown that, one can simulate a Turing machine with a weightless neural network (WNN) with an infinite tape. And it has also been shown that one can simulate probabilistic automata with a WNN with two queues. In this paper, we will show that is possible to simulate a probabilistic automata with a single layer WNN with no auxiliary data structures.
Computability of weightless neural networks is the major topic of this paper. In previous works it has been shown that, one can simulate a Turing machine with a weightless neural network (WNN) with an infinite tape. And it has also been shown that one can simulate probabilistic automata with a WNN with two queues. In this paper, we will show that is possible to simulate a probabilistic automata with a single layer WNN with no auxiliary data structures.
Clustering
ES2014-163
Optimal Data Projection for Kernel Spectral Clustering
Diego H. Peluffo-Ordonez, Carlos Alzate, Suykens Johan, Germán Castellanos-Dominguez
Optimal Data Projection for Kernel Spectral Clustering
Diego H. Peluffo-Ordonez, Carlos Alzate, Suykens Johan, Germán Castellanos-Dominguez
Abstract:
Spectral clustering has taken an important place in the context of pattern recognition, being a good alternative to solve problems with non-linearly separable groups. Because of its unsupervised nature, clustering methods are often parametric, requiring then some initial parameters. Thus, clustering performance is greatly dependent on the selection of those initial parameters. Furthermore, tuning such parameters is not an easy task when the initial data representation is not adequate. Here, we propose a new projection for input data to improve the cluster identification within a kernel spectral clustering framework. The proposed projection is done from a feature extraction formulation, in which a generalized distance involving the kernel matrix is used. Data projection shows to be useful for improving the performance of kernel spectral clustering.
Spectral clustering has taken an important place in the context of pattern recognition, being a good alternative to solve problems with non-linearly separable groups. Because of its unsupervised nature, clustering methods are often parametric, requiring then some initial parameters. Thus, clustering performance is greatly dependent on the selection of those initial parameters. Furthermore, tuning such parameters is not an easy task when the initial data representation is not adequate. Here, we propose a new projection for input data to improve the cluster identification within a kernel spectral clustering framework. The proposed projection is done from a feature extraction formulation, in which a generalized distance involving the kernel matrix is used. Data projection shows to be useful for improving the performance of kernel spectral clustering.
ES2014-119
Supporting GNG-based clustering with local input space histograms
Jochen Kerdels, Gabriele Peters
Supporting GNG-based clustering with local input space histograms
Jochen Kerdels, Gabriele Peters
Abstract:
This paper presents an extension to the growing neural gas (GNG) algorithm that allows to capture local characteristics of the input space. Using these characteristics clustering schemes based on the GNG network can be improved by discarding uncertain edges of the network and identifying edges that span discontinuous regions of input space. We applied the described approach to different two-dimensional data sets found in the literature and obtained comparable results.
This paper presents an extension to the growing neural gas (GNG) algorithm that allows to capture local characteristics of the input space. Using these characteristics clustering schemes based on the GNG network can be improved by discarding uncertain edges of the network and identifying edges that span discontinuous regions of input space. We applied the described approach to different two-dimensional data sets found in the literature and obtained comparable results.
ES2014-104
The Sum-over-Forests clustering
Mathieu Senelle, Marco Saerens, François Fouss
The Sum-over-Forests clustering
Mathieu Senelle, Marco Saerens, François Fouss
Abstract:
This work introduces a novel way to identify dense regions in a graph based on a mode-seeking clustering technique, relying on the Sum-Over-Forests (SoF) density index (which can easily be computed in closed form through a simple matrix inversion) as a local density estimator. We first identify the modes of the SoF density in the graph. Then, the nodes of the graph are assigned to the cluster corresponding to the nearest mode, according to a new kernel, also based on the SoF framework. Experiments on artificial and real datasets show that the proposed index performs well in nodes clustering.
This work introduces a novel way to identify dense regions in a graph based on a mode-seeking clustering technique, relying on the Sum-Over-Forests (SoF) density index (which can easily be computed in closed form through a simple matrix inversion) as a local density estimator. We first identify the modes of the SoF density in the graph. Then, the nodes of the graph are assigned to the cluster corresponding to the nearest mode, according to a new kernel, also based on the SoF framework. Experiments on artificial and real datasets show that the proposed index performs well in nodes clustering.
ES2014-46
Bayesian non-parametric parsimonious clustering
Faicel Chamroukhi, Marius Bartcus, Hervé Glotin
Bayesian non-parametric parsimonious clustering
Faicel Chamroukhi, Marius Bartcus, Hervé Glotin
Abstract:
This paper proposes a new Bayesian non-parametric approach for clustering. It relies on an infinite Gaussian mixture model with a Chinese Restaurant Process (CRP) prior, and an eigenvalue decomposition of the covariance matrix of each cluster. The CRP prior allows to control the model complexity in a principled way and to automatically learn the number of clusters. The covariance matrix decomposition allows to fit various parsimonious models going from simplest spherical ones to the more complex general one. We develop an MCMC Gibbs sampler to learn the models. First results obtained on both simulated and real data highlight the interest of the proposed infinite parsimonious mixture model.
This paper proposes a new Bayesian non-parametric approach for clustering. It relies on an infinite Gaussian mixture model with a Chinese Restaurant Process (CRP) prior, and an eigenvalue decomposition of the covariance matrix of each cluster. The CRP prior allows to control the model complexity in a principled way and to automatically learn the number of clusters. The covariance matrix decomposition allows to fit various parsimonious models going from simplest spherical ones to the more complex general one. We develop an MCMC Gibbs sampler to learn the models. First results obtained on both simulated and real data highlight the interest of the proposed infinite parsimonious mixture model.
ES2014-98
Learning predictive partitions for continuous feature spaces
Björn Weghenkel, Laurenz Wiskott
Learning predictive partitions for continuous feature spaces
Björn Weghenkel, Laurenz Wiskott
Abstract:
Any non-trivial agent (biological or algorithmical) that interacts with its environment needs some representation about its current state. Such a state should enable it to make informed decisions that lead to some desired outcome in the future. In practice, many learning algorithms assume states to come from a discrete set while real-world learning problems often are continuous in nature. We propose an unsupervised learning algorithm that finds discrete partitions of a continuous feature space that are predictive with respect to the future. More precisely, the learned partitions induce a Markov chain on the data with high mutual information between the current state and the next state. Such predictive partitions can serve as an alternative to classical discretization algorithms in cases where the predictable time-structure of the data is of importance.
Any non-trivial agent (biological or algorithmical) that interacts with its environment needs some representation about its current state. Such a state should enable it to make informed decisions that lead to some desired outcome in the future. In practice, many learning algorithms assume states to come from a discrete set while real-world learning problems often are continuous in nature. We propose an unsupervised learning algorithm that finds discrete partitions of a continuous feature space that are predictive with respect to the future. More precisely, the learned partitions induce a Markov chain on the data with high mutual information between the current state and the next state. Such predictive partitions can serve as an alternative to classical discretization algorithms in cases where the predictable time-structure of the data is of importance.
ES2014-164
An adjustable p-exponential clustering algorithm
Valmir Macario, Francisco de Assis Tenório De Carvalho
An adjustable p-exponential clustering algorithm
Valmir Macario, Francisco de Assis Tenório De Carvalho
Abstract:
This paper proposes a new exponential clustering algorithm (XPFCM) by reformulating the clustering's objective function with an additional paramter p to adjust the exponential behavior for membership assignment. The clustering experiments show that the proposed method assigns data to the clusters better than other fuzzy C-means (FCM) variants.
This paper proposes a new exponential clustering algorithm (XPFCM) by reformulating the clustering's objective function with an additional paramter p to adjust the exponential behavior for membership assignment. The clustering experiments show that the proposed method assigns data to the clusters better than other fuzzy C-means (FCM) variants.
ES2014-116
Self-organizing map for determination of goal candidates in mobile robot exploration
Jan Faigl, Peter Vanìk, Miroslav Kulich
Self-organizing map for determination of goal candidates in mobile robot exploration
Jan Faigl, Peter Vanìk, Miroslav Kulich
Abstract:
This paper addresses a problem of determining goal candidates in the frontier-based mobile robot exploration. The proposed solution is based on self-organizing map for the traveling salesman problem with neighborhoods and it allows to study the exploration formulated as a problem of repeated coverage of the current frontiers where the minimal number of goal candidates is determined simultaneously together with the expected cost to visit the candidates. The early results enabled by the proposed self-organizing map-based solution indicate exploration improvement for the proposed problem formulation. Thus, the presented work demonstrates how neural network approach can provide interesting insights and ground for studying optimizations problems arising in robotics.
This paper addresses a problem of determining goal candidates in the frontier-based mobile robot exploration. The proposed solution is based on self-organizing map for the traveling salesman problem with neighborhoods and it allows to study the exploration formulated as a problem of repeated coverage of the current frontiers where the minimal number of goal candidates is determined simultaneously together with the expected cost to visit the candidates. The early results enabled by the proposed self-organizing map-based solution indicate exploration improvement for the proposed problem formulation. Thus, the presented work demonstrates how neural network approach can provide interesting insights and ground for studying optimizations problems arising in robotics.
Regression, Forceasting and Extreme Learning Machines
ES2014-66
Parameter-free regularization in Extreme Learning Machines with affinity matrices
Leonardo Silvestre, André P. Lemos, João P. Braga, Antônio P. Braga
Parameter-free regularization in Extreme Learning Machines with affinity matrices
Leonardo Silvestre, André P. Lemos, João P. Braga, Antônio P. Braga
Abstract:
This paper proposes a novel regularization approach for Extreme Learning Machines. Regularization is performed using a priori spacial information expressed by an affinity matrix. We show that the use of this type of a priori information is similar to perform Tikhonov regularization. Furthermore, if a parameter free affinity matrix is used, like the cosine similarity matrix, regularization is performed without any need for parameter tunning. Experiments are performed using classification problems to validate the proposed approach.
This paper proposes a novel regularization approach for Extreme Learning Machines. Regularization is performed using a priori spacial information expressed by an affinity matrix. We show that the use of this type of a priori information is similar to perform Tikhonov regularization. Furthermore, if a parameter free affinity matrix is used, like the cosine similarity matrix, regularization is performed without any need for parameter tunning. Experiments are performed using classification problems to validate the proposed approach.
ES2014-90
Feature selection in environmental data mining combining Simulated Annealing and Extreme Learning Machine
Michael Leuenberger, Mikhaïl Kanevski
Feature selection in environmental data mining combining Simulated Annealing and Extreme Learning Machine
Michael Leuenberger, Mikhaïl Kanevski
Abstract:
Due to the large amount and complexity of data available in geosciences, machine learning nowadays plays an important role in environmental data mining. In many real data cases, we face the need to design input space with the most relevant features. Because the main goal is to understand and find relationships between phenomena and features, feature selection is preferred to feature transformation or extraction. To deal with the high-dimensional space of environmental data, a wrapper method based on Extreme Learning Machine and global optimization algorithm (Simulated Annealing) is proposed. This paper investigates the whole methodology and shows promising results for environmental data feature selection and modelling.
Due to the large amount and complexity of data available in geosciences, machine learning nowadays plays an important role in environmental data mining. In many real data cases, we face the need to design input space with the most relevant features. Because the main goal is to understand and find relationships between phenomena and features, feature selection is preferred to feature transformation or extraction. To deal with the high-dimensional space of environmental data, a wrapper method based on Extreme Learning Machine and global optimization algorithm (Simulated Annealing) is proposed. This paper investigates the whole methodology and shows promising results for environmental data feature selection and modelling.
ES2014-79
Mobility Prediction Using Fully-Complex Extreme Learning Machines
Lahouari Ghouti
Mobility Prediction Using Fully-Complex Extreme Learning Machines
Lahouari Ghouti
Abstract:
Efficient planning and improved quality of service (QoS) in wireless networks call for the use of mobility prediction schemes. Such schemes ensure accurate mobility prediction of wireless users and units which plays a major role in optimized planning and management of the available bandwidth and power resources. In this paper, fully-complex extreme learning machines (CELMs), known for their universal approximation, model and predict the mobility patterns of arbitrary nodes in a mobile ad hoc network (MANET). Unlike their real-valued counterparts, CELMs properly capture the existing interaction/correlation between the nodes' location coordinates leading to more realistic and accurate prediction. Simulation results using standard mobility models and real-world mobility data clearly show that the proposed complex-valued prediction algorithm outperforms many existing real-valued learning machines in terms of prediction accuracy.
Efficient planning and improved quality of service (QoS) in wireless networks call for the use of mobility prediction schemes. Such schemes ensure accurate mobility prediction of wireless users and units which plays a major role in optimized planning and management of the available bandwidth and power resources. In this paper, fully-complex extreme learning machines (CELMs), known for their universal approximation, model and predict the mobility patterns of arbitrary nodes in a mobile ad hoc network (MANET). Unlike their real-valued counterparts, CELMs properly capture the existing interaction/correlation between the nodes' location coordinates leading to more realistic and accurate prediction. Simulation results using standard mobility models and real-world mobility data clearly show that the proposed complex-valued prediction algorithm outperforms many existing real-valued learning machines in terms of prediction accuracy.
ES2014-184
An Extreme Learning Approach to Active Learning
Euler Horta, Antônio P. Braga
An Extreme Learning Approach to Active Learning
Euler Horta, Antônio P. Braga
Abstract:
We propose in this paper a new active learning method that makes no considerations about the data distribution and does not need to adjust any free parameter. The proposed algorithm is based on extreme learning machines (ELM) and a perceptron with analytical calculation of weights. We show that the proposed model have good results using a reduced set of patterns.
We propose in this paper a new active learning method that makes no considerations about the data distribution and does not need to adjust any free parameter. The proposed algorithm is based on extreme learning machines (ELM) and a perceptron with analytical calculation of weights. We show that the proposed model have good results using a reduced set of patterns.
ES2014-180
A new model selection approach for the ELM network using metaheuristic optimization
Ananda Freire, Guilherme Barreto
A new model selection approach for the ELM network using metaheuristic optimization
Ananda Freire, Guilherme Barreto
Abstract:
We propose a novel approach for architecture selection and hidden neurons excitability improvement for the Extreme Learning Machine (ELM). Named Adaptive Number of Hidden Neurons Approach (ANHNA), the proposed approach relies on a new general encoding scheme of the solution vector that automatically estimates the number of hidden neurons and adjust their activation function parameters (slopes and biases). Due to its general nature, ANHNA's encoding scheme can be used by any metaheuristic algorithm for continuous optimization. Computer experiments were carried out using Differential Evolution (DE) and Particle Swarm Optimization (PSO) metaheuristics, with promising results being achieved by the proposed method in benchmarking regression problems.
We propose a novel approach for architecture selection and hidden neurons excitability improvement for the Extreme Learning Machine (ELM). Named Adaptive Number of Hidden Neurons Approach (ANHNA), the proposed approach relies on a new general encoding scheme of the solution vector that automatically estimates the number of hidden neurons and adjust their activation function parameters (slopes and biases). Due to its general nature, ANHNA's encoding scheme can be used by any metaheuristic algorithm for continuous optimization. Computer experiments were carried out using Differential Evolution (DE) and Particle Swarm Optimization (PSO) metaheuristics, with promising results being achieved by the proposed method in benchmarking regression problems.
ES2014-29
Extreme learning machines for Internet traffic classification
Joseph Ghafari, Emmanuel Herbert, Stephane Senecal, Daniel Migault, Stanislas Francfort, Ting Liu
Extreme learning machines for Internet traffic classification
Joseph Ghafari, Emmanuel Herbert, Stephane Senecal, Daniel Migault, Stanislas Francfort, Ting Liu
Abstract:
Network packet transport services (namely the Internet) are subject to significant security issues. This paper aims to apply Machine Learning methods based on Neural Networks (Extreme Learning Machines or ELM) to analyze the Internet traffic in order to detect specific malicious activities. This is performed by classifying traffic for a key service run over the internet: the Domain Name System (DNS). The ELM models and algorithms are run on DNS traffic data extracted from operating networks for botnet detection.
Network packet transport services (namely the Internet) are subject to significant security issues. This paper aims to apply Machine Learning methods based on Neural Networks (Extreme Learning Machines or ELM) to analyze the Internet traffic in order to detect specific malicious activities. This is performed by classifying traffic for a key service run over the internet: the Domain Name System (DNS). The ELM models and algorithms are run on DNS traffic data extracted from operating networks for botnet detection.
ES2014-23
Electric load forecasting using wavelet transform and extreme learning machine
Song Li, Peng Wang, Lalit Goel
Electric load forecasting using wavelet transform and extreme learning machine
Song Li, Peng Wang, Lalit Goel
Abstract:
This paper proposes a novel method for load forecast, which integrates wavelet transform and extreme learning machine. In order to capture more internal features, wavelet transform is used to decompose the load series into a set of sub-components, which are more predictable. Then all the components are separately processed by extreme learning machine. Numerical testing shows that the proposed method is able to improve the forecast performance with much less computational cost compared with other benchmarking methods.
This paper proposes a novel method for load forecast, which integrates wavelet transform and extreme learning machine. In order to capture more internal features, wavelet transform is used to decompose the load series into a set of sub-components, which are more predictable. Then all the components are separately processed by extreme learning machine. Numerical testing shows that the proposed method is able to improve the forecast performance with much less computational cost compared with other benchmarking methods.
ES2014-133
Swim velocity profile identification through a Dynamic Self-adaptive Multiobjective Harmonic Search and RBF neural networks
Helon V. H. Ayala, Luciano F. da Cruz, Leandro Coelho, Roberto Z. Freire
Swim velocity profile identification through a Dynamic Self-adaptive Multiobjective Harmonic Search and RBF neural networks
Helon V. H. Ayala, Luciano F. da Cruz, Leandro Coelho, Roberto Z. Freire
Abstract:
Technology has been successfully applied in sports, where biomechanical analysis is one of the most important area used to raise the performance of athletes. In this context, this paper focuses on swim velocity profile identification using Radial Basis Functions Neural Networks (RBF-NN) trained by the Gustafson-Kessel clustering combined with a novel Dynamic Self-adaptive Multiobjective Harmony Search (DS-MOHS). One study case is analyzed, from real data acquired of an elite female athlete, swimming breaststroke style. Better results are obtained by DS-MOHS when compared with standard multiobjective harmony search in terms of accuracy and generalization of the model.
Technology has been successfully applied in sports, where biomechanical analysis is one of the most important area used to raise the performance of athletes. In this context, this paper focuses on swim velocity profile identification using Radial Basis Functions Neural Networks (RBF-NN) trained by the Gustafson-Kessel clustering combined with a novel Dynamic Self-adaptive Multiobjective Harmony Search (DS-MOHS). One study case is analyzed, from real data acquired of an elite female athlete, swimming breaststroke style. Better results are obtained by DS-MOHS when compared with standard multiobjective harmony search in terms of accuracy and generalization of the model.
ES2014-173
Dynamic ensemble selection and instantaneous pruning for regression
Kaushala Dias, Terry Windeatt
Dynamic ensemble selection and instantaneous pruning for regression
Kaushala Dias, Terry Windeatt
Abstract:
A novel dynamic method of selecting pruned ensembles of predictors for regression problems is presented. The proposed method, known henceforth as DESIP, enhances the prediction accuracy and generalization ability of pruning methods. Pruning heuristics attempt to combine accurate yet complementary members, therefore DESIP enhances the performance by modifying the pruned aggregation through distributing the ensemble member selection over the entire dataset. Four static ensemble pruning approaches used in regression are compared to highlight the performance improvement yielded by the dynamic method. Experimental comparison is made using Multiple Layer Perceptron predictors on benchmark datasets.
A novel dynamic method of selecting pruned ensembles of predictors for regression problems is presented. The proposed method, known henceforth as DESIP, enhances the prediction accuracy and generalization ability of pruning methods. Pruning heuristics attempt to combine accurate yet complementary members, therefore DESIP enhances the performance by modifying the pruned aggregation through distributing the ensemble member selection over the entire dataset. Four static ensemble pruning approaches used in regression are compared to highlight the performance improvement yielded by the dynamic method. Experimental comparison is made using Multiple Layer Perceptron predictors on benchmark datasets.
ES2014-178
Data normalization and supervised learning to assess the condition of patients with multiple sclerosis based on gait analysis
Samir Azrour, Sébastien Piérard, Pierre Geurts, Marc Van Droogenbroeck
Data normalization and supervised learning to assess the condition of patients with multiple sclerosis based on gait analysis
Samir Azrour, Sébastien Piérard, Pierre Geurts, Marc Van Droogenbroeck
Abstract:
Gait impairment is considered as an important feature of disability in multiple sclerosis but its evaluation in the clinical routine remains limited. In this paper, we assess, by means of supervised learning, the condition of patients with multiple sclerosis based on their gait descriptors obtained with a gait analysis system. As the morphological characteristics of individuals influence their gait while being in first approximation independent of the disease level, an original strategy of data normalization with respect to these characteristics is described and applied beforehand in order to obtain more reliable predictions. In addition, we explain how we address the problem of missing data which is a common issue in the field of clinical evaluation. Results show that, based on machine learning combined to the proposed data handling techniques, we can predict a score highly correlated with the condition of patients.
Gait impairment is considered as an important feature of disability in multiple sclerosis but its evaluation in the clinical routine remains limited. In this paper, we assess, by means of supervised learning, the condition of patients with multiple sclerosis based on their gait descriptors obtained with a gait analysis system. As the morphological characteristics of individuals influence their gait while being in first approximation independent of the disease level, an original strategy of data normalization with respect to these characteristics is described and applied beforehand in order to obtain more reliable predictions. In addition, we explain how we address the problem of missing data which is a common issue in the field of clinical evaluation. Results show that, based on machine learning combined to the proposed data handling techniques, we can predict a score highly correlated with the condition of patients.
ES2014-95
Sparse one hidden layer MLPs
Alberto Torres, David Díaz, José R. Dorronsoro
Sparse one hidden layer MLPs
Alberto Torres, David Díaz, José R. Dorronsoro
Abstract:
We discuss how to build sparse one hidden layer MLP replac- ing the standard l2 weight decay penalty on all weights by an l1 penalty on the linear output weights. We will propose an iterative two step training procedure where the output weights are found using FISTA proximal op- timization algorithm to solve a Lasso-like problem and the hidden weights are computed by unconstrained minimization. As we shall discuss, the procedure has a complexity equivalent to that of standard MLP training, yields MLPs with similar performance and, as a by product, automatically selects the number of hidden units.
We discuss how to build sparse one hidden layer MLP replac- ing the standard l2 weight decay penalty on all weights by an l1 penalty on the linear output weights. We will propose an iterative two step training procedure where the output weights are found using FISTA proximal op- timization algorithm to solve a Lasso-like problem and the hidden weights are computed by unconstrained minimization. As we shall discuss, the procedure has a complexity equivalent to that of standard MLP training, yields MLPs with similar performance and, as a by product, automatically selects the number of hidden units.
ES2014-37
Multi-Step Ahead Forecasting of Road Condition Using Least Squares Support Vector Regression
Konsta Sirvio, Jaakko Hollmén
Multi-Step Ahead Forecasting of Road Condition Using Least Squares Support Vector Regression
Konsta Sirvio, Jaakko Hollmén
Abstract:
Network-level multi-step road condition forecasting is an important step in accurate road maintenance planning, where correct maintenance activities are defined in place and time of road networks. Forecasting methods have developed from engineering models to non-linear machine learning methods that make use of the collected condition and traffic data of the road network. Least Squares Support Vector Regression gives signicantly the best results.
Network-level multi-step road condition forecasting is an important step in accurate road maintenance planning, where correct maintenance activities are defined in place and time of road networks. Forecasting methods have developed from engineering models to non-linear machine learning methods that make use of the collected condition and traffic data of the road network. Least Squares Support Vector Regression gives signicantly the best results.
Label noise in classification
ES2014-10
A comprehensive introduction to label noise
Benoît Frénay, Ata Kaban
A comprehensive introduction to label noise
Benoît Frénay, Ata Kaban
Abstract:
In classification, it is often difficult or expensive to obtain completely accurate and reliable labels. Indeed, labels may be polluted by label noise, due to e.g. insufficient information, expert mistakes, and encoding errors. The problem is that errors in training labels that are not properly handled may deteriorate the accuracy of subsequent predictions, among other effects. Many works have been devoted to label noise and this paper provides a concise and comprehensive introduction to this research topic. In particular, it reviews the types of label noise, their consequences and a number of state of the art approaches to deal with label noise.
In classification, it is often difficult or expensive to obtain completely accurate and reliable labels. Indeed, labels may be polluted by label noise, due to e.g. insufficient information, expert mistakes, and encoding errors. The problem is that errors in training labels that are not properly handled may deteriorate the accuracy of subsequent predictions, among other effects. Many works have been devoted to label noise and this paper provides a concise and comprehensive introduction to this research topic. In particular, it reviews the types of label noise, their consequences and a number of state of the art approaches to deal with label noise.
ES2014-185
Improving the Robustness of Bagging with Reduced Sampling Size
Maryam Sabzevari, Gonzalo Martínez-Muñoz, Alberto Suárez
Improving the Robustness of Bagging with Reduced Sampling Size
Maryam Sabzevari, Gonzalo Martínez-Muñoz, Alberto Suárez
Abstract:
Bagging is a simple and robust classification algorithm in the presence of class label noise. This algorithm builds an ensemble of classifiers by bootstrapping samples with replacement of size equal to the original training set. However, several studies have shown that this choice of sampling size is arbitrary in terms of generalization performance of the ensemble. In this study we discuss how small sampling ratios can contribute to the robustness of bagging in the presence of class label noise. An empirical analysis on two datasets is carried out using different noise rates and bootstrap sampling sizes. The results show that, for the studied datasets, sampling rates of 20% clearly improve the performance of the bagging ensembles in the presence of class label noise.
Bagging is a simple and robust classification algorithm in the presence of class label noise. This algorithm builds an ensemble of classifiers by bootstrapping samples with replacement of size equal to the original training set. However, several studies have shown that this choice of sampling size is arbitrary in terms of generalization performance of the ensemble. In this study we discuss how small sampling ratios can contribute to the robustness of bagging in the presence of class label noise. An empirical analysis on two datasets is carried out using different noise rates and bootstrap sampling sizes. The results show that, for the studied datasets, sampling rates of 20% clearly improve the performance of the bagging ensembles in the presence of class label noise.
ES2014-72
Credal decision trees in noisy domains
Carlos J. Mantas, Joaquín Abellán
Credal decision trees in noisy domains
Carlos J. Mantas, Joaquín Abellán
Abstract:
Credal Decision Trees (CDTs) are algorithms to design classifiers based on imprecise probabilities and uncertainty measures. In this paper, the C4.5 and CDT procedures are combined in a new one. This depends on a parameter $s$. Several experiments are carried out with different values for $s$. C4.5 and the new procedure, varying the value of its parameter, are compared on data sets with different noise levels.
Credal Decision Trees (CDTs) are algorithms to design classifiers based on imprecise probabilities and uncertainty measures. In this paper, the C4.5 and CDT procedures are combined in a new one. This depends on a parameter $s$. Several experiments are carried out with different values for $s$. C4.5 and the new procedure, varying the value of its parameter, are compared on data sets with different noise levels.
ES2014-18
Finding Originally Mislabels with MD-ELM
Anton Akusok, David Veganzones, Yoan Miché, Eric Severin, Amaury Lendasse
Finding Originally Mislabels with MD-ELM
Anton Akusok, David Veganzones, Yoan Miché, Eric Severin, Amaury Lendasse
Abstract:
This paper presents a method which aims at detecting mislabeled samples, with a practical example in the eld of bankruptcy prediction. Mislabeled samples are found in many classication problems and can bias the training of the desired classier. This paper is proposing a new method based on Extreme Learning Machine (ELM) which allows for identification of the most probable mislabeled samples. Two datasets are used in order to validate and test the proposed methodology: a toy example (XOR problem) and a real dataset from corporate nance (bankruptcy prediction).
This paper presents a method which aims at detecting mislabeled samples, with a practical example in the eld of bankruptcy prediction. Mislabeled samples are found in many classication problems and can bias the training of the desired classier. This paper is proposing a new method based on Extreme Learning Machine (ELM) which allows for identification of the most probable mislabeled samples. Two datasets are used in order to validate and test the proposed methodology: a toy example (XOR problem) and a real dataset from corporate nance (bankruptcy prediction).
ES2014-118
Misclassification of class C G-protein-coupled receptors as a label noise problem
Caroline König, Alfredo Vellido Alcacena, René Alquézar Mancho, Jesús Giraldo
Misclassification of class C G-protein-coupled receptors as a label noise problem
Caroline König, Alfredo Vellido Alcacena, René Alquézar Mancho, Jesús Giraldo
Abstract:
G-Protein-Coupled Receptors (GPCRs) are cell membrane proteins of relevance to biology and pharmacology. Their supervised classification in subtypes is hampered by label noise, which stems from a combination of expert knowledge limitations and lack of clear correspondence between labels and different representations of the protein primary sequences. In this brief study, we describe a systematic approach to the analysis of GPCR misclassifications using Support Vector Machines and use it to assist the discovery of database labeling quality problems and investigate the extent to which GPCR sequence physicochemical transformations reflect GPCR subtype labeling. The proposed approach could enable a filtering approach to the label noise problem.
G-Protein-Coupled Receptors (GPCRs) are cell membrane proteins of relevance to biology and pharmacology. Their supervised classification in subtypes is hampered by label noise, which stems from a combination of expert knowledge limitations and lack of clear correspondence between labels and different representations of the protein primary sequences. In this brief study, we describe a systematic approach to the analysis of GPCR misclassifications using Support Vector Machines and use it to assist the discovery of database labeling quality problems and investigate the extent to which GPCR sequence physicochemical transformations reflect GPCR subtype labeling. The proposed approach could enable a filtering approach to the label noise problem.
ES2014-169
A multi-class extension for multi-labeler support vector machines
Diego H. Peluffo-Ordonez, Santiago Murillo-Rendón, Julián Arias-Londoño, Germán Castellanos-Dominguez
A multi-class extension for multi-labeler support vector machines
Diego H. Peluffo-Ordonez, Santiago Murillo-Rendón, Julián Arias-Londoño, Germán Castellanos-Dominguez
Abstract:
In recent years, there has been an increasing interest in the design of pattern recognition systems able to deal with labels coming from multiple sources. To avoid bias during the learning process, in some applications it is strongly recommended to learn from a set of panelists or experts instead of only one. In particular, two aspects are of interest, namely: discriminating between confident and unconfident labelers, and determining the suitable ground truth. This work presents an extension of a previous work, which consists of a generalization of the two class case via a modified one-against-all approach. This approach uses modified classifiers able to learn from multi-labeler settings. This is done within a soft-margin support vector machine framework. Proposed method provides ranking values for panelist as well as an estimate of the ground truth.
In recent years, there has been an increasing interest in the design of pattern recognition systems able to deal with labels coming from multiple sources. To avoid bias during the learning process, in some applications it is strongly recommended to learn from a set of panelists or experts instead of only one. In particular, two aspects are of interest, namely: discriminating between confident and unconfident labelers, and determining the suitable ground truth. This work presents an extension of a previous work, which consists of a generalization of the two class case via a modified one-against-all approach. This approach uses modified classifiers able to learn from multi-labeler settings. This is done within a soft-margin support vector machine framework. Proposed method provides ranking values for panelist as well as an estimate of the ground truth.