Bruges, Belgium, April 22-23-24
Content of the proceedings
-
Prototype-based and weightless models
Emerging techniques and applications in multi-objective reinforcement learning
Sequence learning and time series
Regression and prediction
Feature and kernel learning
Graphs in machine learning
Manifold learning and optimization
Feature and model selection, sparse models
Advances in learning analytics and educational data mining
Classification
Image processing and vision systems
Unsupervised nonlinear dimensionality reduction
Unsupervised learning
Kernel methods
Prototype-based and weightless models
ES2015-68
Median-LVQ for classification of dissimilarity data based on ROC-optimization
David Nebel, Thomas Villmann
Median-LVQ for classification of dissimilarity data based on ROC-optimization
David Nebel, Thomas Villmann
Abstract:
In this article we consider a median variant of the learning vector quantization (LVQ) classifier for classification of dissimilarity data. However, beside the median aspect, we propose to optimize the receiver-operating characteristics (ROC) instead of the classification accuracy. In particular, we present a probabilistic LVQ model with an adaptation scheme based on a generalized Expectation-Maximization-procedure, which allows a maximization of the area under the ROC-curve for those dissimilarity data. The basic idea behind is the utilization of ordered pairs as a structured input for learning. The new scheme can be seen as a supplement to the recently introduced LVQ-scheme for ROC-optimization of vector data.
In this article we consider a median variant of the learning vector quantization (LVQ) classifier for classification of dissimilarity data. However, beside the median aspect, we propose to optimize the receiver-operating characteristics (ROC) instead of the classification accuracy. In particular, we present a probabilistic LVQ model with an adaptation scheme based on a generalized Expectation-Maximization-procedure, which allows a maximization of the area under the ROC-curve for those dissimilarity data. The basic idea behind is the utilization of ordered pairs as a structured input for learning. The new scheme can be seen as a supplement to the recently introduced LVQ-scheme for ROC-optimization of vector data.
ES2015-88
Certainty-based prototype insertion/deletion for classification with metric adaptation
Lydia Fischer, Barbara Hammer, Heiko Wersing
Certainty-based prototype insertion/deletion for classification with metric adaptation
Lydia Fischer, Barbara Hammer, Heiko Wersing
Abstract:
We propose an extension of prototype-based classification models to automatically adjust model complexity, thus offering a powerful technique for online, incremental learning tasks. The incremental technique is based on the notion of the certainty of an observed classification. Unlike previous work, we can incorporate matrix learning into the framework by relying on the cost function of generalised learning vector quantisation (GLVQ) for prototype insertion, deletion, as well as training. In several benchmarks, we demonstrate that the proposed method provides comparable results to offline counterparts and an incremental support vector machine, while enabling a better control of the required memory.
We propose an extension of prototype-based classification models to automatically adjust model complexity, thus offering a powerful technique for online, incremental learning tasks. The incremental technique is based on the notion of the certainty of an observed classification. Unlike previous work, we can incorporate matrix learning into the framework by relying on the cost function of generalised learning vector quantisation (GLVQ) for prototype insertion, deletion, as well as training. In several benchmarks, we demonstrate that the proposed method provides comparable results to offline counterparts and an incremental support vector machine, while enabling a better control of the required memory.
ES2015-35
Learning matrix quantization and variants of relevance learning
Kristin Domaschke, Marika Kaden, Mandy Lange, Thomas Villmann
Learning matrix quantization and variants of relevance learning
Kristin Domaschke, Marika Kaden, Mandy Lange, Thomas Villmann
Abstract:
We propose an extension of the learning vector quantization framework for matrix data. Data in matrix form occur in several areas like gray-scale images, time dependent spectra or fMRI data. If the matrix data are vectorized, important spatial information may be lost. Thus, processing matrix data in matrix form seems to be more appropriate. However, it requires matrix dissimilarities for data comparison. Here Schatten-$p$-norms come into play. We show that they can be used in a natural way replacing the vector dissimilarities in the learning framework. Moreover, we transfer the concept of vector relevance learning also to this new matrix variant. We apply the resulting learning matrix quantization approach to the classification of time-dependent fluorescence spectra as an exemplary real world application.
We propose an extension of the learning vector quantization framework for matrix data. Data in matrix form occur in several areas like gray-scale images, time dependent spectra or fMRI data. If the matrix data are vectorized, important spatial information may be lost. Thus, processing matrix data in matrix form seems to be more appropriate. However, it requires matrix dissimilarities for data comparison. Here Schatten-$p$-norms come into play. We show that they can be used in a natural way replacing the vector dissimilarities in the learning framework. Moreover, we transfer the concept of vector relevance learning also to this new matrix variant. We apply the resulting learning matrix quantization approach to the classification of time-dependent fluorescence spectra as an exemplary real world application.
ES2015-26
A WiSARD-based multi-term memory framework for online tracking of objects
Daniel Nascimento, Rafael Carvalho, Félix Mora-Camino, Priscila Lima, Felipe França
A WiSARD-based multi-term memory framework for online tracking of objects
Daniel Nascimento, Rafael Carvalho, Félix Mora-Camino, Priscila Lima, Felipe França
Abstract:
In this paper it is proposed a generic object tracker with real- time performance. The proposed tracker is inspired on the hierarchical short-term and medium-term memories for which patterns are stored as discriminators of a WiSARD weightless neural network. This approach is evaluated through benchmark video sequences published by Babenko et al. Experiments show that the WiSARD-based approach outperforms most of the previous results in the literature, with respect to the same dataset.
In this paper it is proposed a generic object tracker with real- time performance. The proposed tracker is inspired on the hierarchical short-term and medium-term memories for which patterns are stored as discriminators of a WiSARD weightless neural network. This approach is evaluated through benchmark video sequences published by Babenko et al. Experiments show that the WiSARD-based approach outperforms most of the previous results in the literature, with respect to the same dataset.
ES2015-100
Memory Transfer in DRASiW–like Systems
De gregorio Massimo, Giordano Maurizio
Memory Transfer in DRASiW–like Systems
De gregorio Massimo, Giordano Maurizio
Abstract:
DRASiW is an extension of a Weightless NN model, namely WiSARD, with the capability of storing, in an internal data structure called “mental image” (MI), the frequencies of seen patterns during the training stage. Due to these capabilities together with the possibility to reversely process MIs to generate synthetic prototypes of train samples, in this paper we show how, in DRASiW–like systems, it is possible to transfer the memory between different systems preserving the functionalities.
DRASiW is an extension of a Weightless NN model, namely WiSARD, with the capability of storing, in an internal data structure called “mental image” (MI), the frequencies of seen patterns during the training stage. Due to these capabilities together with the possibility to reversely process MIs to generate synthetic prototypes of train samples, in this paper we show how, in DRASiW–like systems, it is possible to transfer the memory between different systems preserving the functionalities.
ES2015-73
Combining dissimilarity measures for prototype-based classification
Ernest Mwebaze, Gjalt Bearda, Michael Biehl, Dietlind Zuehlke
Combining dissimilarity measures for prototype-based classification
Ernest Mwebaze, Gjalt Bearda, Michael Biehl, Dietlind Zuehlke
Abstract:
Prototype-based classification has been used successfully for classification tasks where interpretability of the output of the system is key. Prototypes are representative of the data and, together with a suitable measure of dissimilarity, parameterize the classifier. In many practical problems, the same object is represented by a collection of qualitatively different subsets of features, each of which might require a different dissimilarity measure. In this paper we present a novel technique for combining different dissimilarity measures into one classification scheme for heterogeneous, mixed data. To illustrate the method we apply a select class of prototype-based classifiers, LVQ, to the problem of diagnosing viral crop disease in cassava plants. We combine different dissimilarity measures related to features extracted from leaf images including histograms (HSV) and shape features (SIFT). Our results show the feasibility of the method, increased performance compared to previous methods and improved interpretability of the systems.
Prototype-based classification has been used successfully for classification tasks where interpretability of the output of the system is key. Prototypes are representative of the data and, together with a suitable measure of dissimilarity, parameterize the classifier. In many practical problems, the same object is represented by a collection of qualitatively different subsets of features, each of which might require a different dissimilarity measure. In this paper we present a novel technique for combining different dissimilarity measures into one classification scheme for heterogeneous, mixed data. To illustrate the method we apply a select class of prototype-based classifiers, LVQ, to the problem of diagnosing viral crop disease in cassava plants. We combine different dissimilarity measures related to features extracted from leaf images including histograms (HSV) and shape features (SIFT). Our results show the feasibility of the method, increased performance compared to previous methods and improved interpretability of the systems.
Emerging techniques and applications in multi-objective reinforcement learning
ES2015-15
Multi-objective optimization perspectives on reinforcement learning algorithms using reward vectors
Madalina Drugan
Multi-objective optimization perspectives on reinforcement learning algorithms using reward vectors
Madalina Drugan
Abstract:
Reinforcement learning is a machine learning area that studies which actions an agent can take in order to optimize a cumulative reward function. Recently, a new class of reinforcement learning algorithms with multiple, possibly conflicting, reward functions was proposed. We call this class of algorithms the multi-objective reinforcement learning (MORL) paradigm. We give an overview on multi-objective optimization techniques imported in MORL and their theoretical simplified variant with a single state, namely the multi-objective multi-armed bandits (MOMAB) paradigm.
Reinforcement learning is a machine learning area that studies which actions an agent can take in order to optimize a cumulative reward function. Recently, a new class of reinforcement learning algorithms with multiple, possibly conflicting, reward functions was proposed. We call this class of algorithms the multi-objective reinforcement learning (MORL) paradigm. We give an overview on multi-objective optimization techniques imported in MORL and their theoretical simplified variant with a single state, namely the multi-objective multi-armed bandits (MOMAB) paradigm.
ES2015-27
Thompson Sampling for Multi-Objective Multi-Armed Bandits Problem
Saba Yahyaa, Bernard Manderick
Thompson Sampling for Multi-Objective Multi-Armed Bandits Problem
Saba Yahyaa, Bernard Manderick
Abstract:
The multi-objective, multi-armed bandit (MOMAB) problem is a Markov decision process with stochastic rewards. Each arm generates a vector of rewards instead of a single scalar reward. Moreover, these multiple rewards might be conflicting. The MOMAB-problem has a set of Pareto optimal arms and an agent's goal is not only to find that set but also to play evenly or fairly the arms in that set. To find the Pareto optimal arms, linear scalarized function or Pareto dominance relations approach can be used. The linear scalarized function converts the multi-objective optimization problem into a single objective one and is very popular approach because of its simplicity. The Pareto dominance relations optimizes directly the multi-objective problem. In this paper, we extend Thompson Sampling policy to be used in the $MOMAB$ problem. We propose Pareto Thompson Sampling and linear scalarized Thompson Sampling approaches. We compare empirically between Pareto Thompson Sampling and linear scalarized Thompson Sampling on a test suite of MOMAB problems with Bernoulli distributions. Pareto Thompson Sampling is the approach with the best empirical performance.
The multi-objective, multi-armed bandit (MOMAB) problem is a Markov decision process with stochastic rewards. Each arm generates a vector of rewards instead of a single scalar reward. Moreover, these multiple rewards might be conflicting. The MOMAB-problem has a set of Pareto optimal arms and an agent's goal is not only to find that set but also to play evenly or fairly the arms in that set. To find the Pareto optimal arms, linear scalarized function or Pareto dominance relations approach can be used. The linear scalarized function converts the multi-objective optimization problem into a single objective one and is very popular approach because of its simplicity. The Pareto dominance relations optimizes directly the multi-objective problem. In this paper, we extend Thompson Sampling policy to be used in the $MOMAB$ problem. We propose Pareto Thompson Sampling and linear scalarized Thompson Sampling approaches. We compare empirically between Pareto Thompson Sampling and linear scalarized Thompson Sampling on a test suite of MOMAB problems with Bernoulli distributions. Pareto Thompson Sampling is the approach with the best empirical performance.
ES2015-65
Pareto Local Search for MOMDP Planning
Chiel Kooijman, Maarten De Waard, Maarten Inja, Diederik Roijers, Shimon Whiteson
Pareto Local Search for MOMDP Planning
Chiel Kooijman, Maarten De Waard, Maarten Inja, Diederik Roijers, Shimon Whiteson
Abstract:
Standard single-objective methods such as dynamic programming are not applicable to Markov decision processes (MDPs) with multiple objectives because they depend on a maximization function over rewards, which is not defined if the rewards are multi-dimensional. As a result, special multi-objective algorithms are needed to find a set of policies that contains all optimal trade-offs between objectives, i.e. a set of Pareto-optimal policies. In this paper, we propose Pareto Local Policy Search (PLoPS), a new planning method for multi-objective MDPs (MOMDPs) based on Pareto Local Search (PLS). This method produces a good set of policies by iteratively scanning the neighbourhood of locally non-dominated policies for improvements. It is fast because neighbouring policies can be quickly identified as improvements, and their values can be computed incrementally. We test the performance of PLoPS on several MOMDP benchmarks, and compare it to popular decision-theoretic and evolutionary alternatives. The results indicate that PLoPS outperforms the alternatives.
Standard single-objective methods such as dynamic programming are not applicable to Markov decision processes (MDPs) with multiple objectives because they depend on a maximization function over rewards, which is not defined if the rewards are multi-dimensional. As a result, special multi-objective algorithms are needed to find a set of policies that contains all optimal trade-offs between objectives, i.e. a set of Pareto-optimal policies. In this paper, we propose Pareto Local Policy Search (PLoPS), a new planning method for multi-objective MDPs (MOMDPs) based on Pareto Local Search (PLS). This method produces a good set of policies by iteratively scanning the neighbourhood of locally non-dominated policies for improvements. It is fast because neighbouring policies can be quickly identified as improvements, and their values can be computed incrementally. We test the performance of PLoPS on several MOMDP benchmarks, and compare it to popular decision-theoretic and evolutionary alternatives. The results indicate that PLoPS outperforms the alternatives.
ES2015-33
Bernoulli bandits: an empirical comparison
Nixon Ronoh, Reuben Odoyo, Edna Milgo, Madalina Drugan, Bernard Manderick
Bernoulli bandits: an empirical comparison
Nixon Ronoh, Reuben Odoyo, Edna Milgo, Madalina Drugan, Bernard Manderick
Abstract:
We compare empirically a representative sample of action selection policies on a test suite of Bernoulli multi-armed bandit problems. For such problems the rewards are either success or failure having a Bernoulli distribution with unknown success probability. The number of arms in our test suite ranges from small to large and for each number of arms we consider several distributions of the success probabilities. Our selection consists of the following action selection policies: ε-greedy, UCB1- Tuned, Thompson sampling, the Gittins index policy, and the knowledge gradient. In this paper, we report the case of ten arms. A forthcoming technical report will include other than Bernoulli bandits and it describes the experimental results for all multi-armed bandit problems for several parameter settings.
We compare empirically a representative sample of action selection policies on a test suite of Bernoulli multi-armed bandit problems. For such problems the rewards are either success or failure having a Bernoulli distribution with unknown success probability. The number of arms in our test suite ranges from small to large and for each number of arms we consider several distributions of the success probabilities. Our selection consists of the following action selection policies: ε-greedy, UCB1- Tuned, Thompson sampling, the Gittins index policy, and the knowledge gradient. In this paper, we report the case of ten arms. A forthcoming technical report will include other than Bernoulli bandits and it describes the experimental results for all multi-armed bandit problems for several parameter settings.
Sequence learning and time series
ES2015-118
Learning Recurrent Dynamics using Differential Evolution
Sebastian Otte, Fabian Becker, Martin V. Butz, Marcus Liwicki, Andreas Zell
Learning Recurrent Dynamics using Differential Evolution
Sebastian Otte, Fabian Becker, Martin V. Butz, Marcus Liwicki, Andreas Zell
Abstract:
This paper presents an efficient and powerful approach for learning dynamics with Recurrent Neural Networks (RNNs). No specialized or fine-tuned RNNs are used but rather standard RNNs with one fully connected hidden layer. The training procedure bases on a variant of Differential Evolution (DE) with a novel mutation schema that allows to reduce the population size in our setup down to five, but still yields very good results even within a few generations. For several common Multiple Superimposed Oscillator (MSO) instances new state-of-the-art results are presented, which are across the board multiple magnitudes better than the achieved results published so far. Furthermore, for new and even more difficult instances, i.e., MSO9-MSO12, our setup achieves lower error rates than reported previously for the best system on MSO8.
This paper presents an efficient and powerful approach for learning dynamics with Recurrent Neural Networks (RNNs). No specialized or fine-tuned RNNs are used but rather standard RNNs with one fully connected hidden layer. The training procedure bases on a variant of Differential Evolution (DE) with a novel mutation schema that allows to reduce the population size in our setup down to five, but still yields very good results even within a few generations. For several common Multiple Superimposed Oscillator (MSO) instances new state-of-the-art results are presented, which are across the board multiple magnitudes better than the achieved results published so far. Furthermore, for new and even more difficult instances, i.e., MSO9-MSO12, our setup achieves lower error rates than reported previously for the best system on MSO8.
ES2015-31
Comparison of Numerical Models and Statistical Learning for Wind Speed Prediction
Nils André Treiber, Stephan Späth, Justin Heinermann, Lueder von Bremen, Oliver Kramer
Comparison of Numerical Models and Statistical Learning for Wind Speed Prediction
Nils André Treiber, Stephan Späth, Justin Heinermann, Lueder von Bremen, Oliver Kramer
Abstract:
After decades of dominating wind forecasts based on numerical weather predictions, statistical models gained attention for shortest-term forecast horizons in the recent past. A rigorous experimental comparison between both model types is rare. In this paper, we compare COSMO-DE EPS forecasts from the German Meteorological Service (DWD) post-processed with non-homogeneous Gaussian regression to a multivariate support vector regression model. Further, a hybrid model is introduced that employs a weighted prediction of both approaches.
After decades of dominating wind forecasts based on numerical weather predictions, statistical models gained attention for shortest-term forecast horizons in the recent past. A rigorous experimental comparison between both model types is rare. In this paper, we compare COSMO-DE EPS forecasts from the German Meteorological Service (DWD) post-processed with non-homogeneous Gaussian regression to a multivariate support vector regression model. Further, a hybrid model is introduced that employs a weighted prediction of both approaches.
ES2015-39
Solar PV Power Forecasting Using Extreme Learning Machine and Information Fusion
Hélène Le Cadre, Ignacio Aravena, Anthony Papavasiliou
Solar PV Power Forecasting Using Extreme Learning Machine and Information Fusion
Hélène Le Cadre, Ignacio Aravena, Anthony Papavasiliou
Abstract:
We provide a learning algorithm combining distributed Extreme Learning Machine and an information fusion rule based on the aggregation of experts advice, to build day ahead probabilistic solar PV power production forecasts. These forecasts use, apart from the current day solar PV power production, local meteorological inputs, the most valu- able of which is shown to be precipitation. Experiments are then run in one French region, Provence-Alpes-Côte d'Azur, to evaluate the algorithm performance.
We provide a learning algorithm combining distributed Extreme Learning Machine and an information fusion rule based on the aggregation of experts advice, to build day ahead probabilistic solar PV power production forecasts. These forecasts use, apart from the current day solar PV power production, local meteorological inputs, the most valu- able of which is shown to be precipitation. Experiments are then run in one French region, Provence-Alpes-Côte d'Azur, to evaluate the algorithm performance.
ES2015-54
Gaussian process modelling of multiple short time series
Hande Topa, Antti Honkela
Gaussian process modelling of multiple short time series
Hande Topa, Antti Honkela
Abstract:
We study effective Gaussian process (GP) modelling of multiple short time series. These problems are common for example when applying GP models independently to each gene in a gene expression time series data set. Such sets typically contain very few time points and hence naive application of common GP modelling techniques can lead to severe overfitting in a significant fraction of the fitted models, depending on the details of the data set. We propose avoiding overfitting by constraining the GP length-scale to values that are compatible with the spacing of the time points. We demonstrate that this eliminates otherwise serious overfitting in real experiment using GP model to rank SNPs based on their likelihood of being under natural selection.
We study effective Gaussian process (GP) modelling of multiple short time series. These problems are common for example when applying GP models independently to each gene in a gene expression time series data set. Such sets typically contain very few time points and hence naive application of common GP modelling techniques can lead to severe overfitting in a significant fraction of the fitted models, depending on the details of the data set. We propose avoiding overfitting by constraining the GP length-scale to values that are compatible with the spacing of the time points. We demonstrate that this eliminates otherwise serious overfitting in real experiment using GP model to rank SNPs based on their likelihood of being under natural selection.
ES2015-56
Long Short Term Memory Networks for Anomaly Detection in Time Series
Pankaj Malhotra, Lovekesh Vig, Gautam Shroff, Puneet Agarwal
Long Short Term Memory Networks for Anomaly Detection in Time Series
Pankaj Malhotra, Lovekesh Vig, Gautam Shroff, Puneet Agarwal
Abstract:
Long Short Term Memory (LSTM) networks have been demonstrated to be particularly useful for learning sequences containing longer term patterns of unknown length, due to their ability to maintain long term memory. Stacking recurrent hidden layers in such networks also enables the learning of higher level temporal features, for faster learning with sparser representations. In this paper, we use stacked LSTM networks for anomaly/fault detection in time series. A network is trained on non-anomalous data and used as a predictor over a number of time steps. The resulting prediction errors are modeled as a multivariate Gaussian distribution, which is used to assess the likelihood of anomalous behavior. The efficacy of this approach is demonstrated on four datasets: ECG, space shuttle, power demand, and multi-sensor engine dataset.
Long Short Term Memory (LSTM) networks have been demonstrated to be particularly useful for learning sequences containing longer term patterns of unknown length, due to their ability to maintain long term memory. Stacking recurrent hidden layers in such networks also enables the learning of higher level temporal features, for faster learning with sparser representations. In this paper, we use stacked LSTM networks for anomaly/fault detection in time series. A network is trained on non-anomalous data and used as a predictor over a number of time steps. The resulting prediction errors are modeled as a multivariate Gaussian distribution, which is used to assess the likelihood of anomalous behavior. The efficacy of this approach is demonstrated on four datasets: ECG, space shuttle, power demand, and multi-sensor engine dataset.
ES2015-91
Hierarchical, prototype-based clustering of multiple time series with missing values
Pekka Wartiainen, Tommi Kärkkäinen
Hierarchical, prototype-based clustering of multiple time series with missing values
Pekka Wartiainen, Tommi Kärkkäinen
Abstract:
A novel technique to divide a given set of multiple time series containing missing values into disjoint subsets is proposed. With the hierarchical approach that combines a robust clustering algorithm and multiple cluster indices, we are able to generate a dynamic decision tree like structure to represent the original data in the leaf nodes. The whole algorithm is first described and then experimented with one particular data set from the UCI repository, already used in [Kärkkäinen et al., 2014] for a similar exploration. The obtained results are very promising.
A novel technique to divide a given set of multiple time series containing missing values into disjoint subsets is proposed. With the hierarchical approach that combines a robust clustering algorithm and multiple cluster indices, we are able to generate a dynamic decision tree like structure to represent the original data in the leaf nodes. The whole algorithm is first described and then experimented with one particular data set from the UCI repository, already used in [Kärkkäinen et al., 2014] for a similar exploration. The obtained results are very promising.
Regression and prediction
ES2015-12
Fast greedy insertion and deletion in sparse Gaussian process regression
Jens Schreiter, Duy Nguyen-Tuong, Heiner Markert, Michael Hanselmann, Marc Toussaint
Fast greedy insertion and deletion in sparse Gaussian process regression
Jens Schreiter, Duy Nguyen-Tuong, Heiner Markert, Michael Hanselmann, Marc Toussaint
Abstract:
In this paper, we introduce a new and straightforward criterion for successive insertion and deletion of training points in sparse Gaussian process regression. Our novel approach is based on an approximation of the selection technique proposed by Smola and Bartlett. It is shown that the resulting selection strategies are as fast as the purely randomized schemes for insertion and deletion of training points. Experiments on real-world robot data demonstrate that our obtained regression models are competitive with the computationally intensive state-of-the-art methods in terms of generalization accuracy.
In this paper, we introduce a new and straightforward criterion for successive insertion and deletion of training points in sparse Gaussian process regression. Our novel approach is based on an approximation of the selection technique proposed by Smola and Bartlett. It is shown that the resulting selection strategies are as fast as the purely randomized schemes for insertion and deletion of training points. Experiments on real-world robot data demonstrate that our obtained regression models are competitive with the computationally intensive state-of-the-art methods in terms of generalization accuracy.
ES2015-77
Using self-organizing maps for regression: the importance of the output function
Thomas Hecht, Mathieu Lefort, Alexander Gepperth
Using self-organizing maps for regression: the importance of the output function
Thomas Hecht, Mathieu Lefort, Alexander Gepperth
Abstract:
Self-organizing map (SOM) is a powerful paradigm that is extensively applied for clustering and visualization purpose. It is also used for regression learning, especially in robotics, thanks to its ability to provide a topological projection of high dimensional non linear data. In this case, data extracted from the SOM are usually restricted to the best matching unit (BMU), which is the usual way to use SOM for classification, where class labels are attached to individual neurons. In this article, we investigate the influence of considering more information from the SOM than just the BMU when performing regression. For this purpose, we quantitatively study several output functions for the SOM, when using these data as input of a linear regression, and find that the use of additional activities to the BMU can strongly improve regression performance. Thus, we propose an unified and generic framework that embraces a large spectrum of models from the traditional way to use SOM, with the best matching unit as output, to models related to the radial basis function network paradigm, when using local receptive field as output.
Self-organizing map (SOM) is a powerful paradigm that is extensively applied for clustering and visualization purpose. It is also used for regression learning, especially in robotics, thanks to its ability to provide a topological projection of high dimensional non linear data. In this case, data extracted from the SOM are usually restricted to the best matching unit (BMU), which is the usual way to use SOM for classification, where class labels are attached to individual neurons. In this article, we investigate the influence of considering more information from the SOM than just the BMU when performing regression. For this purpose, we quantitatively study several output functions for the SOM, when using these data as input of a linear regression, and find that the use of additional activities to the BMU can strongly improve regression performance. Thus, we propose an unified and generic framework that embraces a large spectrum of models from the traditional way to use SOM, with the best matching unit as output, to models related to the radial basis function network paradigm, when using local receptive field as output.
ES2015-107
Using the Mean Absolute Percentage Error for Regression Models
Arnaud de Myttenaere, Boris Golden, Bénédicte Le Grand, Fabrice Rossi
Using the Mean Absolute Percentage Error for Regression Models
Arnaud de Myttenaere, Boris Golden, Bénédicte Le Grand, Fabrice Rossi
Abstract:
We study in this paper the consequences of using the Mean Absolute Percentage Error (MAPE) as a measure of quality for regression models. We show that finding the best model under the MAPE is equivalent to doing weighted Mean Absolute Error (MAE) regression. We show that universal consistency of Empirical Risk Minimization remains possible using the MAPE instead of the MAE.
We study in this paper the consequences of using the Mean Absolute Percentage Error (MAPE) as a measure of quality for regression models. We show that finding the best model under the MAPE is equivalent to doing weighted Mean Absolute Error (MAE) regression. We show that universal consistency of Empirical Risk Minimization remains possible using the MAPE instead of the MAE.
ES2015-81
Survival Analysis with Cox Regression and Random Non-linear Projections
Samuel Branders, Benoît Frénay, Pierre Dupont
Survival Analysis with Cox Regression and Random Non-linear Projections
Samuel Branders, Benoît Frénay, Pierre Dupont
Abstract:
Proportional Cox hazard models are commonly used in survival analysis, since they define risk scores which can be directly interpreted in terms of hazards. Yet they cannot account for non-linearities in their covariates. This paper shows how to use random non-linear projections to efficiently address this limitation.
Proportional Cox hazard models are commonly used in survival analysis, since they define risk scores which can be directly interpreted in terms of hazards. Yet they cannot account for non-linearities in their covariates. This paper shows how to use random non-linear projections to efficiently address this limitation.
ES2015-135
Ensemble Learning with Dynamic Ordered Pruning for Regression
Kaushala Dias, Terry Windeatt
Ensemble Learning with Dynamic Ordered Pruning for Regression
Kaushala Dias, Terry Windeatt
Abstract:
A novel method of introducing diversity into ensemble learning predictors for regression problems is presented. The proposed method prunes the ensemble while simultaneously training, as part of the same learning process. Here not all members of the ensemble are trained, but selectively trained, resulting in a diverse selection of ensemble members that have strengths in different parts of the training set. The result is that the prediction accuracy and generalization ability of the trained ensemble is enhanced. Pruning heuristics attempt to combine accurate yet complementary members; therefore this method enhances the performance by dynamically modifying the pruned aggregation through distributing the ensemble member selection over the entire dataset. A comparison is drawn with Negative Correlation Learning and a static ensemble pruning approach used in regression to highlight the performance improvement yielded by the dynamic method. Experimental comparison is made using Multiple Layer Perceptron predictors on benchmark datasets.
A novel method of introducing diversity into ensemble learning predictors for regression problems is presented. The proposed method prunes the ensemble while simultaneously training, as part of the same learning process. Here not all members of the ensemble are trained, but selectively trained, resulting in a diverse selection of ensemble members that have strengths in different parts of the training set. The result is that the prediction accuracy and generalization ability of the trained ensemble is enhanced. Pruning heuristics attempt to combine accurate yet complementary members; therefore this method enhances the performance by dynamically modifying the pruned aggregation through distributing the ensemble member selection over the entire dataset. A comparison is drawn with Negative Correlation Learning and a static ensemble pruning approach used in regression to highlight the performance improvement yielded by the dynamic method. Experimental comparison is made using Multiple Layer Perceptron predictors on benchmark datasets.
ES2015-125
Training Multi-Layer Perceptron with Multi-Objective Optimization and Spherical Weights Representation
Honovan Rocha, Marcelo Costa, Antônio Braga
Training Multi-Layer Perceptron with Multi-Objective Optimization and Spherical Weights Representation
Honovan Rocha, Marcelo Costa, Antônio Braga
Abstract:
This paper proposes a novel representation of the parameters of neural networks in which the weights are projected into a new space defined by a radius r and a vector of angles theta. This spherical representation further simplifies the multi-objective learning problem in which error and norm functions are optimized to generate Pareto sets. Using spherical weights the error is minimized using a mono objective problem to the vector of angles whereas the radius (or norm) is fixed. Results indicate that spherical weights generate more reliable and accurate Pareto set estimates as compared to standard multi-objective approach.
This paper proposes a novel representation of the parameters of neural networks in which the weights are projected into a new space defined by a radius r and a vector of angles theta. This spherical representation further simplifies the multi-objective learning problem in which error and norm functions are optimized to generate Pareto sets. Using spherical weights the error is minimized using a mono objective problem to the vector of angles whereas the radius (or norm) is fixed. Results indicate that spherical weights generate more reliable and accurate Pareto set estimates as compared to standard multi-objective approach.
ES2015-90
Reducing offline evaluation bias of collaborative filtering
Arnaud de Myttenaere, Boris Golden, Bénédicte Le Grand, Fabrice Rossi
Reducing offline evaluation bias of collaborative filtering
Arnaud de Myttenaere, Boris Golden, Bénédicte Le Grand, Fabrice Rossi
Abstract:
Recommendation systems have been integrated into the majority of large online systems to filter and rank information according to user profiles. This process influences the way users interact with the system and, as a consequence,bias the evaluation of a recommendation algorithm computed using historical data (via offline evaluation). This paper presents the state of the art of the solutions to reduce this bias and a new application for a collaborative filtering.
Recommendation systems have been integrated into the majority of large online systems to filter and rank information according to user profiles. This process influences the way users interact with the system and, as a consequence,bias the evaluation of a recommendation algorithm computed using historical data (via offline evaluation). This paper presents the state of the art of the solutions to reduce this bias and a new application for a collaborative filtering.
ES2015-23
A new fuzzy neural system with applications
Yuanyuan Chai, Jun Chen, Wei Luo
A new fuzzy neural system with applications
Yuanyuan Chai, Jun Chen, Wei Luo
Abstract:
Through a comprehensive study of existing fuzzy neural systems, this paper presents a Choquet integral-OWA operator based fuzzy neural system named AggFNS as a new hybrid method of CI, which has advantages in universal fuzzy inference operators and importance factor expression during reasoning process. AggFNS was applied in traffic level of service evaluation problem and the experimental results showed that AggFNS has great nonlinear mapping function and approximation capability by training, which could be used for complex systems modeling, prediction and control.
Through a comprehensive study of existing fuzzy neural systems, this paper presents a Choquet integral-OWA operator based fuzzy neural system named AggFNS as a new hybrid method of CI, which has advantages in universal fuzzy inference operators and importance factor expression during reasoning process. AggFNS was applied in traffic level of service evaluation problem and the experimental results showed that AggFNS has great nonlinear mapping function and approximation capability by training, which could be used for complex systems modeling, prediction and control.
ES2015-126
Measuring scoring efficiency through goal expectancy estimation
Héctor Ruiz, Paulo Lisboa, Paul Neilson, Warren Gregson
Measuring scoring efficiency through goal expectancy estimation
Héctor Ruiz, Paulo Lisboa, Paul Neilson, Warren Gregson
Abstract:
Association football is characterized by the lowest scoring rate of all major sports. A typical value of less than 3 goals per game makes it difficult to find strong effects on goal scoring. Instead of goals, one can focus on the production of shots, increasing the available sample size. However, the value of shots depends heavily on different factors, and it is important to take this variability into account. In this paper, we use a multilayer perceptron to build a goal expectancy model that estimates the conversion probability of shots, and use it to evaluate the scoring performance of Premier League footballers.
Association football is characterized by the lowest scoring rate of all major sports. A typical value of less than 3 goals per game makes it difficult to find strong effects on goal scoring. Instead of goals, one can focus on the production of shots, increasing the available sample size. However, the value of shots depends heavily on different factors, and it is important to take this variability into account. In this paper, we use a multilayer perceptron to build a goal expectancy model that estimates the conversion probability of shots, and use it to evaluate the scoring performance of Premier League footballers.
ES2015-29
Predicting the profitability of agricultural enterprises in dairy farming
Maria Yli-Heikkilä, Jukka Tauriainen, Mika Sulkava
Predicting the profitability of agricultural enterprises in dairy farming
Maria Yli-Heikkilä, Jukka Tauriainen, Mika Sulkava
Abstract:
Profitability and other economic aspects of agriculture can be analyzed using various machine learning methods. In this paper, we compare linear, additive and recursive partitioning -based models for predicting the profitability of farms using information easily available to a dairy farmer. We find that an ensemble of recursive partitioning methods provides the best prediction accuracy. We also analyze the importance of the predictor variables. These findings may turn out to be useful in increasing our understanding of the factors affecting farm profitability and developing a web-service for farmers to predict the performance of their own farm enterprise.
Profitability and other economic aspects of agriculture can be analyzed using various machine learning methods. In this paper, we compare linear, additive and recursive partitioning -based models for predicting the profitability of farms using information easily available to a dairy farmer. We find that an ensemble of recursive partitioning methods provides the best prediction accuracy. We also analyze the importance of the predictor variables. These findings may turn out to be useful in increasing our understanding of the factors affecting farm profitability and developing a web-service for farmers to predict the performance of their own farm enterprise.
ES2015-67
The use of RBF neural network to predict building’s corners hygrothermal behavior
Roberto Z. Freire, Gerson H. dos Santos, Leandro dos S. Coelho, Viviana C. Mariani, Divani da S. Carvalho
The use of RBF neural network to predict building’s corners hygrothermal behavior
Roberto Z. Freire, Gerson H. dos Santos, Leandro dos S. Coelho, Viviana C. Mariani, Divani da S. Carvalho
Abstract:
In this paper, a radial basis function neural network (RBF-NN) was combined with two optimization techniques, the expectation-maximization clustering method was used to tune the Gaussian activation functions centers, and the differential evolution was adopted to optimize the spreads and to local search of the centers. The modified RBF-NN was employed to predict building corners hygrothermal behavior. These specific regions of buildings are still barely explored due to modelling complexity, high computer run time, numerical divergence and highly moisture-dependent properties. Moreover, these specific building areas are constantly affected by moisture accumulation and mould growth, conditions that favor structure damages.
In this paper, a radial basis function neural network (RBF-NN) was combined with two optimization techniques, the expectation-maximization clustering method was used to tune the Gaussian activation functions centers, and the differential evolution was adopted to optimize the spreads and to local search of the centers. The modified RBF-NN was employed to predict building corners hygrothermal behavior. These specific regions of buildings are still barely explored due to modelling complexity, high computer run time, numerical divergence and highly moisture-dependent properties. Moreover, these specific building areas are constantly affected by moisture accumulation and mould growth, conditions that favor structure damages.
ES2015-2
I see you: on neural networks for indoor geolocation
Johannes Pohl, Andreas Noack
I see you: on neural networks for indoor geolocation
Johannes Pohl, Andreas Noack
Abstract:
We propose a new passive system for indoor localization of mobile nodes. After the setup, our system only relies on arbitrary wireless communication from the nodes, whereby neither the mobile nodes nor the communication needs to be under our control. The presented system is composed of three Artificial Neural Networks (ANN) using a radiomap approach and the Received Signal Strength (RSS) for localization. A Probabilistic Neural Network (PNN) decides between two Generalized Regression Neural Networks (GRNN) that process the actual RSS measurement. In practical experiments we achieve a mean location error of 0.58m which is 22.64% better than a single GRNN approach in our setup.
We propose a new passive system for indoor localization of mobile nodes. After the setup, our system only relies on arbitrary wireless communication from the nodes, whereby neither the mobile nodes nor the communication needs to be under our control. The presented system is composed of three Artificial Neural Networks (ANN) using a radiomap approach and the Received Signal Strength (RSS) for localization. A Probabilistic Neural Network (PNN) decides between two Generalized Regression Neural Networks (GRNN) that process the actual RSS measurement. In practical experiments we achieve a mean location error of 0.58m which is 22.64% better than a single GRNN approach in our setup.
Feature and kernel learning
ES2015-13
Feature and kernel learning
Veronica Bolon-Canedo, Michele Donini, Fabio Aiolli
Feature and kernel learning
Veronica Bolon-Canedo, Michele Donini, Fabio Aiolli
Abstract:
Feature selection and weighting has been an active research area in the last few decades finding success in many different applications. With the advent of Big Data, the adequate identification of the relevant features has converted feature selection in an even more indispensable step. On the other side, in kernel methods features are implicitly represented by means of feature mappings and kernels. It has been shown that the correct selection of the kernel is a crucial task, as long as an erroneous selection can lead to poor performance. Unfortunately, manually searching for an optimal kernel is a time-consuming and a sub-optimal choice. This tutorial is concerned with the use of data to learn features and kernels automatically. We provide a survey of recent methods developed for feature selection/learning and their application to real world problems, together with a review of the contributions to the ESANN 2015 special session on Feature and Kernel Learning.
Feature selection and weighting has been an active research area in the last few decades finding success in many different applications. With the advent of Big Data, the adequate identification of the relevant features has converted feature selection in an even more indispensable step. On the other side, in kernel methods features are implicitly represented by means of feature mappings and kernels. It has been shown that the correct selection of the kernel is a crucial task, as long as an erroneous selection can lead to poor performance. Unfortunately, manually searching for an optimal kernel is a time-consuming and a sub-optimal choice. This tutorial is concerned with the use of data to learn features and kernels automatically. We provide a survey of recent methods developed for feature selection/learning and their application to real world problems, together with a review of the contributions to the ESANN 2015 special session on Feature and Kernel Learning.
ES2015-52
Discovering temporally extended features for reinforcement learning in domains with delayed causalities
Robert Lieck, Marc Toussaint
Discovering temporally extended features for reinforcement learning in domains with delayed causalities
Robert Lieck, Marc Toussaint
Abstract:
Discovering temporally delayed causalities from data raises notoriously hard problems in reinforcement learning. In this paper we define a space of temporally extended features, designed to capture such causal structures, using a generating operation. Our discovery algorithm PULSE exploits the generating operation to efficiently discover a sparse subset of features. We provide convergence guarantees and apply our method to train a model-based as well as a model-free agent in different domains. In terms of achieved rewards and the number of required features our methods can achieve much better results than other feature expansion methods.
Discovering temporally delayed causalities from data raises notoriously hard problems in reinforcement learning. In this paper we define a space of temporally extended features, designed to capture such causal structures, using a generating operation. Our discovery algorithm PULSE exploits the generating operation to efficiently discover a sparse subset of features. We provide convergence guarantees and apply our method to train a model-based as well as a model-free agent in different domains. In terms of achieved rewards and the number of required features our methods can achieve much better results than other feature expansion methods.
ES2015-104
ESNigma: efficient feature selection for echo state networks
Davide Bacciu, Filippo Benedetti, Alessio Micheli
ESNigma: efficient feature selection for echo state networks
Davide Bacciu, Filippo Benedetti, Alessio Micheli
Abstract:
The paper introduces a feature selection wrapper designed specifically for Echo State Networks. It defines a feature scoring heuristics, applicable to generic subset search algorithms, which allows to reduce the need for model retraining with respect to wrappers in literature. The experimental assessment on real-word noisy sequential data shows that the proposed method can identify a compact set of relevant, highly predictive features with as little as $60\%$ of the time required by the original wrapper.
The paper introduces a feature selection wrapper designed specifically for Echo State Networks. It defines a feature scoring heuristics, applicable to generic subset search algorithms, which allows to reduce the need for model retraining with respect to wrappers in literature. The experimental assessment on real-word noisy sequential data shows that the proposed method can identify a compact set of relevant, highly predictive features with as little as $60\%$ of the time required by the original wrapper.
ES2015-83
Learning features on tear film lipid layer classification
Beatriz Remeseiro, Veronica Bolon-Canedo, Amparo Alonso-Betanzos, Manuel G. Penedo
Learning features on tear film lipid layer classification
Beatriz Remeseiro, Veronica Bolon-Canedo, Amparo Alonso-Betanzos, Manuel G. Penedo
Abstract:
Dry eye is a prevalent disease which leads to irritation of the ocular surface, and is associated with symptoms of discomfort and dryness. The Guillon tear film classification system is one of the most common procedures to diagnose this disease. Previous research has demonstrated that this classification can be automatized by means of image processing and machine learning techniques. However, all approaches for automatic classification have been focused on dark eyes, since they are most common in humans. This paper introduces a methodology making use of feature selection methods, to learn which features are the most relevant for each type of eyes and, thus, improving the automatic classification of the tear film lipid layer independently of the color of the eyes. Experimental results showed the adequacy of the proposed methodology, achieving classification rates over 90%, while producing unbiased results and working in real-time.
Dry eye is a prevalent disease which leads to irritation of the ocular surface, and is associated with symptoms of discomfort and dryness. The Guillon tear film classification system is one of the most common procedures to diagnose this disease. Previous research has demonstrated that this classification can be automatized by means of image processing and machine learning techniques. However, all approaches for automatic classification have been focused on dark eyes, since they are most common in humans. This paper introduces a methodology making use of feature selection methods, to learn which features are the most relevant for each type of eyes and, thus, improving the automatic classification of the tear film lipid layer independently of the color of the eyes. Experimental results showed the adequacy of the proposed methodology, achieving classification rates over 90%, while producing unbiased results and working in real-time.
ES2015-114
PCA-based algorithm for feature score measures ensemble construction
Andrey Filchenkov, Vladislav Dolganov, Ivan Smetannikov
PCA-based algorithm for feature score measures ensemble construction
Andrey Filchenkov, Vladislav Dolganov, Ivan Smetannikov
Abstract:
Feature filtering algorithms are commonly used in feature selection for high-dimensional datasets due to their simplicity and efficacy. Each of these algorithms has its own strengths and weaknesses. Ensemble of different ranking methods is a way to provide a stable and efficacious ranking algorithm. We propose a PCA-based algorithm for filter ranking algorithms ensemble. We compared this algorithm with four other rank aggregation algorithms on five different datasets used in the NIPS-2003 feature selection challenge. We evaluated the stability of the resulting rankings and the AUC score for four classifiers learnt on resulting feature sets. The proposed method has shown better stability and above-average efficacy.
Feature filtering algorithms are commonly used in feature selection for high-dimensional datasets due to their simplicity and efficacy. Each of these algorithms has its own strengths and weaknesses. Ensemble of different ranking methods is a way to provide a stable and efficacious ranking algorithm. We propose a PCA-based algorithm for filter ranking algorithms ensemble. We compared this algorithm with four other rank aggregation algorithms on five different datasets used in the NIPS-2003 feature selection challenge. We evaluated the stability of the resulting rankings and the AUC score for four classifiers learnt on resulting feature sets. The proposed method has shown better stability and above-average efficacy.
Graphs in machine learning
ES2015-14
Graphs in machine learning. An introduction
Pierre Latouche, Fabrice Rossi
Graphs in machine learning. An introduction
Pierre Latouche, Fabrice Rossi
ES2015-130
Exploiting the ODD framework to define a novel effective graph kernel
Giovanni Da San Martino, Nicolò Navarin, Alessandro Sperduti
Exploiting the ODD framework to define a novel effective graph kernel
Giovanni Da San Martino, Nicolò Navarin, Alessandro Sperduti
Abstract:
In this paper, we show how the Ordered Decomposition DAGs kernel framework, a framework that allows the definition of graph kernels from tree kernels, allows to easily define new state-of-the-art graph kernels. Here we consider a quite fast graph kernel based on the Subtree kernel (ST), and we improve it by increasing its expressivity by adding new features involving partial tree features. While the worst-case complexity of the new obtained graph kernel does not increase, its effectiveness is improved, as shown on several chemical datasets, reaching state-of-the-art performances.
In this paper, we show how the Ordered Decomposition DAGs kernel framework, a framework that allows the definition of graph kernels from tree kernels, allows to easily define new state-of-the-art graph kernels. Here we consider a quite fast graph kernel based on the Subtree kernel (ST), and we improve it by increasing its expressivity by adding new features involving partial tree features. While the worst-case complexity of the new obtained graph kernel does not increase, its effectiveness is improved, as shown on several chemical datasets, reaching state-of-the-art performances.
ES2015-106
Exact ICL maximization in a non-stationary time extension of latent block model for dynamic networks
Marco Corneli, Pierre Latouche, Fabrice Rossi
Exact ICL maximization in a non-stationary time extension of latent block model for dynamic networks
Marco Corneli, Pierre Latouche, Fabrice Rossi
Abstract:
The latent block model (LBM) is a powerful probabilistic tool to describe interactions between node sets in bipartite networks, but it does not account for interactions of time varying intensity between nodes in unknown classes. Here we propose a non stationary temporal extension of the LBM that clusters simultaneously the two node sets of a bipartite network and constructs classes of time intervals on which interactions are stationary. The number of clusters as well as the membership to classes are obtained by maximizing the exact complete-data integrated likelihood by means of a greedy search approach. Experiments on simulated and real data illustrate the potentialities of such a model.
The latent block model (LBM) is a powerful probabilistic tool to describe interactions between node sets in bipartite networks, but it does not account for interactions of time varying intensity between nodes in unknown classes. Here we propose a non stationary temporal extension of the LBM that clusters simultaneously the two node sets of a bipartite network and constructs classes of time intervals on which interactions are stationary. The number of clusters as well as the membership to classes are obtained by maximizing the exact complete-data integrated likelihood by means of a greedy search approach. Experiments on simulated and real data illustrate the potentialities of such a model.
ES2015-87
A State-Space Model for the Dynamic Random Subgraph Model
RAwyia zreik, Pierre Latouche, Charles Bouveyron
A State-Space Model for the Dynamic Random Subgraph Model
RAwyia zreik, Pierre Latouche, Charles Bouveyron
Abstract:
In recent years, many random graph models have been proposed to extract information from networks. The principle is to look for groups of vertices with homogenous connection profiles. Most of these models are suitable for static networks and can handle different types of edges. This work is motivated by the need of analyzing an evolving network describing email communications between employees of the Enron compagny where social positions play an important role. Therefore, in this paper, we consider the random subgraph model (RSM) which was proposed recently to model networks through latent clusters built within known partitions. Using a state space model to characterize the cluster proportions, RSM is then extended in order to deal with dynamic networks. We call the latter the dynamic random subgraph model (dRSM).
In recent years, many random graph models have been proposed to extract information from networks. The principle is to look for groups of vertices with homogenous connection profiles. Most of these models are suitable for static networks and can handle different types of edges. This work is motivated by the need of analyzing an evolving network describing email communications between employees of the Enron compagny where social positions play an important role. Therefore, in this paper, we consider the random subgraph model (RSM) which was proposed recently to model networks through latent clusters built within known partitions. Using a state space model to characterize the cluster proportions, RSM is then extended in order to deal with dynamic networks. We call the latter the dynamic random subgraph model (dRSM).
ES2015-132
Gabriel Graph for Dataset Structure and Large Margin Classification: A Bayesian Approach
Luiz Carlos Torres, Cristiano Castro, Antônio Braga
Gabriel Graph for Dataset Structure and Large Margin Classification: A Bayesian Approach
Luiz Carlos Torres, Cristiano Castro, Antônio Braga
Abstract:
This paper presents a geometrical approach for obtaining large margin classifiers. The method aims at exploring the geometrical properties of the dataset from the structure of a Gabriel graph, which represents pattern relations according to a given distance metric, such as the Euclidean distance. Once the graph is generated, geometric vectors, analogous to SVM's support vectors are obtained in order to yield the final large margin solution from a Gaussian mixture model approach. Preliminary experiments have shown that the solutions obtained with the proposed method are close to those obtained with SVMs.
This paper presents a geometrical approach for obtaining large margin classifiers. The method aims at exploring the geometrical properties of the dataset from the structure of a Gabriel graph, which represents pattern relations according to a given distance metric, such as the Euclidean distance. Once the graph is generated, geometric vectors, analogous to SVM's support vectors are obtained in order to yield the final large margin solution from a Gaussian mixture model approach. Preliminary experiments have shown that the solutions obtained with the proposed method are close to those obtained with SVMs.
Manifold learning and optimization
ES2015-109
Supervised Manifold Learning with Incremental Stochastic Embeddings
Oliver Kramer
Supervised Manifold Learning with Incremental Stochastic Embeddings
Oliver Kramer
Abstract:
[Comment: paper may also fit into Session: "Unsupervised nonlinear dimensionality reduction?"] In this paper, we introduce an incremental dimensionality reduction approach for labeled data. The algorithm incrementally samples in latent space and chooses a solution that minimizes the nearest neighbor classification error taking into account label information. We introduce and compare two optimization approaches to generate supervised embeddings, i.e., an incremental solution construction method and a re-embedding approach. Both methods have in common that the objective is to minimize the nearest neighbor classification error computed in the low-dimensional space. The resulting embedding is a surrogate of the high-dimensional labeled set. The set allows conclusions about the data set structure and can be used as preprocessing step for classification of labeled patterns.
[Comment: paper may also fit into Session: "Unsupervised nonlinear dimensionality reduction?"] In this paper, we introduce an incremental dimensionality reduction approach for labeled data. The algorithm incrementally samples in latent space and chooses a solution that minimizes the nearest neighbor classification error taking into account label information. We introduce and compare two optimization approaches to generate supervised embeddings, i.e., an incremental solution construction method and a re-embedding approach. Both methods have in common that the objective is to minimize the nearest neighbor classification error computed in the low-dimensional space. The resulting embedding is a surrogate of the high-dimensional labeled set. The set allows conclusions about the data set structure and can be used as preprocessing step for classification of labeled patterns.
ES2015-99
Rank-constrained optimization: a Riemannian manifold approach
Guifang Zhou, Wen Huang, Gallivan Kyle, Van Dooren Paul, Pierre-Antoine Absil
Rank-constrained optimization: a Riemannian manifold approach
Guifang Zhou, Wen Huang, Gallivan Kyle, Van Dooren Paul, Pierre-Antoine Absil
Abstract:
This paper presents an algorithm that solves optimization problems on a matrix manifold $mathcal{M} subseteq mathbb{R}^{m times n}$ with an additional rank inequality constraint. New geometric objects are defined to facilitate efficiently finding a suitable rank. The convergence properties of the algorithm are given and a weighted low-rank approximation problem is used to illustrate the efficiency and effectiveness of the algorithm.
This paper presents an algorithm that solves optimization problems on a matrix manifold $mathcal{M} subseteq mathbb{R}^{m times n}$ with an additional rank inequality constraint. New geometric objects are defined to facilitate efficiently finding a suitable rank. The convergence properties of the algorithm are given and a weighted low-rank approximation problem is used to illustrate the efficiency and effectiveness of the algorithm.
ES2015-131
Asynchronous decentralized convex optimization through short-term gradient averaging
Jérôme FELLUS, David Picard, Philippe-Henri Gosselin
Asynchronous decentralized convex optimization through short-term gradient averaging
Jérôme FELLUS, David Picard, Philippe-Henri Gosselin
Abstract:
This paper considers decentralized convex optimization over a network in large scale contexts, where large simultaneously applies to number of training examples, dimensionality and number of networking nodes. We first propose a centralized optimization scheme that generalizes successful existing methods based on gradient averaging, improving their flexibility by making the number of averaged gradients an explicit parameter of the method. We then propose an asynchronous distributed algorithm that implements this original scheme for large decentralized computing networks.
This paper considers decentralized convex optimization over a network in large scale contexts, where large simultaneously applies to number of training examples, dimensionality and number of networking nodes. We first propose a centralized optimization scheme that generalizes successful existing methods based on gradient averaging, improving their flexibility by making the number of averaged gradients an explicit parameter of the method. We then propose an asynchronous distributed algorithm that implements this original scheme for large decentralized computing networks.
Feature and model selection, sparse models
ES2015-50
Model Selection for Big Data: Algorithmic Stability and Bag of Little Bootstraps on GPUs
Luca Oneto, Bernardo Pilarz, Alessandro Ghio, Davide Anguita
Model Selection for Big Data: Algorithmic Stability and Bag of Little Bootstraps on GPUs
Luca Oneto, Bernardo Pilarz, Alessandro Ghio, Davide Anguita
Abstract:
Model selection is a key step in learning from data, because it allows to select optimal models, by avoiding both under- and over-fitting. However, in the Big Data framework, the effectiveness of a model selection approach is assessed not only through the accuracy of the learned model but also through the time and computational resources needed to complete the procedure. In this paper, we propose two model selection approaches for Least Squares Support Vector Machine (LS-SVM) classifiers, based on Fully-empirical Algorithmic Stability (FAS) and Bag of Little Bootstraps (BLB). The two methods scale sub-linearly respect to the size of the learning set and, therefore, are well suited for big data applications. Experiments are performed on a Graphical Processing Unit (GPU), showing up to 30x speed-ups with respect to conventional CPU-based implementations.
Model selection is a key step in learning from data, because it allows to select optimal models, by avoiding both under- and over-fitting. However, in the Big Data framework, the effectiveness of a model selection approach is assessed not only through the accuracy of the learned model but also through the time and computational resources needed to complete the procedure. In this paper, we propose two model selection approaches for Least Squares Support Vector Machine (LS-SVM) classifiers, based on Fully-empirical Algorithmic Stability (FAS) and Bag of Little Bootstraps (BLB). The two methods scale sub-linearly respect to the size of the learning set and, therefore, are well suited for big data applications. Experiments are performed on a Graphical Processing Unit (GPU), showing up to 30x speed-ups with respect to conventional CPU-based implementations.
ES2015-95
Solving constrained Lasso and Elastic Net using nu-SVMs
Carlos M. Alaíz, Alberto Torres, José R. Dorronsoro
Solving constrained Lasso and Elastic Net using nu-SVMs
Carlos M. Alaíz, Alberto Torres, José R. Dorronsoro
Abstract:
Many important linear sparse models have at its core the Lasso problem, for which the GLMNet algorithm is often considered as the current state of the art. Recently M. Jaggi has observed that Constrained Lasso (CL) can be reduced to a SVM-like problem, which opens the way to use efficient SVM algorithms to solve CL. We will refine Jaggi's arguments to reduce CL as well as constrained Elastic Net to a Nearest Point Problem and show experimentally that the well known LIBSVM library results in a faster convergence than GLMNet for small problems and also, if properly adapted, for larger ones.
Many important linear sparse models have at its core the Lasso problem, for which the GLMNet algorithm is often considered as the current state of the art. Recently M. Jaggi has observed that Constrained Lasso (CL) can be reduced to a SVM-like problem, which opens the way to use efficient SVM algorithms to solve CL. We will refine Jaggi's arguments to reduce CL as well as constrained Elastic Net to a Nearest Point Problem and show experimentally that the well known LIBSVM library results in a faster convergence than GLMNet for small problems and also, if properly adapted, for larger ones.
ES2015-10
Assessment of feature saliency of MLP using analytic sensitivity
Tommi Kärkkäinen
Assessment of feature saliency of MLP using analytic sensitivity
Tommi Kärkkäinen
Abstract:
A novel technique to determine the saliency of features for the multilayer perceptron (MLP) neural network is presented. It is based on the analytic derivative of the feedforward mapping with respect to inputs, which is then integrated over the training data using the mean of the absolute values. Experiments demonstrating the viability of the approach are given with small benchmark data sets. The cross-validation based framework for reliable determination of MLP that has been used in the experiments was introduced in Kärkkäinen et al. (ESANN 2014, pp. 213-218) and Kärkkäinen (LNCS 8621, pp. 291-300).
A novel technique to determine the saliency of features for the multilayer perceptron (MLP) neural network is presented. It is based on the analytic derivative of the feedforward mapping with respect to inputs, which is then integrated over the training data using the mean of the absolute values. Experiments demonstrating the viability of the approach are given with small benchmark data sets. The cross-validation based framework for reliable determination of MLP that has been used in the experiments was introduced in Kärkkäinen et al. (ESANN 2014, pp. 213-218) and Kärkkäinen (LNCS 8621, pp. 291-300).
ES2015-41
Morisita-based feature selection for regression problems
Jean Golay, Michael Leuenberger, Mikhaïl Kanevski
Morisita-based feature selection for regression problems
Jean Golay, Michael Leuenberger, Mikhaïl Kanevski
Abstract:
Data acquisition, storage and management have been improved, while the factors of many phenomena are not well known. Consequently, irrelevant and redundant features artificially increase the size of datasets, which complicate learning tasks, such as regression. To address this problem, feature selection methods have been proposed. This research introduces a new supervised filter based on the Morisita estimator of intrinsic dimension. The algorithm is simple and does not rely on arbitrary parameters. It is applied to both synthetic and real data and a comparison with a wrapper based on extreme learning machine is conducted.
Data acquisition, storage and management have been improved, while the factors of many phenomena are not well known. Consequently, irrelevant and redundant features artificially increase the size of datasets, which complicate learning tasks, such as regression. To address this problem, feature selection methods have been proposed. This research introduces a new supervised filter based on the Morisita estimator of intrinsic dimension. The algorithm is simple and does not rely on arbitrary parameters. It is applied to both synthetic and real data and a comparison with a wrapper based on extreme learning machine is conducted.
ES2015-48
A new genetic algorithm for multi-label correlation-based feature selection
Suwimol Jungjit, Alex Freitas
A new genetic algorithm for multi-label correlation-based feature selection
Suwimol Jungjit, Alex Freitas
Abstract:
This paper proposes a new Genetic Algorithm for Multi-Label Correlation-Based Feature Selection (GA-ML-CFS). This GA performs a global search in the space of candidate feature subsets, in order to select a high-quality feature subset that is used by a multi-label classification algorithm – in this work, the Multi-Label k-NN algorithm. We compare the results of GA-ML-CFS with the results of the previously proposed Hill-Climbing for Multi-Label Correlation-Based Feature Selection (HC-ML-CFS), across 10 multi-label datasets
This paper proposes a new Genetic Algorithm for Multi-Label Correlation-Based Feature Selection (GA-ML-CFS). This GA performs a global search in the space of candidate feature subsets, in order to select a high-quality feature subset that is used by a multi-label classification algorithm – in this work, the Multi-Label k-NN algorithm. We compare the results of GA-ML-CFS with the results of the previously proposed Hill-Climbing for Multi-Label Correlation-Based Feature Selection (HC-ML-CFS), across 10 multi-label datasets
ES2015-102
Search Strategies for Binary Feature Selection for a Naive Bayes Classifier
Tsirizo Rabenoro, Jérôme Lacaille, Marie Cottrell, Fabrice Rossi
Search Strategies for Binary Feature Selection for a Naive Bayes Classifier
Tsirizo Rabenoro, Jérôme Lacaille, Marie Cottrell, Fabrice Rossi
Abstract:
We compare in this paper several feature selection methods for the Naive Bayes Classifier (NBC) when the data under study are described by a large number of redundant binary indicators. Wrapper approaches guided by the NBC estimation of the classification error probability outperform filter approaches while retaining a reasonable computational cost.
We compare in this paper several feature selection methods for the Naive Bayes Classifier (NBC) when the data under study are described by a large number of redundant binary indicators. Wrapper approaches guided by the NBC estimation of the classification error probability outperform filter approaches while retaining a reasonable computational cost.
Advances in learning analytics and educational data mining
ES2015-18
Advances in learning analytics and educational data mining
Mehrnoosh Vahdat, Alessandro Ghio, Luca Oneto, Davide Anguita, Mathias Funk, Matthias Rauterberg
Advances in learning analytics and educational data mining
Mehrnoosh Vahdat, Alessandro Ghio, Luca Oneto, Davide Anguita, Mathias Funk, Matthias Rauterberg
Abstract:
The growing interest in recent years towards Learning Analytics (LA) and Educational Data Mining (EDM) has enabled novel approaches and advancements in educational settings. The wide variety of research and practice in this context has enforced important possibilities and applications from adaptation and personalization of Technology Enhanced Learning (TEL) systems to improvement of instructional design and pedagogy choices based on students needs. LA and EDM play an important role in enhancing learning processes by offering innovative methods of development and integration of more personalized, adaptive, and interactive educational environments. This has motivated the organization of the ESANN 2015 Special Session in Advances in Learning Analytics and Educational Data Mining. Here, a review of research and practice in LA and EDM is presented accompanied by the most central methods, benefits, and challenges of the field. Additionally, this paper covers a review of novel contributions into the Special Session.
The growing interest in recent years towards Learning Analytics (LA) and Educational Data Mining (EDM) has enabled novel approaches and advancements in educational settings. The wide variety of research and practice in this context has enforced important possibilities and applications from adaptation and personalization of Technology Enhanced Learning (TEL) systems to improvement of instructional design and pedagogy choices based on students needs. LA and EDM play an important role in enhancing learning processes by offering innovative methods of development and integration of more personalized, adaptive, and interactive educational environments. This has motivated the organization of the ESANN 2015 Special Session in Advances in Learning Analytics and Educational Data Mining. Here, a review of research and practice in LA and EDM is presented accompanied by the most central methods, benefits, and challenges of the field. Additionally, this paper covers a review of novel contributions into the Special Session.
ES2015-43
Adaptive structure metrics for automated feedback provision in Java programming
Benjamin Paassen, Bassam Mokbel, Barbara Hammer
Adaptive structure metrics for automated feedback provision in Java programming
Benjamin Paassen, Bassam Mokbel, Barbara Hammer
Abstract:
Today's learning supporting systems for programming mostly rely on pre-coded feedback provision, such that their applicability is restricted to modelled tasks. In this contribution, we investigate the suitability of machine learning techniques to automate this process by means of a presentation of similar solution strategies from a set of stored examples. To this end we apply structure metric learning methods in local and global alignment which can be used to compare Java programs. We demonstrate that automatically adapted metrics better identify the underlying programming strategy as compared to their default counterparts in a benchmark example from programming.
Today's learning supporting systems for programming mostly rely on pre-coded feedback provision, such that their applicability is restricted to modelled tasks. In this contribution, we investigate the suitability of machine learning techniques to automate this process by means of a presentation of similar solution strategies from a set of stored examples. To this end we apply structure metric learning methods in local and global alignment which can be used to compare Java programs. We demonstrate that automatically adapted metrics better identify the underlying programming strategy as compared to their default counterparts in a benchmark example from programming.
ES2015-49
Human Algorithmic Stability and Human Rademacher Complexity
Mehrnoosh Vahdat, Luca Oneto, Alessandro Ghio, Davide Anguita, Mathias Funk, Matthias Rauterberg
Human Algorithmic Stability and Human Rademacher Complexity
Mehrnoosh Vahdat, Luca Oneto, Alessandro Ghio, Davide Anguita, Mathias Funk, Matthias Rauterberg
Abstract:
In Machine Learning (ML), the learning process of an algorithm given a set of evidences is studied via complexity measures. The way towards using ML complexity measures in the Human Learning (HL) domain has been paved by a previous study, which introduced Human Rademacher Complexity (HRC): in this work, we introduce Human Algorithmic Stability (HAS). Exploratory experiments, performed on a group of students, show the superiority of HAS against HRC, since HAS allows grasping the nature and complexity of the task to learn.
In Machine Learning (ML), the learning process of an algorithm given a set of evidences is studied via complexity measures. The way towards using ML complexity measures in the Human Learning (HL) domain has been paved by a previous study, which introduced Human Rademacher Complexity (HRC): in this work, we introduce Human Algorithmic Stability (HAS). Exploratory experiments, performed on a group of students, show the superiority of HAS against HRC, since HAS allows grasping the nature and complexity of the task to learn.
ES2015-86
High-School Dropout Prediction Using Machine Learning: A Danish Large-scale Study
Nicolae-Bogdan Sara, Rasmus Halland, Christian Igel, Stephen Alstrup
High-School Dropout Prediction Using Machine Learning: A Danish Large-scale Study
Nicolae-Bogdan Sara, Rasmus Halland, Christian Igel, Stephen Alstrup
Abstract:
Pupils not finishing their secondary education are a big societal problem. Previous studies indicate that machine learning can be used to predict high-school dropout, which allows early interventions. To the best of our knowledge, this paper presents the first large-scale study of that kind. It considers pupils that were at least six months into their Danish high-school education, with the goal to predict dropout in the subsequent three months. We combined information from the MaCom Lectio study administration system, which is used by most Danish high schools, with data from public online sources (name database, travel planner, governmental statistics). In contrast to existing studies that were based on only a few hundred students, we considered a considerably larger sample of 36299 pupils for training and 36299 for testing. We evaluated different machine learning methods. A random forest classifier achieved an accuracy of 93.47% and an area under the curve of 0.965. Given the large sample, we conclude that machine learning can be used to reliably detect high-school dropout given the information already available to many schools.
Pupils not finishing their secondary education are a big societal problem. Previous studies indicate that machine learning can be used to predict high-school dropout, which allows early interventions. To the best of our knowledge, this paper presents the first large-scale study of that kind. It considers pupils that were at least six months into their Danish high-school education, with the goal to predict dropout in the subsequent three months. We combined information from the MaCom Lectio study administration system, which is used by most Danish high schools, with data from public online sources (name database, travel planner, governmental statistics). In contrast to existing studies that were based on only a few hundred students, we considered a considerably larger sample of 36299 pupils for training and 36299 for testing. We evaluated different machine learning methods. A random forest classifier achieved an accuracy of 93.47% and an area under the curve of 0.965. Given the large sample, we conclude that machine learning can be used to reliably detect high-school dropout given the information already available to many schools.
ES2015-22
The prediction of learning performance using features of note taking activities
Minoru Nakayama, Kouichi Mutsuura, Hiroh Yamamoto
The prediction of learning performance using features of note taking activities
Minoru Nakayama, Kouichi Mutsuura, Hiroh Yamamoto
Abstract:
To promote effective learning in online learning environments, the prediction of learning performance is necessary, using various features of learning behaviour. In a blended learning course, participant's note taking activity reflects learning performance, and the possibility of predicting performance in final exams is examined using metrics of participant's characteristics and features of the contents of notes taken during the course. According to the results of this prediction performance, features of note-taking activities are a significant source of information to predict the score of final exams. Also, the accuracy of this prediction was evaluated using factors of the feature extraction procedure and the course instructions.
To promote effective learning in online learning environments, the prediction of learning performance is necessary, using various features of learning behaviour. In a blended learning course, participant's note taking activity reflects learning performance, and the possibility of predicting performance in final exams is examined using metrics of participant's characteristics and features of the contents of notes taken during the course. According to the results of this prediction performance, features of note-taking activities are a significant source of information to predict the score of final exams. Also, the accuracy of this prediction was evaluated using factors of the feature extraction procedure and the course instructions.
ES2015-113
Enhancing learning at work. How to combine theoretical and data-driven approaches, and multiple levels of data?
Virpi Kalakoski, Henriikka Ratilainen, Linda Drupsteen
Enhancing learning at work. How to combine theoretical and data-driven approaches, and multiple levels of data?
Virpi Kalakoski, Henriikka Ratilainen, Linda Drupsteen
Abstract:
This research plan focuses on learning at work. Our aim is to gather empirical data on multiple factors that can affect learning for work, and to apply computational methods in order to understand the preconditions of effective learning. The design will systematically combine theory- and data-driven approaches to study (i) whether principles of effective learning found in previous studies apply to real life settings, (ii) what interactions between individual and organizational factors are related to learning outcomes, and (iii) new connections and phenomena relevant to enhance learning in real life.
This research plan focuses on learning at work. Our aim is to gather empirical data on multiple factors that can affect learning for work, and to apply computational methods in order to understand the preconditions of effective learning. The design will systematically combine theory- and data-driven approaches to study (i) whether principles of effective learning found in previous studies apply to real life settings, (ii) what interactions between individual and organizational factors are related to learning outcomes, and (iii) new connections and phenomena relevant to enhance learning in real life.
ES2015-24
Weighted Clustering of Sparse Educational Data
Mirka Saarela, Tommi Kärkkäinen
Weighted Clustering of Sparse Educational Data
Mirka Saarela, Tommi Kärkkäinen
Abstract:
Clustering as an unsupervised technique is predominantly used in unweighted settings. In this paper, we present an efficient version of a robust clustering algorithm for sparse educational data that takes the weights, aligning a sample with the corresponding population, into account. The algorithm is utilized to divide the Finnish student population of PISA 2012 (the latest data from the Programme for International Student Assessment) into groups, according to their attitudes and perceptions towards mathematics, for which one third of the data is missing. Furthermore, necessary modifications of three cluster indices to reveal an appropriate number of groups are proposed and demonstrated.
Clustering as an unsupervised technique is predominantly used in unweighted settings. In this paper, we present an efficient version of a robust clustering algorithm for sparse educational data that takes the weights, aligning a sample with the corresponding population, into account. The algorithm is utilized to divide the Finnish student population of PISA 2012 (the latest data from the Programme for International Student Assessment) into groups, according to their attitudes and perceptions towards mathematics, for which one third of the data is missing. Furthermore, necessary modifications of three cluster indices to reveal an appropriate number of groups are proposed and demonstrated.
Classification
ES2015-32
An affinity matrix approach for structure selection of extreme learning machines
David Pinto, Andre Lemos, Antônio Braga
An affinity matrix approach for structure selection of extreme learning machines
David Pinto, Andre Lemos, Antônio Braga
Abstract:
This paper proposes a novel pruning approach for Extreme Learning Machines. Hidden neurons ranking and selection are performed using a priori information expressed by affinity matrices. We show that the similarity between the affinity matrix of the input patterns and the affinity matrix of the hidden layer output patterns can be seen as a measure of the data structural retention through the network. However, from a certain similarity level, adding new hidden nodes will have small or no effect on the amount of information propagated from the input. The proposed approach automatically determines this level and hence the suitable number of hidden nodes. Experiments are performed using classification problems to validate the proposed approach.
This paper proposes a novel pruning approach for Extreme Learning Machines. Hidden neurons ranking and selection are performed using a priori information expressed by affinity matrices. We show that the similarity between the affinity matrix of the input patterns and the affinity matrix of the hidden layer output patterns can be seen as a measure of the data structural retention through the network. However, from a certain similarity level, adding new hidden nodes will have small or no effect on the amount of information propagated from the input. The proposed approach automatically determines this level and hence the suitable number of hidden nodes. Experiments are performed using classification problems to validate the proposed approach.
ES2015-80
A generalised label noise model for classification
Jakramate Bootkrajang
A generalised label noise model for classification
Jakramate Bootkrajang
Abstract:
Learning from labelled data is becoming more and more challenging due to inherent imperfection of training labels. In this paper, we propose a new, generalised label noise model which is able to withstand the negative effect of both random noise and a wide range of non-random label noises. Empirical studies using three real-world datasets with inherent annotation errors demonstrate that the proposed generalised label noise model improves, in terms of classification accuracy, over existing label noise modelling approaches.
Learning from labelled data is becoming more and more challenging due to inherent imperfection of training labels. In this paper, we propose a new, generalised label noise model which is able to withstand the negative effect of both random noise and a wide range of non-random label noises. Empirical studies using three real-world datasets with inherent annotation errors demonstrate that the proposed generalised label noise model improves, in terms of classification accuracy, over existing label noise modelling approaches.
ES2015-84
On the use of machine learning techniques for the analysis of spontaneous reactions in automated hearing assessment
Veronica Bolon-Canedo, Alba Fernández, Amparo Alonso-Betanzos, Marcos Ortega, Manuel G. Penedo
On the use of machine learning techniques for the analysis of spontaneous reactions in automated hearing assessment
Veronica Bolon-Canedo, Alba Fernández, Amparo Alonso-Betanzos, Marcos Ortega, Manuel G. Penedo
Abstract:
Lack of hearing is one of the most frequent sensory deficits among elder population. Its correct assessment becomes complicated for audiologists when there are severe difficulties in the communication with the patient. Trying to facilitate this task, this paper proposes a methodology for the correct classification of eye gestural reactions to the auditory stimuli by using machine learning approaches. After extracting the features from the existing videos, we applied several classifiers and managed to improve the detection of the most important classes through the use of oversampling techniques in a novel way. This methodology showed promising results, with true positive rates over 0.96 for the critical classes and global classification rates over 97%, paving the way to its inclusion in a fully automated tool.
Lack of hearing is one of the most frequent sensory deficits among elder population. Its correct assessment becomes complicated for audiologists when there are severe difficulties in the communication with the patient. Trying to facilitate this task, this paper proposes a methodology for the correct classification of eye gestural reactions to the auditory stimuli by using machine learning approaches. After extracting the features from the existing videos, we applied several classifiers and managed to improve the detection of the most important classes through the use of oversampling techniques in a novel way. This methodology showed promising results, with true positive rates over 0.96 for the critical classes and global classification rates over 97%, paving the way to its inclusion in a fully automated tool.
ES2015-120
Combining higher-order N-grams and intelligent sample selection to improve language modeling for Handwritten Text Recognition
Jafar Tanha, Jesse De Does, Katrien Depuydt
Combining higher-order N-grams and intelligent sample selection to improve language modeling for Handwritten Text Recognition
Jafar Tanha, Jesse De Does, Katrien Depuydt
Abstract:
We combine two techniques to improve the language mod- eling component of a Handwritten Text Recognition (HTR) system. On the one hand, we apply a previously developed intelligent sample selection approach to language model adaptation for handwritten text recognition, which exploits a combination of in-domain and out-of-domain data for construction of language models. On the other hand, we apply rescoring methods to enable more complex language modeling in HTR. It is shown that these techniques complement each other very well, and that the combination leads to a significant error reduction in a practical HTR task for historical data.
We combine two techniques to improve the language mod- eling component of a Handwritten Text Recognition (HTR) system. On the one hand, we apply a previously developed intelligent sample selection approach to language model adaptation for handwritten text recognition, which exploits a combination of in-domain and out-of-domain data for construction of language models. On the other hand, we apply rescoring methods to enable more complex language modeling in HTR. It is shown that these techniques complement each other very well, and that the combination leads to a significant error reduction in a practical HTR task for historical data.
ES2015-40
Learning Sparse Feature Representations using Probabilistic Quadtrees and Deep Belief Nets
Saikat Basu, Manohar Karki, Sangram Ganguly, Robert DiBiano, Supratik Mukhopadhyay, Ramakrishna Nemani
Learning Sparse Feature Representations using Probabilistic Quadtrees and Deep Belief Nets
Saikat Basu, Manohar Karki, Sangram Ganguly, Robert DiBiano, Supratik Mukhopadhyay, Ramakrishna Nemani
Abstract:
Learning sparse feature representations is a useful instrument for solving an unsupervised learning problem. In this paper, we present three labeled handwritten digit datasets, collectively called n-MNIST. Then, we propose a novel framework for the classification of handwritten digits that learns sparse representations using probabilistic quadtrees and Deep Belief Nets. On the MNIST and n-MNIST datasets, our framework shows promising results and significantly outperforms traditional Deep Belief Networks.
Learning sparse feature representations is a useful instrument for solving an unsupervised learning problem. In this paper, we present three labeled handwritten digit datasets, collectively called n-MNIST. Then, we propose a novel framework for the classification of handwritten digits that learns sparse representations using probabilistic quadtrees and Deep Belief Nets. On the MNIST and n-MNIST datasets, our framework shows promising results and significantly outperforms traditional Deep Belief Networks.
ES2015-61
Optimal transport for semi-supervised domain adaptation
Denis Rousselle, Stéphane Canu
Optimal transport for semi-supervised domain adaptation
Denis Rousselle, Stéphane Canu
Abstract:
Domain adaption for semi-supervised learning is still a challenging task. Indeed, available solutions are often slow and fail to provide relevant interpretations. Here we propose a new algorithm to solve this problem of semi-supervised domain adaptation efficiently, by using an adapted combination of transportation algorithms. Our empirical evidence supports our initial intuition showing the interest of the proposed method.
Domain adaption for semi-supervised learning is still a challenging task. Indeed, available solutions are often slow and fail to provide relevant interpretations. Here we propose a new algorithm to solve this problem of semi-supervised domain adaptation efficiently, by using an adapted combination of transportation algorithms. Our empirical evidence supports our initial intuition showing the interest of the proposed method.
ES2015-46
Resource-efficient Incremental learning in very high dimensions
Alexander Gepperth, Mathieu Lefort, Thomas Hecht
Resource-efficient Incremental learning in very high dimensions
Alexander Gepperth, Mathieu Lefort, Thomas Hecht
Abstract:
We propose a three-layer neural architecture for incremental multi-class learning that remains resource-efficient even when the number of input dimensions is very high ($\ge 1000$). This so-called projection-prediction (PROPRE) architecture is strongly inspired by biological information processing in that it uses a prototype-based, topologically organized hidden layers trained with the SOM learning rule controlled by a global, task-related error signal. Furthermore, the SOM learning adapts only the weights of localized neural sub-populations that are similar to the input, which explicitly avoids the catastrophic forgetting effect of MLPs in case new input statistics are presented to the architecture. As the readout layer uses simple linear regression, the approach essentially applies locally linear models to "receptive fields" (RF) defined by SOM prototypes, whereas RF shape is implicitly defined by adjacent prototypes (which avoids the storage of covariance matrices that gets prohibitive for high input dimensionality). Both RF centers and shapes are jointly adapted w.r.t. input statistics and the classification task. Tests on the MNIST dataset show that the algorithm achieves compares favorably compared to the state-of-the-art LWPR algorithm at vastly decreased resource requirements.
We propose a three-layer neural architecture for incremental multi-class learning that remains resource-efficient even when the number of input dimensions is very high ($\ge 1000$). This so-called projection-prediction (PROPRE) architecture is strongly inspired by biological information processing in that it uses a prototype-based, topologically organized hidden layers trained with the SOM learning rule controlled by a global, task-related error signal. Furthermore, the SOM learning adapts only the weights of localized neural sub-populations that are similar to the input, which explicitly avoids the catastrophic forgetting effect of MLPs in case new input statistics are presented to the architecture. As the readout layer uses simple linear regression, the approach essentially applies locally linear models to "receptive fields" (RF) defined by SOM prototypes, whereas RF shape is implicitly defined by adjacent prototypes (which avoids the storage of covariance matrices that gets prohibitive for high input dimensionality). Both RF centers and shapes are jointly adapted w.r.t. input statistics and the classification task. Tests on the MNIST dataset show that the algorithm achieves compares favorably compared to the state-of-the-art LWPR algorithm at vastly decreased resource requirements.
ES2015-5
One-vs-all binarization technique in the context of random forest
Md Nasim Adnan, Md Zahidul Islam
One-vs-all binarization technique in the context of random forest
Md Nasim Adnan, Md Zahidul Islam
Abstract:
Binarization techniques are widely used to solve multi-class classification problems. These techniques reduce the classification complexity of multi-class classification problems by dividing the original data set into two-class segments or replicas. Then a set of simpler classifiers are learnt from the two-class segments or replicas. The outputs from these classifiers are combined for final classification. Binarization can improve prediction accuracy when compared to a single classifier. However, to be declared as a superior technique, binarization techniques need to prove themselves in the context of ensemble classifiers such as Random Forest. Random Forest is a state-of-the-art popular decision forest building algorithm which focuses on generating diverse decision trees as the base classifiers. In this paper we evaluate one-vs-all binarization technique in the context of Random Forest. We present an elaborate experimental result involving ten widely used data sets from the UCI Machine Learning Repository. The experimental results exhibit the effectiveness of one-vs-all binarization technique in the context of Random Forest.
Binarization techniques are widely used to solve multi-class classification problems. These techniques reduce the classification complexity of multi-class classification problems by dividing the original data set into two-class segments or replicas. Then a set of simpler classifiers are learnt from the two-class segments or replicas. The outputs from these classifiers are combined for final classification. Binarization can improve prediction accuracy when compared to a single classifier. However, to be declared as a superior technique, binarization techniques need to prove themselves in the context of ensemble classifiers such as Random Forest. Random Forest is a state-of-the-art popular decision forest building algorithm which focuses on generating diverse decision trees as the base classifiers. In this paper we evaluate one-vs-all binarization technique in the context of Random Forest. We present an elaborate experimental result involving ten widely used data sets from the UCI Machine Learning Repository. The experimental results exhibit the effectiveness of one-vs-all binarization technique in the context of Random Forest.
ES2015-21
Improving the random forest algorithm by randomly varying the size of the bootstrap samples for low dimensional data sets
Md Nasim Adnan, Md Zahidul Islam
Improving the random forest algorithm by randomly varying the size of the bootstrap samples for low dimensional data sets
Md Nasim Adnan, Md Zahidul Islam
Abstract:
The Random Forest algorithm generates quite diverse decision trees as the base classifiers for high dimensional data sets. However, for low dimensional data sets the diversity among the trees falls sharply. In Random Forest, the size of the bootstrap samples generally remains the same every time to generate a decision tree as the base classifier. In this paper we propose to vary the size of the bootstrap samples randomly within a predefined range in order to increase diversity among the trees. We conduct an elaborate experimentation on several low dimensional data sets from UCI Machine Learning Repository. The experimental results show the effectiveness of our proposed technique.
The Random Forest algorithm generates quite diverse decision trees as the base classifiers for high dimensional data sets. However, for low dimensional data sets the diversity among the trees falls sharply. In Random Forest, the size of the bootstrap samples generally remains the same every time to generate a decision tree as the base classifier. In this paper we propose to vary the size of the bootstrap samples randomly within a predefined range in order to increase diversity among the trees. We conduct an elaborate experimentation on several low dimensional data sets from UCI Machine Learning Repository. The experimental results show the effectiveness of our proposed technique.
ES2015-112
An Ensemble Learning Technique for Multipartite Ranking
Stéphan Clémençon, Sylvain Robbiano
An Ensemble Learning Technique for Multipartite Ranking
Stéphan Clémençon, Sylvain Robbiano
Abstract:
Decision tree induction algorithms, possibly combined with a consensus technique, have been recently successfully extended to multipartite ranking. It is the goal of this paper to address certain aspects of their weakness, instability and lack of smoothness namely, by proposing dedicated ensemble learning strategies. A shown by numerical experiments, bootstrap aggregation combined with a certain amount of feature randomization dramatically improve performance of such ranking methods, in terms of accuracy and robustness both at the same time.
Decision tree induction algorithms, possibly combined with a consensus technique, have been recently successfully extended to multipartite ranking. It is the goal of this paper to address certain aspects of their weakness, instability and lack of smoothness namely, by proposing dedicated ensemble learning strategies. A shown by numerical experiments, bootstrap aggregation combined with a certain amount of feature randomization dramatically improve performance of such ranking methods, in terms of accuracy and robustness both at the same time.
ES2015-82
Online multiclass learning with "bandit" feedback under a Passive-Aggressive approach
Hongliang Zhong, Emmanuel Daucé, Liva Ralaivola
Online multiclass learning with "bandit" feedback under a Passive-Aggressive approach
Hongliang Zhong, Emmanuel Daucé, Liva Ralaivola
Abstract:
This paper presents a new approach to online multi-class learning with bandit feedback. This algorithm, named PAB (Passive Agressive in Bandit) is a variant of Online Passive-Aggressive Algorithm proposed by [Crammer, 2006], the latter being an effective framework for performing max-margin online learning. We analyze some of its operating principal, and show it to provide a good and scalable solution to the bandit classification problem, in particular in the case of a real-world dataset where it is found to outperform the best existing methods.
This paper presents a new approach to online multi-class learning with bandit feedback. This algorithm, named PAB (Passive Agressive in Bandit) is a variant of Online Passive-Aggressive Algorithm proposed by [Crammer, 2006], the latter being an effective framework for performing max-margin online learning. We analyze some of its operating principal, and show it to provide a good and scalable solution to the bandit classification problem, in particular in the case of a real-world dataset where it is found to outperform the best existing methods.
ES2015-85
Data Analytics for Drilling Operational States Classifications
Galina Veres, Zoheir Sabeur
Data Analytics for Drilling Operational States Classifications
Galina Veres, Zoheir Sabeur
Abstract:
This paper provides benchmarks for the identification of best performance classifiers for the detection of operational states in industrial drilling operations. Multiple scenarios for the detection of the operational states are tested on a rig with various drilling wells. Drilling data are extremely challenging due to their non-linear and stochastic natures, notwithstanding the embedded noise in them and unbalancing. Nevertheless, there is a possibility to deploy robust classifiers to overcome such challenges and achieve good automated detection of states. Three classifiers with best classification rates of drilling operational states were identified in this study.
This paper provides benchmarks for the identification of best performance classifiers for the detection of operational states in industrial drilling operations. Multiple scenarios for the detection of the operational states are tested on a rig with various drilling wells. Drilling data are extremely challenging due to their non-linear and stochastic natures, notwithstanding the embedded noise in them and unbalancing. Nevertheless, there is a possibility to deploy robust classifiers to overcome such challenges and achieve good automated detection of states. Three classifiers with best classification rates of drilling operational states were identified in this study.
ES2015-79
Prediction of concrete carbonation depth using decision trees
Woubishet Taffese, Esko Sistonen , Jari Puttonen
Prediction of concrete carbonation depth using decision trees
Woubishet Taffese, Esko Sistonen , Jari Puttonen
Abstract:
In this paper, three carbonation depth predicting models using decision tree approach are developed. Carbonation, in urban area, is the major causes of reinforcement steel corrosion that causes premature degradation, loss of serviceability and safety of reinforced concrete structures. The adopted decision trees are regression tree, bagged ensemble and reduced bagged ensemble regression tree. The evaluation of the models predictions performance reveals that all three models perform reasonably well. Among the models, reduced bagged ensemble regression tree has a highest prediction and generalization capability.
In this paper, three carbonation depth predicting models using decision tree approach are developed. Carbonation, in urban area, is the major causes of reinforcement steel corrosion that causes premature degradation, loss of serviceability and safety of reinforced concrete structures. The adopted decision trees are regression tree, bagged ensemble and reduced bagged ensemble regression tree. The evaluation of the models predictions performance reveals that all three models perform reasonably well. Among the models, reduced bagged ensemble regression tree has a highest prediction and generalization capability.
ES2015-66
Powered-Two-Wheeler safety critical events recognition using a mixture model with quadratic logistic functions
Ferhat ATTAL, Abderrahmane Boubezoul, Allou Samé, Latifa Oukhellou
Powered-Two-Wheeler safety critical events recognition using a mixture model with quadratic logistic functions
Ferhat ATTAL, Abderrahmane Boubezoul, Allou Samé, Latifa Oukhellou
Abstract:
This paper presents a simple and efficient methodology that uses both acceleration and angular velocity signals to detect critical safety events for Powered Two Wheelers (PTW). The problem of recognition of critical events has been performed with the help of two steps: (1) the feature extraction step, where the multidimensional time trajectories of accelerometer/gyroscope data were modelled and segmented by using a specific mixture model with quadratic logistic functions; (2) the classifica- tion step, which consists in using the k-nearest neighbor (k-NN) algorithm in order to assign each trajectory characterized by its extracted features to one of the three classes namely Fall, near Fall and Naturalistic riding. The results show the ability of the proposed methodology to detect critical safety events for Powered Two Wheelers (PTW).
This paper presents a simple and efficient methodology that uses both acceleration and angular velocity signals to detect critical safety events for Powered Two Wheelers (PTW). The problem of recognition of critical events has been performed with the help of two steps: (1) the feature extraction step, where the multidimensional time trajectories of accelerometer/gyroscope data were modelled and segmented by using a specific mixture model with quadratic logistic functions; (2) the classifica- tion step, which consists in using the k-nearest neighbor (k-NN) algorithm in order to assign each trajectory characterized by its extracted features to one of the three classes namely Fall, near Fall and Naturalistic riding. The results show the ability of the proposed methodology to detect critical safety events for Powered Two Wheelers (PTW).
Image processing and vision systems
ES2015-76
Real-time activity recognition via deep learning of motion features
Kishore Konda, Pramod Chandrashekhariah, Roland Memisevic, Jochen Triesch
Real-time activity recognition via deep learning of motion features
Kishore Konda, Pramod Chandrashekhariah, Roland Memisevic, Jochen Triesch
Abstract:
Activity recognition is a challenging computer vision problem with countless applications. Here we present a real time activity recognition system using deep learning of local motion feature representations. Our approach learns to directly extract energy based motion features from video blocks. We implement the system on a distributed computing architecture and evaluate its performance on the iCub humanoid robot. We demonstrate real time performance using GPUs, paving the way for wide deployment of activity recognition systems in real world scenarios.
Activity recognition is a challenging computer vision problem with countless applications. Here we present a real time activity recognition system using deep learning of local motion feature representations. Our approach learns to directly extract energy based motion features from video blocks. We implement the system on a distributed computing architecture and evaluate its performance on the iCub humanoid robot. We demonstrate real time performance using GPUs, paving the way for wide deployment of activity recognition systems in real world scenarios.
ES2015-115
Designing semantic feature spaces for brain-reading
Luepol Pipanmaekaporn, Ludmilla Tajtelbom, Vincent Guigue, thierry Artieres
Designing semantic feature spaces for brain-reading
Luepol Pipanmaekaporn, Ludmilla Tajtelbom, Vincent Guigue, thierry Artieres
Abstract:
We focus on a brain-reading task which consists in discovering the word a person is thinking of from an fMRI image of his brain. Previous studies have demonstrated the feasibility of this brain-reading task through the design of what has been called a semantic space, i.e. a continuous low dimensional space reflecting the similarity between words. Up to now better results are achieved when carefully designing the semantic space by hand, which limits the generality of the method. We propose to automatically design several semantic space from linguistic resources and to combine them in a principled way so as to reach results as accurate as when using a manually built semantic space.
We focus on a brain-reading task which consists in discovering the word a person is thinking of from an fMRI image of his brain. Previous studies have demonstrated the feasibility of this brain-reading task through the design of what has been called a semantic space, i.e. a continuous low dimensional space reflecting the similarity between words. Up to now better results are achieved when carefully designing the semantic space by hand, which limits the generality of the method. We propose to automatically design several semantic space from linguistic resources and to combine them in a principled way so as to reach results as accurate as when using a manually built semantic space.
ES2015-124
Learning objects from RGB-D sensors using point cloud-based neural networks
Marcelo Borghetti Soares, Pablo Barros, German Ignacio Parisi, Stefan Wermter
Learning objects from RGB-D sensors using point cloud-based neural networks
Marcelo Borghetti Soares, Pablo Barros, German Ignacio Parisi, Stefan Wermter
Abstract:
In this paper we present a scene understanding approach for assistive robotics based on learning to recognize different objects from RGB-D devices. Using the depth information it is possible to compute descriptors that capture the geometrical relations among the points that constitute an object or extract features from multiple viewpoints. We developed a framework for testing different neural models that receive this depth information as input. Also, we propose a novel approach using three-dimensional RGB-D information as input to Convolutional Neural Networks. We found F1-scores greater than 0.9 for the majority of the objects tested, showing that the adopted approach is effective as well for classification.
In this paper we present a scene understanding approach for assistive robotics based on learning to recognize different objects from RGB-D devices. Using the depth information it is possible to compute descriptors that capture the geometrical relations among the points that constitute an object or extract features from multiple viewpoints. We developed a framework for testing different neural models that receive this depth information as input. Also, we propose a novel approach using three-dimensional RGB-D information as input to Convolutional Neural Networks. We found F1-scores greater than 0.9 for the majority of the objects tested, showing that the adopted approach is effective as well for classification.
ES2015-116
A Robust Neural Robot Navigation Using a Combination of Deliberative and Reactive Control Architectures
Dalia Marcela Rojas Castro, Arnaud Revel, Michel Ménard
A Robust Neural Robot Navigation Using a Combination of Deliberative and Reactive Control Architectures
Dalia Marcela Rojas Castro, Arnaud Revel, Michel Ménard
Abstract:
This paper proposes a hybrid neural-based control architecture for robot indoor navigation. This architecture preserves all the advantages of reactive architectures such as rapid responses to unforeseen problems in dynamic environments while combining them with the global knowledge of the world used in deliberative architectures. In order to take the right decision during navigation, the reactive module allows the robot to corroborate the dynamic visual perception with the a priori knowledge of the world gathered from a previously examined floor plan. Experiments with the robot functioning based on the proposed architecture in a simple navigation scenario prove the feasibility of the approach.
This paper proposes a hybrid neural-based control architecture for robot indoor navigation. This architecture preserves all the advantages of reactive architectures such as rapid responses to unforeseen problems in dynamic environments while combining them with the global knowledge of the world used in deliberative architectures. In order to take the right decision during navigation, the reactive module allows the robot to corroborate the dynamic visual perception with the a priori knowledge of the world gathered from a previously examined floor plan. Experiments with the robot functioning based on the proposed architecture in a simple navigation scenario prove the feasibility of the approach.
ES2015-122
Robust Visual Terrain Classification with Recurrent Neural Networks
Sebastian Otte, Stefan Laible, Richard Hanten, Marcus Liwicki, Andreas Zell
Robust Visual Terrain Classification with Recurrent Neural Networks
Sebastian Otte, Stefan Laible, Richard Hanten, Marcus Liwicki, Andreas Zell
Abstract:
A novel approach for robust visual terrain classification by generating feature sequences on repeatedly mutated image patches is presented. These sequences providing the feature vector progress under a certain image operation are learned with Recurrent Neural Networks (RNNs). The approach is studied for image patch based terrain classification for wheeled robots. Thereby, various RNN architectures, namely, standard RNNs, Long Short Term Memory networks (LSTMs), Dynamic Cortex Memory networks (DCMs) as well as bidirectional variants of the mentioned architecture are investigated and compared to recently used state-of-the-art methods for real-time terrain classification. The results show that the presented approach outperforms previous methods significantly.
A novel approach for robust visual terrain classification by generating feature sequences on repeatedly mutated image patches is presented. These sequences providing the feature vector progress under a certain image operation are learned with Recurrent Neural Networks (RNNs). The approach is studied for image patch based terrain classification for wheeled robots. Thereby, various RNN architectures, namely, standard RNNs, Long Short Term Memory networks (LSTMs), Dynamic Cortex Memory networks (DCMs) as well as bidirectional variants of the mentioned architecture are investigated and compared to recently used state-of-the-art methods for real-time terrain classification. The results show that the presented approach outperforms previous methods significantly.
ES2015-89
Revisiting ant colony algorithms to seismic faults detection
Walther Maciel, Cristina Vasconcelos, Pedro Silva, Marcelo Gattass
Revisiting ant colony algorithms to seismic faults detection
Walther Maciel, Cristina Vasconcelos, Pedro Silva, Marcelo Gattass
Abstract:
Seismic fault extracting is a time consuming task that can be aided by image enhancement of fault areas. The recent literature address this task by using ant colony optimization (ACO) algorithms to highlight the fault edges. This work proposes improvements to current state of the art methodologies by revisiting and/or reincorporating classic aspects of ACO, such as ant distribution, pheromone evaporation and deposition, not previously considered in this seismic fault enhancement scenario.The proposed approach arrives at good results presenting images with little noise and great localization of fault edges.
Seismic fault extracting is a time consuming task that can be aided by image enhancement of fault areas. The recent literature address this task by using ant colony optimization (ACO) algorithms to highlight the fault edges. This work proposes improvements to current state of the art methodologies by revisiting and/or reincorporating classic aspects of ACO, such as ant distribution, pheromone evaporation and deposition, not previously considered in this seismic fault enhancement scenario.The proposed approach arrives at good results presenting images with little noise and great localization of fault edges.
ES2015-101
Depth and height aware semantic RGB-D perception with convolutional neural networks
Hannes Schulz, Nico Höft, Sven Behnke
Depth and height aware semantic RGB-D perception with convolutional neural networks
Hannes Schulz, Nico Höft, Sven Behnke
Abstract:
Convolutional neural networks are popular for image labeling tasks, because of built-in translation invariance. They do not adopt well to scale changes, however, and cannot easily adjust to classes which regularly appear in certain scene regions. This is especially true when the network is applied in a sliding window. When depth data is available, we can address both problems. We propose to adjust the size of processed windows to the depth and to supply inferred height above ground to the network, which significantly improves object-class segmentation results on the NYU depth dataset.
Convolutional neural networks are popular for image labeling tasks, because of built-in translation invariance. They do not adopt well to scale changes, however, and cannot easily adjust to classes which regularly appear in certain scene regions. This is especially true when the network is applied in a sliding window. When depth data is available, we can address both problems. We propose to adjust the size of processed windows to the depth and to supply inferred height above ground to the network, which significantly improves object-class segmentation results on the NYU depth dataset.
ES2015-136
A simple technique for improving multi-class classification with neural networks
Thomas Kopinski, Alexander Gepperth, Uwe Handmann
A simple technique for improving multi-class classification with neural networks
Thomas Kopinski, Alexander Gepperth, Uwe Handmann
Abstract:
We present a novel method to perform multi-class pattern classification with neural networks and test it on a challenging 3D hand gesture recognition problem. Our method consists of a standard one-against-all (OAA) classification, followed by another network layer classifying the resulting class scores, possibly augmented by the original raw input vector. This allows the network to disambiguate hard-to-separate classes as the distribution of class scores carries considerable information as well, and is in fact often used for assessing the confidence of a decision. We show that by this approach we are able to significantly boost our results, overall as well as for particular difficult cases, on the hard 10-class gesture classification task.
We present a novel method to perform multi-class pattern classification with neural networks and test it on a challenging 3D hand gesture recognition problem. Our method consists of a standard one-against-all (OAA) classification, followed by another network layer classifying the resulting class scores, possibly augmented by the original raw input vector. This allows the network to disambiguate hard-to-separate classes as the distribution of class scores carries considerable information as well, and is in fact often used for assessing the confidence of a decision. We show that by this approach we are able to significantly boost our results, overall as well as for particular difficult cases, on the hard 10-class gesture classification task.
ES2015-128
Dynamic gesture recognition using Echo State Networks
Doreen Jirak, Pablo Barros, Stefan Wermter
Dynamic gesture recognition using Echo State Networks
Doreen Jirak, Pablo Barros, Stefan Wermter
Abstract:
In the last decade, training recurrent neural networks (RNN) using techniques from the area of reservoir computing (RC) became popular for learning sequential data due to the ease of network training. Although successfully applied in the language- and speech research, only little is known about using RC techniques for dynamic gesture recognition. We therefore conduct experiments on command gestures using Echo State Networks (ESN) to investigate both the effect of different gesture sequence representations and different parameter configurations. For recognition we employ the ensemble technique, i.e. using ESN's as weak classifiers. Our results show that using ESN is a promising approach, thus we give indications for future experiments in this research area.
In the last decade, training recurrent neural networks (RNN) using techniques from the area of reservoir computing (RC) became popular for learning sequential data due to the ease of network training. Although successfully applied in the language- and speech research, only little is known about using RC techniques for dynamic gesture recognition. We therefore conduct experiments on command gestures using Echo State Networks (ESN) to investigate both the effect of different gesture sequence representations and different parameter configurations. For recognition we employ the ensemble technique, i.e. using ESN's as weak classifiers. Our results show that using ESN is a promising approach, thus we give indications for future experiments in this research area.
ES2015-127
A flat neural network architecture to represent movement primitives with integrated sequencing
Andre Lemme, Jochen Steil
A flat neural network architecture to represent movement primitives with integrated sequencing
Andre Lemme, Jochen Steil
Abstract:
The paper proposes a minimalistic network to learn a set of movement primitives and their sequencing in one single feedforward network. Utilizing an extreme learning machine with output feedback and a simple inhibition mechanism, this approach can sequence movement primitives efficiently with very moderate network size. It can interpolate movement primitives to create new motions. This work thus demonstrates that an unspecific single hidden layer, that is a flat representation is sufficient to efficiently compose complex sequences, a task which usually requires hierarchiy, multiple timescales and multi-level control mechanisms.
The paper proposes a minimalistic network to learn a set of movement primitives and their sequencing in one single feedforward network. Utilizing an extreme learning machine with output feedback and a simple inhibition mechanism, this approach can sequence movement primitives efficiently with very moderate network size. It can interpolate movement primitives to create new motions. This work thus demonstrates that an unspecific single hidden layer, that is a flat representation is sufficient to efficiently compose complex sequences, a task which usually requires hierarchiy, multiple timescales and multi-level control mechanisms.
Unsupervised nonlinear dimensionality reduction
ES2015-16
Unsupervised dimensionality reduction: the challenge of big data visualization
Kerstin Bunte, John Aldo Lee
Unsupervised dimensionality reduction: the challenge of big data visualization
Kerstin Bunte, John Aldo Lee
ES2015-37
Autoencoding time series for visualisation
Nikolaos Gianniotis, Sven Dennis Kügler, Peter Tino, Kai Polsterer, Ranjeev Misra
Autoencoding time series for visualisation
Nikolaos Gianniotis, Sven Dennis Kügler, Peter Tino, Kai Polsterer, Ranjeev Misra
Abstract:
We present an algorithm for the visualisation of time series. To that end we employ echo state networks to convert time series into a suitable vector representation which is capable of capturing the latent dynamics of the time series. Subsequently, the obtained vector representations are put through an autoencoder and the visualisation is constructed using the activations of the “bottleneck”. The crux of the work lies with defining an objective function that quantifies the reconstruction error of these representations in a principled manner. We demonstrate the method on synthetic and real data.
We present an algorithm for the visualisation of time series. To that end we employ echo state networks to convert time series into a suitable vector representation which is capable of capturing the latent dynamics of the time series. Subsequently, the obtained vector representations are put through an autoencoder and the visualisation is constructed using the activations of the “bottleneck”. The crux of the work lies with defining an objective function that quantifies the reconstruction error of these representations in a principled manner. We demonstrate the method on synthetic and real data.
ES2015-97
Diffusion Maps parameters selection based on neighbourhood preservation
Carlos M. Alaíz, Ángela Fernández, José R. Dorronsoro
Diffusion Maps parameters selection based on neighbourhood preservation
Carlos M. Alaíz, Ángela Fernández, José R. Dorronsoro
Abstract:
Diffusion Maps is one of the leading methods for dimensionality reduction, although it requires to fix a certain number of parameters that can be crucial for its performance. This parameter selection is usually based on the expertise of the user, as there are no unified criterion for evaluating the quality of the embedding. We propose to use a neighbourhood preservation measure as the criterion for fixing these parameters. As we shall see, this approach provides good embedding parameters without needing problem specific knowledge.
Diffusion Maps is one of the leading methods for dimensionality reduction, although it requires to fix a certain number of parameters that can be crucial for its performance. This parameter selection is usually based on the expertise of the user, as there are no unified criterion for evaluating the quality of the embedding. We propose to use a neighbourhood preservation measure as the criterion for fixing these parameters. As we shall see, this approach provides good embedding parameters without needing problem specific knowledge.
ES2015-134
Unsupervised Dimensionality Reduction for Transfer Learning
Patrick Blöbaum, Alexander Schulz, Barbara Hammer
Unsupervised Dimensionality Reduction for Transfer Learning
Patrick Blöbaum, Alexander Schulz, Barbara Hammer
Abstract:
We investigate the suitability of unsupervised dimensionality reduction (DR) for transfer learning in the context of different representations of the source and target domain. Essentially, unsupervised DR establishes a link of source and target domain by representing the data in a common latent space. We consider two settings: a linear DR of source and target data which establishes correspondences of the data and an according transfer, and its combination with a nonlinear DR which allows to adapt to more complex data characterised by a global nonlinear structure.
We investigate the suitability of unsupervised dimensionality reduction (DR) for transfer learning in the context of different representations of the source and target domain. Essentially, unsupervised DR establishes a link of source and target domain by representing the data in a common latent space. We consider two settings: a linear DR of source and target data which establishes correspondences of the data and an according transfer, and its combination with a nonlinear DR which allows to adapt to more complex data characterised by a global nonlinear structure.
ES2015-74
Efficient unsupervised clustering for spatial birds population analysis along the river Loire
Aurore Payen, Ludovic Journaux, Clément Delion, Lucile Sautot, Bruno Faivre
Efficient unsupervised clustering for spatial birds population analysis along the river Loire
Aurore Payen, Ludovic Journaux, Clément Delion, Lucile Sautot, Bruno Faivre
Abstract:
This paper focuses on application and comparison of Non Linear Dimensionality Reduction (NLDR) methods on natural high dimensional bird communities dataset along the Loire River (France). In this context, biologists usually use the well-known linear PCA on their data in order to explain the longitudinal distribution pattern and find discontinuities along the upstream-downstream gradient. Unfortunately this method was unsuccessful on this kind of nonlinear dataset. The goal of this paper is to compare recent NLDR methods coupled with different data transformations in order to find out the best approach on this nonlinear real-life dataset. Results show that Multiscale Jensen-Shannon Embedding (Ms JSE) is the more successful method on this dataset.
This paper focuses on application and comparison of Non Linear Dimensionality Reduction (NLDR) methods on natural high dimensional bird communities dataset along the Loire River (France). In this context, biologists usually use the well-known linear PCA on their data in order to explain the longitudinal distribution pattern and find discontinuities along the upstream-downstream gradient. Unfortunately this method was unsuccessful on this kind of nonlinear dataset. The goal of this paper is to compare recent NLDR methods coupled with different data transformations in order to find out the best approach on this nonlinear real-life dataset. Results show that Multiscale Jensen-Shannon Embedding (Ms JSE) is the more successful method on this dataset.
ES2015-75
NLDR methods for high dimensional NIRS dataset : application to vineyard soils characterization
Clément Delion, Ludovic Journaux, Aurore Payen, Lucile Sautot, Emmanuel Chevigny, Pierre Curmi
NLDR methods for high dimensional NIRS dataset : application to vineyard soils characterization
Clément Delion, Ludovic Journaux, Aurore Payen, Lucile Sautot, Emmanuel Chevigny, Pierre Curmi
Abstract:
In the context of vineyard soils characterizationn this paper explores and compare dierent recent Non Linear Dimensionality Reduction (NLDR) methods on a high-dimensional Near InfraRed Spectroscopy (NIRS) dataset. NLDR methods are based on k-neighborhood criterion and Euclidean and fractional distances metrics are tested. esults show that Multiscale Jensen-Shannon Embedding (Ms JSE) coupled with euclidean distance outperform all over methods. Application on data is made at global scale and at dierent scale of depth of soil.
In the context of vineyard soils characterizationn this paper explores and compare dierent recent Non Linear Dimensionality Reduction (NLDR) methods on a high-dimensional Near InfraRed Spectroscopy (NIRS) dataset. NLDR methods are based on k-neighborhood criterion and Euclidean and fractional distances metrics are tested. esults show that Multiscale Jensen-Shannon Embedding (Ms JSE) coupled with euclidean distance outperform all over methods. Application on data is made at global scale and at dierent scale of depth of soil.
ES2015-137
Geometrical homotopy for data visualization
Diego Hernán Peluffo-Ordóñez, Juan Carlos Alvarado-Pérez, John Aldo Lee, Michel Verleysen
Geometrical homotopy for data visualization
Diego Hernán Peluffo-Ordóñez, Juan Carlos Alvarado-Pérez, John Aldo Lee, Michel Verleysen
Abstract:
This work presents an approach allowing for an interactive visualization of dimensionality reduction outcomes, which is based on an extended view of conventional homotopy. The pairwise functional followed from a simple homotopic function can be incorporated within a geometrical framework in order to yield a bi-parametric approach able to combine several kernel matrices. Therefore, the users can establish the mixture of kernels in an intuitive fashion by only varying two parameters. Our approach is tested by using kernel alternatives for conventional methods of spectral dimensional reduction such as multidimensional scalling, locally linear embedding and laplacian eigenmaps. Provided mixture represents every single dimensional reduction approach as well as helps users to find a suitable representation of embedded data.
This work presents an approach allowing for an interactive visualization of dimensionality reduction outcomes, which is based on an extended view of conventional homotopy. The pairwise functional followed from a simple homotopic function can be incorporated within a geometrical framework in order to yield a bi-parametric approach able to combine several kernel matrices. Therefore, the users can establish the mixture of kernels in an intuitive fashion by only varying two parameters. Our approach is tested by using kernel alternatives for conventional methods of spectral dimensional reduction such as multidimensional scalling, locally linear embedding and laplacian eigenmaps. Provided mixture represents every single dimensional reduction approach as well as helps users to find a suitable representation of embedded data.
Unsupervised learning
ES2015-28
On the equivalence between regularized NMF and similarity-augmented graph partitioning
Anthony Coutant, Hoel Le Capitaine, Philippe Leray
On the equivalence between regularized NMF and similarity-augmented graph partitioning
Anthony Coutant, Hoel Le Capitaine, Philippe Leray
Abstract:
Many papers pointed out the interest of (co-)clustering both data and features in a dataset to obtain better performances than methods focused on data only. In addition, recent work have shown that data and features lie in low dimensional manifolds embedded into the original space and this information has been introduced as regularization terms in clustering objectives. Very popular and recent examples are regularized NMF algorithms. However, these techniques have difficulties to avoid local optima and require high computation times, making them inadequate for large scale data. In this paper, we show that NMF with manifolds regularization on a binary matrix is mathematically equivalent to an edgecut partitioning in a graph augmented with manifolds information in the case of hard co-clustering. Based on these results, we explore experimentally the efficiency of regularized graph partitioning methods for hard coclustering on more relaxed datasets and show that regularized multi-level graph partitioning is much faster and often find better clustering results than regularized NMF, and other well-known algorithms.
Many papers pointed out the interest of (co-)clustering both data and features in a dataset to obtain better performances than methods focused on data only. In addition, recent work have shown that data and features lie in low dimensional manifolds embedded into the original space and this information has been introduced as regularization terms in clustering objectives. Very popular and recent examples are regularized NMF algorithms. However, these techniques have difficulties to avoid local optima and require high computation times, making them inadequate for large scale data. In this paper, we show that NMF with manifolds regularization on a binary matrix is mathematically equivalent to an edgecut partitioning in a graph augmented with manifolds information in the case of hard co-clustering. Based on these results, we explore experimentally the efficiency of regularized graph partitioning methods for hard coclustering on more relaxed datasets and show that regularized multi-level graph partitioning is much faster and often find better clustering results than regularized NMF, and other well-known algorithms.
ES2015-64
Ranking Overlap and Outlier Points in Data using Soft Kernel Spectral Clustering
Raghvendra Mall, Rocco Langone, Johan Suykens
Ranking Overlap and Outlier Points in Data using Soft Kernel Spectral Clustering
Raghvendra Mall, Rocco Langone, Johan Suykens
Abstract:
Soft clustering algorithms can handle real-life datasets better as they capture the presence of inherent overlapping clusters. A soft kernel spectral clustering (SKSC) method proposed in [1] exploited the eigen-projections of the points to assign them different cluster membership probabilities. In this paper, we detect points in dense overlapping regions as overlap points. We also identify the outlier points by exploiting the eigen-projections. We then propose novel ranking techniques using structure and similarity properties in the eigen-space to rank these overlap and outlier points. By ranking the overlap and outlier points we provide an order for the most and least influential points in the dataset. We demonstrate the effectiveness of our ranking measures on several datasets.
Soft clustering algorithms can handle real-life datasets better as they capture the presence of inherent overlapping clusters. A soft kernel spectral clustering (SKSC) method proposed in [1] exploited the eigen-projections of the points to assign them different cluster membership probabilities. In this paper, we detect points in dense overlapping regions as overlap points. We also identify the outlier points by exploiting the eigen-projections. We then propose novel ranking techniques using structure and similarity properties in the eigen-space to rank these overlap and outlier points. By ranking the overlap and outlier points we provide an order for the most and least influential points in the dataset. We demonstrate the effectiveness of our ranking measures on several datasets.
ES2015-108
Towards a Tomographic Index of Systemic Risk Measures
Kaj-Mikael Bjork, Patrick Kouontchou, Amaury Lendasse, Yoan Miché, Betrand Maillet
Towards a Tomographic Index of Systemic Risk Measures
Kaj-Mikael Bjork, Patrick Kouontchou, Amaury Lendasse, Yoan Miché, Betrand Maillet
Abstract:
Due to the recent financial crisis, several systemic risk measures have been proposed in the literature for quantifying financial system wide distress. In this note we propose an aggregated Index for financial systemic risk measurement based on EOF and ICA analyses on the several systemic risk measures released in the recent literature. We use this index to further identify the states of the market as suggested in Kouontchou et al. [2013]. We show, by characterizing markets conditions with a robust Kohonen Self-Organizing Maps algorithm that this measure is directly linked to crises markets states and there is a strong link between return and systemic risk.
Due to the recent financial crisis, several systemic risk measures have been proposed in the literature for quantifying financial system wide distress. In this note we propose an aggregated Index for financial systemic risk measurement based on EOF and ICA analyses on the several systemic risk measures released in the recent literature. We use this index to further identify the states of the market as suggested in Kouontchou et al. [2013]. We show, by characterizing markets conditions with a robust Kohonen Self-Organizing Maps algorithm that this measure is directly linked to crises markets states and there is a strong link between return and systemic risk.
ES2015-58
An objective function for self-limiting neural plasticity rules.
Rodrigo Echeveste, Claudius Gros
An objective function for self-limiting neural plasticity rules.
Rodrigo Echeveste, Claudius Gros
Abstract:
Self-organization provides a framework for the study of systems in which complex patterns emerge from simple rules, without the guidance of external agents or fine tuning of parameters. Within this framework, one can formulate a guiding principle for plasticity in the context of unsupervised learning, in terms of an objective function. In this work we derive Hebbian, self-limiting synaptic plasticity rules from such an objective function and then apply the rules to the non-linear bars problem.
Self-organization provides a framework for the study of systems in which complex patterns emerge from simple rules, without the guidance of external agents or fine tuning of parameters. Within this framework, one can formulate a guiding principle for plasticity in the context of unsupervised learning, in terms of an objective function. In this work we derive Hebbian, self-limiting synaptic plasticity rules from such an objective function and then apply the rules to the non-linear bars problem.
Kernel methods
ES2015-7
Probabilistic Classification Vector Machine at large scale
Frank-Michael Schleif, Andrej Gisbrecht, Peter Tino
Probabilistic Classification Vector Machine at large scale
Frank-Michael Schleif, Andrej Gisbrecht, Peter Tino
Abstract:
Probabilistic kernel classifiers are effective approaches to solve classification problems but only few of them can be applied to indefinite kernels as typically observed in life science problems and are often limited to rather small scale problems. We provide a novel batch formulation of the Probabilistic Classification Vector Machine for large scale metric and non-metric data.
Probabilistic kernel classifiers are effective approaches to solve classification problems but only few of them can be applied to indefinite kernels as typically observed in life science problems and are often limited to rather small scale problems. We provide a novel batch formulation of the Probabilistic Classification Vector Machine for large scale metric and non-metric data.
ES2015-111
Online Learning with Operator-valued Kernels
Julien Audiffren, Hachem Kadri
Online Learning with Operator-valued Kernels
Julien Audiffren, Hachem Kadri
Abstract:
We consider the problem of learning a vector-valued function f in an online learning setting. The function f is assumed to lie in a reproducing Hilbert space of operator-valued kernels. We describe an online algorithm for learning f while taking into account the output structure. This algorithm, OLOK, extends the standard kernel-based online learning algorithm NORMA from scalar-valued to operator-valued setting. We report a cumulative error bound that holds both for classification and regression. Our experiments show that the proposed algorithm achieves good performance results with low computational cost.
We consider the problem of learning a vector-valued function f in an online learning setting. The function f is assumed to lie in a reproducing Hilbert space of operator-valued kernels. We describe an online algorithm for learning f while taking into account the output structure. This algorithm, OLOK, extends the standard kernel-based online learning algorithm NORMA from scalar-valued to operator-valued setting. We report a cumulative error bound that holds both for classification and regression. Our experiments show that the proposed algorithm achieves good performance results with low computational cost.
ES2015-45
Online One-class Classification for Intrusion Detection Based on the Mahalanobis Distance
Patric Nader, Paul Honeine, Pierre Beauseroy
Online One-class Classification for Intrusion Detection Based on the Mahalanobis Distance
Patric Nader, Paul Honeine, Pierre Beauseroy
Abstract:
Machine learning techniques have been very popular in the past decade for their ability to detect hidden patterns in large volumes of data. Researchers have been developing online intrusion detection algorithms based on these techniques. In this paper, we propose an online one-class classification approach based on the Mahalanobis distance which takes into account the covariance in each feature direction and the different scaling of the coordinate axes. We define the one-class problem by two concentric hyperspheres enclosing the support vectors of the description. We update the classifier at each time step. The tests are conducted on real data.
Machine learning techniques have been very popular in the past decade for their ability to detect hidden patterns in large volumes of data. Researchers have been developing online intrusion detection algorithms based on these techniques. In this paper, we propose an online one-class classification approach based on the Mahalanobis distance which takes into account the covariance in each feature direction and the different scaling of the coordinate axes. We define the one-class problem by two concentric hyperspheres enclosing the support vectors of the description. We update the classifier at each time step. The tests are conducted on real data.
ES2015-34
I/S-Race: An iterative Multi-Objective Racing Algorithm for the SVM Parameter Selection Problem
Miranda Péricles, Ricardo Silva, Ricardo Prudêncio
I/S-Race: An iterative Multi-Objective Racing Algorithm for the SVM Parameter Selection Problem
Miranda Péricles, Ricardo Silva, Ricardo Prudêncio
Abstract:
Finding appropriate values for the parameters of an algorithm is an important and time consuming task. Recent studies have shown that racing algorithms can effectively handle this task. This paper presents a multi-objective racing algorithm called iterative S-Race (I/S-Race), which efficiently addresses multi-objective model selection problems in the sense of Pareto optimality. We evaluate the I/S-Race for selecting parameters of SVMs, considering 20 widely-used classification datasets. The results revealed that the I/S-Race is an efficient and effective algorithm for automatic model selection, when compared to a brute-force multi-objective selection approach and the S-Race algorithm.
Finding appropriate values for the parameters of an algorithm is an important and time consuming task. Recent studies have shown that racing algorithms can effectively handle this task. This paper presents a multi-objective racing algorithm called iterative S-Race (I/S-Race), which efficiently addresses multi-objective model selection problems in the sense of Pareto optimality. We evaluate the I/S-Race for selecting parameters of SVMs, considering 20 widely-used classification datasets. The results revealed that the I/S-Race is an efficient and effective algorithm for automatic model selection, when compared to a brute-force multi-objective selection approach and the S-Race algorithm.
ES2015-110
SMO Lattices for the Parallel Training of Support Vector Machines
Markus Kächele, Günther Palm, Friedhelm Schwenker
SMO Lattices for the Parallel Training of Support Vector Machines
Markus Kächele, Günther Palm, Friedhelm Schwenker
Abstract:
In this work, a method is proposed to train Support Vector Machines in parallel. The difference to other parallel implementations is that the problem is decomposed into hierarchically connected nodes and that each node does not have to fully optimize its local problem. Instead Lagrange multipliers are filtered and transferred between nodes during runtime, with important ones ascending and unimportant ones descending inside the architecture. Experimental validation demonstrates the advantages in terms of speed in comparison to other approaches.
In this work, a method is proposed to train Support Vector Machines in parallel. The difference to other parallel implementations is that the problem is decomposed into hierarchically connected nodes and that each node does not have to fully optimize its local problem. Instead Lagrange multipliers are filtered and transferred between nodes during runtime, with important ones ascending and unimportant ones descending inside the architecture. Experimental validation demonstrates the advantages in terms of speed in comparison to other approaches.
ES2015-59
Pareto front of bi-objective kernel-based nonnegative matrix factorization
Fei Zhu, Paul Honeine
Pareto front of bi-objective kernel-based nonnegative matrix factorization
Fei Zhu, Paul Honeine
Abstract:
The nonnegative matrix factorization (NMF) is a powerful data analysis and dimensionality reduction technique. So far, the NMF has been limited to a single-objective problem in either its linear or nonlinear kernel-based formulation. This paper presents a novel bi-objective NMF model based on kernel machines, where the decomposition is performed simultaneously in both input and feature spaces. The problem is solved employing the sum-weighted approach. Without loss of generality, we study the case of the Gaussian kernel, where the multiplicative update rules are derived and the Pareto front is approximated. The performance of the proposed method is demonstrated for unmixing hyperspectral images.
The nonnegative matrix factorization (NMF) is a powerful data analysis and dimensionality reduction technique. So far, the NMF has been limited to a single-objective problem in either its linear or nonlinear kernel-based formulation. This paper presents a novel bi-objective NMF model based on kernel machines, where the decomposition is performed simultaneously in both input and feature spaces. The problem is solved employing the sum-weighted approach. Without loss of generality, we study the case of the Gaussian kernel, where the multiplicative update rules are derived and the Pareto front is approximated. The performance of the proposed method is demonstrated for unmixing hyperspectral images.
ES2015-69
Learning missing edges via kernels in partially-known graphs
Senka Krivic, Sandor Szedmak, Hanchen Xiong, Justus Piater
Learning missing edges via kernels in partially-known graphs
Senka Krivic, Sandor Szedmak, Hanchen Xiong, Justus Piater
Abstract:
This paper deals with the problem of learning unknown edges with attributes in a partially given multigraph. The method is an extension of the Maximum Margin Multi-Valued Regression (M³VR) to the case where those edges are characterized by different attributes. It is applied on a large scale problem where an agent tries to learn unknown object-object relations by exploiting known such relations. The method can handle not only binary relations but also complex structured relations such as text, images, collections of labels, categories, etc., which can be represented by kernels. We compared the performance with specialized state-of-art matrix completion method.
This paper deals with the problem of learning unknown edges with attributes in a partially given multigraph. The method is an extension of the Maximum Margin Multi-Valued Regression (M³VR) to the case where those edges are characterized by different attributes. It is applied on a large scale problem where an agent tries to learn unknown object-object relations by exploiting known such relations. The method can handle not only binary relations but also complex structured relations such as text, images, collections of labels, categories, etc., which can be represented by kernels. We compared the performance with specialized state-of-art matrix completion method.