Bruges, Belgium, April 23-24-25
Content of the proceedings
-
Dynamical and recurrent systems, control
Feature selection, imputation and projection
Machine learning methods in cancer research
Learning I
Biological systems and biologically-inspired networks
Clustering and vector quantization
Methodology and standards for data analysis with machine learning tools
Learning II
Neural Networks for Computational Neuroscience
Kernel methods
Machine Learning Approaches and Pattern Recognition for Spectral Data
Learning III
Dynamical and recurrent systems, control
ES2008-88
Pruning and Regularisation in Reservoir Computing: a First Insight
Xavier Dutoit, Benjamin Schrauwen, Jan Van Campenhout, Dirk Stroobandt, Hendrik Van Brussel, Marnix Nuttin
Pruning and Regularisation in Reservoir Computing: a First Insight
Xavier Dutoit, Benjamin Schrauwen, Jan Van Campenhout, Dirk Stroobandt, Hendrik Van Brussel, Marnix Nuttin
Abstract:
Reservoir Computing is a new paradigm to use recurrent neural networks which has promising performances. However, as the recurrent part is created randomly, it typically needs to be large enough to be able to capture the dynamic features of the data considered. Moreover, this random creation is still lacking a strong methodology. We propose to study how pruning some connections from the reservoir to the readout can help to increase the generalisation ability in much the same way as regularisation techniques do, and improve the implementability of reservoirs in hardware. Furthermore we study actual sub-reservoir which is kept after pruning which leads to important insights in what we have to expect from a good reservoir.
Reservoir Computing is a new paradigm to use recurrent neural networks which has promising performances. However, as the recurrent part is created randomly, it typically needs to be large enough to be able to capture the dynamic features of the data considered. Moreover, this random creation is still lacking a strong methodology. We propose to study how pruning some connections from the reservoir to the readout can help to increase the generalisation ability in much the same way as regularisation techniques do, and improve the implementability of reservoirs in hardware. Furthermore we study actual sub-reservoir which is kept after pruning which leads to important insights in what we have to expect from a good reservoir.
ES2008-114
Design of Oscillatory Recurrent Neural Network Controllers with Gradient based Algorithms
Guillaume Jouffroy
Design of Oscillatory Recurrent Neural Network Controllers with Gradient based Algorithms
Guillaume Jouffroy
Abstract:
In this paper we address the question of the search of parameters value for recurrent neural networks to have an oscillatory behavior. A generalized partial teacher forcing is formalized when target signals are not all available. The drawbacks of these algorithms are covered, and a modified version is proposed toward a better general hybrid partial teacher forcing algorithm. The scope of shaping the oscillator is addressed.
In this paper we address the question of the search of parameters value for recurrent neural networks to have an oscillatory behavior. A generalized partial teacher forcing is formalized when target signals are not all available. The drawbacks of these algorithms are covered, and a modified version is proposed toward a better general hybrid partial teacher forcing algorithm. The scope of shaping the oscillator is addressed.
ES2008-46
Learning Inverse Dynamics: a Comparison
Duy Nguyen Tuong, Jan Peters, Matthias Seeger, Bernhard Schoelkopf
Learning Inverse Dynamics: a Comparison
Duy Nguyen Tuong, Jan Peters, Matthias Seeger, Bernhard Schoelkopf
Abstract:
While it is well-known that model can enhance the control performance in terms of precision or energy efficiency, the practical application has often been limited by the complexities of manually obtaining sufficiently accurate models. In the past, learning has proven a viable alternative to using a combination of rigid-body dynamics and handcrafted approximations of nonlinearities. However, a major open question is what nonparametric learning method is suited best for learning dynamics? Traditionally, locally weighted projection regression (LWPR), has been the standard method as it is capable of online, real-time learning for very complex robots. However, while LWPR has had significant impact on learning in robotics, alternative nonparametric regression methods such as support vector regression (SVR) and Gaussian processes regression (GPR) offer interesting alternatives with fewer open parameters and potentially higher accuracy. In this paper, we evaluate these three alternatives for model learning. Our comparison consists out of the evaluation of learning quality for each regression method using original data from SARCOS robot arm, as well as the robot tracking performance employing learned models. The results show that GPR and SVR achieve a superior learning precision and can be applied for real-time control obtaining higher accuracy. However, for the online learning LWPR presents the better method due to its lower computational requirements.
While it is well-known that model can enhance the control performance in terms of precision or energy efficiency, the practical application has often been limited by the complexities of manually obtaining sufficiently accurate models. In the past, learning has proven a viable alternative to using a combination of rigid-body dynamics and handcrafted approximations of nonlinearities. However, a major open question is what nonparametric learning method is suited best for learning dynamics? Traditionally, locally weighted projection regression (LWPR), has been the standard method as it is capable of online, real-time learning for very complex robots. However, while LWPR has had significant impact on learning in robotics, alternative nonparametric regression methods such as support vector regression (SVR) and Gaussian processes regression (GPR) offer interesting alternatives with fewer open parameters and potentially higher accuracy. In this paper, we evaluate these three alternatives for model learning. Our comparison consists out of the evaluation of learning quality for each regression method using original data from SARCOS robot arm, as well as the robot tracking performance employing learned models. The results show that GPR and SVR achieve a superior learning precision and can be applied for real-time control obtaining higher accuracy. However, for the online learning LWPR presents the better method due to its lower computational requirements.
ES2008-8
Model-Based Reinforcement Learning with Continuous States and Actions
Marc Deisenroth, Carl Edward Rasmussen, Jan Peters
Model-Based Reinforcement Learning with Continuous States and Actions
Marc Deisenroth, Carl Edward Rasmussen, Jan Peters
Abstract:
Finding an optimal policy in a reinforcement learning (RL) framework with continuous state and action spaces is challenging. Approximate solutions are often inevitable. GPDP is an approximate dynamic programming algorithm based on Gaussian process (GP) models for the value functions. In this paper, we extend GPDP to the case of unknown transition dynamics. After building a GP model for the transition dynamics, we apply GPDP to this model and determine a continuous-valued policy in the entire state space. We apply the resulting controller to the underpowered pendulum swing up. Moreover, we compare our results on this RL task to a nearly optimal discrete DP solution in a fully known environment.
Finding an optimal policy in a reinforcement learning (RL) framework with continuous state and action spaces is challenging. Approximate solutions are often inevitable. GPDP is an approximate dynamic programming algorithm based on Gaussian process (GP) models for the value functions. In this paper, we extend GPDP to the case of unknown transition dynamics. After building a GP model for the transition dynamics, we apply GPDP to this model and determine a continuous-valued policy in the entire state space. We apply the resulting controller to the underpowered pendulum swing up. Moreover, we compare our results on this RL task to a nearly optimal discrete DP solution in a fully known environment.
Feature selection, imputation and projection
ES2008-39
Using the Delta Test for Variable Selection
Emil Eirola, Elia Liitiäinen, Amaury Lendasse, Francesco Corona, Michel Verleysen
Using the Delta Test for Variable Selection
Emil Eirola, Elia Liitiäinen, Amaury Lendasse, Francesco Corona, Michel Verleysen
Abstract:
Input selection is an important consideration in all large-scale modelling problems. We propose that using an established noise variance estimator known as the Delta test as the target to minimise can provide an effective input selection methodology. Theoretical justifications and experimental results are presented.
Input selection is an important consideration in all large-scale modelling problems. We propose that using an established noise variance estimator known as the Delta test as the target to minimise can provide an effective input selection methodology. Theoretical justifications and experimental results are presented.
ES2008-74
Metric adaptation for supervised attribute rating
Marc Strickert, Frank-Michael Schleif, Thomas Villmann
Metric adaptation for supervised attribute rating
Marc Strickert, Frank-Michael Schleif, Thomas Villmann
Abstract:
A new approach for faithful relevance rating of attributes is proposed, enabling class-specific discriminatory data space transformations. The method is based on the adaptation of the underlying data similarity measure by using class information linked to the data vectors. For adaptive Minkowski metrics and parametric Pearson similarity, the obtained attribute weights can be used for back-transforming data for further analysis with methods utilizing non-adapted measures as demonstrated for benchmark and mass spectrum data.
A new approach for faithful relevance rating of attributes is proposed, enabling class-specific discriminatory data space transformations. The method is based on the adaptation of the underlying data similarity measure by using class information linked to the data vectors. For adaptive Minkowski metrics and parametric Pearson similarity, the obtained attribute weights can be used for back-transforming data for further analysis with methods utilizing non-adapted measures as demonstrated for benchmark and mass spectrum data.
ES2008-42
K-nearest neighbours based on mutual information for incomplete data classification
Pedro J. García-Laencina, José-Luis Sancho-Gómez, Anibal R. Figueiras-Vidal, Michel Verleysen
K-nearest neighbours based on mutual information for incomplete data classification
Pedro J. García-Laencina, José-Luis Sancho-Gómez, Anibal R. Figueiras-Vidal, Michel Verleysen
Abstract:
Incomplete data is a common drawback that machine learning techniques need to deal with when solving real-life classification tasks. One of the most popular procedures for solving this kind of problems is the K-nearest neighbours (KNN) algorithm. In this paper, we present a weighted KNN approach using mutual information to impute and classify incomplete input data. Numerical results on both artificial and real data are given to demonstrate the effectiveness of the proposed method.
Incomplete data is a common drawback that machine learning techniques need to deal with when solving real-life classification tasks. One of the most popular procedures for solving this kind of problems is the K-nearest neighbours (KNN) algorithm. In this paper, we present a weighted KNN approach using mutual information to impute and classify incomplete input data. Numerical results on both artificial and real data are given to demonstrate the effectiveness of the proposed method.
ES2008-66
Nonlinear data projection on a sphere with controlled trade-off between trustworthiness and continuity
Victor Onclinx, Vincent Wertz, Michel Verleysen
Nonlinear data projection on a sphere with controlled trade-off between trustworthiness and continuity
Victor Onclinx, Vincent Wertz, Michel Verleysen
Abstract:
This paper presents a nonlinear data projection method aimed to project data on a non-Euclidean manifold, when their structure is too complex to be embedded in an Euclidean space. The method optimizes a pairwise distance criterion that implements a control between trustworthiness and continuity, two criteria that respectively represent the risks of flattening and tearing the projection. The method is illustrated to project data on a sphere, but can be extended to other manifolds such as the torus and the cylinder.
This paper presents a nonlinear data projection method aimed to project data on a non-Euclidean manifold, when their structure is too complex to be embedded in an Euclidean space. The method optimizes a pairwise distance criterion that implements a control between trustworthiness and continuity, two criteria that respectively represent the risks of flattening and tearing the projection. The method is illustrated to project data on a sphere, but can be extended to other manifolds such as the torus and the cylinder.
ES2008-105
Rank-based quality assessment of nonlinear dimensionality reduction
John Lee, Michel Verleysen
Rank-based quality assessment of nonlinear dimensionality reduction
John Lee, Michel Verleysen
Abstract:
Nonlinear dimensionality reduction aims at providing low-dimensional representions of high-dimensional data sets. Many new methods have been proposed in the recent years, but the question of their assessment and comparison remains open. This paper reviews some of the existing quality measures that are based on distance ranking and $K$-ary neighborhoods. Many quality criteria actually rely on the analysis of one or several sub-blocks of a co-ranking matrix. The analogy between the co-ranking matrix and a Shepard diagram is highlighted. Finally, a unifying framework is sketched, new measures are proposed and illustrated in a short experiment.
Nonlinear dimensionality reduction aims at providing low-dimensional representions of high-dimensional data sets. Many new methods have been proposed in the recent years, but the question of their assessment and comparison remains open. This paper reviews some of the existing quality measures that are based on distance ranking and $K$-ary neighborhoods. Many quality criteria actually rely on the analysis of one or several sub-blocks of a co-ranking matrix. The analogy between the co-ranking matrix and a Shepard diagram is highlighted. Finally, a unifying framework is sketched, new measures are proposed and illustrated in a short experiment.
Machine learning methods in cancer research
ES2008-6
Machine learning in cancer research: implications for personalised medicine
Alfredo Vellido, Elia Biganzoli, Paulo J.G. Lisboa
Machine learning in cancer research: implications for personalised medicine
Alfredo Vellido, Elia Biganzoli, Paulo J.G. Lisboa
ES2008-81
Multi-class classification of ovarian tumors
Ben Van Calster, Dirk Timmerman, Antonia C. Testa, Lil Valentin, Sabine Van Huffel
Multi-class classification of ovarian tumors
Ben Van Calster, Dirk Timmerman, Antonia C. Testa, Lil Valentin, Sabine Van Huffel
Abstract:
In this work, we developed classifiers to distinguish between four ovarian tumor types using Bayesian least squares support vector machines and kernel logistic regression. A rank-one update method for input selection performed better than automatic relevance determination, and was used to select inputs for the classifiers. Evaluation on an independent test set showed good performance of the classifiers to distinguish between all groups, even the small and difficult group of borderline tumors.
In this work, we developed classifiers to distinguish between four ovarian tumor types using Bayesian least squares support vector machines and kernel logistic regression. A rank-one update method for input selection performed better than automatic relevance determination, and was used to select inputs for the classifiers. Evaluation on an independent test set showed good performance of the classifiers to distinguish between all groups, even the small and difficult group of borderline tumors.
ES2008-63
A new method of DNA probes selection and its use with multi-objective neural network for predicting the outcome of breast cancer preoperative chemotherapy
René Natowicz, Antônio Braga, Roberto Incitti, Euler Horta, Roman Rouzier, Thiago Rodrigues, Marcelo Costa
A new method of DNA probes selection and its use with multi-objective neural network for predicting the outcome of breast cancer preoperative chemotherapy
René Natowicz, Antônio Braga, Roberto Incitti, Euler Horta, Roman Rouzier, Thiago Rodrigues, Marcelo Costa
Abstract:
DNA microarrays technology has emerged as a major tool to explore cancer biology and solve clinical issues. The response to chemotherapy represents such an issue because its prediction would make it possible to give the patients the most appropriate chemotherapy regimen. We propose a new method of probe selection, and predictors designed with multi-objective neural network (MOBJ-NN) taking as input the individual predictions of the selected probes. The novelty of this paper is to link the method of probes selection and the MOBJ-NN model.
DNA microarrays technology has emerged as a major tool to explore cancer biology and solve clinical issues. The response to chemotherapy represents such an issue because its prediction would make it possible to give the patients the most appropriate chemotherapy regimen. We propose a new method of probe selection, and predictors designed with multi-objective neural network (MOBJ-NN) taking as input the individual predictions of the selected probes. The novelty of this paper is to link the method of probes selection and the MOBJ-NN model.
ES2008-71
Feature Selection in Proton Magnetic Resonance Spectroscopy for Brain Tumor Classification
Félix Fernando González Navarro, Luis Antonio Belanche Muñoz
Feature Selection in Proton Magnetic Resonance Spectroscopy for Brain Tumor Classification
Félix Fernando González Navarro, Luis Antonio Belanche Muñoz
Abstract:
1H-MRS is a technique that uses response of protons under certain magnetic conditions to reveal biochemical structure of human tissue. An important application is found in brain tumor diagnosis, due to the known complications of physical exploration and as a help to other kind of non invasive methods. It is possible to analize spectral data with machine learning methods to classify tumor classes in an automated fashion. One important characteristic of these data is its high dimensionality. In this work we present a contribution to lighten this situation with an algorithm based on entropic measures of subsets of spectral data. Experimental results show that the approach used has a good classification performance, in terms of prediction accuracy and number of involved spectra.
1H-MRS is a technique that uses response of protons under certain magnetic conditions to reveal biochemical structure of human tissue. An important application is found in brain tumor diagnosis, due to the known complications of physical exploration and as a help to other kind of non invasive methods. It is possible to analize spectral data with machine learning methods to classify tumor classes in an automated fashion. One important characteristic of these data is its high dimensionality. In this work we present a contribution to lighten this situation with an algorithm based on entropic measures of subsets of spectral data. Experimental results show that the approach used has a good classification performance, in terms of prediction accuracy and number of involved spectra.
ES2008-103
A method for robust variable selection with significance assessment
Annalisa Barla, Sofia Mosci, Lorenzo Rosasco, Alessandro Verri
A method for robust variable selection with significance assessment
Annalisa Barla, Sofia Mosci, Lorenzo Rosasco, Alessandro Verri
Abstract:
Our goal is proposing an unbiased framework for gene expression data analysis based on variable selection combined with a significance assessment step. We start discussing the need of such a framework by illustrating the dramatic effect of a biased approach especially when the sample size is small. Then we describe our analysis protocol, based on two main ingredients. The first is a variable selection core based on elastic net regularization where we explicitly take into account regularization parameter tuning. The second, is a general architecture to assess the statistical significance of the model via cross validation and permutation testing. Finally we challenge the system on real data experiments studying its performance when changing variable selection algorithm or dealing with small sample datasets.
Our goal is proposing an unbiased framework for gene expression data analysis based on variable selection combined with a significance assessment step. We start discussing the need of such a framework by illustrating the dramatic effect of a biased approach especially when the sample size is small. Then we describe our analysis protocol, based on two main ingredients. The first is a variable selection core based on elastic net regularization where we explicitly take into account regularization parameter tuning. The second, is a general architecture to assess the statistical significance of the model via cross validation and permutation testing. Finally we challenge the system on real data experiments studying its performance when changing variable selection algorithm or dealing with small sample datasets.
ES2008-95
Survival SVM: a practical scalable algorithm
Vanya Van Belle, Kristiaan Pelckmans, Johan Suykens, Sabine Van Huffel
Survival SVM: a practical scalable algorithm
Vanya Van Belle, Kristiaan Pelckmans, Johan Suykens, Sabine Van Huffel
Abstract:
This work advances the Support Vector Machine (SVM) based approach for predictive modeling of failure time data. The main results concern a drastic reduction in computation time, an improved criterion for model selection, and the use of additive models for improved interpretability in this context. Particular attention is given towards the influence of right-censoring in the methods. The approach is illustrated on a case-study in prostate cancer.
This work advances the Support Vector Machine (SVM) based approach for predictive modeling of failure time data. The main results concern a drastic reduction in computation time, an improved criterion for model selection, and the use of additive models for improved interpretability in this context. Particular attention is given towards the influence of right-censoring in the methods. The approach is illustrated on a case-study in prostate cancer.
ES2008-52
DSS-oriented exploration of a multi-centre magnetic resonance spectroscopy brain tumour dataset through visualization
Enrique Romero, Margarida Julià-Sapé, Alfredo Vellido
DSS-oriented exploration of a multi-centre magnetic resonance spectroscopy brain tumour dataset through visualization
Enrique Romero, Margarida Julià-Sapé, Alfredo Vellido
Abstract:
The exploration of brain tumours usually requires non-invasive techniques such as Magnetic Resonance Imaging (MRI) or Magnetic Resonance Spectroscopy (MRS). While radiologists are used to interpreting MRI, many of them are not used to the biochemical information provided by MRS. In this situation, radiologists may benefit from the use of computer-based support for their decisions. As part of the development of a medical Decision Support System (DSS), MRS data corresponding to various tumour pathologies are used to assist expert diagnosis. The high dimensionality of the data might obscure peculiarities and anomalies that would jeopardize automated DSS diagnostic assistance. We illustrate how visualization, combined with expert opinion, can be used to explore data in a process that should improve computer-based tumour classification.
The exploration of brain tumours usually requires non-invasive techniques such as Magnetic Resonance Imaging (MRI) or Magnetic Resonance Spectroscopy (MRS). While radiologists are used to interpreting MRI, many of them are not used to the biochemical information provided by MRS. In this situation, radiologists may benefit from the use of computer-based support for their decisions. As part of the development of a medical Decision Support System (DSS), MRS data corresponding to various tumour pathologies are used to assist expert diagnosis. The high dimensionality of the data might obscure peculiarities and anomalies that would jeopardize automated DSS diagnostic assistance. We illustrate how visualization, combined with expert opinion, can be used to explore data in a process that should improve computer-based tumour classification.
ES2008-104
Handling almost-deterministic relationships in constraint-based Bayesian network discovery : Application to cancer risk factor identification
Sergio Rodrigues de Morais, Alexandre Aussem, Marilys Corbex
Handling almost-deterministic relationships in constraint-based Bayesian network discovery : Application to cancer risk factor identification
Sergio Rodrigues de Morais, Alexandre Aussem, Marilys Corbex
Abstract:
In this paper, we discuss simple methods for identification and handling of almost-deterministic relationships (ADR) in automatic constraint-based Bayesian network structure discovery. The problem with ADR is that conditional independence tests become unreliable when the conditional set almost-determine one of the variable in the test. Such errors have usually a cascading effect that causes many errors in the final graph. Several methods for identification and handling of ADR are discussed to provide insight into their advantages and disadvantages. The methods are applied on standard benchmarks to recover the original structure from data in order to assess their capabilities. We then discuss efforts to apply ours findings to Nasopharyngeal Carcinoma (NPC) survey data. The aim is to help identify the important risk factors involved in the NPC cancer.
In this paper, we discuss simple methods for identification and handling of almost-deterministic relationships (ADR) in automatic constraint-based Bayesian network structure discovery. The problem with ADR is that conditional independence tests become unreliable when the conditional set almost-determine one of the variable in the test. Such errors have usually a cascading effect that causes many errors in the final graph. Several methods for identification and handling of ADR are discussed to provide insight into their advantages and disadvantages. The methods are applied on standard benchmarks to recover the original structure from data in order to assess their capabilities. We then discuss efforts to apply ours findings to Nasopharyngeal Carcinoma (NPC) survey data. The aim is to help identify the important risk factors involved in the NPC cancer.
Learning I
ES2008-18
Using graph-theoretic measures to predict the performance of associative memory models
Lee Calcraft, Rod Adams, Weiliang Chen, Neil Davey
Using graph-theoretic measures to predict the performance of associative memory models
Lee Calcraft, Rod Adams, Weiliang Chen, Neil Davey
Abstract:
We test a selection of associative memory models built with different connection strategies, exploring the relationship between the structural properties of each network and its pattern-completion performance. It is found that the Local Efficiency of the network can be used to predict pattern completion performance for associative memory models built with a range of different connection strategies. This relationship is maintained as the networks are scaled up in size, but breaks down under conditions of very sparse connectivity.
We test a selection of associative memory models built with different connection strategies, exploring the relationship between the structural properties of each network and its pattern-completion performance. It is found that the Local Efficiency of the network can be used to predict pattern completion performance for associative memory models built with a range of different connection strategies. This relationship is maintained as the networks are scaled up in size, but breaks down under conditions of very sparse connectivity.
ES2008-76
A novel autoassociative memory on the complex hypercubic lattice
Rama Murthy Garimella, Praveen Dasigi
A novel autoassociative memory on the complex hypercubic lattice
Rama Murthy Garimella, Praveen Dasigi
Abstract:
In this paper we have defined a novel activation function called the multi-level signum for the real and complex valued associative memories. The major motivation of such a function is to increase the number of patterns that can be stored in a memory without increasing the number of neurons. The state of such a network can be described as one of the points that lie on a complex bounded lattice. The convergence behavior of such a network is observed which is supported with the simulation results performed on a sample dataset of 1000 instances
In this paper we have defined a novel activation function called the multi-level signum for the real and complex valued associative memories. The major motivation of such a function is to increase the number of patterns that can be stored in a memory without increasing the number of neurons. The state of such a network can be described as one of the points that lie on a complex bounded lattice. The convergence behavior of such a network is observed which is supported with the simulation results performed on a sample dataset of 1000 instances
ES2008-79
Word recognition and incremental learning based on neural associative memories and hidden Markov models
Zoehre Kara Kayikci, Günther Palm
Word recognition and incremental learning based on neural associative memories and hidden Markov models
Zoehre Kara Kayikci, Günther Palm
Abstract:
An architecture for achieving word recognition and incremental learning of new words in a language processing system is presented. The architecture is based on neural associative memories and hidden Markov models. The hidden Markov models generate subword-unit transcriptions of the spoken words and provide them as input to the associative memory module. The associative memory module is a network of binary auto- and heteroassociative memories and responsible for combining words from subword-units. The basic version of the system is implemented for simple command sentences. Its performance is compared with the performance of the hidden Markov models.
An architecture for achieving word recognition and incremental learning of new words in a language processing system is presented. The architecture is based on neural associative memories and hidden Markov models. The hidden Markov models generate subword-unit transcriptions of the spoken words and provide them as input to the associative memory module. The associative memory module is a network of binary auto- and heteroassociative memories and responsible for combining words from subword-units. The basic version of the system is implemented for simple command sentences. Its performance is compared with the performance of the hidden Markov models.
ES2008-69
Conditional prediction of time series using spiral recurrent neural network
Huaien Gao, Rudolf Sollacher
Conditional prediction of time series using spiral recurrent neural network
Huaien Gao, Rudolf Sollacher
Abstract:
Frequently, sequences of state transitions are triggered by specific signals. Learning these triggered sequences with recurrent neural networks implies storing them as different attractors of the recurrent hidden layer dynamics. A challenging test and also useful for application is conditional prediction of sequences giving just the trigger signal as an input and letting the recurrent neural network evolve the sequences automatically. This paper addresses this problem with the spiral recurrent neural network (SpiralRNN) architecture.
Frequently, sequences of state transitions are triggered by specific signals. Learning these triggered sequences with recurrent neural networks implies storing them as different attractors of the recurrent hidden layer dynamics. A challenging test and also useful for application is conditional prediction of sequences giving just the trigger signal as an input and letting the recurrent neural network evolve the sequences automatically. This paper addresses this problem with the spiral recurrent neural network (SpiralRNN) architecture.
ES2008-118
Learning to play Tetris applying reinforcement learning methods
Alexander Gross, Jan Friedland, Friedhelm Schwenker
Learning to play Tetris applying reinforcement learning methods
Alexander Gross, Jan Friedland, Friedhelm Schwenker
Abstract:
In this paper the application of reinforcement learning to Tetris is investigated, particulary the idea of temporal difference learning is applied to estimate the state value function V. For two predefined reward functions Tetris agents have been trained by using a $\epsilon$-greedy policy.
In this paper the application of reinforcement learning to Tetris is investigated, particulary the idea of temporal difference learning is applied to estimate the state value function V. For two predefined reward functions Tetris agents have been trained by using a $\epsilon$-greedy policy.
ES2008-80
QL2, a simple reinforcement learning scheme for two-player zero-sum Markov games
Benoît Frénay, Marco Saerens
QL2, a simple reinforcement learning scheme for two-player zero-sum Markov games
Benoît Frénay, Marco Saerens
Abstract:
Markov games is a framework which formalises n-agent reinforcement learning. For instance, Littman proposed the minimax-Q algorithm to model two-agent zero-sum problems. This paper proposes a new simple algorithm, QL2, and compares it to several standard algorithms (Q-learning, Minimax and minimax-Q). Experiments show that QL2 converges to optimal mixed policies, as minimax-Q, while using a surprisingly simple and cheap gradient-based updating rule.
Markov games is a framework which formalises n-agent reinforcement learning. For instance, Littman proposed the minimax-Q algorithm to model two-agent zero-sum problems. This paper proposes a new simple algorithm, QL2, and compares it to several standard algorithms (Q-learning, Minimax and minimax-Q). Experiments show that QL2 converges to optimal mixed policies, as minimax-Q, while using a surprisingly simple and cheap gradient-based updating rule.
ES2008-36
Safe exploration for reinforcement learning
Alexander Hans, Daniel Schneegass, Anton Maximilian Schäfer, Steffen Udluft
Safe exploration for reinforcement learning
Alexander Hans, Daniel Schneegass, Anton Maximilian Schäfer, Steffen Udluft
Abstract:
In this paper we define and address the problem of safe exploration in the context of reinforcement learning. Our notion of safety is concerned with states or transitions that can lead to damage and thus must be avoided. We introduce the concepts of a safety function for determining a state's safety degree and that of a backup policy that is able to lead the controlled system from a critical state back to a safe one. Moreover, we present a level-based exploration scheme that is able to generate a comprehenive base of observations while adhering safety constraints. We evaluate our approach on a simplified simulation of a gas turbine.
In this paper we define and address the problem of safe exploration in the context of reinforcement learning. Our notion of safety is concerned with states or transitions that can lead to damage and thus must be avoided. We introduce the concepts of a safety function for determining a state's safety degree and that of a backup policy that is able to lead the controlled system from a critical state back to a safe one. Moreover, we present a level-based exploration scheme that is able to generate a comprehenive base of observations while adhering safety constraints. We evaluate our approach on a simplified simulation of a gas turbine.
ES2008-47
Similarities and differences between policy gradient methods and evolution strategies
Verena Heidrich-Meisner, Christian Igel
Similarities and differences between policy gradient methods and evolution strategies
Verena Heidrich-Meisner, Christian Igel
Abstract:
Natural policy gradient methods and the covariance matrix adaptation evolution strategy, two variable metric methods proposed for solving reinforcement learning tasks, are contrasted to point out their conceptual similarities and differences. Experiments on the cart pole benchmark are conducted as a first attempt to compare their performance.
Natural policy gradient methods and the covariance matrix adaptation evolution strategy, two variable metric methods proposed for solving reinforcement learning tasks, are contrasted to point out their conceptual similarities and differences. Experiments on the cart pole benchmark are conducted as a first attempt to compare their performance.
ES2008-119
Improvement in Game Agent Control Using State-Action Value Scaling
Leo Galway, Darryl Charles, Michaela Black
Improvement in Game Agent Control Using State-Action Value Scaling
Leo Galway, Darryl Charles, Michaela Black
Abstract:
The aim of this paper is to enhance the performance of a reinforcement learning game agent controller, within a dynamic game environment, through the retention of learned information between a series of consecutive games. Using a variation of the classic arcade game Pac-Man, the Sarsa algorithm has been utilised for the control of the game agent. The results indicate the use of state-action value scaling between games as successful in preserving prior knowledge, therefore improving the performance of the game agent.
The aim of this paper is to enhance the performance of a reinforcement learning game agent controller, within a dynamic game environment, through the retention of learned information between a series of consecutive games. Using a variation of the classic arcade game Pac-Man, the Sarsa algorithm has been utilised for the control of the game agent. The results indicate the use of state-action value scaling between games as successful in preserving prior knowledge, therefore improving the performance of the game agent.
ES2008-23
Multilayer Perceptrons with Radial Basis Functions as Value Functions in Reinforcement Learning
Victor Uc Cetina
Multilayer Perceptrons with Radial Basis Functions as Value Functions in Reinforcement Learning
Victor Uc Cetina
Abstract:
Using multilayer perceptrons (MLPs) to approximate the state-action value function in reinforcement learning (RL) algorithms could become a nightmare due to the constant possibility of unlearning past experiences. Moreover, since the target values in the training examples are bootstraps values, this is, estimates of other estimates, the chances to get stuck in a local minimum are increased. These problems occur very often in the mountain car task, as showed by Boyan and Moore. In this paper we present empirical evidence showing that MLPs augmented with one layer of radial basis functions (RBFs) can avoid these problems. Our experimental testbeds are the mountain car task and a difficult robot learning problem.
Using multilayer perceptrons (MLPs) to approximate the state-action value function in reinforcement learning (RL) algorithms could become a nightmare due to the constant possibility of unlearning past experiences. Moreover, since the target values in the training examples are bootstraps values, this is, estimates of other estimates, the chances to get stuck in a local minimum are increased. These problems occur very often in the mountain car task, as showed by Boyan and Moore. In this paper we present empirical evidence showing that MLPs augmented with one layer of radial basis functions (RBFs) can avoid these problems. Our experimental testbeds are the mountain car task and a difficult robot learning problem.
ES2008-73
Selection of important input variables for RBF network using partial derivatives
Jarkko Tikka, Jaakko Hollmén
Selection of important input variables for RBF network using partial derivatives
Jarkko Tikka, Jaakko Hollmén
Abstract:
In regression problems, making accurate predictions is often the primary goal. Also, relevance of inputs in the prediction of an output would be valuable information in many cases. A sequential input selection algorithm for Radial basis function (SISAL-RBF) networks is presented to analyze importances of the inputs. The ranking of inputs is based on values, which are evaluated from the partial derivatives of the network. The proposed method is applied to benchmark data sets. It yields accurate prediction models, which are parsimonious in terms of the input variables.
In regression problems, making accurate predictions is often the primary goal. Also, relevance of inputs in the prediction of an output would be valuable information in many cases. A sequential input selection algorithm for Radial basis function (SISAL-RBF) networks is presented to analyze importances of the inputs. The ranking of inputs is based on values, which are evaluated from the partial derivatives of the network. The proposed method is applied to benchmark data sets. It yields accurate prediction models, which are parsimonious in terms of the input variables.
ES2008-108
A multiple testing procedure for input variable selection in neural networks
Michele La Rocca, Cira Perna
A multiple testing procedure for input variable selection in neural networks
Michele La Rocca, Cira Perna
Abstract:
In this paper a novel procedure to select the input nodes in neural network modeling is presented and discussed. The approach is developed in a multiple testing framework and so it is able to take under control the well known data snooping problem which arises when the same sample is used more than once for estimation and model selection. The performance of the proposed procedure is illustrated by numerical examples.
In this paper a novel procedure to select the input nodes in neural network modeling is presented and discussed. The approach is developed in a multiple testing framework and so it is able to take under control the well known data snooping problem which arises when the same sample is used more than once for estimation and model selection. The performance of the proposed procedure is illustrated by numerical examples.
ES2008-83
Computationally Efficient Neural Field Dynamics
Alexander Gepperth, Jannik Fritsch, Christian Goerick
Computationally Efficient Neural Field Dynamics
Alexander Gepperth, Jannik Fritsch, Christian Goerick
Abstract:
We propose a modification of the dynamic neural field model of Amari, aiming at reducing the simulation effort by employing space- and frequency representations of the dynamic state in parallel. Additionally, we show how the correct treatment of boundary conditions (wrap-around, zero-padding) can be ensured, which is of particular importance for, e.g., vision processing. We present theoretical predictions as well as measurements of the performance differences between original and modified dynamics. In addition, we show analytically that key properties of the original model are retained by the modified version. This allows us to deduce simple conditions for the applicability and the computational advantage of the proposed model in any given application scenario.
We propose a modification of the dynamic neural field model of Amari, aiming at reducing the simulation effort by employing space- and frequency representations of the dynamic state in parallel. Additionally, we show how the correct treatment of boundary conditions (wrap-around, zero-padding) can be ensured, which is of particular importance for, e.g., vision processing. We present theoretical predictions as well as measurements of the performance differences between original and modified dynamics. In addition, we show analytically that key properties of the original model are retained by the modified version. This allows us to deduce simple conditions for the applicability and the computational advantage of the proposed model in any given application scenario.
Biological systems and biologically-inspired networks
ES2008-77
The gamma cycle and its role in the formation of assemblies
Thomas Burwick
The gamma cycle and its role in the formation of assemblies
Thomas Burwick
Abstract:
Rhythmic synchronization of activated neural groups in the gamma-frequency range (30-100 Hz) is observed in many brain regions. Interneuron networks are key to the generation of these rhythms. Motivated by the inhibitory effect of interneurons and summarizing experimental findings, it was recently proposed that the corresponding gamma cycle realizes a rapidly repeating winner-take-all algorithm. Here, this interpretation is considered from the modeling perspective, starting from an oscillatory network model with several stored patterns. A gradient formulation is used to include inhibitory pulses. The resulting dynamics is discussed, identifying temporal coding assemblies with coherent patterns. Thereby, the winner-take-all hypothesis is combined with binding-by-synchrony and confirmed.
Rhythmic synchronization of activated neural groups in the gamma-frequency range (30-100 Hz) is observed in many brain regions. Interneuron networks are key to the generation of these rhythms. Motivated by the inhibitory effect of interneurons and summarizing experimental findings, it was recently proposed that the corresponding gamma cycle realizes a rapidly repeating winner-take-all algorithm. Here, this interpretation is considered from the modeling perspective, starting from an oscillatory network model with several stored patterns. A gradient formulation is used to include inhibitory pulses. The resulting dynamics is discussed, identifying temporal coding assemblies with coherent patterns. Thereby, the winner-take-all hypothesis is combined with binding-by-synchrony and confirmed.
ES2008-100
The impact of axon wiring costs on small neuronal networks
Conrad Attard, Andreas Albrecht
The impact of axon wiring costs on small neuronal networks
Conrad Attard, Andreas Albrecht
Abstract:
Recent papers by D. Chklovskii and E.M. Izhikevich suggest that wiring costs may play a significant role in the physical layout and function of neuronal structures. About eighty years ago, in his paper on the relationship between diameter and branching angles in trees, C.D. Murray proposed the volume as the cost function which dictates growth. O. Shefi et al. grafted this idea into neuroscience as a possible optimisation mechanism. Our paper presents computational experiments on the impact of wiring cost functions proposed by D. Chklovskii and O. Shefi et al. when applied to interneuronal connections in small ML neuronal networks.
Recent papers by D. Chklovskii and E.M. Izhikevich suggest that wiring costs may play a significant role in the physical layout and function of neuronal structures. About eighty years ago, in his paper on the relationship between diameter and branching angles in trees, C.D. Murray proposed the volume as the cost function which dictates growth. O. Shefi et al. grafted this idea into neuroscience as a possible optimisation mechanism. Our paper presents computational experiments on the impact of wiring cost functions proposed by D. Chklovskii and O. Shefi et al. when applied to interneuronal connections in small ML neuronal networks.
ES2008-56
An FPGA-based model suitable for evolution and development of spiking neural networks
Hooman Shayani, Peter J. Bentley, Andrew M. Tyrrell
An FPGA-based model suitable for evolution and development of spiking neural networks
Hooman Shayani, Peter J. Bentley, Andrew M. Tyrrell
Abstract:
We propose a digital neuron model suitable for evolving and growing heterogeneous spiking neural networks on FPGAs using a piecewise linear approximation of the Quadratic Integrate and Fire (QIF) model. A network of 161 neurons and 1610 synapses with 4210 times real-time neuron simulation speed was simulated and synthesized for a Virtex-5 chip.
We propose a digital neuron model suitable for evolving and growing heterogeneous spiking neural networks on FPGAs using a piecewise linear approximation of the Quadratic Integrate and Fire (QIF) model. A network of 161 neurons and 1610 synapses with 4210 times real-time neuron simulation speed was simulated and synthesized for a Virtex-5 chip.
Clustering and vector quantization
ES2008-67
Self-Organizing Maps for cyclic and unbounded graphs
Markus Hagenbuchner, Alessandro Sperduti, Ah Chung Tsoi
Self-Organizing Maps for cyclic and unbounded graphs
Markus Hagenbuchner, Alessandro Sperduti, Ah Chung Tsoi
Abstract:
This paper introduces a new concept to the processing of graph structured information using self organising map framework. Previous approaches to this problem were limited to the processing of bounded graphs. The computational complexity of such methods grows rapidly with the level of connectivity, and are restricted to the processing of positional graphs. The concept proposed in this paper addresses these issues by reducing the computational demand, and by allowing the processing of non-positional graphs. This is achieved by utilising the state space of the self organising map instead of the states of the nodes in the graph for processing.
This paper introduces a new concept to the processing of graph structured information using self organising map framework. Previous approaches to this problem were limited to the processing of bounded graphs. The computational complexity of such methods grows rapidly with the level of connectivity, and are restricted to the processing of positional graphs. The concept proposed in this paper addresses these issues by reducing the computational demand, and by allowing the processing of non-positional graphs. This is achieved by utilising the state space of the self organising map instead of the states of the nodes in the graph for processing.
ES2008-65
Clustering of Self-Organizing Map
Hanane Azzag, mustapha lebbah
Clustering of Self-Organizing Map
Hanane Azzag, mustapha lebbah
Abstract:
In this paper, we present a new similarity measure for a clustering self-organizing map which will be reached using a new approach of hierarchical clustering. (1) The similarity measure is composed from two terms: weighted Ward distance and Euclidean distance weighted by neighbourhood function. (2) An algorithm inspired from artificial ants named AntTree will be used to cluster a self-organizing map. This algorithm has the advantage to provide a hierarchy of referents with a low complexity (near the $n\log (n)$). The SOM clustering including the new measure is validated on several public data bases.
In this paper, we present a new similarity measure for a clustering self-organizing map which will be reached using a new approach of hierarchical clustering. (1) The similarity measure is composed from two terms: weighted Ward distance and Euclidean distance weighted by neighbourhood function. (2) An algorithm inspired from artificial ants named AntTree will be used to cluster a self-organizing map. This algorithm has the advantage to provide a hierarchy of referents with a low complexity (near the $n\log (n)$). The SOM clustering including the new measure is validated on several public data bases.
ES2008-51
Explaining Ant-Based Clustering on the basis of Self-Organizing Maps
Lutz Herrmann, Alfred Ultsch
Explaining Ant-Based Clustering on the basis of Self-Organizing Maps
Lutz Herrmann, Alfred Ultsch
Abstract:
Ant-based clustering is a nature-inspired technique whereas stochastic agents perform the task of clustering high-dimensional data. This paper analyzes the popular technique of Lumer/Faieta. It is shown that the Lumer/Faieta approach is strongly related to Kohonen's Self-Organizing Batch Map. A unifying basis is derived in order to assess strengths and weaknesses of both techniques. The behaviour of several popular ant-based clustering techniques is explained.
Ant-based clustering is a nature-inspired technique whereas stochastic agents perform the task of clustering high-dimensional data. This paper analyzes the popular technique of Lumer/Faieta. It is shown that the Lumer/Faieta approach is strongly related to Kohonen's Self-Organizing Batch Map. A unifying basis is derived in order to assess strengths and weaknesses of both techniques. The behaviour of several popular ant-based clustering techniques is explained.
ES2008-57
Phase transitions in Vector Quantization
Aree Witoelar, Anarta Ghosh, Michael Biehl
Phase transitions in Vector Quantization
Aree Witoelar, Anarta Ghosh, Michael Biehl
Abstract:
We study Winner-Takes-All and rank based Vector Quantization along the lines of the statistical physics of off-line learning. Typical behavior of the system is obtained within a model where high-dimensional training data are drawn from a mixture of Gaussians. The analysis becomes exact in the simplifying limit of high training temperature. Our main findings concern the existence of phase transitions, i.e. a critical or discontinuous dependence of VQ performance on the training set size. We show how the nature and properties of the transition depends on the number of prototypes and the control parameter of rank based cost functions.
We study Winner-Takes-All and rank based Vector Quantization along the lines of the statistical physics of off-line learning. Typical behavior of the system is obtained within a model where high-dimensional training data are drawn from a mixture of Gaussians. The analysis becomes exact in the simplifying limit of high training temperature. Our main findings concern the existence of phase transitions, i.e. a critical or discontinuous dependence of VQ performance on the training set size. We show how the nature and properties of the transition depends on the number of prototypes and the control parameter of rank based cost functions.
ES2008-28
Parallelizing single patch pass clustering
Nikolai Alex, Barbara Hammer
Parallelizing single patch pass clustering
Nikolai Alex, Barbara Hammer
Abstract:
Clustering algorithms such as k-means, the self-organizing map (SOM), or Neural Gas (NG) constitute popular tools for automated information analysis. Since data sets are becoming larger and larger, it is vital that the algorithms perform efficient for huge data sets. Here we propose a parallelization of patch neural gas which requires only a single run over the data set and which can work with limited memory, thus it is very efficient for streaming or massive data sets. The realization is very general such that it can easily be transferred to alternative prototype-based methods and distributed settings. Approximately linear relative speed-up can be observed depending on the number of processors.
Clustering algorithms such as k-means, the self-organizing map (SOM), or Neural Gas (NG) constitute popular tools for automated information analysis. Since data sets are becoming larger and larger, it is vital that the algorithms perform efficient for huge data sets. Here we propose a parallelization of patch neural gas which requires only a single run over the data set and which can work with limited memory, thus it is very efficient for streaming or massive data sets. The realization is very general such that it can easily be transferred to alternative prototype-based methods and distributed settings. Approximately linear relative speed-up can be observed depending on the number of processors.
ES2008-44
Learning Data Representations with Sparse Coding Neural Gas
Kai Labusch, Erhardt Barth, Thomas Martinetz
Learning Data Representations with Sparse Coding Neural Gas
Kai Labusch, Erhardt Barth, Thomas Martinetz
Abstract:
We consider the problem of learning an unknown (overcomplete) basis from an unknown sparse linear combination. Introducing the ``sparse coding neural gas'' algorithm, we show how to employ a combination of the original neural gas algorithm and Oja's rule to learn a simple sparse code that represents each training sample by a multiple of one basis vector. We generalise this algorithm using orthogonal matching pursuit to learn a sparse code where each training sample is represented by a linear combination of k basis elements. We show that this method can be used to learn artificial sparse overcomplete codes.
We consider the problem of learning an unknown (overcomplete) basis from an unknown sparse linear combination. Introducing the ``sparse coding neural gas'' algorithm, we show how to employ a combination of the original neural gas algorithm and Oja's rule to learn a simple sparse code that represents each training sample by a multiple of one basis vector. We generalise this algorithm using orthogonal matching pursuit to learn a sparse code where each training sample is represented by a linear combination of k basis elements. We show that this method can be used to learn artificial sparse overcomplete codes.
Methodology and standards for data analysis with machine learning tools
ES2008-4
Methodology and standards for data analysis with machine learning tools
Damien Francois
Methodology and standards for data analysis with machine learning tools
Damien Francois
Abstract:
Many tools for data mining are complex and require skills and experience to be used successfully. Therefore, data mining is often considered an art as much as science. This paper presents some ideas on how to move forward from art to science, through the use of methodological standards and meta learning.
Many tools for data mining are complex and require skills and experience to be used successfully. Therefore, data mining is often considered an art as much as science. This paper presents some ideas on how to move forward from art to science, through the use of methodological standards and meta learning.
ES2008-37
A Methodology for Building Regression Models using Extreme Learning Machine: OP-ELM
Yoan Miché, Patrick Bas, Christian Jutten, Olli Simula, Amaury Lendasse
A Methodology for Building Regression Models using Extreme Learning Machine: OP-ELM
Yoan Miché, Patrick Bas, Christian Jutten, Olli Simula, Amaury Lendasse
Abstract:
This paper proposes a methodology (named OP-ELM) based on a recent development --called Extreme Learning Machine-- decreasing drastically the training speed of networks. Variable selection is beforehand performed on the original dataset to ensure proper results by OP-ELM: the network is first created using Extreme Learning Process, selection of the most relevant nodes is performed using Least Angle Regression (LARS) ranking of the nodes and a Leave-One-Out estimation of the performances. Results are globally equivalent to LSSVM ones with reduced computational time.
This paper proposes a methodology (named OP-ELM) based on a recent development --called Extreme Learning Machine-- decreasing drastically the training speed of networks. Variable selection is beforehand performed on the original dataset to ensure proper results by OP-ELM: the network is first created using Extreme Learning Process, selection of the most relevant nodes is performed using Least Angle Regression (LARS) ranking of the nodes and a Leave-One-Out estimation of the performances. Results are globally equivalent to LSSVM ones with reduced computational time.
ES2008-72
Do we need experts for time series forecasting?
Christiane Lemke, Bogdan Gabrys
Do we need experts for time series forecasting?
Christiane Lemke, Bogdan Gabrys
Abstract:
This study examines a selection of off-the-shelf forecasting and forecast combination algorithms with the focus on assessing practical relevance by drawing conclusions for non-expert users. Some of the methods have only recently been introduced and have not been part in comparative empirical evaluations before. Considering the advances of forecasting techniques, this analysis is carried out in the context of the question whether we need human expertise for forecasting or whether the used methods provide comparable performance.
This study examines a selection of off-the-shelf forecasting and forecast combination algorithms with the focus on assessing practical relevance by drawing conclusions for non-expert users. Some of the methods have only recently been introduced and have not been part in comparative empirical evaluations before. Considering the advances of forecasting techniques, this analysis is carried out in the context of the question whether we need human expertise for forecasting or whether the used methods provide comparable performance.
ES2008-115
Homogeneous bipartition based on multidimensional ranking
Michaël Aupetit
Homogeneous bipartition based on multidimensional ranking
Michaël Aupetit
Abstract:
We present an algorithm which partitions a data set in two parts with equal size and experimentally nearly the same distribution measured through the likelihood of a Parzen kernel density estimator. The generation of the partition takes O(N^2) operations (N number of data) and is 2 orders of magnitude faster than the state of the art.
We present an algorithm which partitions a data set in two parts with equal size and experimentally nearly the same distribution measured through the likelihood of a Parzen kernel density estimator. The generation of the partition takes O(N^2) operations (N number of data) and is 2 orders of magnitude faster than the state of the art.
ES2008-40
Feature selection on process fault detection and visualization
Tuomas Alhonnoro, Miki Sirola
Feature selection on process fault detection and visualization
Tuomas Alhonnoro, Miki Sirola
Abstract:
Feature subset selection has become an essential part in data mining applications. In this article, feature subset selection is integrated into real time process fault detection. Various methods based on both dependency measures and cluster separability measures are discussed. An intuitive tool for process visualization is introduced. Experiments on nuclear power plant simulator data are carried out to assess the effectiveness and performance of the methods. Early detection of failures, which is one important goal in the project, is achieved with help of visualizations developed in this work. In a leak scenario an illustrative example was produced.
Feature subset selection has become an essential part in data mining applications. In this article, feature subset selection is integrated into real time process fault detection. Various methods based on both dependency measures and cluster separability measures are discussed. An intuitive tool for process visualization is introduced. Experiments on nuclear power plant simulator data are carried out to assess the effectiveness and performance of the methods. Early detection of failures, which is one important goal in the project, is achieved with help of visualizations developed in this work. In a leak scenario an illustrative example was produced.
ES2008-9
Classification of chestnuts with feature selection by noise resilient classifiers
Elena Roglia, Rossella Cancelliere, Rosa Meo
Classification of chestnuts with feature selection by noise resilient classifiers
Elena Roglia, Rossella Cancelliere, Rosa Meo
Abstract:
In this paper we solve the problem of classifying chestnut plants according to their place of origin. We compare the results obtained by state of the art classifiers, among which, MLP, RBF, SVM, C4.5 decision tree and random forest. We determine which features are meaningful for the classification, the achievable classification accuracy of these classifiers families with the available features and how much the classifiers are robust to noise. Among the obtained classifiers, neural networks show the greatest robustness to noise.
In this paper we solve the problem of classifying chestnut plants according to their place of origin. We compare the results obtained by state of the art classifiers, among which, MLP, RBF, SVM, C4.5 decision tree and random forest. We determine which features are meaningful for the classification, the achievable classification accuracy of these classifiers families with the available features and how much the classifiers are robust to noise. Among the obtained classifiers, neural networks show the greatest robustness to noise.
ES2008-75
GeoKernels: modeling of spatial data on geomanifolds
Alexei Pozdnoukhov, Mikhaïl Kanevski
GeoKernels: modeling of spatial data on geomanifolds
Alexei Pozdnoukhov, Mikhaïl Kanevski
Abstract:
This paper presents a review of methodology for semi-supervised modeling with kernel methods, when the manifold assumption is guaranteed to be satisfied. It concerns environmental data modeling in complex topographies of the mountainous regions, where environmental processes are highly influenced by the relief. These relations, possibly regionalized and non-linear, can be modeled from data with machine learning using the digital elevation models in semi-supervised kernel methods. The range of the methods used in the study include data-driven feature selection and extraction and semi-supervised Support Vector algorithms.
This paper presents a review of methodology for semi-supervised modeling with kernel methods, when the manifold assumption is guaranteed to be satisfied. It concerns environmental data modeling in complex topographies of the mountainous regions, where environmental processes are highly influenced by the relief. These relations, possibly regionalized and non-linear, can be modeled from data with machine learning using the digital elevation models in semi-supervised kernel methods. The range of the methods used in the study include data-driven feature selection and extraction and semi-supervised Support Vector algorithms.
Learning II
ES2008-50
Constructing ensembles of classifiers using linear projections based on misclassified instances
Cesar Garcia-Osorio, Nicolás García-Pedrajas
Constructing ensembles of classifiers using linear projections based on misclassified instances
Cesar Garcia-Osorio, Nicolás García-Pedrajas
Abstract:
In this paper we propose a novel approach for ensemble construction based on the use of linear projections to achieve both accuracy and diversity of individual classifiers. The proposed approach uses the philosophy of boosting, putting more effort on difficult instances, but instead of learning the classifier on a biased distribution of the training set it uses misclassified instances to find a supervised linear projection that favors their correct classification. Supervised linear projections are used to find the most suitable projection at each step of the creation of the ensemble. In a previous work we validated this approach using non-linear projections. In this work we show that linear projections can be used as well, with the advantage of being simpler and faster to obtain. The method is compared with {\sc AdaBoost}, showing an improved performance on a large set of 45 problems from the UCI Machine Learning Repository.
In this paper we propose a novel approach for ensemble construction based on the use of linear projections to achieve both accuracy and diversity of individual classifiers. The proposed approach uses the philosophy of boosting, putting more effort on difficult instances, but instead of learning the classifier on a biased distribution of the training set it uses misclassified instances to find a supervised linear projection that favors their correct classification. Supervised linear projections are used to find the most suitable projection at each step of the creation of the ensemble. In a previous work we validated this approach using non-linear projections. In this work we show that linear projections can be used as well, with the advantage of being simpler and faster to obtain. The method is compared with {\sc AdaBoost}, showing an improved performance on a large set of 45 problems from the UCI Machine Learning Repository.
ES2008-82
A Regularized Learning Method for Neural Networks Based on Sensitivity Analysis
Bertha Guijarro-Berdiñas, Oscar Fontenla-Romero, Beatriz Pérez-Sánchez, Amparo Alonso-Betanzos
A Regularized Learning Method for Neural Networks Based on Sensitivity Analysis
Bertha Guijarro-Berdiñas, Oscar Fontenla-Romero, Beatriz Pérez-Sánchez, Amparo Alonso-Betanzos
Abstract:
The Sensitivity-Based Linear Learning Method (SBLLM) is a learning method for two-layer feedforward neural networks, based on sensitivity analysis, that calculates the weights by solving a linear system of equations. Therefore, there is an important saving in computational time which significantly enhances the behavior of this method compared to other learning algorithms. This paper introduces a generalization of the SBLLM by adding a regularization term in the cost function. The theoretical basis for the method is given and its performance is illustrated.
The Sensitivity-Based Linear Learning Method (SBLLM) is a learning method for two-layer feedforward neural networks, based on sensitivity analysis, that calculates the weights by solving a linear system of equations. Therefore, there is an important saving in computational time which significantly enhances the behavior of this method compared to other learning algorithms. This paper introduces a generalization of the SBLLM by adding a regularization term in the cost function. The theoretical basis for the method is given and its performance is illustrated.
ES2008-14
A Method for Time Series Prediction using a Combination of Linear Models
David Martínez-Rego, Oscar Fontenla-Romero, Amparo Alonso-Betanzos
A Method for Time Series Prediction using a Combination of Linear Models
David Martínez-Rego, Oscar Fontenla-Romero, Amparo Alonso-Betanzos
Abstract:
This paper presents a new approach for time series prediction using local dynamic modeling. The proposed method is composed of three blocks: a Time Delay Line that transforms the original time series into a set of N-dimensional vectors, an Information-Theoretic based clustering method that segments the previous set into subspaces of similar vectors and a set of single layer neural networks that adjust a local model for each subspace created by the clustering stage. The results of this model are compared with those of another local modeling approach and of two representative global models in time series prediction: Tapped Delay Line Multilayer Perceptron (TDL-MLP) and Support Vector Regression (SVR).
This paper presents a new approach for time series prediction using local dynamic modeling. The proposed method is composed of three blocks: a Time Delay Line that transforms the original time series into a set of N-dimensional vectors, an Information-Theoretic based clustering method that segments the previous set into subspaces of similar vectors and a set of single layer neural networks that adjust a local model for each subspace created by the clustering stage. The results of this model are compared with those of another local modeling approach and of two representative global models in time series prediction: Tapped Delay Line Multilayer Perceptron (TDL-MLP) and Support Vector Regression (SVR).
ES2008-34
Interpretable ensembles of local models for safety-related applications
Sebastian Nusser, Clemens Otte, Werner Hauptmann
Interpretable ensembles of local models for safety-related applications
Sebastian Nusser, Clemens Otte, Werner Hauptmann
Abstract:
This paper discusses a machine learning approach for binary classification problems which satisfies the specific requirements of safety-related applications. The approach is based on ensembles of local models. Each local model utilizes only a small subspace of the complete input space. This ensures the interpretability and verifiability of the local models, which is a crucial prerequisite for applications in safety-related domains. A feature construction method based on a multi-layer perceptron architecture is proposed to overcome limitations of the local modeling strategy, while keeping the global model interpretable.
This paper discusses a machine learning approach for binary classification problems which satisfies the specific requirements of safety-related applications. The approach is based on ensembles of local models. Each local model utilizes only a small subspace of the complete input space. This ensures the interpretability and verifiability of the local models, which is a crucial prerequisite for applications in safety-related domains. A feature construction method based on a multi-layer perceptron architecture is proposed to overcome limitations of the local modeling strategy, while keeping the global model interpretable.
ES2008-43
Multi-View Forests of Tree-Structured Radial Basis Function Networks Based on Dempster-Shafer Evidence Theory
Mohamed Farouk Abdel Hady, Günther Palm, Friedhelm Schwenker
Multi-View Forests of Tree-Structured Radial Basis Function Networks Based on Dempster-Shafer Evidence Theory
Mohamed Farouk Abdel Hady, Günther Palm, Friedhelm Schwenker
Abstract:
An essential requirement to create an accurate classifier ensemble is the diversity among the individual base classifiers. In this paper, Multi-View Forests, a method to construct ensembles of tree-structured radial basis function (RBF) networks using multi-view learning is proposed. In Multi-view learning it is assumed that the patterns to be classified are described by multiple feature sets (views). Multi-view Forests have been evaluated by using a benchmark data set of handwritten digits recognition. Results show that multi-view learning can improve the performance of the ensemble by enforcing the diversity among the individual classifiers.
An essential requirement to create an accurate classifier ensemble is the diversity among the individual base classifiers. In this paper, Multi-View Forests, a method to construct ensembles of tree-structured radial basis function (RBF) networks using multi-view learning is proposed. In Multi-view learning it is assumed that the patterns to be classified are described by multiple feature sets (views). Multi-view Forests have been evaluated by using a benchmark data set of handwritten digits recognition. Results show that multi-view learning can improve the performance of the ensemble by enforcing the diversity among the individual classifiers.
ES2008-97
SOM based clustering with instance-level constraints
Fazia Bellal, Khalid Benabdeslem, Alexandre Aussem
SOM based clustering with instance-level constraints
Fazia Bellal, Khalid Benabdeslem, Alexandre Aussem
Abstract:
This paper describes a new topological map dedicated to clustering under instance-level constraints. In general, traditional clustering is used in an unsupervised manner. However, in some cases, background information about the problem domain is available or imposed in the form of constraints, in addition to data instances. In this context, we modify the popular SOM algorithm to take these constraints into account during the construction of the topology. We present experiments on synthetic known databases with artificial constraints. We then apply the new method to a real problem of clustering melanoma data in health domain.
This paper describes a new topological map dedicated to clustering under instance-level constraints. In general, traditional clustering is used in an unsupervised manner. However, in some cases, background information about the problem domain is available or imposed in the form of constraints, in addition to data instances. In this context, we modify the popular SOM algorithm to take these constraints into account during the construction of the topology. We present experiments on synthetic known databases with artificial constraints. We then apply the new method to a real problem of clustering melanoma data in health domain.
ES2008-98
Robust object segmentation by adaptive metrics in Generalized LVQ
Alexander Denecke, Heiko Wersing, Jochen J. Steil, Edgar Koerner
Robust object segmentation by adaptive metrics in Generalized LVQ
Alexander Denecke, Heiko Wersing, Jochen J. Steil, Edgar Koerner
Abstract:
We investigate the effect of several adaptive metrics in the context of figure-ground segregation, using Generalized LVQ to train a classifier for image regions. Extending the Euclidean metrics towards local matrices of relevance-factors not only leads to a higher classification accuracy and increased robustness on heterogeneous/noisy data, but also figure-ground segregation using this adaptive metrics enables a considerably higher recognition performance on segmented objects of real image data.
We investigate the effect of several adaptive metrics in the context of figure-ground segregation, using Generalized LVQ to train a classifier for image regions. Extending the Euclidean metrics towards local matrices of relevance-factors not only leads to a higher classification accuracy and increased robustness on heterogeneous/noisy data, but also figure-ground segregation using this adaptive metrics enables a considerably higher recognition performance on segmented objects of real image data.
ES2008-59
Magnification Control in Relational Neural Gas
Alexander Hasenfuss, Barbara Hammer, Tina Geweniger, Thomas Villmann
Magnification Control in Relational Neural Gas
Alexander Hasenfuss, Barbara Hammer, Tina Geweniger, Thomas Villmann
Abstract:
Prototype-based clustering algorithms such as the Self Organizing Map (SOM) or Neural Gas (NG) offer powerful tools for automated data inspection. The distribution of prototypes, however, does not coincide with the underlying data distribution and magnification control is necessary to obtain information theoretic optimum maps. Recently, several extensions of SOM and NG to general non-vectorial dissimilarity data have been proposed, such as Relational NG (RNG). Here, we derive a magnification control scheme for RNG based on localized learning, and we demonstrate its applicability for various data sets, opening the way towards information theoretic optimum clustering of general dissimilarity data.
Prototype-based clustering algorithms such as the Self Organizing Map (SOM) or Neural Gas (NG) offer powerful tools for automated data inspection. The distribution of prototypes, however, does not coincide with the underlying data distribution and magnification control is necessary to obtain information theoretic optimum maps. Recently, several extensions of SOM and NG to general non-vectorial dissimilarity data have been proposed, such as Relational NG (RNG). Here, we derive a magnification control scheme for RNG based on localized learning, and we demonstrate its applicability for various data sets, opening the way towards information theoretic optimum clustering of general dissimilarity data.
ES2008-48
Parallel asynchronous neighborhood mechanism for WTM Kohonen network implemented in CMOS technology
Marta Kolasa, Rafal Dlugosz
Parallel asynchronous neighborhood mechanism for WTM Kohonen network implemented in CMOS technology
Marta Kolasa, Rafal Dlugosz
Abstract:
In this paper we present a new neighborhood mechanism for WTM self-organizing Kohonen map implemented in CMOS 0.18 um technology. Proposed mechanism is an asynchronous circuit and does not require controlling clock generator. Propagation of an enable signal from a winning neuron to neurons that belong to the winner’s neighborhood is very fast. For example radius (R) of the neighborhood equal to 15, total time that is required to start adaptation of all neighboring neurons is below 10 ns. This makes the proposed WTM network as fast as the WTA network. When compared to WTM networks implemented, for example in C++, the proposed solution can be even several hundreds times faster and consume much less power.
In this paper we present a new neighborhood mechanism for WTM self-organizing Kohonen map implemented in CMOS 0.18 um technology. Proposed mechanism is an asynchronous circuit and does not require controlling clock generator. Propagation of an enable signal from a winning neuron to neurons that belong to the winner’s neighborhood is very fast. For example radius (R) of the neighborhood equal to 15, total time that is required to start adaptation of all neighboring neurons is below 10 ns. This makes the proposed WTM network as fast as the WTA network. When compared to WTM networks implemented, for example in C++, the proposed solution can be even several hundreds times faster and consume much less power.
ES2008-49
Initialization mechanism in Kohonen neural network implemented in CMOS technology
Tomasz Talaska, Rafal Dlugosz
Initialization mechanism in Kohonen neural network implemented in CMOS technology
Tomasz Talaska, Rafal Dlugosz
Abstract:
An initialization mechanism is presented for Kohonen neural network implemented in CMOS technology. Proper selection of initial values of neurons’ weights has a large influence on speed of the learning algorithm and finally on the quantization error of the network, which for different initial parameters can vary even by several orders of magnitude. Experiments with the software model of designed network show that results can be additionally improved when conscience mechanism is used during the learning phase. This mechanism additionally decreases number of dead neurons, which minimizes the quantization error. The initialization mechanism together with experimental Kohonen neural network with four neurons and 3 inputs have been designed in CMOS 0.18 um technology.
An initialization mechanism is presented for Kohonen neural network implemented in CMOS technology. Proper selection of initial values of neurons’ weights has a large influence on speed of the learning algorithm and finally on the quantization error of the network, which for different initial parameters can vary even by several orders of magnitude. Experiments with the software model of designed network show that results can be additionally improved when conscience mechanism is used during the learning phase. This mechanism additionally decreases number of dead neurons, which minimizes the quantization error. The initialization mechanism together with experimental Kohonen neural network with four neurons and 3 inputs have been designed in CMOS 0.18 um technology.
ES2008-107
Neural network hardware architecture for pattern recognition in the HESS2 project
Narayanan Ramanan, Sonia Khatchadourian, Jean-Christophe Prévotet, Lounis Kessal
Neural network hardware architecture for pattern recognition in the HESS2 project
Narayanan Ramanan, Sonia Khatchadourian, Jean-Christophe Prévotet, Lounis Kessal
Abstract:
In this paper, we consider the problem of implementation of neural network in the context of the level 2 trigger of HESS2 project. We propose a hardware architecture which which takes advantage of high parallelism, pipelining and the intrinsic nature of FPGAs.
In this paper, we consider the problem of implementation of neural network in the context of the level 2 trigger of HESS2 project. We propose a hardware architecture which which takes advantage of high parallelism, pipelining and the intrinsic nature of FPGAs.
ES2008-21
Active and reactive use of virtual neural sensors
Massimo De Gregorio
Active and reactive use of virtual neural sensors
Massimo De Gregorio
Abstract:
This paper addresses the possible use of virtual neural sensors, implemented by means of weightless systems, as active or reactive sensors. The latter, made possible by the intrinsic characteristic of weightless systems that can be trained on-line. These virtual neural sensors have been adopted for actual applications in different domains.
This paper addresses the possible use of virtual neural sensors, implemented by means of weightless systems, as active or reactive sensors. The latter, made possible by the intrinsic characteristic of weightless systems that can be trained on-line. These virtual neural sensors have been adopted for actual applications in different domains.
ES2008-26
Noise influence on correlated activities in a modular neuronal network: from synapses to functional connectivity
Carlo Casarino, Gaetano Aiello, Davide Valenti, Bernardo Spagnolo
Noise influence on correlated activities in a modular neuronal network: from synapses to functional connectivity
Carlo Casarino, Gaetano Aiello, Davide Valenti, Bernardo Spagnolo
Abstract:
In this work we propose taking noise into account when modeling the neuronal activity in a correlation-based type network. Volume transmission effects on connectivity are considered. As a result, an individual module can be set in an “activated” state via noise produced by the remaining modules. The stochastic approach could provide a new insight into the relation between functional and anatomical connectivity.
In this work we propose taking noise into account when modeling the neuronal activity in a correlation-based type network. Volume transmission effects on connectivity are considered. As a result, an individual module can be set in an “activated” state via noise produced by the remaining modules. The stochastic approach could provide a new insight into the relation between functional and anatomical connectivity.
ES2008-106
Neuromimetic motion indicator for visual perception
Claudio Castellanos Sánchez
Neuromimetic motion indicator for visual perception
Claudio Castellanos Sánchez
Abstract:
This paper presents a bio-inspired model for visual percep- tion of motion through its principal indicator : the neuromimetic motion indicator (NMI). This indicator emerges out of the mechanism of antagonist interactions (MAI) where an architecture of oriented columns, local and distributed interactions of the neurons in the primary visual cortex (V1) is operated. The NMI indicator classifies the motion into two types : null motion and motion, and it estimates the number of moving objects in the scene.
This paper presents a bio-inspired model for visual percep- tion of motion through its principal indicator : the neuromimetic motion indicator (NMI). This indicator emerges out of the mechanism of antagonist interactions (MAI) where an architecture of oriented columns, local and distributed interactions of the neurons in the primary visual cortex (V1) is operated. The NMI indicator classifies the motion into two types : null motion and motion, and it estimates the number of moving objects in the scene.
Neural Networks for Computational Neuroscience
ES2008-5
Neural networks for computational neuroscience
David Meunier, Hélène Paugam-Moisy
Neural networks for computational neuroscience
David Meunier, Hélène Paugam-Moisy
Abstract:
Computational neuroscience is an appealing interdisciplinary domain, at the interface between biology and computer science. It aims at understanding the experimental data obtained in neuroscience using models. Several kind of models can be used, one of them being artificial neural networks. In this tutorial we review some of the advances neural networks have brought in computational neuroscience, focusing on spiking neural networks. We describe several artificial neuron models, who are able to grasp the temporal properties of biological neurons. We also describe briefly data obtained in neuroscience, and some artificial neural networks developed to understand the mechanisms underlying these experimental data.
Computational neuroscience is an appealing interdisciplinary domain, at the interface between biology and computer science. It aims at understanding the experimental data obtained in neuroscience using models. Several kind of models can be used, one of them being artificial neural networks. In this tutorial we review some of the advances neural networks have brought in computational neuroscience, focusing on spiking neural networks. We describe several artificial neuron models, who are able to grasp the temporal properties of biological neurons. We also describe briefly data obtained in neuroscience, and some artificial neural networks developed to understand the mechanisms underlying these experimental data.
ES2008-93
Emergence of stimulus-specific synchronous response through STDP in recurrent neural networks
Frédéric Henry, Emmanuel Daucé
Emergence of stimulus-specific synchronous response through STDP in recurrent neural networks
Frédéric Henry, Emmanuel Daucé
Abstract:
This paper presents learning simulation results on a balanced recurrent neural network of spiking neurons with a simple implementation of the STDP plasticity rule, whose potentiation and depression effects compensate. The synaptic weights and delays are randomly set and the network activity, which is a combination of an input signal and a recurrent feedback, is initially strong and irregular. Under a static stimulation, the learning process shapes the initial activity toward a more regular and synchronous response. The response is specific to this particular stimulus: the network has learned to select by synchrony one arbitrary stimulus from a set of random static stimuli.
This paper presents learning simulation results on a balanced recurrent neural network of spiking neurons with a simple implementation of the STDP plasticity rule, whose potentiation and depression effects compensate. The synaptic weights and delays are randomly set and the network activity, which is a combination of an input signal and a recurrent feedback, is initially strong and irregular. Under a static stimulation, the learning process shapes the initial activity toward a more regular and synchronous response. The response is specific to this particular stimulus: the network has learned to select by synchrony one arbitrary stimulus from a set of random static stimuli.
ES2008-113
Visual focus with spiking neurons
Sylvain Chevallier, Philippe Tarroux
Visual focus with spiking neurons
Sylvain Chevallier, Philippe Tarroux
Abstract:
We propose to implement a network of leaky integrate-and-fire neurons able to detect and to focus on a stimulus even in the presence distractors. The experimental data show that this behavior is very robust to noise. This process is similar to an early visual attention mechanism.
We propose to implement a network of leaky integrate-and-fire neurons able to detect and to focus on a stimulus even in the presence distractors. The experimental data show that this behavior is very robust to noise. This process is similar to an early visual attention mechanism.
ES2008-101
Simulation of a recurrent neurointerface with sparse electrical connections
Andreas Herzog, Karsten Kube, Bernd Michaelis, Ana D. de Lima, Thomas Voigt
Simulation of a recurrent neurointerface with sparse electrical connections
Andreas Herzog, Karsten Kube, Bernd Michaelis, Ana D. de Lima, Thomas Voigt
Abstract:
With the technical development of multi-electrode arrays, the monitoring of many individual neurons has become feasible. However, for practical use of those arrays as bidirectional neurointerfaces, feedback signals have to be generated in real-time to integrate the electrodes into the existing spatio-temporal context as a new information source. In this modeling study we will introduce a recurrent neurointerface, which uses a biologically plausible artificial neural network to pre-process electrode signals and generate adequate feedback signals to the biological network. The artificial network is more transparent for advanced methods to analyze synchronous firing patterns and reacts more stably to external input signals.
With the technical development of multi-electrode arrays, the monitoring of many individual neurons has become feasible. However, for practical use of those arrays as bidirectional neurointerfaces, feedback signals have to be generated in real-time to integrate the electrodes into the existing spatio-temporal context as a new information source. In this modeling study we will introduce a recurrent neurointerface, which uses a biologically plausible artificial neural network to pre-process electrode signals and generate adequate feedback signals to the biological network. The artificial network is more transparent for advanced methods to analyze synchronous firing patterns and reacts more stably to external input signals.
ES2008-29
Direct and inverse solution for a stimulus adaptation problem using SVR
Dominik Brugger, Sergejus Butovas, Martin Bogdan, Cornelius Schwarz, Wolfgang Rosenstiel
Direct and inverse solution for a stimulus adaptation problem using SVR
Dominik Brugger, Sergejus Butovas, Martin Bogdan, Cornelius Schwarz, Wolfgang Rosenstiel
Abstract:
Adapting stimuli to stabilize neural responses is an important problem in the context of cortical prostheses. This paper describes two approaches for stimulus adaptation using support vector regression (SVR). One approach involves the solution of an inverse problem and it is shown that for linear SVR an analytical solution exists. The proposed algorithms are evaluated in conjunction with different preprocessing methods on three datasets recorded from the barrel cortex of anasthesized rats.
Adapting stimuli to stabilize neural responses is an important problem in the context of cortical prostheses. This paper describes two approaches for stimulus adaptation using support vector regression (SVR). One approach involves the solution of an inverse problem and it is shown that for linear SVR an analytical solution exists. The proposed algorithms are evaluated in conjunction with different preprocessing methods on three datasets recorded from the barrel cortex of anasthesized rats.
ES2008-30
Computational model for amygdala neural networks
Jean Marc Salotti
Computational model for amygdala neural networks
Jean Marc Salotti
Abstract:
We present a computational model of amygdala neural networks. It is used to simulate neuronal activation in amygdala nuclei at different stages of aversive conditioning experiments with rats. Our model is based on neurobiological data. Simple formal neurons and an adaptive Hebbian rule are the key elements of the model. The results are compatible with neuronal activation maps obtained with C-Fos markers. The model also enables interesting predictions.
We present a computational model of amygdala neural networks. It is used to simulate neuronal activation in amygdala nuclei at different stages of aversive conditioning experiments with rats. Our model is based on neurobiological data. Simple formal neurons and an adaptive Hebbian rule are the key elements of the model. The results are compatible with neuronal activation maps obtained with C-Fos markers. The model also enables interesting predictions.
Kernel methods
ES2008-53
Factored sequence kernels
Pierre Mahé, Nicola Cancedda
Factored sequence kernels
Pierre Mahé, Nicola Cancedda
Abstract:
In this paper we propose an extension of sequence kernels to the case where the symbols that define the sequences have multiple representations. This configuration occurs in natural language processing for instance, where words can be characterized according to different linguistic dimensions. The core of our contribution is to integrate early the different representations in the kernel, in a way that generates rich composite features defined across the various symbol dimensions.
In this paper we propose an extension of sequence kernels to the case where the symbols that define the sequences have multiple representations. This configuration occurs in natural language processing for instance, where words can be characterized according to different linguistic dimensions. The core of our contribution is to integrate early the different representations in the kernel, in a way that generates rich composite features defined across the various symbol dimensions.
ES2008-85
Regularization path for Ranking SVM
Karina Zapien, Thomas Gaertner, Gilles Gasso, Stéphane Canu
Regularization path for Ranking SVM
Karina Zapien, Thomas Gaertner, Gilles Gasso, Stéphane Canu
Abstract:
Ranking algorithms are often introduced with the aim of automatically personalising search results. However, most ranking algorithms developed in the machine learning community rely on a careful choice of some regularisation parameter. Building upon work on the regularisation path for kernel methods, we propose a parameter selection algorithm for ranking SVM. Empirical results on synthetic data are promising.
Ranking algorithms are often introduced with the aim of automatically personalising search results. However, most ranking algorithms developed in the machine learning community rely on a careful choice of some regularisation parameter. Building upon work on the regularisation path for kernel methods, we propose a parameter selection algorithm for ranking SVM. Empirical results on synthetic data are promising.
ES2008-91
An accelerated MDM algorithm for SVM training
Alvaro Barbero, Jorge López, José Dorronsoro
An accelerated MDM algorithm for SVM training
Alvaro Barbero, Jorge López, José Dorronsoro
Abstract:
In this work we will propose an acceleration procedure for the Mitchell--De\-mya\-nov--Malozemov (MDM) algorithm (a fast geometric algorithm for SVM construction) that may yield quite large training savings. While decomposition algorithms such as SVMLight or SMO are usually the SVM methods of choice, we shall show that there is a relationship between SMO and MDM that suggests that, at least in their simplest implementations, they should have similar training speeds. Thus, and although we will not discuss it here, the proposed MDM acceleration might be used as a starting point to new ways of accelerating SMO.
In this work we will propose an acceleration procedure for the Mitchell--De\-mya\-nov--Malozemov (MDM) algorithm (a fast geometric algorithm for SVM construction) that may yield quite large training savings. While decomposition algorithms such as SVMLight or SMO are usually the SVM methods of choice, we shall show that there is a relationship between SMO and MDM that suggests that, at least in their simplest implementations, they should have similar training speeds. Thus, and although we will not discuss it here, the proposed MDM acceleration might be used as a starting point to new ways of accelerating SMO.
ES2008-84
Approximation of Gaussian process regression models after training
Thorsten Suttorp, Christian Igel
Approximation of Gaussian process regression models after training
Thorsten Suttorp, Christian Igel
Abstract:
The evaluation of a standard Gaussian process regression model takes time linear in the number of training data points. In this paper, the models are approximated in the feature space after training. It is empirically shown that the time required for evaluation can be drastically reduced without considerable loss in performance.
The evaluation of a standard Gaussian process regression model takes time linear in the number of training data points. In this paper, the models are approximated in the feature space after training. It is empirically shown that the time required for evaluation can be drastically reduced without considerable loss in performance.
Machine Learning Approaches and Pattern Recognition for Spectral Data
ES2008-7
Machine learning approches and pattern recognition for spectral data
Thomas Villmann, Erzsebet Merenyi, Udo Seiffert
Machine learning approches and pattern recognition for spectral data
Thomas Villmann, Erzsebet Merenyi, Udo Seiffert
ES2008-110
Consistency of Derivative Based Functional Classifiers on Sampled Data
Fabrice Rossi, Nathalie Villa
Consistency of Derivative Based Functional Classifiers on Sampled Data
Fabrice Rossi, Nathalie Villa
Abstract:
In some applications, especially spectrometric ones, curve classifiers achieve better performances if they work on the $m$-order derivatives of their inputs. This paper proposes a smoothing spline based approach that give a strong theoretical background to this common practice.
In some applications, especially spectrometric ones, curve classifiers achieve better performances if they work on the $m$-order derivatives of their inputs. This paper proposes a smoothing spline based approach that give a strong theoretical background to this common practice.
ES2008-54
Generalized matrix learning vector quantizer for the analysis of spectral data
Petra Schneider, Frank-Michael Schleif, Thomas Villmann, Michael Biehl
Generalized matrix learning vector quantizer for the analysis of spectral data
Petra Schneider, Frank-Michael Schleif, Thomas Villmann, Michael Biehl
Abstract:
Abstract. The analysis of spectral data constitutes new challenges for machine learning algorithms due to the functional nature of the data. Special attention is given to the used metric in such analysis. Recently a prototype based algorithm has been proposed which allows the integration of a full adaptive matrix in the metric. In this contribution we analyse this approach with respect to band matrices and its usage for the analysis of functional spectral data. The approach is tested on satellite data and data taken from food chemistry.
Abstract. The analysis of spectral data constitutes new challenges for machine learning algorithms due to the functional nature of the data. Special attention is given to the used metric in such analysis. Recently a prototype based algorithm has been proposed which allows the integration of a full adaptive matrix in the metric. In this contribution we analyse this approach with respect to band matrices and its usage for the analysis of functional spectral data. The approach is tested on satellite data and data taken from food chemistry.
ES2008-35
Linear Projection based on Noise Variance Estimation - Application to Spectral Data
Amaury Lendasse, Francesco Corona
Linear Projection based on Noise Variance Estimation - Application to Spectral Data
Amaury Lendasse, Francesco Corona
Abstract:
In this paper, we propose a new methodology to build latent variables that are optimal if a nonlinear model is used afterward. This method is based on Nonparametric Noise Estimation (NNE). NNE is providing an estimate of the variance of the noise between input and output variables. The linear projection that builds latent variables is optimized in order to minimize the NNE. We successfully tested the proposed methodology on a referenced spectral dataset from food industry (Tecator).
In this paper, we propose a new methodology to build latent variables that are optimal if a nonlinear model is used afterward. This method is based on Nonparametric Noise Estimation (NNE). NNE is providing an estimate of the variance of the noise between input and output variables. The linear projection that builds latent variables is optimized in order to minimize the NNE. We successfully tested the proposed methodology on a referenced spectral dataset from food industry (Tecator).
ES2008-55
Inverting hyperspectral images with Gaussian Regularized Sliced Inverse Regression
Caroline Bernard-Michel, Sylvain Douté, Laurent Gardes, Stephane Girard
Inverting hyperspectral images with Gaussian Regularized Sliced Inverse Regression
Caroline Bernard-Michel, Sylvain Douté, Laurent Gardes, Stephane Girard
Abstract:
In the context of hyperspectral image analysis in planetology, we show how to estimate the physical parameters that generate the spectral infrared signal reflected by Mars. The training approach we develop is based on the estimation of the functional relationship between parameters and spectra, using a database of synthetic spectra generated by a physical model. The high dimension of spectra is reduced by using Gaussian regularized inverse regression to overcome the curse of dimensionality. Compared with a basic k-nearest neighbors approach, estimates are more accurate and are thus promising.
In the context of hyperspectral image analysis in planetology, we show how to estimate the physical parameters that generate the spectral infrared signal reflected by Mars. The training approach we develop is based on the estimation of the functional relationship between parameters and spectra, using a database of synthetic spectra generated by a physical model. The high dimension of spectra is reduced by using Gaussian regularized inverse regression to overcome the curse of dimensionality. Compared with a basic k-nearest neighbors approach, estimates are more accurate and are thus promising.
Learning III
ES2008-2
Comparison of sparse least squares support vector regressors trained in primal and dual
Shigeo Abe
Comparison of sparse least squares support vector regressors trained in primal and dual
Shigeo Abe
Abstract:
In our previous work, we have developed sparse least squares support vector regressors (sparse LS SVRs) trained in the primal form in the reduced empirical feature space. In this paper we develop sparse LS SVRs trained in the dual form in the empirical feature space. Namely, first the support vectors that span the reduced empirical feature space are selected by the Cholesky factorization and LS SVR is trained in the dual form by solving a set of linear equations. We compare the computational cost of the LS SVRs in the primal and dual form and clarify that if the dimension of the reduced empirical feature space is almost equal to the number of training data, the dual form is faster. But the primal form is computationally more stable and for the large margin parameter the coefficient matrix of the dual form becomes near singular. By computer experiments using some benchmark data sets we verify the above results.
In our previous work, we have developed sparse least squares support vector regressors (sparse LS SVRs) trained in the primal form in the reduced empirical feature space. In this paper we develop sparse LS SVRs trained in the dual form in the empirical feature space. Namely, first the support vectors that span the reduced empirical feature space are selected by the Cholesky factorization and LS SVR is trained in the dual form by solving a set of linear equations. We compare the computational cost of the LS SVRs in the primal and dual form and clarify that if the dimension of the reduced empirical feature space is almost equal to the number of training data, the dual form is faster. But the primal form is computationally more stable and for the large margin parameter the coefficient matrix of the dual form becomes near singular. By computer experiments using some benchmark data sets we verify the above results.
ES2008-58
On related violating pairs for working set selection in SMO algorithms
Tobias Glasmachers
On related violating pairs for working set selection in SMO algorithms
Tobias Glasmachers
Abstract:
Sequential Minimal Optimization (SMO) is currently the most popular algorithm to solve large quadratic programs for Support Vector Machine (SVM) training. For many variants of this iterative algorithm proofs of convergence to the optimum exist. Nevertheless, to find such proofs for elaborated SMO-type algorithms is challenging in general. We provide a basic tool for such convergence proofs in the context of cache-friendly working set selection. Finally this result is applied to notably simplify the convergence proof of the very efficient Hybrid Maximum Gain algorithm.
Sequential Minimal Optimization (SMO) is currently the most popular algorithm to solve large quadratic programs for Support Vector Machine (SVM) training. For many variants of this iterative algorithm proofs of convergence to the optimum exist. Nevertheless, to find such proofs for elaborated SMO-type algorithms is challenging in general. We provide a basic tool for such convergence proofs in the context of cache-friendly working set selection. Finally this result is applied to notably simplify the convergence proof of the very efficient Hybrid Maximum Gain algorithm.
ES2008-89
Discrimination of regulatory DNA by SVM on the basis of over- and under-represented motifs
Rene te Boekhorst, Irina Abnizova, Lorenz Wernisch
Discrimination of regulatory DNA by SVM on the basis of over- and under-represented motifs
Rene te Boekhorst, Irina Abnizova, Lorenz Wernisch
Abstract:
In this paper we apply three pattern recognition methods (support vector machine, cluster analysis and principal component analysis) to distinguish regulatory regions from coding and non-coding non regulatory DNA sequences. Using a new feature representation (the degree by which motifs are over- and under-represented) we demonstrate the remarkable power of this methodology in identifying regulatory regions of Drosophila melanogaster
In this paper we apply three pattern recognition methods (support vector machine, cluster analysis and principal component analysis) to distinguish regulatory regions from coding and non-coding non regulatory DNA sequences. Using a new feature representation (the degree by which motifs are over- and under-represented) we demonstrate the remarkable power of this methodology in identifying regulatory regions of Drosophila melanogaster
ES2008-87
Automatic alignment of medical vs. general terminologies
Laura Diosan, Alexandrina Rogozan, Jean Pierre Pecuchet
Automatic alignment of medical vs. general terminologies
Laura Diosan, Alexandrina Rogozan, Jean Pierre Pecuchet
Abstract:
We propose an original automatic alignment of definitions taken from different dictionaries that could be associated to the same concept although they may have different labels. The alignment between a specialized terminology used by the librarians to index concepts and a general vocabulary employed by a neophyte user in order to retrieve documents on Internet, will certainly improve the performances of the information retrieval process. The selected framework is a medical one. We propose a terminology alignment by an SVM classifier trained on a compact, but relevant representation of such definition pair by several similarity measures and the length of de¯nitions. Three syntactic levels are investigated: Nouns, Nouns-Adjectives, and Nouns-Adjectives-Verbs. Our aim is to show how the combination of similarity measures offers a better semantic access to the document content than only one measure and it improves the performances of the automatic alignment. The results obtained on the test set show the relevance of our approach, as the F-measure reaches 88%. However, this conclusion should be validated on larger corpora.
We propose an original automatic alignment of definitions taken from different dictionaries that could be associated to the same concept although they may have different labels. The alignment between a specialized terminology used by the librarians to index concepts and a general vocabulary employed by a neophyte user in order to retrieve documents on Internet, will certainly improve the performances of the information retrieval process. The selected framework is a medical one. We propose a terminology alignment by an SVM classifier trained on a compact, but relevant representation of such definition pair by several similarity measures and the length of de¯nitions. Three syntactic levels are investigated: Nouns, Nouns-Adjectives, and Nouns-Adjectives-Verbs. Our aim is to show how the combination of similarity measures offers a better semantic access to the document content than only one measure and it improves the performances of the automatic alignment. The results obtained on the test set show the relevance of our approach, as the F-measure reaches 88%. However, this conclusion should be validated on larger corpora.
ES2008-116
Improving a statistical language model by modulating the effects of context words
Zhang Yuecheng, Andriy Mnih, Geoffrey Hinton
Improving a statistical language model by modulating the effects of context words
Zhang Yuecheng, Andriy Mnih, Geoffrey Hinton
Abstract:
We show how to improve a state-of-the-art neural network language model that converts the previous ``context'' words into feature vectors and combines these feature vectors to predict the feature vector of the next word. Significant improvements in predictive accuracy are achieved by using higher-level features to modulate the effects of the context words. This is more effective than using the higher-level features to directly predict the feature vector of the next word, but it is also possible to combine both methods.
We show how to improve a state-of-the-art neural network language model that converts the previous ``context'' words into feature vectors and combines these feature vectors to predict the feature vector of the next word. Significant improvements in predictive accuracy are achieved by using higher-level features to modulate the effects of the context words. This is more effective than using the higher-level features to directly predict the feature vector of the next word, but it is also possible to combine both methods.
ES2008-19
An emphasized target smoothing procedure to improve MLP classifiers performance
Soufiane El Jelali, Abdelouahid Lyhyaoui, Anibal R. Figueiras-Vidal
An emphasized target smoothing procedure to improve MLP classifiers performance
Soufiane El Jelali, Abdelouahid Lyhyaoui, Anibal R. Figueiras-Vidal
Abstract:
Standard learning procedures are better fitted to estimation than to classification problems, and focusing the training on appropriate samples provides performance advantages in classification tasks. In this paper, we combine these ideas creating smooth targets for classification by means of a convex combination of the original target and the output of an auxiliary classifier, the combination parameter being a function of the auxiliary classifier error. Experimental results with Multilayer Perceptron architectures support the usefulness of this approach.
Standard learning procedures are better fitted to estimation than to classification problems, and focusing the training on appropriate samples provides performance advantages in classification tasks. In this paper, we combine these ideas creating smooth targets for classification by means of a convex combination of the original target and the output of an auxiliary classifier, the combination parameter being a function of the auxiliary classifier error. Experimental results with Multilayer Perceptron architectures support the usefulness of this approach.
ES2008-102
A neural model with feedback for robust disambiguation of motion
Mauricio Cerda, Bernard Girau
A neural model with feedback for robust disambiguation of motion
Mauricio Cerda, Bernard Girau
Abstract:
The aperture problem is a direct consequence of any local detection in the visual perception of motion. It results in ambiguous responses of the local motion detectors. Biological systems, such as the brain of different mammals, are able to disambiguate motion detection. Such disambiguation is usually seen as a possible result of a pyramidal feedforward processing with growing receptive fields, but this approach is not able to detect motion in a simultaneously unambiguous and precise way. In this work we define a neural model of motion disambiguation that achieves both criteria, mainly with the help of excitatory feedback connections. Our model mostly differs from previous ones in the nature of the competition, by incorporating lateral inhibition. Its main advantages are: tolerance to noise and stability. We perform tests on synthetic image sequences that show the effectiveness of our approach.
The aperture problem is a direct consequence of any local detection in the visual perception of motion. It results in ambiguous responses of the local motion detectors. Biological systems, such as the brain of different mammals, are able to disambiguate motion detection. Such disambiguation is usually seen as a possible result of a pyramidal feedforward processing with growing receptive fields, but this approach is not able to detect motion in a simultaneously unambiguous and precise way. In this work we define a neural model of motion disambiguation that achieves both criteria, mainly with the help of excitatory feedback connections. Our model mostly differs from previous ones in the nature of the competition, by incorporating lateral inhibition. Its main advantages are: tolerance to noise and stability. We perform tests on synthetic image sequences that show the effectiveness of our approach.
ES2008-38
Multilayer perceptron to model the decarburization process in stainless steel production
Carlos Spínola, Carlos J. Gálvez-Fernández, Jose Muñoz-Perez, José Bonelo, Javier Ferrer, Julio Vizoso
Multilayer perceptron to model the decarburization process in stainless steel production
Carlos Spínola, Carlos J. Gálvez-Fernández, Jose Muñoz-Perez, José Bonelo, Javier Ferrer, Julio Vizoso
Abstract:
The Argon-Oxygen Decarburization (AOD) is the refining process of stainless steel to get its final chemical composition through several stages, where tons of materials are added and oxygen and inert gas are blown. The decarburization efficiency and the final temperature in each stage are two important values of this process. We present in this paper an empirical model, based on Multilayer Perceptron, to predict these values in order to automate and enhance the production performance of the AOD. Two architectures are proposed and compared.
The Argon-Oxygen Decarburization (AOD) is the refining process of stainless steel to get its final chemical composition through several stages, where tons of materials are added and oxygen and inert gas are blown. The decarburization efficiency and the final temperature in each stage are two important values of this process. We present in this paper an empirical model, based on Multilayer Perceptron, to predict these values in order to automate and enhance the production performance of the AOD. Two architectures are proposed and compared.
ES2008-27
An automatic identifier of Confinement Regimes at JET combining Fuzzy Logic and Classification Trees
Guido Vagliasindi, Paolo Arena, Luigi Fortuna, Andrea Murari, Giuseppe Mazzitelli, Antonio Gallo, Umberto Vagliasindi
An automatic identifier of Confinement Regimes at JET combining Fuzzy Logic and Classification Trees
Guido Vagliasindi, Paolo Arena, Luigi Fortuna, Andrea Murari, Giuseppe Mazzitelli, Antonio Gallo, Umberto Vagliasindi
Abstract:
In modern thermonuclear fusion devices it is possible to distinguish distinct types of plasma confinement regimes which have different performance in terms of confinement time. Discriminating among them could represent a useful feature for an efficient control of a plasma experiment. An automatic identifier based on fuzzy logic is here proposed together with an unsupervised technique, using classification and regression trees, for selecting, among several diagnostic signals available, the inputs to be provided to the identifier.
In modern thermonuclear fusion devices it is possible to distinguish distinct types of plasma confinement regimes which have different performance in terms of confinement time. Discriminating among them could represent a useful feature for an efficient control of a plasma experiment. An automatic identifier based on fuzzy logic is here proposed together with an unsupervised technique, using classification and regression trees, for selecting, among several diagnostic signals available, the inputs to be provided to the identifier.
ES2008-61
Combining neural networks and optimization techniques for visuokinesthetic prediction and motor planning
Wolfram Schenck, Dennis Sinder, Ralf Möller
Combining neural networks and optimization techniques for visuokinesthetic prediction and motor planning
Wolfram Schenck, Dennis Sinder, Ralf Möller
Abstract:
We present a method for motor planning based on visuokinesthetic prediction by a forward model (FM) and the optimization method ``differential evolution'' (DE) for a block-pushing task of a robot arm. The FM is implemented by a set of multi-layer perceptrons and used for the iterative prediction of future sensory states in an internal simulation process. DE is applied to determine via this internal simulation the movement sequences by which a target block can be successfully pushed from an arbitrary start to an arbitrary goal position. The presented method shows a good performance on the pushing task.
We present a method for motor planning based on visuokinesthetic prediction by a forward model (FM) and the optimization method ``differential evolution'' (DE) for a block-pushing task of a robot arm. The FM is implemented by a set of multi-layer perceptrons and used for the iterative prediction of future sensory states in an internal simulation process. DE is applied to determine via this internal simulation the movement sequences by which a target block can be successfully pushed from an arbitrary start to an arbitrary goal position. The presented method shows a good performance on the pushing task.
ES2008-68
Petri nets design based on neural networks
Edouard Leclercq, Souleiman Ould el Medhi, Dimitri Lefebvre
Petri nets design based on neural networks
Edouard Leclercq, Souleiman Ould el Medhi, Dimitri Lefebvre
Abstract:
Petri net faulty models are useful for reliability analysis and fault diagnosis of discrete event systems. Such models are difficult to work out as long as they must be computed according to alarm propagation. This paper deals with Petri net models synthesis and identification based on neural network approaches, with regard to event propagation and to state propagation dataset. A learning neural algorithm is proposed to build Petri net models, these models are suitable for the diagnosis of discrete event systems.
Petri net faulty models are useful for reliability analysis and fault diagnosis of discrete event systems. Such models are difficult to work out as long as they must be computed according to alarm propagation. This paper deals with Petri net models synthesis and identification based on neural network approaches, with regard to event propagation and to state propagation dataset. A learning neural algorithm is proposed to build Petri net models, these models are suitable for the diagnosis of discrete event systems.
ES2008-92
Detecting zebra crossings utilizing AdaBoost
Ludwig Lausser, Friedhelm Schwenker, Günther Palm
Detecting zebra crossings utilizing AdaBoost
Ludwig Lausser, Friedhelm Schwenker, Günther Palm
Abstract:
The paper introduces a zebra crossing detector based on the Viola Jones approach. For the task useful pre- and postprocessing procedures are described.
The paper introduces a zebra crossing detector based on the Viola Jones approach. For the task useful pre- and postprocessing procedures are described.