Bruges, Belgium, April 25-26-27
Content of the proceedings
-
Dynamic and complex systems
Prototype-based learning
Model selection and regularization
Fuzzy and Probabilistic Methods in Neural Networks and Machine Learning
Learning I
Convex Optimization for the Design of Learning Machines
Generative models and maximum likelihood approaches
Kernel methods and Support Vector Machines
Reinforcement Learning
Learning II
Biologically motivated learning
Learning causality
Reservoir Computing
Learning III
Dynamic and complex systems
ES2007-40
Synchronization and acceleration: complementary mechanisms of temporal coding
Thomas Burwick
Synchronization and acceleration: complementary mechanisms of temporal coding
Thomas Burwick
Abstract:
Temporal coding is studied with an oscillatory network model that is a complex-valued generalization of the Cohen-Grossberg-Hopfield system. The model is considered with synchronization and acceleration, where acceleration refers to a mechanism that causes the units of the network to oscillate with higher-phase velocity in case of stronger and/or more coherent input. Applying Hebbian memory, we demonstrate that acceleration introduces the desynchronization that is needed to segment two overlapping patterns without using inhibitory couplings.
Temporal coding is studied with an oscillatory network model that is a complex-valued generalization of the Cohen-Grossberg-Hopfield system. The model is considered with synchronization and acceleration, where acceleration refers to a mechanism that causes the units of the network to oscillate with higher-phase velocity in case of stronger and/or more coherent input. Applying Hebbian memory, we demonstrate that acceleration introduces the desynchronization that is needed to segment two overlapping patterns without using inhibitory couplings.
ES2007-64
Pattern Recognition using Chaotic Transients
Wee Jin Goh, Nigel Crook
Pattern Recognition using Chaotic Transients
Wee Jin Goh, Nigel Crook
Abstract:
This paper proposes a novel nonlinear transient computation device described as the LTCM that uses the chaotic attractor provided by the Lorenz system of equations to perform pattern recognition. Previous work on nonlinear transient computation has demonstrated that such devices can process time varying input signals. This paper investigates the ability of the LTCM to correctly classify static, linearly inseperable data sets commonly used as benchmarks in the pattern recognition research community. The results from the LTCM are compared with those from support vector machines and multi-layer perceptrons on the same data sets.
This paper proposes a novel nonlinear transient computation device described as the LTCM that uses the chaotic attractor provided by the Lorenz system of equations to perform pattern recognition. Previous work on nonlinear transient computation has demonstrated that such devices can process time varying input signals. This paper investigates the ability of the LTCM to correctly classify static, linearly inseperable data sets commonly used as benchmarks in the pattern recognition research community. The results from the LTCM are compared with those from support vector machines and multi-layer perceptrons on the same data sets.
ES2007-102
Order in Complex Systems of Nonlinear Oscillators: Phase Locked Subspaces
Jan-Hendrik Schleimer, Ricardo Vigário
Order in Complex Systems of Nonlinear Oscillators: Phase Locked Subspaces
Jan-Hendrik Schleimer, Ricardo Vigário
Abstract:
Any order parameter quantifying the degree of organisation in a physical system can be studied in connection to source extraction algorithms. Independent component analysis (ICA) by minimising the mutual information of the sources falls into that line of thought, since it can be interpreted as searching components with low complexity. Complexity pursuit, a modification minimising Kolmogorov complexity, is a further example. In this article a specific case of order in complex networks of self- sustained oscillators is discussed, with the objective of recovering original synchronisation pattern between them. The approach is put in relation with ICA.
Any order parameter quantifying the degree of organisation in a physical system can be studied in connection to source extraction algorithms. Independent component analysis (ICA) by minimising the mutual information of the sources falls into that line of thought, since it can be interpreted as searching components with low complexity. Complexity pursuit, a modification minimising Kolmogorov complexity, is a further example. In this article a specific case of order in complex networks of self- sustained oscillators is discussed, with the objective of recovering original synchronisation pattern between them. The approach is put in relation with ICA.
Prototype-based learning
ES2007-57
"Kernelized" Self-Organizing Maps for Structured Data
Fabio Aiolli, Giovanni Da San Martino, Alessandro Sperduti, Markus Hagenbuchner
"Kernelized" Self-Organizing Maps for Structured Data
Fabio Aiolli, Giovanni Da San Martino, Alessandro Sperduti, Markus Hagenbuchner
Abstract:
The suitability of the well known kernels for trees, and the lesser known Self-Organizing Map for Structures for categorization tasks on structured data is investigated in this paper. It is shown that a suitable combination of the two approaches, by defining new kernels on the activation map of a Self-Organizing Map for Structures, can result in a system that is significantly more accurate for categorization tasks on structured data. The effectiveness of the proposed approach is demonstrated experimentally on a relatively large corpus of XML formatted data.
The suitability of the well known kernels for trees, and the lesser known Self-Organizing Map for Structures for categorization tasks on structured data is investigated in this paper. It is shown that a suitable combination of the two approaches, by defining new kernels on the activation map of a Self-Organizing Map for Structures, can result in a system that is significantly more accurate for categorization tasks on structured data. The effectiveness of the proposed approach is demonstrated experimentally on a relatively large corpus of XML formatted data.
ES2007-138
Model collisions in the dissimilarity SOM
Fabrice Rossi
Model collisions in the dissimilarity SOM
Fabrice Rossi
Abstract:
We investigate in this paper the problem of model collisions in the Dissimilarity Self Organizing Map (SOM). This extension of the SOM to dissimilarity data suffers from constraints imposed on the model representation, that lead to some strong map folding: several units share a common prototype. We propose in this paper an efficient way to address this problem via a branch and bound approach.
We investigate in this paper the problem of model collisions in the Dissimilarity Self Organizing Map (SOM). This extension of the SOM to dissimilarity data suffers from constraints imposed on the model representation, that lead to some strong map folding: several units share a common prototype. We propose in this paper an efficient way to address this problem via a branch and bound approach.
ES2007-78
Clustering a medieval social network by SOM using a kernel based distance measure
Nathalie Villa, Romain Boulet
Clustering a medieval social network by SOM using a kernel based distance measure
Nathalie Villa, Romain Boulet
Abstract:
In order to explore the social organization of a medieval peasant community before the Hundred Years' War, we propose the use of an adaptation of the well-known Kohonen Self Organizing Map to dissimilarity data. In this paper, the algorithm is used with a distance based on a kernel which allows the choice of a smoothing parameter to control the importance of local or global proximities.
In order to explore the social organization of a medieval peasant community before the Hundred Years' War, we propose the use of an adaptation of the well-known Kohonen Self Organizing Map to dissimilarity data. In this paper, the algorithm is used with a distance based on a kernel which allows the choice of a smoothing parameter to control the importance of local or global proximities.
ES2007-81
Relevance matrices in LVQ
Petra Schneider, Michael Biehl, Barbara Hammer
Relevance matrices in LVQ
Petra Schneider, Michael Biehl, Barbara Hammer
Abstract:
We propose a new matrix learning scheme to extend Generalized Relevance Learning Vector Quantization (GRLVQ). By introducing a full matrix of relevance factors in the distance measure, correlations between different features and their importance for the classification scheme can be taken into account. In comparison to the weighted euclidean metric used for GRLVQ, this metric is more powerful to represent the internal structure of the data appropriately while maintaining its excellent generalization ability as large margin optimizer. The algorithm is tested and compared to alternative LVQ schemes using an artificial dataset and the image segmentation data from the UCI repository.
We propose a new matrix learning scheme to extend Generalized Relevance Learning Vector Quantization (GRLVQ). By introducing a full matrix of relevance factors in the distance measure, correlations between different features and their importance for the classification scheme can be taken into account. In comparison to the weighted euclidean metric used for GRLVQ, this metric is more powerful to represent the internal structure of the data appropriately while maintaining its excellent generalization ability as large margin optimizer. The algorithm is tested and compared to alternative LVQ schemes using an artificial dataset and the image segmentation data from the UCI repository.
ES2007-89
Tracking fast changing non-stationary distributions with a topologically adaptive neural network: application to video tracking
Georges Adrian Drumea, Hervé Frezza-Buet
Tracking fast changing non-stationary distributions with a topologically adaptive neural network: application to video tracking
Georges Adrian Drumea, Hervé Frezza-Buet
Abstract:
In this paper, an original method named GNG-T, extended from GNG-U algorithm by Fritzke is presented. The method performs continuously vector quantization over a distribution that changes over time. It deals with both sudden changes and continuous ones, and is thus suited for video tracking framework, where continuous tracking is required as well as fast adaptation to incoming and outgoing people. The central mechanism relies on the management of quantization resolution, that cope with stopping condition problems of usual Growing Neural Gas inspired methods. Application to video tracking is briefly presented.
In this paper, an original method named GNG-T, extended from GNG-U algorithm by Fritzke is presented. The method performs continuously vector quantization over a distribution that changes over time. It deals with both sudden changes and continuous ones, and is thus suited for video tracking framework, where continuous tracking is required as well as fast adaptation to incoming and outgoing people. The central mechanism relies on the management of quantization resolution, that cope with stopping condition problems of usual Growing Neural Gas inspired methods. Application to video tracking is briefly presented.
ES2007-110
Systematicity in sentence processing with a recursive self-organizing neural network
Igor Farkas, Matthew W. Crocker
Systematicity in sentence processing with a recursive self-organizing neural network
Igor Farkas, Matthew W. Crocker
Abstract:
As potential candidates for human cognition, connectionist models of sentence processing must learn to behave systematically by generalizing from a small traning set. It was recently shown that Elman networks and, to a greater extent, echo state networks (ESN) possess limited ability to generalize in artificial language learning tasks. We study this capacity for the recently introduced recursive self-organizing neural network model and show that its performance is comparable with ESNs.
As potential candidates for human cognition, connectionist models of sentence processing must learn to behave systematically by generalizing from a small traning set. It was recently shown that Elman networks and, to a greater extent, echo state networks (ESN) possess limited ability to generalize in artificial language learning tasks. We study this capacity for the recently introduced recursive self-organizing neural network model and show that its performance is comparable with ESNs.
Model selection and regularization
ES2007-22
Agglomerative Independent Variable Group Analysis
Antti Honkela, Jeremias Seppä, Esa Alhoniemi
Agglomerative Independent Variable Group Analysis
Antti Honkela, Jeremias Seppä, Esa Alhoniemi
Abstract:
Independent Variable Group Analysis (IVGA) is a principle for grouping dependent variables together while keeping mutually independent or weakly dependent variables in separate groups. In this paper an agglomerative method for learning a hierarchy of IVGA groupings is presented. The method resembles hierarchical clustering, but the distance measure is based on an approximation of mutual information between groups of variables. The approach also allows determining optimal cutoff points for the hierarchy. The method is demonstrated to find sensible groupings of variables that ease construction of a predictive model.
Independent Variable Group Analysis (IVGA) is a principle for grouping dependent variables together while keeping mutually independent or weakly dependent variables in separate groups. In this paper an agglomerative method for learning a hierarchy of IVGA groupings is presented. The method resembles hierarchical clustering, but the distance measure is based on an approximation of mutual information between groups of variables. The approach also allows determining optimal cutoff points for the hierarchy. The method is demonstrated to find sensible groupings of variables that ease construction of a predictive model.
ES2007-75
Classifying n-back EEG data using entropy and mutual information features
Liang Wu, Predrag Neskovic, Etienne Reyes, Elena Festa, Heindel William
Classifying n-back EEG data using entropy and mutual information features
Liang Wu, Predrag Neskovic, Etienne Reyes, Elena Festa, Heindel William
Abstract:
In this work we show that entropy (H) and mutual information (MI) can be used as methods for extracting spatially localized features for classification purposes. In order to increase accuracy of entropy estimation, we use a Bayesian approach with a Dirichlet prior to derive estimation equations. We calculate the H and MI features for each electrode (H) and pair of electrodes (MI) in three frequency bands and use them to train the Naive Bayes classifier. We test the H and MI features on one/five trial long segments of n-back memory EEG signals and show that they outperform power spectrum and linear correlation features respectively.
In this work we show that entropy (H) and mutual information (MI) can be used as methods for extracting spatially localized features for classification purposes. In order to increase accuracy of entropy estimation, we use a Bayesian approach with a Dirichlet prior to derive estimation equations. We calculate the H and MI features for each electrode (H) and pair of electrodes (MI) in three frequency bands and use them to train the Naive Bayes classifier. We test the H and MI features on one/five trial long segments of n-back memory EEG signals and show that they outperform power spectrum and linear correlation features respectively.
ES2007-62
Nearest Neighbor Distributions and Noise Variance Estimation
Elia Liitiäinen, Francesco Corona, Amaury Lendasse
Nearest Neighbor Distributions and Noise Variance Estimation
Elia Liitiäinen, Francesco Corona, Amaury Lendasse
Abstract:
In this paper, we address the problem of deriving bounds for the moments of nearest neighbor distributions. The bounds are formulated for the general case and specifically applied to the problem of noise variance estimation with the Delta and the Gamma test. For this problem, we focus on the rate of convergence and the bias of the estimators and validate the theoretical achievement with experimental results.
In this paper, we address the problem of deriving bounds for the moments of nearest neighbor distributions. The bounds are formulated for the general case and specifically applied to the problem of noise variance estimation with the Delta and the Gamma test. For this problem, we focus on the rate of convergence and the bias of the estimators and validate the theoretical achievement with experimental results.
ES2007-79
Complexity bounds of radial basis functions and multi-objective learning
Illya Kokshenev, Antônio Braga
Complexity bounds of radial basis functions and multi-objective learning
Illya Kokshenev, Antônio Braga
Abstract:
In the paper, the problem of multi-objective (MOBJ) learning is discussed. The problem of obtaining apparent (effective) complexity measure, which is one of the objectives, is considered. For the specific case of RBFN, the bounds on the smoothness-based complexity measure are proposed. As shown in the experimental part, the bounds can be used for Pareto set approximation.
In the paper, the problem of multi-objective (MOBJ) learning is discussed. The problem of obtaining apparent (effective) complexity measure, which is one of the objectives, is considered. For the specific case of RBFN, the bounds on the smoothness-based complexity measure are proposed. As shown in the experimental part, the bounds can be used for Pareto set approximation.
Fuzzy and Probabilistic Methods in Neural Networks and Machine Learning
ES2007-7
How to process uncertainty in machine learning?
Barbara Hammer, Thomas Villmann
How to process uncertainty in machine learning?
Barbara Hammer, Thomas Villmann
ES2007-23
An Estimation of Response Certainty using Features of Eye-movements
Minoru Nakayama, Yosiyuki Takahasi
An Estimation of Response Certainty using Features of Eye-movements
Minoru Nakayama, Yosiyuki Takahasi
Abstract:
To examine the feasibility of estimating the degree of ``strength of belief (SOB)'' of viewer's responses using support vector machines (SVM) trained with features of gazes, the gazing features were analyzed while subjects reviewed their own responses to multiple choice tasks. Subjects freely reported the certainty of their chosen answers, and these responses were then classified as high and low SOBs. All gazing points of eye-movements were classified into visual areas, or cells, which corresponded with the positions of answers so that training data, consisting of the features and SOB, was produced. A discrimination model for SOB was trained with several combinations of features to see whether performance of a significant level could be obtained. As a result, a trained model with 3 features, which consists of interval time, vertical difference and length between gazes, can provide significant discrimination performance for SOB.
To examine the feasibility of estimating the degree of ``strength of belief (SOB)'' of viewer's responses using support vector machines (SVM) trained with features of gazes, the gazing features were analyzed while subjects reviewed their own responses to multiple choice tasks. Subjects freely reported the certainty of their chosen answers, and these responses were then classified as high and low SOBs. All gazing points of eye-movements were classified into visual areas, or cells, which corresponded with the positions of answers so that training data, consisting of the features and SOB, was produced. A discrimination model for SOB was trained with several combinations of features to see whether performance of a significant level could be obtained. As a result, a trained model with 3 features, which consists of interval time, vertical difference and length between gazes, can provide significant discrimination performance for SOB.
ES2007-115
Visualisation of tree-structured data through generative probabilistic modelling
Nikolaos Gianniotis, Peter Tino
Visualisation of tree-structured data through generative probabilistic modelling
Nikolaos Gianniotis, Peter Tino
Abstract:
We present a generative probabilistic model for the topographic mapping of tree structured data. The model is formulated as constrained mixture of hidden Markov tree models. A natural measure of likelihood arises as a cost function that guides the model fitting. We compare our approach with an existing neural-based methodology for constructing topographic maps of directed acyclic graphs. We argue that the probabilistic nature of our model brings several advantages, such as principled interpretation of the visualisation plots.
We present a generative probabilistic model for the topographic mapping of tree structured data. The model is formulated as constrained mixture of hidden Markov tree models. A natural measure of likelihood arises as a cost function that guides the model fitting. We compare our approach with an existing neural-based methodology for constructing topographic maps of directed acyclic graphs. We argue that the probabilistic nature of our model brings several advantages, such as principled interpretation of the visualisation plots.
ES2007-145
Visualization of Fuzzy Information in Fuzzy-Classification for Image Segmentation using MDS
Thomas Villmann, Strickert Marc, Cornelia Brüß, Frank-Michael Schleif, Udo Seiffert
Visualization of Fuzzy Information in Fuzzy-Classification for Image Segmentation using MDS
Thomas Villmann, Strickert Marc, Cornelia Brüß, Frank-Michael Schleif, Udo Seiffert
Abstract:
\N
\N
Learning I
ES2007-99
SOM for intensity inhomogeneity correction in MRI
Maite García-Sebastián, Manuel Graña
SOM for intensity inhomogeneity correction in MRI
Maite García-Sebastián, Manuel Graña
Abstract:
Given an appropiate imaging resolution, a common Magnetic Resonance Imaging (MRI) model assumes that object under study is composed of piecewise constant materials, so that MRI produces piecewise constant images. The intensity inhomogeneity (IIH) is modeled by a multiplicative inhomogeneity field. It is due to the spatial inhomogeneity in the excitatory Radio Frecuency (RF) signal and other effects. It has been acknowledged as a greater source of error for automatic segmentation algorithms than additive noise. We propose a new non parametric IIH correction algorithm where the Self Organizing Map (SOM) is used to estimate the IIH field.
Given an appropiate imaging resolution, a common Magnetic Resonance Imaging (MRI) model assumes that object under study is composed of piecewise constant materials, so that MRI produces piecewise constant images. The intensity inhomogeneity (IIH) is modeled by a multiplicative inhomogeneity field. It is due to the spatial inhomogeneity in the excitatory Radio Frecuency (RF) signal and other effects. It has been acknowledged as a greater source of error for automatic segmentation algorithms than additive noise. We propose a new non parametric IIH correction algorithm where the Self Organizing Map (SOM) is used to estimate the IIH field.
ES2007-83
SOM+EOF for finding missing values
Antti Sorjamaa, Paul Merlin, Bertrand Maillet, Amaury Lendasse
SOM+EOF for finding missing values
Antti Sorjamaa, Paul Merlin, Bertrand Maillet, Amaury Lendasse
Abstract:
In this paper, a new method for the determination of missing values in temporal databases is presented. This new method is based on two projection methods: a nonlinear one (Self-Organized Maps) and a linear one (Empirical Orthogonal Functions). The global methodology that is presented combines the advantages of both methods to get accurate candidates for missing values. An application of the determination of missing values for fund return database is presented.
In this paper, a new method for the determination of missing values in temporal databases is presented. This new method is based on two projection methods: a nonlinear one (Self-Organized Maps) and a linear one (Empirical Orthogonal Functions). The global methodology that is presented combines the advantages of both methods to get accurate candidates for missing values. An application of the determination of missing values for fund return database is presented.
ES2007-124
Self-organized chains for clustering
Hassan Ghaziri
Self-organized chains for clustering
Hassan Ghaziri
Abstract:
This paper presents a new algorithm for clustering. It is an generalisation of the K-means algorithms . Each cluster will be represented by a chain of prototypes instead of being represented by one prototype like for the K-means. The chains are competing together to represent clusters and are evolving according to Kohonen maps adaptation rule. It is well known that K-means performs very well with hyper-spherical data and has difficulties in dealing with irregular data. We have shown on special artificial data that the new algorithm we are presenting performs very well for different types of data sets . In addition, it shows robustness regarding initial conditions.
This paper presents a new algorithm for clustering. It is an generalisation of the K-means algorithms . Each cluster will be represented by a chain of prototypes instead of being represented by one prototype like for the K-means. The chains are competing together to represent clusters and are evolving according to Kohonen maps adaptation rule. It is well known that K-means performs very well with hyper-spherical data and has difficulties in dealing with irregular data. We have shown on special artificial data that the new algorithm we are presenting performs very well for different types of data sets . In addition, it shows robustness regarding initial conditions.
ES2007-59
On the dynamics of Vector Quantization and Neural Gas
Aree Witoelar, Michael Biehl, Anarta Ghosh, Barbara Hammer
On the dynamics of Vector Quantization and Neural Gas
Aree Witoelar, Michael Biehl, Anarta Ghosh, Barbara Hammer
Abstract:
A large variety of machine learning models which aim at vector quantization have been proposed. However, only very preliminary rigorous mathematical analysis concerning their learning behavior such as convergence speed, robustness with respect to initialization, etc.\ exists. In this paper, we use the theory of on-line learning for an exact mathematical description of the training dynamics of Vector Quantization mechanisms in model situations. We study update rules including the basic Winner-Takes-All mechanism and the Rank-Based update of the popular Neural Gas network. We investigate a model with three competing prototypes trained from a mixture of Gaussian clusters and compare performances in terms of dynamics, sensitivity to initial conditions and asymptotic results. We demonstrate that rank-based Neural Gas achieves both robustness to initial conditions and best asymptotic quantization error.
A large variety of machine learning models which aim at vector quantization have been proposed. However, only very preliminary rigorous mathematical analysis concerning their learning behavior such as convergence speed, robustness with respect to initialization, etc.\ exists. In this paper, we use the theory of on-line learning for an exact mathematical description of the training dynamics of Vector Quantization mechanisms in model situations. We study update rules including the basic Winner-Takes-All mechanism and the Rank-Based update of the popular Neural Gas network. We investigate a model with three competing prototypes trained from a mixture of Gaussian clusters and compare performances in terms of dynamics, sensitivity to initial conditions and asymptotic results. We demonstrate that rank-based Neural Gas achieves both robustness to initial conditions and best asymptotic quantization error.
ES2007-33
Three-dimensional self-organizing dynamical systems for discrete structures memorizing and retrieval
Alexander Yudashkin
Three-dimensional self-organizing dynamical systems for discrete structures memorizing and retrieval
Alexander Yudashkin
Abstract:
The synthesis concept for dynamical system with the memory of multiple states defined with the quaternion algebra usage is considered. The system memorizes numerous configurations consisting of separate nodes and retrieves any of them from the non-stationary distorted state. Each stored configuration corresponds to the particular attractor of the dynamical system, defined by the set of nonlinear ordinary differential equation in the hypercomplex domain. The model demonstrates the intelligence in sample structure assembling based on the initial desire, that is shown numerically in the paper. Such models can be used in robotics, complex information systems and in pattern recognition tasks.
The synthesis concept for dynamical system with the memory of multiple states defined with the quaternion algebra usage is considered. The system memorizes numerous configurations consisting of separate nodes and retrieves any of them from the non-stationary distorted state. Each stored configuration corresponds to the particular attractor of the dynamical system, defined by the set of nonlinear ordinary differential equation in the hypercomplex domain. The model demonstrates the intelligence in sample structure assembling based on the initial desire, that is shown numerically in the paper. Such models can be used in robotics, complex information systems and in pattern recognition tasks.
ES2007-76
Clustering using genetic algorithm combining validation criteria
Murilo Naldi, André Carvalho
Clustering using genetic algorithm combining validation criteria
Murilo Naldi, André Carvalho
Abstract:
Clustering techniques have been a valuable tool for several data analysis applications. However, one of the main difficulties associated with clustering is the validation of the results obtained. Both clustering algorithms and validation criteria present an inductive bias, which can favor datasets with particular characteristics. Besides, different runs of the same algorithm using the same data set may produce different clusters. In this work, traditional clustering and validation techniques are combined with Genetic Algorithms (GAs) to build clusters that better approximate the real distribution of the dataset. The GA employs a fitness function that combines two validation criteria. Such combination allows the GA to improve the evaluation of the candidate solutions. Furthermore, this combined approach avoids the individual weaknesses of each criterion. A set of experiments are run to compare the proposed model with other clustering algorithms, with promising results.
Clustering techniques have been a valuable tool for several data analysis applications. However, one of the main difficulties associated with clustering is the validation of the results obtained. Both clustering algorithms and validation criteria present an inductive bias, which can favor datasets with particular characteristics. Besides, different runs of the same algorithm using the same data set may produce different clusters. In this work, traditional clustering and validation techniques are combined with Genetic Algorithms (GAs) to build clusters that better approximate the real distribution of the dataset. The GA employs a fitness function that combines two validation criteria. Such combination allows the GA to improve the evaluation of the candidate solutions. Furthermore, this combined approach avoids the individual weaknesses of each criterion. A set of experiments are run to compare the proposed model with other clustering algorithms, with promising results.
ES2007-112
Toward a robust 2D spatio-temporal self-organization
Thomas Girod, Laurent Bougrain, Frédéric Alexandre
Toward a robust 2D spatio-temporal self-organization
Thomas Girod, Laurent Bougrain, Frédéric Alexandre
Abstract:
Several models have been proposed for spatio-temporal self-organization, among which the TOM model by Wiemer is particularly promising. In this paper, we propose to adapt and extend this model to 2D maps to make it more generic and biologically plausible and more adapted to realistic applications, illustrated here by an application to speech analysis.
Several models have been proposed for spatio-temporal self-organization, among which the TOM model by Wiemer is particularly promising. In this paper, we propose to adapt and extend this model to 2D maps to make it more generic and biologically plausible and more adapted to realistic applications, illustrated here by an application to speech analysis.
ES2007-58
Adaptive Weight Change Mechanism for Kohonens's Neural Network Implemented in CMOS 0.18 um Technology
Tomasz Talaska, Rafal Dlugosz, Pedrycz Witold
Adaptive Weight Change Mechanism for Kohonens's Neural Network Implemented in CMOS 0.18 um Technology
Tomasz Talaska, Rafal Dlugosz, Pedrycz Witold
Abstract:
In this paper, we present a block of adaptive weight change (AWC) mechanism for analog current-mode Kohonen's Neural Network (KNN) implemented in CMOS 0.18 um technology. As the other essential building blocks of KNNs dealing with the calculations of Euclidean distance, formation of a conscience mechanism and determination of the winner-takes-all (WTA) circuits have been already developed, the AWC forms another essential step towards the realization of the network. We show that the proposed network works with small values of analog signals thus resulting in low power dissipation and chip area when compared with digital realizations of KNNs. Each neuron occupies chip area equal to about 1000 um2 and dissipates 20 uW of power for 20 MHz input data rate.
In this paper, we present a block of adaptive weight change (AWC) mechanism for analog current-mode Kohonen's Neural Network (KNN) implemented in CMOS 0.18 um technology. As the other essential building blocks of KNNs dealing with the calculations of Euclidean distance, formation of a conscience mechanism and determination of the winner-takes-all (WTA) circuits have been already developed, the AWC forms another essential step towards the realization of the network. We show that the proposed network works with small values of analog signals thus resulting in low power dissipation and chip area when compared with digital realizations of KNNs. Each neuron occupies chip area equal to about 1000 um2 and dissipates 20 uW of power for 20 MHz input data rate.
ES2007-146
Feature clustering and mutual information for the selection of variables in spectral data
Catherine Krier, Damien Francois, Fabrice Rossi, Michel Verleysen
Feature clustering and mutual information for the selection of variables in spectral data
Catherine Krier, Damien Francois, Fabrice Rossi, Michel Verleysen
Abstract:
Spectral data often have a large number of highly-correlated features, making feature selection both necessary and uneasy. A methodology combining hierarchical constrained clustering of spectral variables and selection of clusters by mutual information is proposed. The clustering allows reducing the number of features to be selected by grouping similar and consecutive spectral variables together, allowing an easy interpretation. The approach is applied to two datasets related to spectroscopy data from the food industry.
Spectral data often have a large number of highly-correlated features, making feature selection both necessary and uneasy. A methodology combining hierarchical constrained clustering of spectral variables and selection of clusters by mutual information is proposed. The clustering allows reducing the number of features to be selected by grouping similar and consecutive spectral variables together, allowing an easy interpretation. The approach is applied to two datasets related to spectroscopy data from the food industry.
ES2007-67
Prediction of post-synaptic activity in proteins using recursive feature elimination
Bernardo Carvalho, Ricardo Ribeiro, Talles Medeiros
Prediction of post-synaptic activity in proteins using recursive feature elimination
Bernardo Carvalho, Ricardo Ribeiro, Talles Medeiros
Abstract:
This work presents a new approach to predict post-synaptic activities in proteins. It uses a feature selection technique, called Recursive Feature Elimination, in order to select only the relevant features from the complete database. Once the reduced subset is found, Least Squares Support Vector Machine, a SVM based classifier, is used to predict its classes. The experiments were performed on a database that was harvested from Swiss Prot/Uniprot, a public domain database with a rich source of information for a very large number of proteins. The obtained results show that the proposed approach led to a reduced representation to the database, using only 6% of the original information, and yielded an improvement into the classification when compared to another two prediction techniques applied to the complete database, Decision Tree and Least Squares Support Vector Machine.
This work presents a new approach to predict post-synaptic activities in proteins. It uses a feature selection technique, called Recursive Feature Elimination, in order to select only the relevant features from the complete database. Once the reduced subset is found, Least Squares Support Vector Machine, a SVM based classifier, is used to predict its classes. The experiments were performed on a database that was harvested from Swiss Prot/Uniprot, a public domain database with a rich source of information for a very large number of proteins. The obtained results show that the proposed approach led to a reduced representation to the database, using only 6% of the original information, and yielded an improvement into the classification when compared to another two prediction techniques applied to the complete database, Decision Tree and Least Squares Support Vector Machine.
ES2007-18
A new feature selection scheme using data distribution factor for transactional data
Piyang Wang, Tommy W. S. Chow
A new feature selection scheme using data distribution factor for transactional data
Piyang Wang, Tommy W. S. Chow
Abstract:
A new efficient unsupervised feature selection method is proposed to handle transactional data. The proposed feature selection method introduces a new Data Distribution Factor (DDF) to select appropriate clusters. This method combines the compactness and separation together with a newly introduced concept of singleton item. This new feature selection method is computationally inexpensive and is able to deliver very promising results. Four datasets from UCI machine learning repository are used in this studied. The obtained results show that the proposed method is very efficient and able to deliver very reliable results.
A new efficient unsupervised feature selection method is proposed to handle transactional data. The proposed feature selection method introduces a new Data Distribution Factor (DDF) to select appropriate clusters. This method combines the compactness and separation together with a newly introduced concept of singleton item. This new feature selection method is computationally inexpensive and is able to deliver very promising results. Four datasets from UCI machine learning repository are used in this studied. The obtained results show that the proposed method is very efficient and able to deliver very reliable results.
ES2007-55
informational cost in correlation-based neuronal networks
Gaetano Liborio Aiello, Carlo Casarino
informational cost in correlation-based neuronal networks
Gaetano Liborio Aiello, Carlo Casarino
Abstract:
The cost of maintaining a given level of activity in a neuronal network depends on its size and degree of connectivity. Should a neural function require large-size fully-connected networks, the cost can easily exceed metabolic resources, especially for high level neural functions. We show that, even in this case, the cost can still match the energetic resources provided the function is broken down into a set of subfunctions, each assigned to a higly-connected, small- size module, all together forming a correlation-based type network. Cell assemblies are the best examples of such type of networks.
The cost of maintaining a given level of activity in a neuronal network depends on its size and degree of connectivity. Should a neural function require large-size fully-connected networks, the cost can easily exceed metabolic resources, especially for high level neural functions. We show that, even in this case, the cost can still match the energetic resources provided the function is broken down into a set of subfunctions, each assigned to a higly-connected, small- size module, all together forming a correlation-based type network. Cell assemblies are the best examples of such type of networks.
ES2007-13
Controlling complexity of RBF networks by similarity
Ulrich Rückert, Ralf Eickhoff
Controlling complexity of RBF networks by similarity
Ulrich Rückert, Ralf Eickhoff
Abstract:
Using radial basis function networks for function approximation tasks suffers from unavailable knowledge about an adequate network size. In this work, a measuring technique is proposed which can control the model complexity and is based on the correlation coefficient between two basis functions. Simulation results show good performance and, therefore, this technique can be integrated in the RBF training procedure.
Using radial basis function networks for function approximation tasks suffers from unavailable knowledge about an adequate network size. In this work, a measuring technique is proposed which can control the model complexity and is based on the correlation coefficient between two basis functions. Simulation results show good performance and, therefore, this technique can be integrated in the RBF training procedure.
ES2007-37
Adaptive Global Metamodeling with Neural Networks
Dirk Gorissen, Wouter Hendrickx, Tom Dhaene
Adaptive Global Metamodeling with Neural Networks
Dirk Gorissen, Wouter Hendrickx, Tom Dhaene
Abstract:
Due to the scale and computational complexity of current simulation codes, metamodels (or surrogate models) have become indispensable tools for exploring and understanding the design space. Consequently, there is great interest in techniques that facilitate the construction and evaluation of such approximation models while minimizing the computational cost and maximizing metamodel accuracy. This paper presents a novel, adaptive, integrated approach to global metamodeling with neural networks based on the Multivariate Metamodeling Toolbox. An adaptive, evolutionary inspired, modeling algorithm is presented and its performance compared with rational metamodeling on a number of test problems.
Due to the scale and computational complexity of current simulation codes, metamodels (or surrogate models) have become indispensable tools for exploring and understanding the design space. Consequently, there is great interest in techniques that facilitate the construction and evaluation of such approximation models while minimizing the computational cost and maximizing metamodel accuracy. This paper presents a novel, adaptive, integrated approach to global metamodeling with neural networks based on the Multivariate Metamodeling Toolbox. An adaptive, evolutionary inspired, modeling algorithm is presented and its performance compared with rational metamodeling on a number of test problems.
Convex Optimization for the Design of Learning Machines
ES2007-5
Convex optimization for the design of learning machines
Kristiaan Pelckmans, Johan Suykens, De Moor Bart
Convex optimization for the design of learning machines
Kristiaan Pelckmans, Johan Suykens, De Moor Bart
ES2007-72
Deploying SDP for machine learning
Tijl De Bie
Deploying SDP for machine learning
Tijl De Bie
Abstract:
We discuss the use in machine learning of a general type of convex optimisation problems known as semi-definite programming (SDP). We intend to argue that SDP's arise quite naturally in a variety of situations, accounting for there omnipresence in modern machine learning approaches, and we provide examples in support.
We discuss the use in machine learning of a general type of convex optimisation problems known as semi-definite programming (SDP). We intend to argue that SDP's arise quite naturally in a variety of situations, accounting for there omnipresence in modern machine learning approaches, and we provide examples in support.
ES2007-73
A metamorphosis of Canonical Correlation Analysis into multivariate maximum margin learning
Sandor Szedmak, Tijl De Bie, David R. Hardoon
A metamorphosis of Canonical Correlation Analysis into multivariate maximum margin learning
Sandor Szedmak, Tijl De Bie, David R. Hardoon
Abstract:
Canonical Correlation Analysis(CCA) is a useful tool to discover relationship between different sources of information represented by vectors. The solution of the underlying optimization problem involves a generalized eigenproblem and is nonconvex. We will show a sequence of transformations which turn CCA into a convex maximum margin problem. The new formulation can be applied for the same class of problems at a significantly lower computational cost and with a better numerical stability.
Canonical Correlation Analysis(CCA) is a useful tool to discover relationship between different sources of information represented by vectors. The solution of the underlying optimization problem involves a generalized eigenproblem and is nonconvex. We will show a sequence of transformations which turn CCA into a convex maximum margin problem. The new formulation can be applied for the same class of problems at a significantly lower computational cost and with a better numerical stability.
ES2007-38
Model Selection for Kernel Probit Regression
Gavin Cawley
Model Selection for Kernel Probit Regression
Gavin Cawley
Abstract:
The convex optimisation problem involved in fitting a kernel probit regression (KPR) model can be solved efficiently via an iteratively re-weighted least-squares (IRWLS) approach. The use of successive quadratic approximations of the true objective function suggests an efficient approximate form of leave-one-out cross-validation for KPR, based on an existing exact algorithm for the weighted least-squares support vector machine. This forms the basis for an efficient gradient descent model selection procedure used to tune the values of the regularisation and kernel parameters. Experimental results are given demonstrating the utility of this approach.
The convex optimisation problem involved in fitting a kernel probit regression (KPR) model can be solved efficiently via an iteratively re-weighted least-squares (IRWLS) approach. The use of successive quadratic approximations of the true objective function suggests an efficient approximate form of leave-one-out cross-validation for KPR, based on an existing exact algorithm for the weighted least-squares support vector machine. This forms the basis for an efficient gradient descent model selection procedure used to tune the values of the regularisation and kernel parameters. Experimental results are given demonstrating the utility of this approach.
ES2007-29
Interval discriminant analysis using support vector machines
Cecilio Angulo, Davide Anguita, Luis González
Interval discriminant analysis using support vector machines
Cecilio Angulo, Davide Anguita, Luis González
Abstract:
Imprecision, incompleteness, prior knowledge or improved learning speed can motivate interval–represented data. Most approaches for SVM learning of interval data use local kernels based on interval distances. We present here a novel approach, suitable for linear SVMs, which allows to deal with interval data without resorting to interval distances. The experimental results confirms the validity of our proposal.
Imprecision, incompleteness, prior knowledge or improved learning speed can motivate interval–represented data. Most approaches for SVM learning of interval data use local kernels based on interval distances. We present here a novel approach, suitable for linear SVMs, which allows to deal with interval data without resorting to interval distances. The experimental results confirms the validity of our proposal.
Generative models and maximum likelihood approaches
ES2007-126
Mixtures of robust probabilistic principal component analyzers
Cédric Archambeau, Nicolas Delannay, Michel Verleysen
Mixtures of robust probabilistic principal component analyzers
Cédric Archambeau, Nicolas Delannay, Michel Verleysen
Abstract:
Discovering low-dimensional (nonlinear) manifolds is an important open problem in Machine Learning. In many applications, the data are living in a high dimensional space. This can lead to serious problems in practice due to the curse of dimensionality. Fortunately, the core of the data lies often on one or several low-dimensional manifolds. A way to handle these is to pre-process the data by nonlinear data projection techniques (see for example Tenenbaum, et al., 2000). An alternative approach is to combine local linear models. In particular, mixtures of probabilistic principal component analyzers (Tipping and Bishop, 1999) are very attractive as each component is specifically designed to extract the local principal orientations in the data. However, an important issue is the model sensitivity to data lying off the manifold, possibly leading to mismatches between successive local models. The mixtures of robust probabilistic principal component analyzers introduced in this paper heal this problem as each component is able to cope with atypical data while identifying the local principal directions. Interestingly, the standard mixture of Gaussians is a particular instance of this more general model.
Discovering low-dimensional (nonlinear) manifolds is an important open problem in Machine Learning. In many applications, the data are living in a high dimensional space. This can lead to serious problems in practice due to the curse of dimensionality. Fortunately, the core of the data lies often on one or several low-dimensional manifolds. A way to handle these is to pre-process the data by nonlinear data projection techniques (see for example Tenenbaum, et al., 2000). An alternative approach is to combine local linear models. In particular, mixtures of probabilistic principal component analyzers (Tipping and Bishop, 1999) are very attractive as each component is specifically designed to extract the local principal orientations in the data. However, an important issue is the model sensitivity to data lying off the manifold, possibly leading to mismatches between successive local models. The mixtures of robust probabilistic principal component analyzers introduced in this paper heal this problem as each component is able to cope with atypical data while identifying the local principal directions. Interestingly, the standard mixture of Gaussians is a particular instance of this more general model.
ES2007-53
Learning topology of a labeled data set with the supervised generative gaussian graph
Pierre Gaillard, Michaël Aupetit, Gérard Govaert
Learning topology of a labeled data set with the supervised generative gaussian graph
Pierre Gaillard, Michaël Aupetit, Gérard Govaert
Abstract:
Discovering the topology of a set of labeled data in a Euclidian space can help to design better decision systems. In this work, we propose a supervised generative model based on the Delaunay Graph of some prototypes representing the labeled data in order to extract the topology of the classes.
Discovering the topology of a set of labeled data in a Euclidian space can help to design better decision systems. In this work, we propose a supervised generative model based on the Delaunay Graph of some prototypes representing the labeled data in order to extract the topology of the classes.
ES2007-91
Markovian blind separation of non-stationary temporally correlated sources
Rima Guidara, Shahram Hosseini, Yannick Deville
Markovian blind separation of non-stationary temporally correlated sources
Rima Guidara, Shahram Hosseini, Yannick Deville
Abstract:
In a previous work, we developed a quasi-efficient maximum likelihood approach for blindly separating stationary, temporally correlated sources modeled by Markov processes. In this paper, we propose to extend this idea to separate mixtures of non-stationary sources. To handle non-stationarity, two methods based respectively on blocking and kernel smoothing are used to find parametric estimates of the score functions of the sources, required for implementing the maximum likelihood approach. Then, the proposed methods exploit simultaneously non-Gaussianity, non-stationarity and time correlation in a quasi-efficient manner. Experimental results using artificial and real data show clearly the better performance of the proposed methods with respect to classical source separation methods.
In a previous work, we developed a quasi-efficient maximum likelihood approach for blindly separating stationary, temporally correlated sources modeled by Markov processes. In this paper, we propose to extend this idea to separate mixtures of non-stationary sources. To handle non-stationarity, two methods based respectively on blocking and kernel smoothing are used to find parametric estimates of the score functions of the sources, required for implementing the maximum likelihood approach. Then, the proposed methods exploit simultaneously non-Gaussianity, non-stationarity and time correlation in a quasi-efficient manner. Experimental results using artificial and real data show clearly the better performance of the proposed methods with respect to classical source separation methods.
ES2007-111
Collaborative Filtering with interlaced Generalized Linear Models
Nicolas Delannay, Michel Verleysen
Collaborative Filtering with interlaced Generalized Linear Models
Nicolas Delannay, Michel Verleysen
Abstract:
Collaborative Filtering (CF) aims at finding patterns in a sparse matrix of contingency. It can be used for example to mine the ratings given by users on a set of items. In this paper, we introduce a new model for CF based on the Generalized Linear Models formalism. Interestingly, it shares specificities of the model-based and the factorization approaches. The model is simple, and yet it performs very well on the popular MovieLens and Jester datasets.
Collaborative Filtering (CF) aims at finding patterns in a sparse matrix of contingency. It can be used for example to mine the ratings given by users on a set of items. In this paper, we introduce a new model for CF based on the Generalized Linear Models formalism. Interestingly, it shares specificities of the model-based and the factorization approaches. The model is simple, and yet it performs very well on the popular MovieLens and Jester datasets.
Kernel methods and Support Vector Machines
ES2007-105
Computing and stopping the solution paths for $\nu$-SVR
Gilles Gasso, Karina Zapien, Stéphane Canu
Computing and stopping the solution paths for $\nu$-SVR
Gilles Gasso, Karina Zapien, Stéphane Canu
Abstract:
The paper describes the computation of the full paths of the well-known $\nu$-SVR. In the classical method, the user provides two parameters: the regulation parameter $\lambda$ and $\nu$ which settles the width of the tube of the $\epsilon$-insensitive cost optimized by SVR. The paper proposes an efficient to way to get all the solutions by varying $\nu$ and $\lambda$. It analyses also the stopping of the algorithm using the leave-one-out-criterion.
The paper describes the computation of the full paths of the well-known $\nu$-SVR. In the classical method, the user provides two parameters: the regulation parameter $\lambda$ and $\nu$ which settles the width of the tube of the $\epsilon$-insensitive cost optimized by SVR. The paper proposes an efficient to way to get all the solutions by varying $\nu$ and $\lambda$. It analyses also the stopping of the algorithm using the leave-one-out-criterion.
ES2007-30
Optimizing kernel parameters by second-order methods
Shigeo Abe
Optimizing kernel parameters by second-order methods
Shigeo Abe
Abstract:
Radial basis function network (RBF) kernels are widely used for support vector machines (SVMs). But for model selection of an SVM, we need to optimize the kernel parameter and the margin parameter by time-consuming cross validation. In this paper we propose determining parameters for RBF and Mahalanobis kernels by maximizing the class separability by the second-order optimization. For multi-class problems, we determine the kernel parameters for all the two-class problems and set the average value of the parameter values to all the kernel parameters. Then we determine the margin parameter by cross-validation. By computer experiments of multi-class problems we show that the proposed method works to select optimal or near optimal parameters.
Radial basis function network (RBF) kernels are widely used for support vector machines (SVMs). But for model selection of an SVM, we need to optimize the kernel parameter and the margin parameter by time-consuming cross validation. In this paper we propose determining parameters for RBF and Mahalanobis kernels by maximizing the class separability by the second-order optimization. For multi-class problems, we determine the kernel parameters for all the two-class problems and set the average value of the parameter values to all the kernel parameters. Then we determine the margin parameter by cross-validation. By computer experiments of multi-class problems we show that the proposed method works to select optimal or near optimal parameters.
ES2007-32
A novel kernel-based method for local pattern extraction in random process signals
Majid Beigi, Andreas Zell
A novel kernel-based method for local pattern extraction in random process signals
Majid Beigi, Andreas Zell
Abstract:
We consider a class of random process signals which contain randomly position local similarities representing the texture of an object. Those repetitive parts may occur in speech, musical pieces and sonar signals. We suggest a warped time resolved spectrum kernel for extracting the subsequence similarity in time series in general, and as an example in biosonar signals. Having a set of those kernels for similarity extraction in different size of ubsequences, we propose a new method to find an optimal linear combination and selection of those kernels. We formulate the optimal kernel selection via maximizing the Kernel Fisher Discriminant criterium (KFD) and use Mesh Adaptive Direct Search method (MADS) to solve the optimization problem. Our method is used for biosonar landmark classification with promising results.
We consider a class of random process signals which contain randomly position local similarities representing the texture of an object. Those repetitive parts may occur in speech, musical pieces and sonar signals. We suggest a warped time resolved spectrum kernel for extracting the subsequence similarity in time series in general, and as an example in biosonar signals. Having a set of those kernels for similarity extraction in different size of ubsequences, we propose a new method to find an optimal linear combination and selection of those kernels. We formulate the optimal kernel selection via maximizing the Kernel Fisher Discriminant criterium (KFD) and use Mesh Adaptive Direct Search method (MADS) to solve the optimization problem. Our method is used for biosonar landmark classification with promising results.
ES2007-113
One-class SVM regularization path and comparison with alpha seeding
Alain Rakotomamonjy, Manuel DAVY
One-class SVM regularization path and comparison with alpha seeding
Alain Rakotomamonjy, Manuel DAVY
Abstract:
One-class support vector machines (1-SVMs) estimate the level set of the underlying density observed data. Aside the kernel selection issue, one difficulty concerns the choice of the 'level' parameter. In this paper, following the work by Hastie et. al (2004), we derive the entire regularization path for $\nu$-1-SVMs. Since this regularization path is efficient for building different level sets estimate, we have empirically compared such approach to state of the art approach based on alpha seeding and we show that regularization path is far more efficient.
One-class support vector machines (1-SVMs) estimate the level set of the underlying density observed data. Aside the kernel selection issue, one difficulty concerns the choice of the 'level' parameter. In this paper, following the work by Hastie et. al (2004), we derive the entire regularization path for $\nu$-1-SVMs. Since this regularization path is efficient for building different level sets estimate, we have empirically compared such approach to state of the art approach based on alpha seeding and we show that regularization path is far more efficient.
Reinforcement Learning
ES2007-4
Reinforcement learning in a nutshell
Verena Heidrich-Meisner, Martin Lauer, Christian Igel, Martin Riedmiller
Reinforcement learning in a nutshell
Verena Heidrich-Meisner, Martin Lauer, Christian Igel, Martin Riedmiller
ES2007-93
A unified view of TD algorithms, introducing Full-gradient TD and Equi-gradient descent TD
Manuel Loth, Philippe Preux, Manuel DAVY
A unified view of TD algorithms, introducing Full-gradient TD and Equi-gradient descent TD
Manuel Loth, Philippe Preux, Manuel DAVY
Abstract:
This paper addresses policy evaluation in MDP. It provides a unified view of algorithms such as TD(lambda), LSTD(lambda), iLSTD, and residual-gradient TD. We assert that they all consist of minimizing a gradient function and differ in the form of this function and their means of minimizing it. Building on this unified view, two new schemes are introduced: Full-gradient TD which uses a generalization of the principle introduced in iLSTD, and EGD TD which reduces the gradient by successive equi-gradient descents (generalisation of the LARS algorithm). These three algorithms share the worthy property of using much more efficiently the samples than TD, while keeping the good properties of gradient descent schemes.
This paper addresses policy evaluation in MDP. It provides a unified view of algorithms such as TD(lambda), LSTD(lambda), iLSTD, and residual-gradient TD. We assert that they all consist of minimizing a gradient function and differ in the form of this function and their means of minimizing it. Building on this unified view, two new schemes are introduced: Full-gradient TD which uses a generalization of the principle introduced in iLSTD, and EGD TD which reduces the gradient by successive equi-gradient descents (generalisation of the LARS algorithm). These three algorithms share the worthy property of using much more efficiently the samples than TD, while keeping the good properties of gradient descent schemes.
ES2007-125
Applying the Episodic Natural Actor-Critic Architecture to Motor Primitive Learning
Jan Peters, Stefan Schaal
Applying the Episodic Natural Actor-Critic Architecture to Motor Primitive Learning
Jan Peters, Stefan Schaal
Abstract:
In this paper, we investigate motor primitive learning with the Natural Actor-Critic approach. The Natural Actor-Critic consists out of actor updates which are achieved using natural stochastic policy gradients while the critic obtains both the natural policy gradient by linear regression. We show that this architecture can be used to learn the "building blocks of movement generation", called motor primitives. Motor primitives are parameterized control policies such as splines or nonlinear differential equations with desired attractor properties. We show that our most modern algorithm, the Episodic Natural Actor-Critic outperforms previous algorithms by at least an order of magnitude. We demonstrate the efficiency of this reinforcement learning method in the application of learning to hit a baseball with an anthropomorphic robot arm.
In this paper, we investigate motor primitive learning with the Natural Actor-Critic approach. The Natural Actor-Critic consists out of actor updates which are achieved using natural stochastic policy gradients while the critic obtains both the natural policy gradient by linear regression. We show that this architecture can be used to learn the "building blocks of movement generation", called motor primitives. Motor primitives are parameterized control policies such as splines or nonlinear differential equations with desired attractor properties. We show that our most modern algorithm, the Episodic Natural Actor-Critic outperforms previous algorithms by at least an order of magnitude. We demonstrate the efficiency of this reinforcement learning method in the application of learning to hit a baseball with an anthropomorphic robot arm.
ES2007-24
Neural Rewards Regression for near-optimal policy identification in Markovian and partial observable environments
Daniel Schneegass, Steffen Udluft, Thomas Martinetz
Neural Rewards Regression for near-optimal policy identification in Markovian and partial observable environments
Daniel Schneegass, Steffen Udluft, Thomas Martinetz
Abstract:
Neural Rewards Regression (NRR) is a generalisation of Temporal Difference Learning (TD-Learning) and Approximate Q-Iteration with Neural Networks. The method allows to trade between these two techniques as well as between approaching the fixed point of the Bellman iteration and minimising the Bellman residual. NRR explicitly finds the optimal Q-function without an algorithmic framework except Back Propagation for Neural Networks. We further extend the approach by a recurrent substructure to Recurrent Neural Rewards Regression for partial observable environments or higher order Markov Decision Processes. It allows to transport past information to the present and the future in order to reconstruct the Markov property.
Neural Rewards Regression (NRR) is a generalisation of Temporal Difference Learning (TD-Learning) and Approximate Q-Iteration with Neural Networks. The method allows to trade between these two techniques as well as between approaching the fixed point of the Bellman iteration and minimising the Bellman residual. NRR explicitly finds the optimal Q-function without an algorithmic framework except Back Propagation for Neural Networks. We further extend the approach by a recurrent substructure to Recurrent Neural Rewards Regression for partial observable environments or higher order Markov Decision Processes. It allows to transport past information to the present and the future in order to reconstruct the Markov property.
ES2007-35
Immediate Reward Reinforcement Learning for Projective Kernel Methods
Colin Fyfe, Pei Ling Lai
Immediate Reward Reinforcement Learning for Projective Kernel Methods
Colin Fyfe, Pei Ling Lai
Abstract:
We extend a reinforcement learning algorithm which has previously been shown to cluster data. We have previously applied the method to unsupervised projection methods, principal component analysis, exploratory projection pursuit and canonical correlation analysis. We now show how the same methods can be used in feature spaces to perform kernel principal component analysis and kernel canonical correlation analysis.
We extend a reinforcement learning algorithm which has previously been shown to cluster data. We have previously applied the method to unsupervised projection methods, principal component analysis, exploratory projection pursuit and canonical correlation analysis. We now show how the same methods can be used in feature spaces to perform kernel principal component analysis and kernel canonical correlation analysis.
ES2007-49
Replacing eligibility trace for action-value learning with function approximation
Kary Främling
Replacing eligibility trace for action-value learning with function approximation
Kary Främling
Abstract:
The eligibility trace is one of the most used mechanisms to speed up reinforcement learning. Replacing eligibility traces generally seem to perform better than accumulating eligibility traces. However, replacing traces are currently not applicable when using function approximation methods where states are not represented uniquely by binary values. This paper proposes two modifications to replacing traces that overcome this limitation. Experimental results from the Mountain-Car task indicate that the new replacing traces outperform both the accumulating and the `ordinary' replacing traces.
The eligibility trace is one of the most used mechanisms to speed up reinforcement learning. Replacing eligibility traces generally seem to perform better than accumulating eligibility traces. However, replacing traces are currently not applicable when using function approximation methods where states are not represented uniquely by binary values. This paper proposes two modifications to replacing traces that overcome this limitation. Experimental results from the Mountain-Car task indicate that the new replacing traces outperform both the accumulating and the `ordinary' replacing traces.
ES2007-54
The Recurrent Control Neural Network
Anton Maximilian Schaefer, Steffen Udluft, Hans-Georg Zimmermann
The Recurrent Control Neural Network
Anton Maximilian Schaefer, Steffen Udluft, Hans-Georg Zimmermann
Abstract:
This paper presents our Recurrent Control Neural Network (RCNN), which is a model-based approach for a data-efficient modelling and control of reinforcement learning problems in discrete time. Its architecture is based on a recurrent neural network (RNN), which is extended by an additional control network. The latter has the particular task to learn the optimal policy. This method has the advantage that by using neural networks we can easily deal with high-dimensions or continuous state and action spaces. Furthermore we can profit from their high system-identification and approximation quality. We show that our RCNN is able to learn a potentially optimal policy by testing it on two different settings of the mountain car problem.
This paper presents our Recurrent Control Neural Network (RCNN), which is a model-based approach for a data-efficient modelling and control of reinforcement learning problems in discrete time. Its architecture is based on a recurrent neural network (RNN), which is extended by an additional control network. The latter has the particular task to learn the optimal policy. This method has the advantage that by using neural networks we can easily deal with high-dimensions or continuous state and action spaces. Furthermore we can profit from their high system-identification and approximation quality. We show that our RCNN is able to learn a potentially optimal policy by testing it on two different settings of the mountain car problem.
Learning II
ES2007-3
The Intrinsic Recurrent Support Vector Machine
Daniel Schneegass, Anton Maximilian Schaefer, Thomas Martinetz
The Intrinsic Recurrent Support Vector Machine
Daniel Schneegass, Anton Maximilian Schaefer, Thomas Martinetz
Abstract:
In this work, we present a new model for a Recurrent Support Vector Machine. We call it intrinsic because the complete recurrence is directly incorporated within the considered optimisation problem. This approach offers the advantage that the model straightforwardly develops an algorithmic solution. We test the algorithm on several simple time series. The results are promising and can be seen as a starting point for further research. By inventing better and more efficient methods and algorithms, we expect that Recurrent Support Vector Machines could become an alternative to handle and simulate dynamical systems.
In this work, we present a new model for a Recurrent Support Vector Machine. We call it intrinsic because the complete recurrence is directly incorporated within the considered optimisation problem. This approach offers the advantage that the model straightforwardly develops an algorithmic solution. We test the algorithm on several simple time series. The results are promising and can be seen as a starting point for further research. By inventing better and more efficient methods and algorithms, we expect that Recurrent Support Vector Machines could become an alternative to handle and simulate dynamical systems.
ES2007-21
A-LSSVM: an Adaline based iterative sparse LS-SVM classifier
Bernardo Carvalho, Antônio Braga
A-LSSVM: an Adaline based iterative sparse LS-SVM classifier
Bernardo Carvalho, Antônio Braga
Abstract:
LS-SVM aims at solving the learning problem with a system of linear equations. Although this solution is simpler, there is a loss of sparseness in the feature vectors. We present in this work a new method, A-LSSVM, which uses the neural model Adaline to solve the LS-SVM's linear system while automatically finding the support vectors. The proposed approach is compared with other methods in literature to impose sparseness in LS-SVM: Pruning, LS2-SVM, Ada-Pinv and IP-LSSVM. The experiments, performed on three important benchmark databases in Machine Learning, show that all sparse LS-SVMs have an accuracy near SVM, but differ in training time and support vectors found.
LS-SVM aims at solving the learning problem with a system of linear equations. Although this solution is simpler, there is a loss of sparseness in the feature vectors. We present in this work a new method, A-LSSVM, which uses the neural model Adaline to solve the LS-SVM's linear system while automatically finding the support vectors. The proposed approach is compared with other methods in literature to impose sparseness in LS-SVM: Pruning, LS2-SVM, Ada-Pinv and IP-LSSVM. The experiments, performed on three important benchmark databases in Machine Learning, show that all sparse LS-SVMs have an accuracy near SVM, but differ in training time and support vectors found.
ES2007-25
Explicit Kernel Rewards Regression for data-efficient near-optimal policy identification
Daniel Schneegass, Steffen Udluft, Thomas Martinetz
Explicit Kernel Rewards Regression for data-efficient near-optimal policy identification
Daniel Schneegass, Steffen Udluft, Thomas Martinetz
Abstract:
We present the Explicit Kernel Rewards Regression (EKRR) approach, as an extension of Kernel Rewards Regression (KRR), for Optimal Policy Identification in Reinforcement Learning. The method uses the Structural Risk Minimisation paradigm to achieve a high generalisation capability. This explicit version of KRR offers at least two important advantages. On the one hand, finding the optimal policy is done by a quadratic program, hence no Policy Iteration techniques are necessary. And on the other hand, the approach allows for the usage of further constraints and certain regularisation techniques as e.g. in Ridge Regression and Support Vector Machines.
We present the Explicit Kernel Rewards Regression (EKRR) approach, as an extension of Kernel Rewards Regression (KRR), for Optimal Policy Identification in Reinforcement Learning. The method uses the Structural Risk Minimisation paradigm to achieve a high generalisation capability. This explicit version of KRR offers at least two important advantages. On the one hand, finding the optimal policy is done by a quadratic program, hence no Policy Iteration techniques are necessary. And on the other hand, the approach allows for the usage of further constraints and certain regularisation techniques as e.g. in Ridge Regression and Support Vector Machines.
ES2007-69
Kernel-based online machine learning and support vector reduction
Sumeet Agarwal, Saradhi Vedula, Harish Karnick
Kernel-based online machine learning and support vector reduction
Sumeet Agarwal, Saradhi Vedula, Harish Karnick
Abstract:
We apply kernel-based machine learning methods to online learning situations, and the related requirement of reducing the complexity of the learnt classifier. Online methods are particularly useful in situations which involve streaming data, such as medical or financial applications. We show that the concept of span of support vectors can be used to build a classifier that performs reasonably well while satisfying given space and time constraints, thus making it potentially suitable for such online situations.
We apply kernel-based machine learning methods to online learning situations, and the related requirement of reducing the complexity of the learnt classifier. Online methods are particularly useful in situations which involve streaming data, such as medical or financial applications. We show that the concept of span of support vectors can be used to build a classifier that performs reasonably well while satisfying given space and time constraints, thus making it potentially suitable for such online situations.
ES2007-97
Kernel PCA based clustering for inducing features in text categorization
Zsolt Minier, Lehel Csato
Kernel PCA based clustering for inducing features in text categorization
Zsolt Minier, Lehel Csato
Abstract:
We study dimensionality reduction or feature selection in text document categorization problem. We focus on the first step in building text categorization systems, that is the choice of efficiently representing numerically the natural language text. This numerical representation is going to be used by machine learning algorithms. We propose a representation based on word clusters. We build a kernel matrix from the word distribution over the different categories and apply kernel PCA to extract a low-dimensional representation of words. On this low-dimensional representation we use K-means clustering to group words into clusters and use these clusters subsequently in the document categorization task. We show that kernel PCA based clustering gives better or comparable performance than several advanced clustering methods when applied for the standard Reuters corpus.
We study dimensionality reduction or feature selection in text document categorization problem. We focus on the first step in building text categorization systems, that is the choice of efficiently representing numerically the natural language text. This numerical representation is going to be used by machine learning algorithms. We propose a representation based on word clusters. We build a kernel matrix from the word distribution over the different categories and apply kernel PCA to extract a low-dimensional representation of words. On this low-dimensional representation we use K-means clustering to group words into clusters and use these clusters subsequently in the document categorization task. We show that kernel PCA based clustering gives better or comparable performance than several advanced clustering methods when applied for the standard Reuters corpus.
ES2007-104
Kernel on Bag of Paths For Measuring Similarity of Shapes
Frederic Suard, Alain Rakotomamonjy, Abdelaziz Benrshrair
Kernel on Bag of Paths For Measuring Similarity of Shapes
Frederic Suard, Alain Rakotomamonjy, Abdelaziz Benrshrair
Abstract:
A common approach for classifying shock graphs is to use a dissimilarity measure on graphs and a distance based classifier. In this paper, we propose the use of kernel functions for data mining problems on shock graphs. The first contribution of the paper is to extend the class of graph kernel by proposing kernels based on bag of paths. Then, we propose a methodology for using these kernels for shock graphs retrieval. Our experimental results show that our approach is very competitive compared to graph matching approaches and is rather robust.
A common approach for classifying shock graphs is to use a dissimilarity measure on graphs and a distance based classifier. In this paper, we propose the use of kernel functions for data mining problems on shock graphs. The first contribution of the paper is to extend the class of graph kernel by proposing kernels based on bag of paths. Then, we propose a methodology for using these kernels for shock graphs retrieval. Our experimental results show that our approach is very competitive compared to graph matching approaches and is rather robust.
ES2007-42
Electroencephalogram signal classification for brain computer interfaces using wavelets and support vector machines
Francesc Benimeli, Ken Sharman
Electroencephalogram signal classification for brain computer interfaces using wavelets and support vector machines
Francesc Benimeli, Ken Sharman
Abstract:
An electroencephalogram (EEG) signal classification procedure for use in real-time synchronous brain computer interfaces (BCI)is proposed. The features used to perform the classification consist in the coefficients of a discrete wavelet transform (DWT) computed for each trial. A support vector machine (SVM) algorithm has been applied to classify the resultant feature vectors. Some experimental results obtained from the experimental application of the proposed procedure to the classification of two mental states are presented.
An electroencephalogram (EEG) signal classification procedure for use in real-time synchronous brain computer interfaces (BCI)is proposed. The features used to perform the classification consist in the coefficients of a discrete wavelet transform (DWT) computed for each trial. A support vector machine (SVM) algorithm has been applied to classify the resultant feature vectors. Some experimental results obtained from the experimental application of the proposed procedure to the classification of two mental states are presented.
ES2007-119
Bat echolocation modelling using spike kernels with Support Vector Regression.
Bertrand Fontaine, Herbert Peremans, Benjamin Schrauwen
Bat echolocation modelling using spike kernels with Support Vector Regression.
Bertrand Fontaine, Herbert Peremans, Benjamin Schrauwen
Abstract:
From the echoes of their vocalisations bats extract information about the positions of reflectors. To gain an understanding of how target position is translated into neural features, we model the bat's peripheral auditory system up until the auditory nerve. This model assumes multiple threshold detecting neurons for each frequency channel where the inter-spike times are linked to the location of the reflector. To show that this coding process can be reversed we compute the kernel product of the spike trains using a non-binned spike kernel function. This approach allows doing regression on azimuth and elevation using Support Vector Machines.
From the echoes of their vocalisations bats extract information about the positions of reflectors. To gain an understanding of how target position is translated into neural features, we model the bat's peripheral auditory system up until the auditory nerve. This model assumes multiple threshold detecting neurons for each frequency channel where the inter-spike times are linked to the location of the reflector. To show that this coding process can be reversed we compute the kernel product of the spike trains using a non-binned spike kernel function. This approach allows doing regression on azimuth and elevation using Support Vector Machines.
ES2007-20
Ensemble neural classifier design for face recognition
Terry Windeatt
Ensemble neural classifier design for face recognition
Terry Windeatt
Abstract:
A method for tuning MLP learning parameters in an ensemble classifier framework is presented. No validation set or cross-validation technique is required to optimize parameters for generalisability. In this paper, the technique is applied to face recognition using Error-Correcting Output Coding strategy to solve multi-class problems.
A method for tuning MLP learning parameters in an ensemble classifier framework is presented. No validation set or cross-validation technique is required to optimize parameters for generalisability. In this paper, the technique is applied to face recognition using Error-Correcting Output Coding strategy to solve multi-class problems.
ES2007-44
Data reduction using classifier ensembles
J.S. Sánchez, L.I. Kuncheva
Data reduction using classifier ensembles
J.S. Sánchez, L.I. Kuncheva
Abstract:
We propose a data reduction approach for finding a reference set for the nearest neighbour classifier. The approach is based on classifier ensembles. Each ensemble member is given a subset of the training data. Using Wilson's editing method, the ensemble member produces a reduced reference set. We explored several routes to make use of these reference sets. The results with 10 real and artificial data sets indicated that merging the reference sets and subsequent editing of the merged set provides the best trade-off between the error and the size of the resultant reference set. This approach can also handle large data sets because only small fractions of the data are edited at a time.
We propose a data reduction approach for finding a reference set for the nearest neighbour classifier. The approach is based on classifier ensembles. Each ensemble member is given a subset of the training data. Using Wilson's editing method, the ensemble member produces a reduced reference set. We explored several routes to make use of these reference sets. The results with 10 real and artificial data sets indicated that merging the reference sets and subsequent editing of the merged set provides the best trade-off between the error and the size of the resultant reference set. This approach can also handle large data sets because only small fractions of the data are edited at a time.
ES2007-108
ICA-based High Frequency VaR for Risk Management
Patrick Kouontchou, Bertrand Maillet
ICA-based High Frequency VaR for Risk Management
Patrick Kouontchou, Bertrand Maillet
Abstract:
Independent Component Analysis (ICA, see Comon, 1994 and Hyvärinen et al., 2001) is more appropriate when non-linearity and non-normality are at stake, as mentioned by Back and Weigend (1997) in a financial context. Using high-frequency data on the French Stock Market, we evaluate this technique when generating scenarii for accurate Value-at-Risk computations, reducing by this mean the effective dimensionality of the scenario specification problem in several cases, without leaving aside some main characteristics of the dataset. Various methods for specifying stress scenarii are discussed, compared to other published ones (see Giot and Laurent, 2004), and classical tests of rejection are presented (Kupiec, 1997, Christoffersen and Pelletier, 2003).
Independent Component Analysis (ICA, see Comon, 1994 and Hyvärinen et al., 2001) is more appropriate when non-linearity and non-normality are at stake, as mentioned by Back and Weigend (1997) in a financial context. Using high-frequency data on the French Stock Market, we evaluate this technique when generating scenarii for accurate Value-at-Risk computations, reducing by this mean the effective dimensionality of the scenario specification problem in several cases, without leaving aside some main characteristics of the dataset. Various methods for specifying stress scenarii are discussed, compared to other published ones (see Giot and Laurent, 2004), and classical tests of rejection are presented (Kupiec, 1997, Christoffersen and Pelletier, 2003).
ES2007-80
Algebraic inversion of an artificial neural network classifier
Travis Wiens, Rich Burton, Greg Schoenau
Algebraic inversion of an artificial neural network classifier
Travis Wiens, Rich Burton, Greg Schoenau
Abstract:
Artificial neural networks are, by their definition, non-linear functions. Typically, this means that it is impossible to find a closed-form solution for the inverse function of a neural network. This paper presents a special form of neural network classifier that allows for its algebraic inversion in order to find the boundary between classes. The control of the fuel-air ratio in a spark ignition engine is given as an example.
Artificial neural networks are, by their definition, non-linear functions. Typically, this means that it is impossible to find a closed-form solution for the inverse function of a neural network. This paper presents a special form of neural network classifier that allows for its algebraic inversion in order to find the boundary between classes. The control of the fuel-air ratio in a spark ignition engine is given as an example.
ES2007-123
Estimation of tangent planes for neighborhood graph correction
Karina Zapien, Gilles Gasso, Stéphane Canu
Estimation of tangent planes for neighborhood graph correction
Karina Zapien, Gilles Gasso, Stéphane Canu
Abstract:
Local algorithms for non-linear dimensionality reduction and semi-supervised learning algorithms based on spectral decomposition have become quite popular. One drawback of these lie on the fact that a nearest neighborhood graph has to be built in order to decide which two points are to be kept close. In the presence of shortcuts (union of two points whose distance measure along the submanifold is actually large), the resulting embbeding will be unsatisfactory. This paper proposes an algorithm to detect wrong graph connections based on the tangent plane of the manifold at each point, this lead to the estimation of the proper number of neighbor for each point in the dataset. Experiments show that the constructions of the graph can be improved with this method.
Local algorithms for non-linear dimensionality reduction and semi-supervised learning algorithms based on spectral decomposition have become quite popular. One drawback of these lie on the fact that a nearest neighborhood graph has to be built in order to decide which two points are to be kept close. In the presence of shortcuts (union of two points whose distance measure along the submanifold is actually large), the resulting embbeding will be unsatisfactory. This paper proposes an algorithm to detect wrong graph connections based on the tangent plane of the manifold at each point, this lead to the estimation of the proper number of neighbor for each point in the dataset. Experiments show that the constructions of the graph can be improved with this method.
ES2007-128
Estimating the Number of Components in a Mixture of Multilayer Perceptrons
Madalina Olteanu, joseph Rynkiewicz
Estimating the Number of Components in a Mixture of Multilayer Perceptrons
Madalina Olteanu, joseph Rynkiewicz
Abstract:
In this paper we are interested in estimating the number of components in a mixture of multilayer perceptrons. The penalized marginal-likelihood criterion for mixture models and hidden Markov models introduced by Keribin (2000) and Gassiat (2002) is extended to mixtures of multilayer perceptrons. We prove the consistency of the BIC criterion under some hypothesis which involve essentially the bracketing entropy of the generalized score-functions class and check the assumptions of the main result.
In this paper we are interested in estimating the number of components in a mixture of multilayer perceptrons. The penalized marginal-likelihood criterion for mixture models and hidden Markov models introduced by Keribin (2000) and Gassiat (2002) is extended to mixtures of multilayer perceptrons. We prove the consistency of the BIC criterion under some hypothesis which involve essentially the bracketing entropy of the generalized score-functions class and check the assumptions of the main result.
Biologically motivated learning
ES2007-88
Derivation of nonlinear amplitude equations for the normal modes of a self-organizing system
Junmei Zhu, Christoph von der Malsburg
Derivation of nonlinear amplitude equations for the normal modes of a self-organizing system
Junmei Zhu, Christoph von der Malsburg
Abstract:
We here are pointing out a basically well-known pathway to the analysis of self-organizing systems that is now well in reach of numerical methods. Systems of coupled nonlinear differential equations are decomposed into normal modes, are reduced by adiabatic elimination of stable modes to a much smaller system of unstable modes and their nonlinear interaction. In the past, this treatment was accessible only for highly idealized model systems. Guided by an application to retinotopic map formation we discuss the extension to more realistic cases.
We here are pointing out a basically well-known pathway to the analysis of self-organizing systems that is now well in reach of numerical methods. Systems of coupled nonlinear differential equations are decomposed into normal modes, are reduced by adiabatic elimination of stable modes to a much smaller system of unstable modes and their nonlinear interaction. In the past, this treatment was accessible only for highly idealized model systems. Guided by an application to retinotopic map formation we discuss the extension to more realistic cases.
ES2007-27
A neural model of cross-modal association in insects
Jan Wessnitzer, Barbara Webb
A neural model of cross-modal association in insects
Jan Wessnitzer, Barbara Webb
Abstract:
We developed a computational model of learning in the Mushroom Body, a region of multimodal integration in the insect brain. Using realistic neural dynamics and a biologically-based learning rule (spike timing dependent plasticity), the model is tested as part of an insect brain inspired architecture within a closed loop behavioural task. Replicating in simulation an experiment carried out on bushcrickets, we show the system can successfully associate visual to auditory cues, so as to maintain a steady heading towards an intermittent sound source.
We developed a computational model of learning in the Mushroom Body, a region of multimodal integration in the insect brain. Using realistic neural dynamics and a biologically-based learning rule (spike timing dependent plasticity), the model is tested as part of an insect brain inspired architecture within a closed loop behavioural task. Replicating in simulation an experiment carried out on bushcrickets, we show the system can successfully associate visual to auditory cues, so as to maintain a steady heading towards an intermittent sound source.
ES2007-106
Transition from initialization to working stage in biologically realistic networks
Andreas Herzog, Kube Karsten, Michaelis Bernd, deLima Ana D., Voigt Thomas
Transition from initialization to working stage in biologically realistic networks
Andreas Herzog, Kube Karsten, Michaelis Bernd, deLima Ana D., Voigt Thomas
Abstract:
In biology, during the early development of cortical neurons to a mature functional network a complex set of development steps is necessary. One of the key elements hereby is the transition of the network dynamics, which start from a slow synchronous activity in a early differentiation phase to a mature firing with complex high-order patterns of spikes and bursts. In this modeling study we investigate the required properties of the network to initialize this transition by the switching of the reversal potential of the GABAergic synapses. The simulated networks are generated by a statistical first-order description of parameters for the neuron model and the network architecture.
In biology, during the early development of cortical neurons to a mature functional network a complex set of development steps is necessary. One of the key elements hereby is the transition of the network dynamics, which start from a slow synchronous activity in a early differentiation phase to a mature firing with complex high-order patterns of spikes and bursts. In this modeling study we investigate the required properties of the network to initialize this transition by the switching of the reversal potential of the GABAergic synapses. The simulated networks are generated by a statistical first-order description of parameters for the neuron model and the network architecture.
ES2007-95
A supervised learning approach based on STDP and polychronization in spiking neuron networks
Hélène Paugam-Moisy, Régis Martinez, Samy Bengio
A supervised learning approach based on STDP and polychronization in spiking neuron networks
Hélène Paugam-Moisy, Régis Martinez, Samy Bengio
Abstract:
We propose a network model of spiking neurons, without preimposed topology and driven by STDP (Spike-Time-Dependent Plasticity), a temporal Hebbian unsupervised learning mode, biologically observed. The model is further driven by a supervised learning algorithm, based on a margin criterion, that has effect on the synaptic delays linking the network to the output neurons, with classification as a goal task. The network processing and the resulting performance are completely explainable by the concept of polychronization, proposed by Izhikevich (2006). The model emphasizes the computational capabilities of this concept.
We propose a network model of spiking neurons, without preimposed topology and driven by STDP (Spike-Time-Dependent Plasticity), a temporal Hebbian unsupervised learning mode, biologically observed. The model is further driven by a supervised learning algorithm, based on a margin criterion, that has effect on the synaptic delays linking the network to the output neurons, with classification as a goal task. The network processing and the resulting performance are completely explainable by the concept of polychronization, proposed by Izhikevich (2006). The model emphasizes the computational capabilities of this concept.
Learning causality
ES2007-6
Computational Intelligence approaches to causality detection
Katerina Hlavackova-Schindler, Pablo F. Verdes
Computational Intelligence approaches to causality detection
Katerina Hlavackova-Schindler, Pablo F. Verdes
ES2007-149
Distinguishing between cause and effect via kernel-based complexity measures for conditional distributions
Xiaohai Sun, Dominik Janzing, Schoelkopf Bernhard
Distinguishing between cause and effect via kernel-based complexity measures for conditional distributions
Xiaohai Sun, Dominik Janzing, Schoelkopf Bernhard
Abstract:
We propose a method to evaluate the complexity of probability measures from data that is based on a reproducing kernel Hilbert space seminorm of the logarithm of conditional probability densities. The motivation is to provide a tool for a causal inference method which assumes that conditional probabilities for effects given their causes are typically simpler and smoother than vice-versa. We present experiments with toy data where the quantitative results are consistent with our intuitive understanding of complexity and smoothness. Also in some examples with real-world data the probability measure corresponding to the true causal direction turned out to be less complex than those of the reversed order.
We propose a method to evaluate the complexity of probability measures from data that is based on a reproducing kernel Hilbert space seminorm of the logarithm of conditional probability densities. The motivation is to provide a tool for a causal inference method which assumes that conditional probabilities for effects given their causes are typically simpler and smoother than vice-versa. We present experiments with toy data where the quantitative results are consistent with our intuitive understanding of complexity and smoothness. Also in some examples with real-world data the probability measure corresponding to the true causal direction turned out to be less complex than those of the reversed order.
ES2007-147
Causality analysis of LFPs in micro-electrode arrays based on mutual information
Nikolay Manyakov, Marc Van Hulle
Causality analysis of LFPs in micro-electrode arrays based on mutual information
Nikolay Manyakov, Marc Van Hulle
Abstract:
Since perceptual and motor processes in the brain are the result of interactions between neurons, layers and brain areas, a lot of attention has been directed towards the development of techniques to unveil these interactions both in terms of connectivity and direction of interaction. Several techniques are derived from the Granger causality principle, and are based on multivariate autoregressive modeling, so that they can only account for the linear aspect of these interactions. We propose here a technique based on conditional mutual information which enables us not only to describe the directions of nonlinear connections, but also their time delays. We compare our technique with others using ground truth data, thus, for which we know the connectivity. As an application, we consider local field potentials (LFPs) recorded with the 96 micro-electrode UTAH array implanted in area V4 of the macaque monkey's visual cortex.
Since perceptual and motor processes in the brain are the result of interactions between neurons, layers and brain areas, a lot of attention has been directed towards the development of techniques to unveil these interactions both in terms of connectivity and direction of interaction. Several techniques are derived from the Granger causality principle, and are based on multivariate autoregressive modeling, so that they can only account for the linear aspect of these interactions. We propose here a technique based on conditional mutual information which enables us not only to describe the directions of nonlinear connections, but also their time delays. We compare our technique with others using ground truth data, thus, for which we know the connectivity. As an application, we consider local field potentials (LFPs) recorded with the 96 micro-electrode UTAH array implanted in area V4 of the macaque monkey's visual cortex.
ES2007-43
Learning causality by identifying common effects with kernel-based dependence measures
Xiaohai Sun, Dominik Janzing
Learning causality by identifying common effects with kernel-based dependence measures
Xiaohai Sun, Dominik Janzing
Abstract:
We describe a method for causal inference that measures the strength of statistical dependence by the Hilbert-Schmidt norm of kernel-based conditional cross-covariance operators. We consider the increase of the dependence of two variables X and Y by conditioning on a third variable Z as a hint for Z being a common effect of X and Y. Based on this assumption, we collect \"votes\" for hypothetical causal directions and orient the edges according to the majority vote. For most of our experiments with artificial and real-world data our method has outperformed the conventional constraint-based inductive causation (IC) algorithm.
We describe a method for causal inference that measures the strength of statistical dependence by the Hilbert-Schmidt norm of kernel-based conditional cross-covariance operators. We consider the increase of the dependence of two variables X and Y by conditioning on a third variable Z as a hint for Z being a common effect of X and Y. Based on this assumption, we collect \"votes\" for hypothetical causal directions and orient the edges according to the majority vote. For most of our experiments with artificial and real-world data our method has outperformed the conventional constraint-based inductive causation (IC) algorithm.
ES2007-26
Causality and communities in neural networks
Leonardo Angelini, Daniele Marinazzo, Mario Pellicoro, Sebastiano Stramaglia
Causality and communities in neural networks
Leonardo Angelini, Daniele Marinazzo, Mario Pellicoro, Sebastiano Stramaglia
Abstract:
A recently proposed nonlinear extension of Granger causality is used to map the dynamics of a neural population onto a graph, whose community structure characterizes the collective behavior of the system. Both the number of communities and the modularity depend on transmission delays and on the learning capacity of the system.
A recently proposed nonlinear extension of Granger causality is used to map the dynamics of a neural population onto a graph, whose community structure characterizes the collective behavior of the system. Both the number of communities and the modularity depend on transmission delays and on the learning capacity of the system.
ES2007-148
Exploring the causal order of binary variables via exponential hierarchies of Markov kernels
Xiaohai Sun, Dominik Janzing
Exploring the causal order of binary variables via exponential hierarchies of Markov kernels
Xiaohai Sun, Dominik Janzing
Abstract:
We propose a new algorithm for estimating the causal structure that underlies the observed dependence among n (n>=4) binary variables X_1,...,X_n. Our inference principle states that the factorization of the joint probability into conditional probabilities for X_j given X_1,...,X_{j-1} often leads to simpler terms if the order of variables is compatible with the directed acyclic graph representing the causal structure. We study joint measures of OR/AND gates and show that the complexity of the conditional probabilities (the so-called Markov kernels), defined by a hierarchy of exponential models, depends on the order of the variables. Some toy and real-data experiments support our inference rule.
We propose a new algorithm for estimating the causal structure that underlies the observed dependence among n (n>=4) binary variables X_1,...,X_n. Our inference principle states that the factorization of the joint probability into conditional probabilities for X_j given X_1,...,X_{j-1} often leads to simpler terms if the order of variables is compatible with the directed acyclic graph representing the causal structure. We study joint measures of OR/AND gates and show that the complexity of the conditional probabilities (the so-called Markov kernels), defined by a hierarchy of exponential models, depends on the order of the variables. Some toy and real-data experiments support our inference rule.
Reservoir Computing
ES2007-8
An overview of reservoir computing: theory, applications and implementations
Benjamin Schrauwen, David Verstraeten, Jan Van Campenhout
An overview of reservoir computing: theory, applications and implementations
Benjamin Schrauwen, David Verstraeten, Jan Van Campenhout
ES2007-39
Spiral Recurrent Neural Network for Online Learning
Huaien Gao, Rudolf Sollacher, Hans-Peter Kriegel
Spiral Recurrent Neural Network for Online Learning
Huaien Gao, Rudolf Sollacher, Hans-Peter Kriegel
Abstract:
Autonomous, self* sensor networks require sensor nodes with a certain degree of "intelligence". An elementary component of such an "intelligence" is the ability to learn online predicting sensor values. We consider recurrent neural network (RNN) models trained with an extended Kalman filter algorithm based on real time recurrent learning (RTRL) with teacher forcing. We compared the performance of conventional neural network architectures with that of spiral recurrent neural networks (Spiral RNN) - a novel RNN architecture combining a trainable hidden recurrent layer with the "echo state" property of echo state neural networks (ESNN). We found that this novel RNN architecture shows more stable performance and faster convergence.
Autonomous, self* sensor networks require sensor nodes with a certain degree of "intelligence". An elementary component of such an "intelligence" is the ability to learn online predicting sensor values. We consider recurrent neural network (RNN) models trained with an extended Kalman filter algorithm based on real time recurrent learning (RTRL) with teacher forcing. We compared the performance of conventional neural network architectures with that of spiral recurrent neural networks (Spiral RNN) - a novel RNN architecture combining a trainable hidden recurrent layer with the "echo state" property of echo state neural networks (ESNN). We found that this novel RNN architecture shows more stable performance and faster convergence.
ES2007-74
Several ways to solve the MSO problem
Jochen Jakob Steil
Several ways to solve the MSO problem
Jochen Jakob Steil
Abstract:
The so called MSO-problem, -- a simple superposition of two or more sinusoidal waves --, has recently been discussed as a benchmark problem for reservoir computing and was shown to be not learnable by standard echo state regression. However, we show that are at least three simple ways to learn the MSO signal by introducing a time window on the input, by changing the network time step to match the sampling rate of the signal, and by reservoir adaptation. The latter approach is based on an universal principle to implement a sparsity constraint on the activity patterns of the network neurons, which improves spatio-temporal encoding in the network.
The so called MSO-problem, -- a simple superposition of two or more sinusoidal waves --, has recently been discussed as a benchmark problem for reservoir computing and was shown to be not learnable by standard echo state regression. However, we show that are at least three simple ways to learn the MSO signal by introducing a time window on the input, by changing the network time step to match the sampling rate of the signal, and by reservoir adaptation. The latter approach is based on an universal principle to implement a sparsity constraint on the activity patterns of the network neurons, which improves spatio-temporal encoding in the network.
ES2007-114
Adapting reservoir states to get Gaussian distributions
David Verstraeten, Benjamin Schrauwen, Dirk Stroobandt
Adapting reservoir states to get Gaussian distributions
David Verstraeten, Benjamin Schrauwen, Dirk Stroobandt
Abstract:
We present an online adaptation rule for reservoirs that is inspired by Intrinsic Plasiticity (IP). The IP rule maximizes the information content of the reservoir state by adapting it so that the distribution approximates a given target. Here we fix the variance of the target distribution, which results in a Gaussian distribution. We apply the rule to two tasks with quite different temporal and computational characteristics.
We present an online adaptation rule for reservoirs that is inspired by Intrinsic Plasiticity (IP). The IP rule maximizes the information content of the reservoir state by adapting it so that the distribution approximates a given target. Here we fix the variance of the target distribution, which results in a Gaussian distribution. We apply the rule to two tasks with quite different temporal and computational characteristics.
ES2007-134
Structured reservoir computing with spatiotemporal chaotic attractors
Carlos Lourenço
Structured reservoir computing with spatiotemporal chaotic attractors
Carlos Lourenço
Abstract:
We approach the themes "computing with chaos" and "reservoir computing" in a unified setting. Different neural architectures are mentioned which display chaotic as well as reservoir properties. The architectures share a common topology of close-neighbor connections which supports different types of spatiotemporal dynamics in continuous time. We bring up the role of spatiotemporal structure and associated symmetries in reservoir-mediated pattern processing. Such type of computing is somewhat different from most other examples of reservoir computing.
We approach the themes "computing with chaos" and "reservoir computing" in a unified setting. Different neural architectures are mentioned which display chaotic as well as reservoir properties. The architectures share a common topology of close-neighbor connections which supports different types of spatiotemporal dynamics in continuous time. We bring up the role of spatiotemporal structure and associated symmetries in reservoir-mediated pattern processing. Such type of computing is somewhat different from most other examples of reservoir computing.
ES2007-68
A first attempt of reservoir pruning for classification problems
Xavier Dutoit, Hendrik Van Brussel, Marnix Nuttin
A first attempt of reservoir pruning for classification problems
Xavier Dutoit, Hendrik Van Brussel, Marnix Nuttin
Abstract:
Reservoir Computing is a new paradigm to use artificial neural networks. Despite its promising performances, it has still some drawbacks: as the reservoir is created randomly, it needs to be large enough to be able to capture all the features of the data. Here we propose a method to start with a large reservoir and then reduce its size by pruning out neurons. We then apply this method on a prototypical and a real problem. Both applications show that it allows to improve the performance for a given number of neurons.
Reservoir Computing is a new paradigm to use artificial neural networks. Despite its promising performances, it has still some drawbacks: as the reservoir is created randomly, it needs to be large enough to be able to capture all the features of the data. Here we propose a method to start with a large reservoir and then reduce its size by pruning out neurons. We then apply this method on a prototypical and a real problem. Both applications show that it allows to improve the performance for a given number of neurons.
ES2007-98
Intrinsic plasticity for reservoir learning algorithms
Marion Wardermann, Jochen Jakob Steil
Intrinsic plasticity for reservoir learning algorithms
Marion Wardermann, Jochen Jakob Steil
Abstract:
Recently, a new class of learning algorithms has been pro- posed, reservoir algorithms ([1]). Their learning ability relies heavily on the properties of the reservoir, which is held fixed during learning. In 2005, a fast, biologically plausible learning algorithm — intrinsic plasticity (IP, [2]) — has been proposed, which steers an analog neuron’s output distribution. We will show in this article in what way IP alters the properties of the reservoir and enhances the learning behaviour of reservoir learning algorithms, esp. that of Backpropagation–Decorrelation Recurrent Learning (BPDC, [3])
Recently, a new class of learning algorithms has been pro- posed, reservoir algorithms ([1]). Their learning ability relies heavily on the properties of the reservoir, which is held fixed during learning. In 2005, a fast, biologically plausible learning algorithm — intrinsic plasticity (IP, [2]) — has been proposed, which steers an analog neuron’s output distribution. We will show in this article in what way IP alters the properties of the reservoir and enhances the learning behaviour of reservoir learning algorithms, esp. that of Backpropagation–Decorrelation Recurrent Learning (BPDC, [3])
Learning III
ES2007-19
Bifurcation analysis for a discrete-time Hopfield neural network of two neurons with two delays
Eva Kaslik, Stefan Balint
Bifurcation analysis for a discrete-time Hopfield neural network of two neurons with two delays
Eva Kaslik, Stefan Balint
Abstract:
In this paper, a bifurcation analysis is undertaken for a discrete-time Hopfield neural network of two neurons with two different delays and self-connections. Conditions ensuring the asymptotic stability of the null solution are found, with respect to two characteristic parameters of the system. It is shown that for certain values of these parameters, fold or Neimark-Sacker bifurcations occur, but codimension 2 (fold-Neimark-Sacker, double Neimark Sacker and resonance 1:1) bifurcations may also be present. The direction and the stability of the Neimark-Sacker bifurcations are investigated by applying the center manifold theorem and the normal form theory.
In this paper, a bifurcation analysis is undertaken for a discrete-time Hopfield neural network of two neurons with two different delays and self-connections. Conditions ensuring the asymptotic stability of the null solution are found, with respect to two characteristic parameters of the system. It is shown that for certain values of these parameters, fold or Neimark-Sacker bifurcations occur, but codimension 2 (fold-Neimark-Sacker, double Neimark Sacker and resonance 1:1) bifurcations may also be present. The direction and the stability of the Neimark-Sacker bifurcations are investigated by applying the center manifold theorem and the normal form theory.
ES2007-45
Spicules-based competitive neural network
Jose Antonio Gomez-Ruiz, Jose Muñoz-Perez, M. Angeles Garcia-Bernal, Ezequiel Lopez-Rubio
Spicules-based competitive neural network
Jose Antonio Gomez-Ruiz, Jose Muñoz-Perez, M. Angeles Garcia-Bernal, Ezequiel Lopez-Rubio
Abstract:
We present a new model of unsupervised competitive neural network, based on spicules. This model is capable of detecting topological information of an input space, determining its orientation and, in most case, its skeleton.
We present a new model of unsupervised competitive neural network, based on spicules. This model is capable of detecting topological information of an input space, determining its orientation and, in most case, its skeleton.
ES2007-47
Sparsely-connected associative memory models with displaced connectivity
Lee Calcraft, Rod Adams, Neil Davey
Sparsely-connected associative memory models with displaced connectivity
Lee Calcraft, Rod Adams, Neil Davey
Abstract:
Our work is concerned with finding optimum connection strategies in high-performance associative memory models. Taking inspiration from axonal branching in biological neurons, we impose a displacement of the point of efferent arborisation, so that the output from each node travels a certain distance before branching to connect to other units. This technique is applied to networks constructed with a connectivity profile based on Gaussian distributions, and the results compared to those obtained with a network containing purely local connections, displaced in the same manner. It is found that displacement of the point of arborisation has a very beneficial effect on the performance of both network types, with the displaced locally-connected network performing the best.
Our work is concerned with finding optimum connection strategies in high-performance associative memory models. Taking inspiration from axonal branching in biological neurons, we impose a displacement of the point of efferent arborisation, so that the output from each node travels a certain distance before branching to connect to other units. This technique is applied to networks constructed with a connectivity profile based on Gaussian distributions, and the results compared to those obtained with a network containing purely local connections, displaced in the same manner. It is found that displacement of the point of arborisation has a very beneficial effect on the performance of both network types, with the displaced locally-connected network performing the best.
ES2007-94
RNN-based Learning of Compact Maps for Efficient Robot Localization
Alexander Förster, Alex Graves, Jürgen Schmidhuber
RNN-based Learning of Compact Maps for Efficient Robot Localization
Alexander Förster, Alex Graves, Jürgen Schmidhuber
Abstract:
We describe a new algorithm for robot localization, efficient both in terms of memory and processing time. It transforms a stream of laser range sensor data into a probabilistic calculation of the robot's position, using a bidirectional Long Short-Term Memory (LSTM) recurrent neural network (RNN) to learn the structure of the environment and to answer queries such as: in which room is the robot? To achieve this, the RNN builds an implicit map of the environment.
We describe a new algorithm for robot localization, efficient both in terms of memory and processing time. It transforms a stream of laser range sensor data into a probabilistic calculation of the robot's position, using a bidirectional Long Short-Term Memory (LSTM) recurrent neural network (RNN) to learn the structure of the environment and to answer queries such as: in which room is the robot? To achieve this, the RNN builds an implicit map of the environment.
ES2007-92
Human motion recognition using Nonlinear Transient Computation
Nigel Crook, Wee Jin Goh
Human motion recognition using Nonlinear Transient Computation
Nigel Crook, Wee Jin Goh
Abstract:
A novel approach to human motion recognition is proposed that is based on a variation of the Nonlinear Transient Computation Machine (NTCM). The motion data used to train the NTCM comes from point-light display video sequences of a human walking. The NTCM is trained to distinguish between sequences of video frames that depict coordinated walking motion from those that depict uncoordinated (random) motion.
A novel approach to human motion recognition is proposed that is based on a variation of the Nonlinear Transient Computation Machine (NTCM). The motion data used to train the NTCM comes from point-light display video sequences of a human walking. The NTCM is trained to distinguish between sequences of video frames that depict coordinated walking motion from those that depict uncoordinated (random) motion.
ES2007-14
Automatically searching near-optimal artificial neural networks
Leandro Almeida, Teresa Ludermir
Automatically searching near-optimal artificial neural networks
Leandro Almeida, Teresa Ludermir
Abstract:
The idea of automatically searching neural networks that learn faster and generalize better is becoming increasingly widespread. In this paper, we present a new method for searching near-optimal artificial neural networks that include initial weights, transfer functions, architectures and learning rules that are specially tailored to a given problem. Experimental results have shown that the method is able to produce compact, efficient networks with satisfactory generalization power and shorter training times.
The idea of automatically searching neural networks that learn faster and generalize better is becoming increasingly widespread. In this paper, we present a new method for searching near-optimal artificial neural networks that include initial weights, transfer functions, architectures and learning rules that are specially tailored to a given problem. Experimental results have shown that the method is able to produce compact, efficient networks with satisfactory generalization power and shorter training times.
ES2007-129
A new decision strategy in multi-objective training of the artificial neural networks
Talles Medeiros, Ricardo Takahashi, Antônio Braga
A new decision strategy in multi-objective training of the artificial neural networks
Talles Medeiros, Ricardo Takahashi, Antônio Braga
Abstract:
In this work it is presented a new proposal to select a model in the multi-objective training method of the Artificial Neural Network (NN). In order to do this, information from the residue of the Pareto optimal solution is used. The principle to decide for minimum autocorrelation of the data is a criteria that guarantees the extraction of the current information in the noisy data. The experiments show the performance of the proposed DM for variations of the supervised learning problems.
In this work it is presented a new proposal to select a model in the multi-objective training method of the Artificial Neural Network (NN). In order to do this, information from the residue of the Pareto optimal solution is used. The principle to decide for minimum autocorrelation of the data is a criteria that guarantees the extraction of the current information in the noisy data. The experiments show the performance of the proposed DM for variations of the supervised learning problems.
ES2007-86
Functional elements and networks in fMRI
Jarkko Ylipaavalniemi, Eerika Savia, Ricardo Vigário, Samuel Kaski
Functional elements and networks in fMRI
Jarkko Ylipaavalniemi, Eerika Savia, Ricardo Vigário, Samuel Kaski
Abstract:
We propose a two-step approach for the analysis of functional magnetic resonance images, in the context of natural stimuli. In the first step, elements of functional brain activity emerge, based on spatial independence assumptions. The second step exploits temporal covariation between the elements and given features of the natural stimuli to identify functional networks. The networks can have complex activation patterns related to common task goals.
We propose a two-step approach for the analysis of functional magnetic resonance images, in the context of natural stimuli. In the first step, elements of functional brain activity emerge, based on spatial independence assumptions. The second step exploits temporal covariation between the elements and given features of the natural stimuli to identify functional networks. The networks can have complex activation patterns related to common task goals.
ES2007-77
Feature extraction for EEG classification: representing electrode outputs as a Markov stochastic process
Liang Wu, Predrag Neskovic
Feature extraction for EEG classification: representing electrode outputs as a Markov stochastic process
Liang Wu, Predrag Neskovic
Abstract:
In this work we introduce a new model for representing EEG signals and extracting discriminative features. We treat the outputs of each electrode as a stochastic process and assume that the sequence of variables forming a process is stationary and Markov. To capture temporal dependences within an electrode, we use conditional entropy and to capture dependences between different electrodes we use conditional mutual information features of increasing complexities. We show that even when using a small number of sampling points for their estimation (e.g. a single trial) these features carry discriminative information. We test the usefulness of these features by classifying the EEG data from n-back memory tasks.
In this work we introduce a new model for representing EEG signals and extracting discriminative features. We treat the outputs of each electrode as a stochastic process and assume that the sequence of variables forming a process is stationary and Markov. To capture temporal dependences within an electrode, we use conditional entropy and to capture dependences between different electrodes we use conditional mutual information features of increasing complexities. We show that even when using a small number of sampling points for their estimation (e.g. a single trial) these features carry discriminative information. We test the usefulness of these features by classifying the EEG data from n-back memory tasks.
ES2007-107
A hierarchical model for syllable recognition
Xavier Domont, Martin Heckmann, Heiko Wersing, Frank Joublin, Christian Goerick
A hierarchical model for syllable recognition
Xavier Domont, Martin Heckmann, Heiko Wersing, Frank Joublin, Christian Goerick
Abstract:
Inspired by recent findings on the similarities between the primary auditory and visual cortex we propose a neural network for speech recognition based on a hierarchical feedforward architecture for visual object recognition. When using a Gammatone filterbank for the spectral analysis the resulting spectrograms of syllables can be interpreted as images. After a preprocessing enhancing the formants in the speech signal and a length normalization, the images can than be fed into the visual hierarchy. We demonstrate the validity of our approach on the recognition of 25 different monosyllabic words and compare the results to the Sphinx-4 speech recognition system. Especially for noisy speech our hierarchical model achieves a clear improvement.
Inspired by recent findings on the similarities between the primary auditory and visual cortex we propose a neural network for speech recognition based on a hierarchical feedforward architecture for visual object recognition. When using a Gammatone filterbank for the spectral analysis the resulting spectrograms of syllables can be interpreted as images. After a preprocessing enhancing the formants in the speech signal and a length normalization, the images can than be fed into the visual hierarchy. We demonstrate the validity of our approach on the recognition of 25 different monosyllabic words and compare the results to the Sphinx-4 speech recognition system. Especially for noisy speech our hierarchical model achieves a clear improvement.
ES2007-50
Classification of computer intrusions using functional networks. A comparative study
Amparo Alonso-Betanzos, Noelia Sánchez-Maroño, Félix M. Carballal-Fortes, Juan A. Suárez-Romero, Beatriz Pérez-Sánchez
Classification of computer intrusions using functional networks. A comparative study
Amparo Alonso-Betanzos, Noelia Sánchez-Maroño, Félix M. Carballal-Fortes, Juan A. Suárez-Romero, Beatriz Pérez-Sánchez
Abstract:
Intrusion detection is a problem that has attracted a great deal of attention from computer scientists lately, due to the exponential increase in computer attacks in recent years. DARPA KDD Cup 99 is a standard dataset for classifying computer attacks to which several machine learning techniques have been applied. In this paper, we describe the results obtained using functionalnetworks - a paradigm that extends feedforward neural networks - and compare these to the results obtained by other techniques applied to the same dataset. Of particular interest is the capacity for generalization of the approach used.
Intrusion detection is a problem that has attracted a great deal of attention from computer scientists lately, due to the exponential increase in computer attacks in recent years. DARPA KDD Cup 99 is a standard dataset for classifying computer attacks to which several machine learning techniques have been applied. In this paper, we describe the results obtained using functionalnetworks - a paradigm that extends feedforward neural networks - and compare these to the results obtained by other techniques applied to the same dataset. Of particular interest is the capacity for generalization of the approach used.
ES2007-36
Identification of churn routes in the Brazilian telecommunications market
David L. García, Alfredo Vellido, Angela Nebot
Identification of churn routes in the Brazilian telecommunications market
David L. García, Alfredo Vellido, Angela Nebot
Abstract:
The globalization and deregulation of business environments are rapidly shifting the competitive challenges that telecommunications service providers face. As a result, many of these companies are focusing on the preservation of existing customers and the limitation of customer attrition damages. In this brief paper, we investigate the existence of abandonment routes in the Brazilian telecommunications market, according to the customers’ service consumption pattern. A non-linear latent variable model of the manifold learning family is used to segment and visualize the data, as well as to identify typical churn routes.
The globalization and deregulation of business environments are rapidly shifting the competitive challenges that telecommunications service providers face. As a result, many of these companies are focusing on the preservation of existing customers and the limitation of customer attrition damages. In this brief paper, we investigate the existence of abandonment routes in the Brazilian telecommunications market, according to the customers’ service consumption pattern. A non-linear latent variable model of the manifold learning family is used to segment and visualize the data, as well as to identify typical churn routes.