Bruges, Belgium, April 26-27-28
Content of the proceedings
-
Self-organization, vector quantization and clustering
Man-Machine-Interfaces - Processing of nervous signals
Vision and applications
Online Learning in Cognitive Robotics
Learning I
Feature extraction and variable projection
Visualization methods for data mining
Semi-blind approaches for Source Separation and Independent Component Analysis (ICA)
Learning II
Biologically inspired models
Kernel methods
Nonlinear dynamics
Neural Networks and Machine Learning in Bioinformatics - Theory and Applications
Learning III
Self-organization, vector quantization and clustering
ES2006-142
Unsupervised clustering of continuous trajectories of kinematic trees with SOM-SD
Jochen Steil, Risto Koiva, Alessandro Sperduti
Unsupervised clustering of continuous trajectories of kinematic trees with SOM-SD
Jochen Steil, Risto Koiva, Alessandro Sperduti
Abstract:
We explore the capability of SOM-SD to compress continuous time data recorded from a kinematic tree, which can represent a robot or an artifical stick-figure. We compare different encodings of this data as tree or sequence, which preserve the structural dependencies introduced by the physical constraints in the model to different degrees. Besides computing a standard quantization error, we propose a new measure to account for the amount of compression in the temporal domain based on correlation of the degree of locality of the tree and the number of winners in the map for this tree. The approach is demonstrated for a stick-figure moving in a physics based simulation world. It turns out that SOM-SD is able to achieve a very exact representation of the data together with a reasonable compression if tree encodings rather than sequence encodings are used.
We explore the capability of SOM-SD to compress continuous time data recorded from a kinematic tree, which can represent a robot or an artifical stick-figure. We compare different encodings of this data as tree or sequence, which preserve the structural dependencies introduced by the physical constraints in the model to different degrees. Besides computing a standard quantization error, we propose a new measure to account for the amount of compression in the temporal domain based on correlation of the degree of locality of the tree and the number of winners in the map for this tree. The approach is demonstrated for a stick-figure moving in a physics based simulation world. It turns out that SOM-SD is able to achieve a very exact representation of the data together with a reasonable compression if tree encodings rather than sequence encodings are used.
ES2006-83
Magnification control for batch neural gas
Barbara Hammer, Alexander Hasenfuss, Thomas Villmann
Magnification control for batch neural gas
Barbara Hammer, Alexander Hasenfuss, Thomas Villmann
Abstract:
It is well known, that online neural gas (NG) possesses a magnification exponent different from the information theoretically optimum one in adaptive map formation. The exponent can explicitely be controlled by a small change of the learning algorithm. Batch NG constitutes a fast alternative optimization scheme for NG vector quantizers which possesses the same magnification factor as standard online NG. In this paper, we propose a method to integrate magnification control by local learning into batch NG by linking magnification control to an underlying cost function. We validate the learning rule in an experimental setting.
It is well known, that online neural gas (NG) possesses a magnification exponent different from the information theoretically optimum one in adaptive map formation. The exponent can explicitely be controlled by a small change of the learning algorithm. Batch NG constitutes a fast alternative optimization scheme for NG vector quantizers which possesses the same magnification factor as standard online NG. In this paper, we propose a method to integrate magnification control by local learning into batch NG by linking magnification control to an underlying cost function. We validate the learning rule in an experimental setting.
ES2006-90
Weighted differential topographic function: a refinement of topographic function
Lili Zhang, Erzsebet Merenyi
Weighted differential topographic function: a refinement of topographic function
Lili Zhang, Erzsebet Merenyi
Abstract:
Topology preservation of Self-Organizing Maps (SOMs) is an advantageous property for correct clustering. Among several existing measures of topology violation, this paper studies the Topographic Function (TF) [1]. We find that this measuring method, demonstrated for low-dimensional data in [1], has a reliable foundation in its distance metric for the interpretation of the neighborhood relationship in the input space, for high-dimensional data. Based on the TF, we present a Differential Topographic Function (DTF) to reveal the topology violation more clearly and informatively. In addition, a Weighted Differential Topographic Function (WDTF) has been developed. For real world data, the DTF and WDTF unravel more details than the original TF, and help us estimate the topology preservation quality more accurately.
Topology preservation of Self-Organizing Maps (SOMs) is an advantageous property for correct clustering. Among several existing measures of topology violation, this paper studies the Topographic Function (TF) [1]. We find that this measuring method, demonstrated for low-dimensional data in [1], has a reliable foundation in its distance metric for the interpretation of the neighborhood relationship in the input space, for high-dimensional data. Based on the TF, we present a Differential Topographic Function (DTF) to reveal the topology violation more clearly and informatively. In addition, a Weighted Differential Topographic Function (WDTF) has been developed. For real world data, the DTF and WDTF unravel more details than the original TF, and help us estimate the topology preservation quality more accurately.
ES2006-121
Cluster detection algorithm in neural networks
David Meunier, Hélène Paugam-Moisy
Cluster detection algorithm in neural networks
David Meunier, Hélène Paugam-Moisy
Abstract:
Complex networks have received much attention in the last few years, and reveal global properties of interacting systems in domains like biology, social sciences and technology. One of the key feature of complex networks is their clusterized structure. Most methods applied to study complex networks are based on undirected graphs. However, when considering neural networks, the directionality of links is fundamental. In this article, a method of cluster detection is extended for directed graphs. We show how the extended method is more efficient to detect a clusterized structure in neural networks, without significant increase of the computational cost.
Complex networks have received much attention in the last few years, and reveal global properties of interacting systems in domains like biology, social sciences and technology. One of the key feature of complex networks is their clusterized structure. Most methods applied to study complex networks are based on undirected graphs. However, when considering neural networks, the directionality of links is fundamental. In this article, a method of cluster detection is extended for directed graphs. We show how the extended method is more efficient to detect a clusterized structure in neural networks, without significant increase of the computational cost.
ES2006-86
Enhanced maxcut clustering with multivalued neural networks and functional annealing
Enrique Mérida-Casermeiro, Domingo López-Rodríguez, Juan Miguel Ortiz-de-Lazcano-Lobato
Enhanced maxcut clustering with multivalued neural networks and functional annealing
Enrique Mérida-Casermeiro, Domingo López-Rodríguez, Juan Miguel Ortiz-de-Lazcano-Lobato
Abstract:
In this work a new algorithm to improve the performance of optimization methods, by means of avoiding certain local optima, is described. Its theoretical bases are presented in a rigorous, but intuitive, way. It has been applied concretely to the case of recurrent neural networks, in particular to MREM, a multivalued recurrent model, that has proved to obtain very good results when dealing with NP-complete combinatorial optimization problems. In order to show its efficiency, the well-known MaxCut problem for graphs has been selected as benchmark. Our proposal outperforms other specialized and powerful techniques, as shown by simulations.
In this work a new algorithm to improve the performance of optimization methods, by means of avoiding certain local optima, is described. Its theoretical bases are presented in a rigorous, but intuitive, way. It has been applied concretely to the case of recurrent neural networks, in particular to MREM, a multivalued recurrent model, that has proved to obtain very good results when dealing with NP-complete combinatorial optimization problems. In order to show its efficiency, the well-known MaxCut problem for graphs has been selected as benchmark. Our proposal outperforms other specialized and powerful techniques, as shown by simulations.
Man-Machine-Interfaces - Processing of nervous signals
ES2006-5
Artificial neural networks and machine learning for man-machine-interfaces - processing of nervous signals
Martin Bogdan, Michael Bensch
Artificial neural networks and machine learning for man-machine-interfaces - processing of nervous signals
Martin Bogdan, Michael Bensch
Abstract:
Recently, Man-Machine-Interfaces contacting the nervous system in order to extract information resp. to introduce information gain more and more in importance. In order to establish systems like neural prostheses or Brain-Computer-Interfaces, powerful (real time) algorithms for processing nerve signals or their field potentials are required. Another important point is the introduction of information into nervous systems by means like functional neuroelectrical stimulation (FNS). This paper gives a short introduction and reviews different approaches towards the development of Man-Machine-Interfaces using artificial neural networks respectively machine learning algorithms for signal processing.
Recently, Man-Machine-Interfaces contacting the nervous system in order to extract information resp. to introduce information gain more and more in importance. In order to establish systems like neural prostheses or Brain-Computer-Interfaces, powerful (real time) algorithms for processing nerve signals or their field potentials are required. Another important point is the introduction of information into nervous systems by means like functional neuroelectrical stimulation (FNS). This paper gives a short introduction and reviews different approaches towards the development of Man-Machine-Interfaces using artificial neural networks respectively machine learning algorithms for signal processing.
ES2006-45
Linking non-binned spike train kernels to several existing spike train metrics
Benjamin Schrauwen, Jan Van Campenhout
Linking non-binned spike train kernels to several existing spike train metrics
Benjamin Schrauwen, Jan Van Campenhout
Abstract:
This work presents two kernels which can be applied to sets of spike times. This allows the use of state-of-the-art classification techniques to spike trains. The presented kernels are closely related to several recent and often used spike train metrics. One of the main advantages is that it does not require the spike trains to be binned. A high temporal resolution is thus preserved which is needed when temporal coding is used. As a test of the classification possibilities a jittered spike train template classification problem is solved.
This work presents two kernels which can be applied to sets of spike times. This allows the use of state-of-the-art classification techniques to spike trains. The presented kernels are closely related to several recent and often used spike train metrics. One of the main advantages is that it does not require the spike trains to be binned. A high temporal resolution is thus preserved which is needed when temporal coding is used. As a test of the classification possibilities a jittered spike train template classification problem is solved.
ES2006-111
Spatial filters for the classification of event-related potentials
Ulrich Hoffmann, Jean-Marc Vesin, Touradj Ebrahimi
Spatial filters for the classification of event-related potentials
Ulrich Hoffmann, Jean-Marc Vesin, Touradj Ebrahimi
Abstract:
Spatial filtering is a widely used dimension reduction method in electroencephalogram based brain-computer interface systems. In this paper a new algorithm is proposed, which learns spatial filters from a training dataset. In contrast to existing approaches the proposed method yields spatial filters that are explicitly designed for the classification of event-related potentials, such as the P300 or movement-related potentials. The algorithm is tested, in combination with support vector machines, on several benchmark datasets from past BCI competitions and achieves state of the art results.
Spatial filtering is a widely used dimension reduction method in electroencephalogram based brain-computer interface systems. In this paper a new algorithm is proposed, which learns spatial filters from a training dataset. In contrast to existing approaches the proposed method yields spatial filters that are explicitly designed for the classification of event-related potentials, such as the P300 or movement-related potentials. The algorithm is tested, in combination with support vector machines, on several benchmark datasets from past BCI competitions and achieves state of the art results.
ES2006-51
On-line adaptation of neuro-prostheses with neuronal evaluation signals
Klaus R. Pawelzik, Udo A. Ernst, David Rotermund
On-line adaptation of neuro-prostheses with neuronal evaluation signals
Klaus R. Pawelzik, Udo A. Ernst, David Rotermund
Abstract:
Experiments have demonstrated that prosthetic devices can in principle be controlled by brain signals. However, in stable long-term applications neuroprostheses may suffer substantially from non-stationarities of the recorded signals. Such changes currently require supervised re-learning procedures which must be conducted under laboratory conditions, hampering the envisioned everyday use of such devices. As an alternative we here propose an on-line adaptation scheme that exploits a secondary signal source from brain regions reflecting the user's affective evaluation of the neuro-prosthetic's performance. Using realistic assumptions about recordable signals and their noise levels, our simulations show that prosthetic devices can be adapted successfully during normal, everyday usage.
Experiments have demonstrated that prosthetic devices can in principle be controlled by brain signals. However, in stable long-term applications neuroprostheses may suffer substantially from non-stationarities of the recorded signals. Such changes currently require supervised re-learning procedures which must be conducted under laboratory conditions, hampering the envisioned everyday use of such devices. As an alternative we here propose an on-line adaptation scheme that exploits a secondary signal source from brain regions reflecting the user's affective evaluation of the neuro-prosthetic's performance. Using realistic assumptions about recordable signals and their noise levels, our simulations show that prosthetic devices can be adapted successfully during normal, everyday usage.
ES2006-44
Using distributed genetic programming to evolve classifiers for a brain computer interface
Eva Alfaro-Cid, Anna I. Esparcia-Alcázar, Ken Sharman
Using distributed genetic programming to evolve classifiers for a brain computer interface
Eva Alfaro-Cid, Anna I. Esparcia-Alcázar, Ken Sharman
Abstract:
The objective of this paper is to illustrate the application of genetic programming to evolve classifiers for multi-channel time series data. The paper shows how high performance distributed genetic programming has been implemented for evolving classifiers. The particular application discussed herein is the classification of human electroencephalographic signals for a brain-computer interface. The resulting classifying structures provide classification rates comparable to those obtained using traditional, human-designed, classification methods.
The objective of this paper is to illustrate the application of genetic programming to evolve classifiers for multi-channel time series data. The paper shows how high performance distributed genetic programming has been implemented for evolving classifiers. The particular application discussed herein is the classification of human electroencephalographic signals for a brain-computer interface. The resulting classifying structures provide classification rates comparable to those obtained using traditional, human-designed, classification methods.
Vision and applications
ES2006-39
A Cyclostationary Neural Network model for the prediction of the NO2 concentration
Monica Bianchini, Ernesto Di Iorio, Marco Maggini, Chiara Mocenni, Augusto Pucci
A Cyclostationary Neural Network model for the prediction of the NO2 concentration
Monica Bianchini, Ernesto Di Iorio, Marco Maggini, Chiara Mocenni, Augusto Pucci
Abstract:
Air pollution control is a major environmental concern. The quality of air is an important factor for everyday life in cities, since it affects the health of the community and directly influences the sustainability of our lifestyles and production methods. In this paper we propose a cyclostationary neural network (CNN) model for the prediction of the NO2 concentration. The cyclostationary nature of the problem guides the construction of the CNN architecture, which is composed by a number of MLP blocks equal to the cyclostationary period in the analyzed phenomenon, and is independent of exogenous inputs. Some preliminary experimentation shows that the CNN model significantly outperforms standard statistical tools usually employed for this task.
Air pollution control is a major environmental concern. The quality of air is an important factor for everyday life in cities, since it affects the health of the community and directly influences the sustainability of our lifestyles and production methods. In this paper we propose a cyclostationary neural network (CNN) model for the prediction of the NO2 concentration. The cyclostationary nature of the problem guides the construction of the CNN architecture, which is composed by a number of MLP blocks equal to the cyclostationary period in the analyzed phenomenon, and is independent of exogenous inputs. Some preliminary experimentation shows that the CNN model significantly outperforms standard statistical tools usually employed for this task.
ES2006-89
Learning Visual Invariance
Alessio Plebe
Learning Visual Invariance
Alessio Plebe
Abstract:
Invariance is a necessary feature of a visual system able to recognize real objects in all their possible appearance. It is also the processing step most problematic to understand in biological systems, and most difficult to simulate in computational models. This work investigates the possibility to achieve viewpoint invariance without adopting any explicit theorical solution to the problem, but simply by exposing a hierarchical architecture of self-organizing artificial cortical maps to series of images under various viewpoints.
Invariance is a necessary feature of a visual system able to recognize real objects in all their possible appearance. It is also the processing step most problematic to understand in biological systems, and most difficult to simulate in computational models. This work investigates the possibility to achieve viewpoint invariance without adopting any explicit theorical solution to the problem, but simply by exposing a hierarchical architecture of self-organizing artificial cortical maps to series of images under various viewpoints.
Online Learning in Cognitive Robotics
ES2006-4
Recent trends in online learning for cognitive robots
Jochen Steil, Heiko Wersing
Recent trends in online learning for cognitive robots
Jochen Steil, Heiko Wersing
Abstract:
We present a review of recent trends in cognitive robotics that deal with online learning approaches to the acquisition of knowledge, control strategies and behaviors of a cognitive robot or agent. Along this line we focus on the topics of object recognition in cognitive vision, trajectory learning and adaptive control of multi-DOF robots, task learning from demonstration, and general developmental approaches in robotics. We argue for the relevance of online learning as a key ability for future intelligent robotic systems to allow flexible and adaptive behavior within a changing and unpredictable environment.
We present a review of recent trends in cognitive robotics that deal with online learning approaches to the acquisition of knowledge, control strategies and behaviors of a cognitive robot or agent. Along this line we focus on the topics of object recognition in cognitive vision, trajectory learning and adaptive control of multi-DOF robots, task learning from demonstration, and general developmental approaches in robotics. We argue for the relevance of online learning as a key ability for future intelligent robotic systems to allow flexible and adaptive behavior within a changing and unpredictable environment.
ES2006-73
Extended model of conditioned learning within latent inhibition
Nicolas Gomond, Jean-Marc Salotti
Extended model of conditioned learning within latent inhibition
Nicolas Gomond, Jean-Marc Salotti
Abstract:
Due to the various and dynamic nature of stimuli, decisions of intelligent agents must rely on the coordination of complex cognitive systems. This paper precisely focusses on a general learning architecture for autonomous agents. It is based on a neural network model that enables the specific behaviours of classical conditioning and a biologically inspired attentional phenomenon called latent inhibition. We propose a neural network implementation of an extended model of classical conditioning and present some results.
Due to the various and dynamic nature of stimuli, decisions of intelligent agents must rely on the coordination of complex cognitive systems. This paper precisely focusses on a general learning architecture for autonomous agents. It is based on a neural network model that enables the specific behaviours of classical conditioning and a biologically inspired attentional phenomenon called latent inhibition. We propose a neural network implementation of an extended model of classical conditioning and present some results.
ES2006-118
construction of a memory management system in an on-line learning mechanism
Francisco Bellas, Jose Antonio Becerra, Richard Duro
construction of a memory management system in an on-line learning mechanism
Francisco Bellas, Jose Antonio Becerra, Richard Duro
Abstract:
This paper is the first of a two paper series that deals with an important problem in on-line learning mechanisms for autonomous agents that must perform non trivial tasks and operate over extended periods of time. The problem has to do with memory, and, in particular, with what is to be stored in what representation and the need for providing a memory management system to control the interplay between different types of memory. To study the problem, a two level memory structure consisting of a short term and a long term memory is introduced in an evolutionary based cognitive mechanism called the Multilevel Darwinist Brain. A management system for their operation and interaction is proposed that benefits from the evolutionary nature of the mechanism. Some results obtained during operation with real robots are presented in the second paper of the series
This paper is the first of a two paper series that deals with an important problem in on-line learning mechanisms for autonomous agents that must perform non trivial tasks and operate over extended periods of time. The problem has to do with memory, and, in particular, with what is to be stored in what representation and the need for providing a memory management system to control the interplay between different types of memory. To study the problem, a two level memory structure consisting of a short term and a long term memory is introduced in an evolutionary based cognitive mechanism called the Multilevel Darwinist Brain. A management system for their operation and interaction is proposed that benefits from the evolutionary nature of the mechanism. Some results obtained during operation with real robots are presented in the second paper of the series
ES2006-92
Adaptive scene-dependent filters in online learning environments
Michael Götting, Jochen Steil, Heiko Wersing, Edgar Körner, Helge Ritter
Adaptive scene-dependent filters in online learning environments
Michael Götting, Jochen Steil, Heiko Wersing, Edgar Körner, Helge Ritter
Abstract:
In this paper we propose the Adaptive Scene Dependent Filters (ASDF) to enhance the online learning capabilities of an object recognition system in real-world scenes. The ASDF method proposed extends the idea of unsupervised segmentation to a flexible, highly dynamic image segmentation architecture. We combine unsupervised segmentation to define coherent groups of pixels with a recombination step using top-down information to determine which segments belong together to the object. We show the successful application of this approach to online learning in cluttered environments.
In this paper we propose the Adaptive Scene Dependent Filters (ASDF) to enhance the online learning capabilities of an object recognition system in real-world scenes. The ASDF method proposed extends the idea of unsupervised segmentation to a flexible, highly dynamic image segmentation architecture. We combine unsupervised segmentation to define coherent groups of pixels with a recombination step using top-down information to determine which segments belong together to the object. We show the successful application of this approach to online learning in cluttered environments.
ES2006-19
A multiagent architecture for concurrent reinforcement learning
Victor Uc Cetina
A multiagent architecture for concurrent reinforcement learning
Victor Uc Cetina
Abstract:
In this paper we propose a multiagent architecture for implementing concurrent reinforcement learning, an approach where several agents, sharing the same environment, perceptions and actions, work towards one only objective: learning a single value function. We present encouraging experimental results derived from the initial phase of our research on the combination of concurrent reinforcement learning and learning from demonstration.
In this paper we propose a multiagent architecture for implementing concurrent reinforcement learning, an approach where several agents, sharing the same environment, perceptions and actions, work towards one only objective: learning a single value function. We present encouraging experimental results derived from the initial phase of our research on the combination of concurrent reinforcement learning and learning from demonstration.
ES2006-120
Some experimental results with a two level memory management system in the multilevel darwinist brain
Francisco Bellas, Jose Antonio Becerra, Richard Duro
Some experimental results with a two level memory management system in the multilevel darwinist brain
Francisco Bellas, Jose Antonio Becerra, Richard Duro
Abstract:
This paper provides a description and discussion of several experiments carried out with simulated and real agents that operated with the Multilevel Darwinist Brain cognitive mechanism including a two level memory management system. The agents interacted with real environments, including teachers, and the results show the interplay between the parameters that regulate replacement strategies in both, short term and long term memories. This type of structures allow the agents to learn autonomously, paying attention to the relevant information and to transform data into knowledge, creating subjective internal representations that can be easily reused or modified to adapt them to new situations.
This paper provides a description and discussion of several experiments carried out with simulated and real agents that operated with the Multilevel Darwinist Brain cognitive mechanism including a two level memory management system. The agents interacted with real environments, including teachers, and the results show the interplay between the parameters that regulate replacement strategies in both, short term and long term memories. This type of structures allow the agents to learn autonomously, paying attention to the relevant information and to transform data into knowledge, creating subjective internal representations that can be easily reused or modified to adapt them to new situations.
Learning I
ES2006-21
Robust Local Cluster Neural Networks
Ralf Eickhoff, Joaquin Sitte, Ulrich Rückert
Robust Local Cluster Neural Networks
Ralf Eickhoff, Joaquin Sitte, Ulrich Rückert
Abstract:
Artificial neural networks are intended to be used in future nanoelectronics since their biological examples seem to be robust to noise. In this paper, we analyze the robustness of Local Cluster Neural Networks and determine upper bounds on the mean square error for noise contaminated weights and inputs.
Artificial neural networks are intended to be used in future nanoelectronics since their biological examples seem to be robust to noise. In this paper, we analyze the robustness of Local Cluster Neural Networks and determine upper bounds on the mean square error for noise contaminated weights and inputs.
ES2006-24
Topological Correlation
Kevin Doherty, Rod Adams, Neil Davey
Topological Correlation
Kevin Doherty, Rod Adams, Neil Davey
Abstract:
Quantifying the success of the topographic preservation achieved with a neural map is difficult. In this paper we present Topological Correlation, Tc, a method that assesses the degree of topographic preservation achieved based on the linear correlation between the topological distances in the neural map, and the topological distances in the induced Delaunay triangulation of the network nodes. In contrast to previous indices, Tc has been explicitly devised to assess the topographic success of neural maps composed of many sub-graph structures. The Tc index is bounded, and unequivocally identifies a perfect mapping, but more importantly, it provides the ability to quantitatively compare less than successful mappings. The Tc index has also been successfully used to determine the maximum size of network.
Quantifying the success of the topographic preservation achieved with a neural map is difficult. In this paper we present Topological Correlation, Tc, a method that assesses the degree of topographic preservation achieved based on the linear correlation between the topological distances in the neural map, and the topological distances in the induced Delaunay triangulation of the network nodes. In contrast to previous indices, Tc has been explicitly devised to assess the topographic success of neural maps composed of many sub-graph structures. The Tc index is bounded, and unequivocally identifies a perfect mapping, but more importantly, it provides the ability to quantitatively compare less than successful mappings. The Tc index has also been successfully used to determine the maximum size of network.
ES2006-53
An algorithm for fast and reliable ESOM learning
Mario Nöcker, Fabian Mörchen, Alfred Ultsch
An algorithm for fast and reliable ESOM learning
Mario Nöcker, Fabian Mörchen, Alfred Ultsch
Abstract:
The training of Emergent Self-organizing Maps (ESOM) with large datasets can be a computationally demanding task. Batch learning may be used to speed up training. It is demonstrated here, however, that the representation of clusters in the data space on maps trained with batch learning is poor compared to sequential training. This effect occurs even for very clear cluster structures. The k-batch learning algorithm is preferrable, because it creates the same quality of representation as sequential learning but maintains important properties of batch learning that can be exploited for speedup.
The training of Emergent Self-organizing Maps (ESOM) with large datasets can be a computationally demanding task. Batch learning may be used to speed up training. It is demonstrated here, however, that the representation of clusters in the data space on maps trained with batch learning is poor compared to sequential training. This effect occurs even for very clear cluster structures. The k-batch learning algorithm is preferrable, because it creates the same quality of representation as sequential learning but maintains important properties of batch learning that can be exploited for speedup.
ES2006-16
EM-algorithm for training of state-space models with application to time series prediction
Elia Liitiäinen, Nima Reyhani, Amaury Lendasse
EM-algorithm for training of state-space models with application to time series prediction
Elia Liitiäinen, Nima Reyhani, Amaury Lendasse
Abstract:
In this paper, an improvement to the E-step of the EM-algorithm for nonlinear state-space models is presented. We also propose strategies for model structure selection when the EM-algorithm and state-space models are used for time series prediction. Experiments on the Poland electricity load benchmark show that the method gives good short-term predictions and can also be used for long-term prediction.
In this paper, an improvement to the E-step of the EM-algorithm for nonlinear state-space models is presented. We also propose strategies for model structure selection when the EM-algorithm and state-space models are used for time series prediction. Experiments on the Poland electricity load benchmark show that the method gives good short-term predictions and can also be used for long-term prediction.
ES2006-77
Time series prediction using DirRec strategy
Antti Sorjamaa, Amaury Lendasse
Time series prediction using DirRec strategy
Antti Sorjamaa, Amaury Lendasse
Abstract:
This paper demonstrates how the selection of Prediction Strategy is important in the Long-Term Prediction of Time Series. Two strategies are already used in the prediction purposes called Recursive and Direct. This paper presents a third one, DirRec, which combines the advantages of the two already used ones. A simple k-NN approximation method is used and all three strategies are applied to two benchmarks: Santa Fe and Poland Electricity Load time series.
This paper demonstrates how the selection of Prediction Strategy is important in the Long-Term Prediction of Time Series. Two strategies are already used in the prediction purposes called Recursive and Direct. This paper presents a third one, DirRec, which combines the advantages of the two already used ones. A simple k-NN approximation method is used and all three strategies are applied to two benchmarks: Santa Fe and Poland Electricity Load time series.
ES2006-30
Consistent estimation of the architecture of multilayer perceptrons
joseph Rynkiewicz
Consistent estimation of the architecture of multilayer perceptrons
joseph Rynkiewicz
Abstract:
We consider regression models involving multilayer perceptrons (MLP) with one hidden layer and a Gaussian noise. The estimation of the parameters of the MLP can be done by maximizing the likelihood of the model. In this framework, it is difficult to determine the true number of hidden units using an information criterion, like the BIC, because the information matrix of Fisher is not invertible if the number of hidden units is overestimated. Indeed, the classical theoretical justification of information criteria relies entirely on the invertibility of this matrix. However, using recent methodology introduced to deal with models with a loss of identifiability, we prove that suitable information criterion leads to consistent estimation of the true number of hidden units.
We consider regression models involving multilayer perceptrons (MLP) with one hidden layer and a Gaussian noise. The estimation of the parameters of the MLP can be done by maximizing the likelihood of the model. In this framework, it is difficult to determine the true number of hidden units using an information criterion, like the BIC, because the information matrix of Fisher is not invertible if the number of hidden units is overestimated. Indeed, the classical theoretical justification of information criteria relies entirely on the invertibility of this matrix. However, using recent methodology introduced to deal with models with a loss of identifiability, we prove that suitable information criterion leads to consistent estimation of the true number of hidden units.
ES2006-57
Optimal design of hierarchical wavelet networks for time-series forecasting
Chen Yuehui, Yang Bo, Abraham Ajith
Optimal design of hierarchical wavelet networks for time-series forecasting
Chen Yuehui, Yang Bo, Abraham Ajith
Abstract:
The purpose of this study is to identify the Hierarchical Wavelet Neural Networks (HWNN) and select important input features for each sub-wavelet neural network automatically. Based on the pre-defined instruction/operator sets, a HWNN is created and evolved using tree-structure based Extended Compact Genetic Programming (ECGP), and the parameters are optimized by Differential Evolution (DE) algorithm. This framework also allows input variables selection. Empirical results on benchmark time-series approximation problems indicate that the proposed method is effective and fficient.
The purpose of this study is to identify the Hierarchical Wavelet Neural Networks (HWNN) and select important input features for each sub-wavelet neural network automatically. Based on the pre-defined instruction/operator sets, a HWNN is created and evolved using tree-structure based Extended Compact Genetic Programming (ECGP), and the parameters are optimized by Differential Evolution (DE) algorithm. This framework also allows input variables selection. Empirical results on benchmark time-series approximation problems indicate that the proposed method is effective and fficient.
ES2006-85
Recognition of handwritten digits using sparse codes generated by local feature extraction methods
Rebecca Steinert, Martin Rehn, Anders Lansner
Recognition of handwritten digits using sparse codes generated by local feature extraction methods
Rebecca Steinert, Martin Rehn, Anders Lansner
Abstract:
We investigate when sparse coding of sensory inputs can improve performance in a classification task. For this purpose, we use a standard data set, the MNIST database of handwritten digits. We systematically study combinations of sparse coding methods and neural classifiers in a two-layer network. We find that processing the image data into a sparse code can indeed improve the classification performance, compared to directly classifying the images. Further, increasing the level of sparseness leads to even better performance, up to a point where the reduction of redundancy in the codes is offset by loss of information.
We investigate when sparse coding of sensory inputs can improve performance in a classification task. For this purpose, we use a standard data set, the MNIST database of handwritten digits. We systematically study combinations of sparse coding methods and neural classifiers in a two-layer network. We find that processing the image data into a sparse code can indeed improve the classification performance, compared to directly classifying the images. Further, increasing the level of sparseness leads to even better performance, up to a point where the reduction of redundancy in the codes is offset by loss of information.
ES2006-82
iterative context compilation for visual object recognition
Jens Teichert, Rainer Malaka
iterative context compilation for visual object recognition
Jens Teichert, Rainer Malaka
Abstract:
This contribution describes an almost parameterless iterative context compilation method, which produces feature layers, that are especially suited for mixed bottom-up top-down association architectures. The context model is simple and enables fast calculation. Resulting structures are invariant to position, scale and rotation of input patterns.
This contribution describes an almost parameterless iterative context compilation method, which produces feature layers, that are especially suited for mixed bottom-up top-down association architectures. The context model is simple and enables fast calculation. Resulting structures are invariant to position, scale and rotation of input patterns.
ES2006-134
FPGA implementation of an integrate-and-fire LEGION model for image segmentation
Bernard Girau, Cesar Torres-Huitzil
FPGA implementation of an integrate-and-fire LEGION model for image segmentation
Bernard Girau, Cesar Torres-Huitzil
Abstract:
Despite several previous studies, little progress has been made in building successful neural systems for image segmentation in digital hardware. Spiking neural networks offer an opportunity to develop models of visual perception without any complex structure based on multiple neural maps. Such models use elementary asynchronous computations that have motivated several implementations on analog devices, whereas digital implementations appear as quite unable to handle large spiking neural networks, for lack of density. In this work, we consider a model of integrate-and-fire neurons organized according to the standard LEGION architecture to segment grey-level images. Taking advantage of the local and distributed structure of the model, a massively distributed implementation on FPGA using pipelined serial computations is developed. Results show that digital and flexible solutions may efficiently handle large networks of spiking neurons.
Despite several previous studies, little progress has been made in building successful neural systems for image segmentation in digital hardware. Spiking neural networks offer an opportunity to develop models of visual perception without any complex structure based on multiple neural maps. Such models use elementary asynchronous computations that have motivated several implementations on analog devices, whereas digital implementations appear as quite unable to handle large spiking neural networks, for lack of density. In this work, we consider a model of integrate-and-fire neurons organized according to the standard LEGION architecture to segment grey-level images. Taking advantage of the local and distributed structure of the model, a massively distributed implementation on FPGA using pipelined serial computations is developed. Results show that digital and flexible solutions may efficiently handle large networks of spiking neurons.
ES2006-137
Visual object classification by sparse convolutional neural networks
Alexander Gepperth
Visual object classification by sparse convolutional neural networks
Alexander Gepperth
Abstract:
A convolutional network (CNN) architecture termed sparse convolutional neural network is proposed and tested on a real-world classification task (car classification). In addition to the usual error function based on the mean squared error (MSE), a penalty term for correlation between hidden layer neurons is introduced with the aim of enforcing a sparse coding of the objects' visual appearance. It is demonstrated that classification accuracies can be improved by this method compared to purely MSE-trained convolutional networks.
A convolutional network (CNN) architecture termed sparse convolutional neural network is proposed and tested on a real-world classification task (car classification). In addition to the usual error function based on the mean squared error (MSE), a penalty term for correlation between hidden layer neurons is introduced with the aim of enforcing a sparse coding of the objects' visual appearance. It is demonstrated that classification accuracies can be improved by this method compared to purely MSE-trained convolutional networks.
ES2006-133
Modelling switching dynamics using prediction experts operating on distinct wavelet scales
Alexandre Aussem, Pierre Chainais
Modelling switching dynamics using prediction experts operating on distinct wavelet scales
Alexandre Aussem, Pierre Chainais
Abstract:
We present a framework for modelling the switching dynamics of a time series with correlation structures spanning distinct time scales, based on a neural-based multi-expert prediction model. First, an orthogonal wavelet transform is used to decompose the time series into varying levels of temporal resolution so that the underlying temporal structures of the original time series become more tractable. The transitions between the resolution scales are assumed to be governed by a hidden Markov model (HMM). The best state sequence is obtained by the Viterbi algorithm assuming some prior knowledge on the state transition probabilities and energy-dependent observation probabilities. The model achieves a hard segmentation of the time series into distinct dynamical modes and the simultaneous specialization of the prediction experts on the segments. The predictive ability of this strategy is assessed on a synthetic time series.
We present a framework for modelling the switching dynamics of a time series with correlation structures spanning distinct time scales, based on a neural-based multi-expert prediction model. First, an orthogonal wavelet transform is used to decompose the time series into varying levels of temporal resolution so that the underlying temporal structures of the original time series become more tractable. The transitions between the resolution scales are assumed to be governed by a hidden Markov model (HMM). The best state sequence is obtained by the Viterbi algorithm assuming some prior knowledge on the state transition probabilities and energy-dependent observation probabilities. The model achieves a hard segmentation of the time series into distinct dynamical modes and the simultaneous specialization of the prediction experts on the segments. The predictive ability of this strategy is assessed on a synthetic time series.
ES2006-107
Learning for stochastic dynamic programming
Sylvain Gelly, Jérémie Mary, Olivier Teytaud
Learning for stochastic dynamic programming
Sylvain Gelly, Jérémie Mary, Olivier Teytaud
Abstract:
We present experimental results about learning function values (i.e. Bellman values) in stochastic dynamic programming (SDP). All results come from openDP (opendp.sourceforge.net), a freely available source code, and therefore can be reproduced. The goal is an independent comparison of learning methods in the framework of SDP.
We present experimental results about learning function values (i.e. Bellman values) in stochastic dynamic programming (SDP). All results come from openDP (opendp.sourceforge.net), a freely available source code, and therefore can be reproduced. The goal is an independent comparison of learning methods in the framework of SDP.
ES2006-54
Adaptive Sensor Modelling and Classification using a Continuous Restricted Boltzmann Machine (CRBM)
Tong Boon Tang, Alan Murray
Adaptive Sensor Modelling and Classification using a Continuous Restricted Boltzmann Machine (CRBM)
Tong Boon Tang, Alan Murray
Abstract:
This paper presents a neural approach to sensor modelling and classification as the basis of local data fusion in a wireless sensor network. Data distributions are non-Gaussian. Data clusters are sufficiently complex that the classification problem is markedly non-linear. We prove that a Continuous Restricted Boltzmann Machine can model complex data distributions and can autocalibrate against real sensor drift. To highlight the adaptation, two trained but subsequently non-adaptive neural classifiers (SLP and MLP) were employed as benchmarks.
This paper presents a neural approach to sensor modelling and classification as the basis of local data fusion in a wireless sensor network. Data distributions are non-Gaussian. Data clusters are sufficiently complex that the classification problem is markedly non-linear. We prove that a Continuous Restricted Boltzmann Machine can model complex data distributions and can autocalibrate against real sensor drift. To highlight the adaptation, two trained but subsequently non-adaptive neural classifiers (SLP and MLP) were employed as benchmarks.
ES2006-48
Non-linear gating network for the large scale classification model CombNET-II
Mauricio Kugler, Toshiyuki Miyatani, Susumu Kuroyanagi, Anto Satriyo Nugroho, Akira Iwata
Non-linear gating network for the large scale classification model CombNET-II
Mauricio Kugler, Toshiyuki Miyatani, Susumu Kuroyanagi, Anto Satriyo Nugroho, Akira Iwata
Abstract:
The linear gating classifier (stem network) of the large scale model CombNET-II has been always the limiting factor which restricts the number of the expert classifiers (branch networks). The linear boundaries between its clusters cause a rapid decrease in the performance with increasing number of clusters and, consequently, impair the overall performance. This work proposes the use of a non-linear classifier to learn the complex boundaries between the clusters, which increases the gating performance while keeping the balanced split of samples produced by the original sequential clustering algorithm. The experiments have shown that, for some problems, the proposed model outperforms the monolithic classifier.
The linear gating classifier (stem network) of the large scale model CombNET-II has been always the limiting factor which restricts the number of the expert classifiers (branch networks). The linear boundaries between its clusters cause a rapid decrease in the performance with increasing number of clusters and, consequently, impair the overall performance. This work proposes the use of a non-linear classifier to learn the complex boundaries between the clusters, which increases the gating performance while keeping the balanced split of samples produced by the original sequential clustering algorithm. The experiments have shown that, for some problems, the proposed model outperforms the monolithic classifier.
ES2006-94
Saliency extraction with a distributed spiking neural network
Sylvain Chevallier, Philippe Tarroux, Hélène Paugam-Moisy
Saliency extraction with a distributed spiking neural network
Sylvain Chevallier, Philippe Tarroux, Hélène Paugam-Moisy
Abstract:
We present a distributed spiking neuron network (SNN) for handling low-level visual perception in order to extract salient locations in robot camera images. We describe a new method which reduce the computional load of the whole system, stemming from our choices of architecture. We also describe a modeling of post-synaptic potential, which allows to quickly compute the contribution of a sum of incoming spikes to a neuron's membrane potential. The interests of this saliency extraction method, which differs from classical image processing, are also exposed.
We present a distributed spiking neuron network (SNN) for handling low-level visual perception in order to extract salient locations in robot camera images. We describe a new method which reduce the computional load of the whole system, stemming from our choices of architecture. We also describe a modeling of post-synaptic potential, which allows to quickly compute the contribution of a sum of incoming spikes to a neuron's membrane potential. The interests of this saliency extraction method, which differs from classical image processing, are also exposed.
ES2006-100
Connection strategies in neocortical networks
Andreas Herzog, Karsten Kube, Bernd Michaelis, Anna D. de Lima, Thomas Voigt
Connection strategies in neocortical networks
Andreas Herzog, Karsten Kube, Bernd Michaelis, Anna D. de Lima, Thomas Voigt
Abstract:
This study considers the impact of different connection strategies in developing neocortical networks. An adequate connectivity is a requisite for synaptogenesis and the development of synchronous oscillatory network activity during maturation of cortical networks. In a defined time window early in development neurites have to grow out and connect to other neurons. Based on morphological observations we postulate that the underlying mechanism differs from common strategies of unspecific global or small world strategies. We show here that a displaced local connection mode is a very effective approach to connect neurons with minimal costs.
This study considers the impact of different connection strategies in developing neocortical networks. An adequate connectivity is a requisite for synaptogenesis and the development of synchronous oscillatory network activity during maturation of cortical networks. In a defined time window early in development neurites have to grow out and connect to other neurons. Based on morphological observations we postulate that the underlying mechanism differs from common strategies of unspecific global or small world strategies. We show here that a displaced local connection mode is a very effective approach to connect neurons with minimal costs.
Feature extraction and variable projection
ES2006-130
Random Forests Feature Selection with K-PLS: Detecting Ischemia from Magnetocardiograms
Long Han, Mark J. Embrechts, Boleslaw Szymanski, Karsten Sternickel, Alexander Ross
Random Forests Feature Selection with K-PLS: Detecting Ischemia from Magnetocardiograms
Long Han, Mark J. Embrechts, Boleslaw Szymanski, Karsten Sternickel, Alexander Ross
Abstract:
Random Forests were introduced by Breiman for feature(variable) selection and improved predictions for decision tree models. The resulting model is often superior to AdaBoost and bagging approaches. In this paper the random forest approach is extended for variable selection with other learning models, in this case Partial Least Squares (PLS) and Kernel Partial Least Squares (K-PLS) to estimate the importance of variables. This variable selection method is demonstrated on two benchmark datasets (Boston Housing and South African heart disease data). Finally, this methodology is applied to magnetocardiogram data for the detection of ischemic heart disease.
Random Forests were introduced by Breiman for feature(variable) selection and improved predictions for decision tree models. The resulting model is often superior to AdaBoost and bagging approaches. In this paper the random forest approach is extended for variable selection with other learning models, in this case Partial Least Squares (PLS) and Kernel Partial Least Squares (K-PLS) to estimate the importance of variables. This variable selection method is demonstrated on two benchmark datasets (Boston Housing and South African heart disease data). Finally, this methodology is applied to magnetocardiogram data for the detection of ischemic heart disease.
ES2006-101
Determination of the Mahalanobis matrix using nonparametric noise estimations
Amaury Lendasse, Francesco Corona, Jin Hao, Nima Reyhani, Michel Verleysen
Determination of the Mahalanobis matrix using nonparametric noise estimations
Amaury Lendasse, Francesco Corona, Jin Hao, Nima Reyhani, Michel Verleysen
Abstract:
In this paper, the problem of an optimal transformation of the input space for function approximation problems is addressed. The transformation is defined determining the Mahalanobis matrix that minimizes the variance of noise. To compute the variance of the noise, a nonparametric estimator called the Delta Test paradigm is used. The proposed approach is illlustrated on two different benchmarks.
In this paper, the problem of an optimal transformation of the input space for function approximation problems is addressed. The transformation is defined determining the Mahalanobis matrix that minimizes the variance of noise. To compute the variance of the noise, a nonparametric estimator called the Delta Test paradigm is used. The proposed approach is illlustrated on two different benchmarks.
ES2006-105
Bootstrap feature selection in support vector machines for ventricular fibrillation detection
Felipe Alonso Atienza, José Luis Rojo-Álvarez, Gustavo Camps-Valls, Alfredo Rosado Muñoz, Arcadio García Alberola
Bootstrap feature selection in support vector machines for ventricular fibrillation detection
Felipe Alonso Atienza, José Luis Rojo-Álvarez, Gustavo Camps-Valls, Alfredo Rosado Muñoz, Arcadio García Alberola
Abstract:
Support Vector Machines (SVM) for classification are being paid special attention in a number of practical applications. When using nonlinear Mercer kernels, the mapping of the input space to a highdimensional feature space makes the input feature selection a difficult task to be addressed. In this paper, we propose the use of nonparametric bootstrap resampling technique to provide with a statistical, distribution independent, criterion for input space feature selection. The confidence interval of the difference of error probability between the complete input space and a reduced-in-one-variable input space, is estimated via bootstrap resampling. Hence, a backward variable elimination procedure can be stated, by removing one variable at each step according to its associated confidence interval. A practical example application to early stage detection of cardiac ventricular fibrillation (VF) is presented. Basing on a previous nonlinear analysis based on temporal and spectral VF parameters, we use the SVM with Gaussian kernel and bootstrap resampling to provide with the minimum input space feature set that still holds the classiffcation performance of the complete data. The use of bootstrap resampling is a powerful input feature selection procedure for SVM classifiers.
Support Vector Machines (SVM) for classification are being paid special attention in a number of practical applications. When using nonlinear Mercer kernels, the mapping of the input space to a highdimensional feature space makes the input feature selection a difficult task to be addressed. In this paper, we propose the use of nonparametric bootstrap resampling technique to provide with a statistical, distribution independent, criterion for input space feature selection. The confidence interval of the difference of error probability between the complete input space and a reduced-in-one-variable input space, is estimated via bootstrap resampling. Hence, a backward variable elimination procedure can be stated, by removing one variable at each step according to its associated confidence interval. A practical example application to early stage detection of cardiac ventricular fibrillation (VF) is presented. Basing on a previous nonlinear analysis based on temporal and spectral VF parameters, we use the SVM with Gaussian kernel and bootstrap resampling to provide with the minimum input space feature set that still holds the classiffcation performance of the complete data. The use of bootstrap resampling is a powerful input feature selection procedure for SVM classifiers.
ES2006-50
The permutation test for feature selection by mutual information
Damien Francois, Vincent Wertz, Michel Verleysen
The permutation test for feature selection by mutual information
Damien Francois, Vincent Wertz, Michel Verleysen
Abstract:
The estimation of mutual information for feature selection is often subject to inaccuracies due to noise, small sample size, bad choice of parameter for the estimator, etc. The choice of a threshold above which a feature will be considered useful is thus dicult to make. Therefore, the use of the permutation test to assess the reliability of the estimation is proposed. The permutation test allows performing a non-parametric ypothesis test to select the relevant features and to build a Feature Relevance Diagram that visually synthesizes the result of the test.
The estimation of mutual information for feature selection is often subject to inaccuracies due to noise, small sample size, bad choice of parameter for the estimator, etc. The choice of a threshold above which a feature will be considered useful is thus dicult to make. Therefore, the use of the permutation test to assess the reliability of the estimation is proposed. The permutation test allows performing a non-parametric ypothesis test to select the relevant features and to build a Feature Relevance Diagram that visually synthesizes the result of the test.
ES2006-74
Stochastic Processes for Canonical Correlation Analysis
Colin Fyfe, Gayle Leen
Stochastic Processes for Canonical Correlation Analysis
Colin Fyfe, Gayle Leen
Abstract:
We consider two stochastic process methods for performing canonical correlation analysis (CCA). The first uses a Gaussian Process formulation of regression in which we use the current projection of one data set as the target for the other and then repeat in the opposite direction. The second uses a Dirichlet process of Gaussian models where the Gaussian models are determined by Probabilistic CCA \cite{jordan:bach}. The latter method is more computationally intensive but has the advantages of non-parametric approaches.
We consider two stochastic process methods for performing canonical correlation analysis (CCA). The first uses a Gaussian Process formulation of regression in which we use the current projection of one data set as the target for the other and then repeat in the opposite direction. The second uses a Dirichlet process of Gaussian models where the Gaussian models are determined by Probabilistic CCA \cite{jordan:bach}. The latter method is more computationally intensive but has the advantages of non-parametric approaches.
Visualization methods for data mining
ES2006-3
Visual Data Mining and Machine Learning
Fabrice Rossi
Visual Data Mining and Machine Learning
Fabrice Rossi
Abstract:
Information visualization and visual data mining leverage the human visual system to provide insight and understanding of unorganized data. In order to scale to massive sets of high dimensional data, simplification methods are needed, so as to select important dimensions and objects. Some machine learning algorithms try to solve those problems. We give in this paper an overview of information visualization and survey the links between this field and machine learning.
Information visualization and visual data mining leverage the human visual system to provide insight and understanding of unorganized data. In order to scale to massive sets of high dimensional data, simplification methods are needed, so as to select important dimensions and objects. Some machine learning algorithms try to solve those problems. We give in this paper an overview of information visualization and survey the links between this field and machine learning.
ES2006-97
Sanger-driven MDSLocalize - a comparative study for genomic data
Marc Strickert, Nese Sreenivasulu, Udo Seiffert
Sanger-driven MDSLocalize - a comparative study for genomic data
Marc Strickert, Nese Sreenivasulu, Udo Seiffert
Abstract:
Multidimensional scaling (MDS) methods are designed to establish a one-to-one correspondence of input-output relationships. While the input may be given as high-dimensional data items or as adjacency matrix characterizing data relations, the output space is usually chosen as low-dimensional Euclidean, ready for visualization. MDSLocalize, an existing method, is reformulated in terms of Sanger's rule that replaces the original foundations of computationally costly singular value decomposition. The derived method is compared to the recently proposed high-throughput multi-dimensional scaling (HiT-MDS) and to the well-established XGvis system. For comparison, real-value gene expression data and corresponding DNA sequences, given as proximity data, are considered.
Multidimensional scaling (MDS) methods are designed to establish a one-to-one correspondence of input-output relationships. While the input may be given as high-dimensional data items or as adjacency matrix characterizing data relations, the output space is usually chosen as low-dimensional Euclidean, ready for visualization. MDSLocalize, an existing method, is reformulated in terms of Sanger's rule that replaces the original foundations of computationally costly singular value decomposition. The derived method is compared to the recently proposed high-throughput multi-dimensional scaling (HiT-MDS) and to the well-established XGvis system. For comparison, real-value gene expression data and corresponding DNA sequences, given as proximity data, are considered.
ES2006-34
Visualizing the trustworthiness of a projection
Michaël Aupetit
Visualizing the trustworthiness of a projection
Michaël Aupetit
Abstract:
The visualization of continuous multi-dimensional data based on their projection in a 2-dimensional space is a way to detect visually interesting patterns, as far as the projection provides a faithful image of the original data. We propose to visualize directly in the projection space, how much the neighborhood has been preserved or not during the projection. We color the Voronoi cells associated to the segments of the Delaunay graph of the projections, according to their stretching or compression. We experiment these techniques with the Principal Component Analysis and the Curvilinear Component Analysis applied to different databases.
The visualization of continuous multi-dimensional data based on their projection in a 2-dimensional space is a way to detect visually interesting patterns, as far as the projection provides a faithful image of the original data. We propose to visualize directly in the projection space, how much the neighborhood has been preserved or not during the projection. We color the Voronoi cells associated to the segments of the Delaunay graph of the projections, according to their stretching or compression. We experiment these techniques with the Principal Component Analysis and the Curvilinear Component Analysis applied to different databases.
ES2006-138
Data topology visualization for the Self-Organizing Map
Kadim Tasdemir, Erzsebet Merenyi
Data topology visualization for the Self-Organizing Map
Kadim Tasdemir, Erzsebet Merenyi
Abstract:
The Self-Organizing map (SOM), a powerful method for data mining and cluster extraction, is very useful for processing high-dimensional and complex data. Visualization methods present different aspects of the information learned by the SOM to get more insight about the data. In this work, we propose a new visualization scheme that represents data topology superimposed on the SOM grid, and we show how it helps in the discovery of data structure.
The Self-Organizing map (SOM), a powerful method for data mining and cluster extraction, is very useful for processing high-dimensional and complex data. Visualization methods present different aspects of the information learned by the SOM to get more insight about the data. In this work, we propose a new visualization scheme that represents data topology superimposed on the SOM grid, and we show how it helps in the discovery of data structure.
ES2006-26
Visual nonlinear discriminant analysis for classifier design
Tomoharu Iwata, Kazumi Saito, Naonori Ueda
Visual nonlinear discriminant analysis for classifier design
Tomoharu Iwata, Kazumi Saito, Naonori Ueda
Abstract:
We present a new method for analyzing classifiers by visualization, which we call visual nonlinear discriminant analysis. Classifiers that output posterior probabilities are visualized by embedding samples and classes so as to approximate posteriors using parametric embedding. The visualization provides a better intuitive understanding of such classifier characteristics as separability and generalization ability than conventional methods. We evaluate our method by visualizing classifiers for artificial and real data sets.
We present a new method for analyzing classifiers by visualization, which we call visual nonlinear discriminant analysis. Classifiers that output posterior probabilities are visualized by embedding samples and classes so as to approximate posteriors using parametric embedding. The visualization provides a better intuitive understanding of such classifier characteristics as separability and generalization ability than conventional methods. We evaluate our method by visualizing classifiers for artificial and real data sets.
ES2006-78
Outlier identification with the Harmonic Topographic Mapping
Marian Pena, Colin Fyfe
Outlier identification with the Harmonic Topographic Mapping
Marian Pena, Colin Fyfe
Abstract:
We review two versions of a topology preserving algorithm one of which we had previously found to be more succesful in defining smooth manifolds and tight clusters. In the context of outlier detection, however, the other is shown to be more sucessful. However, we show that, by using local kernels for calculation of responsibilities, the first one can also be used in this manner.
We review two versions of a topology preserving algorithm one of which we had previously found to be more succesful in defining smooth manifolds and tight clusters. In the context of outlier detection, however, the other is shown to be more sucessful. However, we show that, by using local kernels for calculation of responsibilities, the first one can also be used in this manner.
ES2006-155
A new hyperbolic visualization method for displaying the results of a neural gas model: application to Webometrics
Shadi Al Shehabi, Jean-Charles Lamirel
A new hyperbolic visualization method for displaying the results of a neural gas model: application to Webometrics
Shadi Al Shehabi, Jean-Charles Lamirel
Abstract:
The core model which is considered in this paper is the neural gas model. This paper proposes an original hyperbolic visualization approach which is suitable to be applied on the results of such a model. The main principle of this approach is to use a hierarchical algorithm in order to summarize the gas contents in the form on a hypertree in which information on data density issued from the original neurons (i.e.classes) description space is preserved. An application of this approach on a dataset of websites issued from European universities is presented in order to prove its accuracy.
The core model which is considered in this paper is the neural gas model. This paper proposes an original hyperbolic visualization approach which is suitable to be applied on the results of such a model. The main principle of this approach is to use a hierarchical algorithm in order to summarize the gas contents in the form on a hypertree in which information on data density issued from the original neurons (i.e.classes) description space is preserved. An application of this approach on a dataset of websites issued from European universities is presented in order to prove its accuracy.
Semi-blind approaches for Source Separation and Independent Component Analysis (ICA)
ES2006-2
Semi-Blind Approaches for Source Separation and Independent component Analysis
Massoud Babaie-Zadeh, Christian Jutten
Semi-Blind Approaches for Source Separation and Independent component Analysis
Massoud Babaie-Zadeh, Christian Jutten
Abstract:
This paper is a survey of semi-blind source separation approaches. Since Gaussian iid signals are not separable, simplest priors suggest to assume non Gaussian iid signals, or Gaussian non iid signals. Other priors can also been used, for instance discrete or bounded sources, positivity, etc. Although providing a generic framework for semi-blind source separation, Sparse Component Analysis and Bayesian ICA will just sketched in this paper, since two other survey papers develop in depth these approaches.
This paper is a survey of semi-blind source separation approaches. Since Gaussian iid signals are not separable, simplest priors suggest to assume non Gaussian iid signals, or Gaussian non iid signals. Other priors can also been used, for instance discrete or bounded sources, positivity, etc. Although providing a generic framework for semi-blind source separation, Sparse Component Analysis and Bayesian ICA will just sketched in this paper, since two other survey papers develop in depth these approaches.
ES2006-154
Bayesian source separation: beyond PCA and ICA
Ali Mohammad-Djafari
Bayesian source separation: beyond PCA and ICA
Ali Mohammad-Djafari
Abstract:
Blind source separation (BSS) has become one of the major signal and image processing area in many applications. Principal component analysis (PCA) and Independent component analysis (ICA) have become two main classical approaches for this problem. However, these two approaches have their limits which are mainly, the assumptions that the data are temporally iid and that the model is exact (no noise). In this paper, we first show that the Bayesian inference framework gives the possibility to go beyond these limits while obtaining PCA and ICA algorithms as particular cases. Then, we propose different a priori models for sources which progressively account for different properties of the sources. Finally, we illustrate the application of these different models in spectrometry, in astrophysical imaging, in satellite imaging and in hyperspectral imaging.
Blind source separation (BSS) has become one of the major signal and image processing area in many applications. Principal component analysis (PCA) and Independent component analysis (ICA) have become two main classical approaches for this problem. However, these two approaches have their limits which are mainly, the assumptions that the data are temporally iid and that the model is exact (no noise). In this paper, we first show that the Bayesian inference framework gives the possibility to go beyond these limits while obtaining PCA and ICA algorithms as particular cases. Then, we propose different a priori models for sources which progressively account for different properties of the sources. Finally, we illustrate the application of these different models in spectrometry, in astrophysical imaging, in satellite imaging and in hyperspectral imaging.
ES2006-157
A survey of Sparse Component Analysis for blind source separation: principles, perspectives, and new challenges
Rémi Gribonval, Sylvain Lesage
A survey of Sparse Component Analysis for blind source separation: principles, perspectives, and new challenges
Rémi Gribonval, Sylvain Lesage
Abstract:
In this survey, we highlight the appealing features and challenges of Sparse Component Analysis (SCA) for blind source separation (BSS). SCA is a simple yet powerful framework to separate several sources from few sensors, even when the independence assumption is dropped. So far, SCA has been most successfully applied when the sources can be represented sparsely in a given basis, but many other potential uses of SCA remain unexplored. Among other challenging perspectives, we discuss how SCA could be used to exploit both the spatial diversity corresponding to the mixing process and the morphological diversity between sources to unmix even underdetermined convolutive mixtures. This raises several challenges, including the design of both provably good and numerically efficient algorithms for large-scale sparse approximation with overcomplete signal dictionaries.
In this survey, we highlight the appealing features and challenges of Sparse Component Analysis (SCA) for blind source separation (BSS). SCA is a simple yet powerful framework to separate several sources from few sensors, even when the independence assumption is dropped. So far, SCA has been most successfully applied when the sources can be represented sparsely in a given basis, but many other potential uses of SCA remain unexplored. Among other challenging perspectives, we discuss how SCA could be used to exploit both the spatial diversity corresponding to the mixing process and the morphological diversity between sources to unmix even underdetermined convolutive mixtures. This raises several challenges, including the design of both provably good and numerically efficient algorithms for large-scale sparse approximation with overcomplete signal dictionaries.
ES2006-62
Source separation with priors on the power spectrum of the sources
Jorge Igual, Raul Llinares, Andres Camacho
Source separation with priors on the power spectrum of the sources
Jorge Igual, Raul Llinares, Andres Camacho
Abstract:
A general approach introducing priors on the correlation function or equivalently power spectrum of the sources in the Blind Source Separation problem is presented. This prior modifies or constrains the contrast function that measures the independence of the recovered signals depending on its characteristics. Considering the case where the priors correspond to the sources that we are interested in recovering, the deflation approach is stated. This formulation is especially useful for those large-dimension problems where the ancillary sources are not needed to be estimated. We show its application to the biomedical problem of extracting the atrial activity from atrial fibrillation episodes, where discriminant information about the frequency content of the atrial activity with respect to the other components is available in advance.
A general approach introducing priors on the correlation function or equivalently power spectrum of the sources in the Blind Source Separation problem is presented. This prior modifies or constrains the contrast function that measures the independence of the recovered signals depending on its characteristics. Considering the case where the priors correspond to the sources that we are interested in recovering, the deflation approach is stated. This formulation is especially useful for those large-dimension problems where the ancillary sources are not needed to be estimated. We show its application to the biomedical problem of extracting the atrial activity from atrial fibrillation episodes, where discriminant information about the frequency content of the atrial activity with respect to the other components is available in advance.
ES2006-153
A time-scale correlation-based blind separation method applicable to correlated sources
Yannick Deville, Dass Bissessur, Matthieu Puigt, Shahram Hosseini, Hervé Carfantan
A time-scale correlation-based blind separation method applicable to correlated sources
Yannick Deville, Dass Bissessur, Matthieu Puigt, Shahram Hosseini, Hervé Carfantan
Abstract:
We first propose a correlation-based blind source separation (BSS) method based on time-scale (TS) representations of the observed signals. This approach consists in identifying the columns of the (permuted scaled) mixing matrix in TS zones where this method detects that a single source is active. It thus sets very limited constraints on the sparsity of the sources in the TS domain. Both the detection and identification stages of this approach use local correlation parameters of the TS transforms of the observed signals. This BSS method, called TISCORR (for TIme-Scale CORRelation-based BSS), is an extension of our previous two temporal and time-frequency versions of this class of methods. Our second contribution in this paper consists in proving that all three approaches apply if the (transformed) source signals are linearly independent, thus allowing them to be correlated. This extends our previous demonstration, which only guaranteed our previous two approaches to be applicable to uncorrelated sources. Experimental tests show that our TISCORR method achieves good separation for linear instantaneous mixtures of real, correlated or uncorrelated, speech signals (output SIRs are above 40 dB).
We first propose a correlation-based blind source separation (BSS) method based on time-scale (TS) representations of the observed signals. This approach consists in identifying the columns of the (permuted scaled) mixing matrix in TS zones where this method detects that a single source is active. It thus sets very limited constraints on the sparsity of the sources in the TS domain. Both the detection and identification stages of this approach use local correlation parameters of the TS transforms of the observed signals. This BSS method, called TISCORR (for TIme-Scale CORRelation-based BSS), is an extension of our previous two temporal and time-frequency versions of this class of methods. Our second contribution in this paper consists in proving that all three approaches apply if the (transformed) source signals are linearly independent, thus allowing them to be correlated. This extends our previous demonstration, which only guaranteed our previous two approaches to be applicable to uncorrelated sources. Experimental tests show that our TISCORR method achieves good separation for linear instantaneous mixtures of real, correlated or uncorrelated, speech signals (output SIRs are above 40 dB).
ES2006-72
Independent dynamics subspace analysis
Alexander Ilin
Independent dynamics subspace analysis
Alexander Ilin
Abstract:
The paper presents an algorithm for identifying the independent subspace analysis model based on source dynamics. We propose to separate subspaces by decoupling their dynamic models. Each subspace is extracted by minimizing the prediction error given by a first-order nonlinear autoregressive model. The learning rules are derived from a cost function and implemented in the framework of denoising source separation.
The paper presents an algorithm for identifying the independent subspace analysis model based on source dynamics. We propose to separate subspaces by decoupling their dynamic models. Each subspace is extracted by minimizing the prediction error given by a first-order nonlinear autoregressive model. The learning rules are derived from a cost function and implemented in the framework of denoising source separation.
ES2006-103
Non-orthogonal Support Width ICA
John A. Lee, Frédéric Vrins, Michel Verleysen
Non-orthogonal Support Width ICA
John A. Lee, Frédéric Vrins, Michel Verleysen
Abstract:
Independent Component Analysis (ICA) is a powerful tool with applications in many areas of blind signal processing; however, its key assumption, i.e. the statistical independence of the source signals, can be somewhat restricting in some particular cases. For example, when considering several images, it is tempting to look on them as independent sources (the picture subjects are different), although they may actually be highly correlated (subjects are similar). Pictures of several landscapes (or faces) fall in this category. How to separate mixtures of such pictures? This paper proposes an ICA algorithm that can tackle this apparently paradoxical problem. Experiments with mixtures of real images demonstrate the soundness of the approach.
Independent Component Analysis (ICA) is a powerful tool with applications in many areas of blind signal processing; however, its key assumption, i.e. the statistical independence of the source signals, can be somewhat restricting in some particular cases. For example, when considering several images, it is tempting to look on them as independent sources (the picture subjects are different), although they may actually be highly correlated (subjects are similar). Pictures of several landscapes (or faces) fall in this category. How to separate mixtures of such pictures? This paper proposes an ICA algorithm that can tackle this apparently paradoxical problem. Experiments with mixtures of real images demonstrate the soundness of the approach.
ES2006-156
Hierarchical markovian models for joint classification, segmentation and data reduction of hyperspectral images
Nadia Bali, Ali Mohammad-Djafari, Adel Mohammadpour
Hierarchical markovian models for joint classification, segmentation and data reduction of hyperspectral images
Nadia Bali, Ali Mohammad-Djafari, Adel Mohammadpour
Abstract:
Spectral classification, segmentation and data reduction are the three main problems in hyperspectral image analysis. In this paper we propose a Bayesian estimation approach which tries to give a solution for these three problems jointly. The data reduction problem is modeled as a blind sources separation (BSS) where the data are the m hyperspectral images and the sources are the n < m images which must be mutually the most independent and piecewise homogeneous. To insure these properties, we propose a hierarchical model for the sources with a common hidden classification variable which is modelled via a Potts Markov field. The joint Bayesian estimation of this hidden variable as well as the sources and the mixing matrix of the BSS problem gives a solution for all the three problems of spectra classification, segmentation and data reduction problems of hyperspectral images. An appropriate Gibbs Sampling (GS) algorithm is proposed for the Bayesian computationand a few simulation results are given to illustrate the performances of the proposed method and some comparison with other classical methods of PCA and ICA used for BSS.
Spectral classification, segmentation and data reduction are the three main problems in hyperspectral image analysis. In this paper we propose a Bayesian estimation approach which tries to give a solution for these three problems jointly. The data reduction problem is modeled as a blind sources separation (BSS) where the data are the m hyperspectral images and the sources are the n < m images which must be mutually the most independent and piecewise homogeneous. To insure these properties, we propose a hierarchical model for the sources with a common hidden classification variable which is modelled via a Potts Markov field. The joint Bayesian estimation of this hidden variable as well as the sources and the mixing matrix of the BSS problem gives a solution for all the three problems of spectra classification, segmentation and data reduction problems of hyperspectral images. An appropriate Gibbs Sampling (GS) algorithm is proposed for the Bayesian computationand a few simulation results are given to illustrate the performances of the proposed method and some comparison with other classical methods of PCA and ICA used for BSS.
ES2006-20
A simple idea to separate convolutive mixtures in an undetermined scenario
Maciej Pedzisz, Ali Mansour
A simple idea to separate convolutive mixtures in an undetermined scenario
Maciej Pedzisz, Ali Mansour
Abstract:
We consider a blind separation problem for undetermined mixtures of two BPSK signals in a multi-path fading channel. We use independence and frequency diversity of the two source signals to identify mixture parameters, estimate Pulse Shaping Filters (PSF) and channel responses, as well as to extract both binary sequences from only one observation. Presented method uses gradient descent algorithm to directly adopt the symbols, which are then used as feedback sequence for PSF roll-off factor identification as well as for channel equalization.
We consider a blind separation problem for undetermined mixtures of two BPSK signals in a multi-path fading channel. We use independence and frequency diversity of the two source signals to identify mixture parameters, estimate Pulse Shaping Filters (PSF) and channel responses, as well as to extract both binary sequences from only one observation. Presented method uses gradient descent algorithm to directly adopt the symbols, which are then used as feedback sequence for PSF roll-off factor identification as well as for channel equalization.
ES2006-68
FastISA: A fast fixed-point algorithm for independent subspace analysis
Aapo Hyvärinen, Urs Köster
FastISA: A fast fixed-point algorithm for independent subspace analysis
Aapo Hyvärinen, Urs Köster
Abstract:
Independent Subspace Analysis (ISA; Hyvarinen & Hoyer, 2000) is an extension of ICA. In ISA, the components are divided into subspaces, so that components in different subspaces are assumed independent, whereas components in the same subspace have dependencies. In this paper we describe a fast fixed-point algorithm for ISA estimation, analogous to FastICA. In particular we give a proof of the quadratic convergence of the algorithm, and present simulations to confirm the fast convergence.
Independent Subspace Analysis (ISA; Hyvarinen & Hoyer, 2000) is an extension of ICA. In ISA, the components are divided into subspaces, so that components in different subspaces are assumed independent, whereas components in the same subspace have dependencies. In this paper we describe a fast fixed-point algorithm for ISA estimation, analogous to FastICA. In particular we give a proof of the quadratic convergence of the algorithm, and present simulations to confirm the fast convergence.
ES2006-115
Discriminacy of the minimum range approach to blind separation of bounded sources
Dinh-Tuan Pham, Frédéric Vrins
Discriminacy of the minimum range approach to blind separation of bounded sources
Dinh-Tuan Pham, Frédéric Vrins
Abstract:
The Blind Source Separation (BSS) problem is often solved by maximizing objective functions reflecting the statistical dependency between outputs. Since global maximization may be difficult without exhaustive search, criteria for which it can be proved that all the local maxima correspond to an acceptable solution of the BSS problem have been developed. These criteria are used in a deflation procedure. This paper shows that the ``spurious maximum free'' property still holds for the minimum range approach when the sources are extracted simultaneously.
The Blind Source Separation (BSS) problem is often solved by maximizing objective functions reflecting the statistical dependency between outputs. Since global maximization may be difficult without exhaustive search, criteria for which it can be proved that all the local maxima correspond to an acceptable solution of the BSS problem have been developed. These criteria are used in a deflation procedure. This paper shows that the ``spurious maximum free'' property still holds for the minimum range approach when the sources are extracted simultaneously.
Learning II
ES2006-148
Entropy-based principle and generalized contingency tables
Vincent Vigneron
Entropy-based principle and generalized contingency tables
Vincent Vigneron
Abstract:
It is well known that the entropy-based concept of mutual information provides a measure of dependence between two discrete random variables. There are several ways to normalize this measure in order to obtain a coefficient similar e.g. to Pearson’s coefficient of contingency. This paper presents a measure of independence between categorical variables and is applied for clustering of multidimensional contingency tables. We propose and study a class of measures of directed discrepancy. Two factors make our divergence function attractive: first, the coefficient we obtain a framework in which a Bregman divergence can be used for the objective function ; second, we allow specification of a larger class of constraints that preserves varous statistics.
It is well known that the entropy-based concept of mutual information provides a measure of dependence between two discrete random variables. There are several ways to normalize this measure in order to obtain a coefficient similar e.g. to Pearson’s coefficient of contingency. This paper presents a measure of independence between categorical variables and is applied for clustering of multidimensional contingency tables. We propose and study a class of measures of directed discrepancy. Two factors make our divergence function attractive: first, the coefficient we obtain a framework in which a Bregman divergence can be used for the objective function ; second, we allow specification of a larger class of constraints that preserves varous statistics.
ES2006-46
On the selection of hidden neurons with heuristic search strategies for approximation
Ignacio Barrio, Enrique Romero, Lluís Belanche
On the selection of hidden neurons with heuristic search strategies for approximation
Ignacio Barrio, Enrique Romero, Lluís Belanche
Abstract:
Feature Selection techniques usually follow some search strategy to select a suitable subset from a set of features. Most neural network growing algorithms perform a search with Forward Selection with the objective of finding a reasonably good subset of neurons. Using this link between both fields (feature selection and neuron selection), we propose and analyze different algorithms for the construction of neural networks based on heuristic search strategies coming from the feature selection field. The results of an experimental comparison to Forward Selection using both synthetic and real data show that a much better approximation can be achieved, though at the expense of a higher computational cost.
Feature Selection techniques usually follow some search strategy to select a suitable subset from a set of features. Most neural network growing algorithms perform a search with Forward Selection with the objective of finding a reasonably good subset of neurons. Using this link between both fields (feature selection and neuron selection), we propose and analyze different algorithms for the construction of neural networks based on heuristic search strategies coming from the feature selection field. The results of an experimental comparison to Forward Selection using both synthetic and real data show that a much better approximation can be achieved, though at the expense of a higher computational cost.
ES2006-71
Lag selection for regression models using high-dimensional mutual information
Geoffroy Simon, Michel Verleysen
Lag selection for regression models using high-dimensional mutual information
Geoffroy Simon, Michel Verleysen
Abstract:
Mutual information may be used to select the embedding lag of a time series. However, this lag selection is usually limited to the analysis of the mutual information between a pair of lagged values in the series. In this paper, generalized mutual information estimators are proposed to take into account more than two variables in the lag selection. Experimental results show that lag selection using mutual information should also take into account the output of the regression model.
Mutual information may be used to select the embedding lag of a time series. However, this lag selection is usually limited to the analysis of the mutual information between a pair of lagged values in the series. In this paper, generalized mutual information estimators are proposed to take into account more than two variables in the lag selection. Experimental results show that lag selection using mutual information should also take into account the output of the regression model.
ES2006-99
Learning what is important: feature selection and rule extraction in a virtual course
Terence Etchells, Angela Nebot, Alfredo Vellido, Paulo Lisboa, Francisco Mugica
Learning what is important: feature selection and rule extraction in a virtual course
Terence Etchells, Angela Nebot, Alfredo Vellido, Paulo Lisboa, Francisco Mugica
Abstract:
Virtual campus environments are becoming a mainstream alternative to traditional distance higher education. The Internet medium they use allows the gathering of information on students’ usage behaviour. The knowledge extracted from this information can be fed back to the e-learning environment to ease advisors’ workload. In this context, two problems are addressed in the current study: finding which usage features are best at predicting online students’ marks, and explaining mark prediction in the form of parsimonious and interpretable rules. To that effect, two methods are used: Fuzzy Inductive Reasoning (FIR) for feature selection and Orthogonal Search-Based Rule Extraction (OSRE). Experiments carried out on the available data indicate that students’ marks can be accurately predicted and that a small subset of variables explains the accuracy of such prediction, which can be described through a set of simple and actionable rules.
Virtual campus environments are becoming a mainstream alternative to traditional distance higher education. The Internet medium they use allows the gathering of information on students’ usage behaviour. The knowledge extracted from this information can be fed back to the e-learning environment to ease advisors’ workload. In this context, two problems are addressed in the current study: finding which usage features are best at predicting online students’ marks, and explaining mark prediction in the form of parsimonious and interpretable rules. To that effect, two methods are used: Fuzzy Inductive Reasoning (FIR) for feature selection and Orthogonal Search-Based Rule Extraction (OSRE). Experiments carried out on the available data indicate that students’ marks can be accurately predicted and that a small subset of variables explains the accuracy of such prediction, which can be described through a set of simple and actionable rules.
ES2006-43
Data mining techniques for feature selection in blood cell recognition
Tomasz Markiewicz, Stanislaw Osowski
Data mining techniques for feature selection in blood cell recognition
Tomasz Markiewicz, Stanislaw Osowski
Abstract:
The paper presents and compares the data mining techniques for selection of the diagnostic features in the problem of blood cell recognition in leukaemia. Different techniques are compared: the linear SVM ranking, correlation analysis and statistical analysis of centers and variances of clusters corresponding to classes. The applied classifier network is Support Vector Machine of radial kernel. The results of recognition of 10 classes of cells are presented and discussed.
The paper presents and compares the data mining techniques for selection of the diagnostic features in the problem of blood cell recognition in leukaemia. Different techniques are compared: the linear SVM ranking, correlation analysis and statistical analysis of centers and variances of clusters corresponding to classes. The applied classifier network is Support Vector Machine of radial kernel. The results of recognition of 10 classes of cells are presented and discussed.
ES2006-104
A Gaussian process latent variable model formulation of canonical correlation analysis
Gayle Leen, Colin Fyfe
A Gaussian process latent variable model formulation of canonical correlation analysis
Gayle Leen, Colin Fyfe
Abstract:
We investigate a nonparametric model with which to visualize the relationship between two datasets. We base our model on Gaussian Process Latent Variable Models (GPVLM)[1],[7], a probabilistically defined latent variable model which takes the alternative approach of marginalizing the parameters and optimizing the latent variables; we optimize a latent variable set for each dataset, which preserves the correlations between the datasets, resulting in a GPVLM formulation of canonical correlation analysis which can be nonlinearised by choice of covariance function
We investigate a nonparametric model with which to visualize the relationship between two datasets. We base our model on Gaussian Process Latent Variable Models (GPVLM)[1],[7], a probabilistically defined latent variable model which takes the alternative approach of marginalizing the parameters and optimizing the latent variables; we optimize a latent variable set for each dataset, which preserves the correlations between the datasets, resulting in a GPVLM formulation of canonical correlation analysis which can be nonlinearised by choice of covariance function
ES2006-35
Designing neural network committees by combining boosting ensembles
Vanessa Gómez-Verdejo, Anibal R. Figueiras-Vidal
Designing neural network committees by combining boosting ensembles
Vanessa Gómez-Verdejo, Anibal R. Figueiras-Vidal
Abstract:
To construct modified Real Adaboost ensembles by applying weighted emphasis on erroneous and critical (near the classification boundary) has been shown to lead to improved designs, both in performance and in ensemble sizes. In this paper, we propose to take advantage of the diversity among different weighted combination to build committees of modified Real Adaboost designs. Experiments show that the expectable improvements are obtained.
To construct modified Real Adaboost ensembles by applying weighted emphasis on erroneous and critical (near the classification boundary) has been shown to lead to improved designs, both in performance and in ensemble sizes. In this paper, we propose to take advantage of the diversity among different weighted combination to build committees of modified Real Adaboost designs. Experiments show that the expectable improvements are obtained.
ES2006-91
Using Regression Error Characteristic Curves for Model Selection in Ensembles of Neural Networks
Aloisio Carlos de Pina, Gerson Zaverucha
Using Regression Error Characteristic Curves for Model Selection in Ensembles of Neural Networks
Aloisio Carlos de Pina, Gerson Zaverucha
Abstract:
Regression Error Characteristic (REC) analysis is a technique for evaluation and comparison of regression models that facilitates the visualization of the performance of many regression functions simultaneously in a single graph. The objective of this work is to present a new approach for model selection in ensembles of Neural Networks, in which we propose the use of REC curves in order to select a good threshold value, so that only residuals greater than that value are considered as errors. The algorithm was empirically evaluated and its results were analyzed also by means of REC curves.
Regression Error Characteristic (REC) analysis is a technique for evaluation and comparison of regression models that facilitates the visualization of the performance of many regression functions simultaneously in a single graph. The objective of this work is to present a new approach for model selection in ensembles of Neural Networks, in which we propose the use of REC curves in order to select a good threshold value, so that only residuals greater than that value are considered as errors. The algorithm was empirically evaluated and its results were analyzed also by means of REC curves.
ES2006-113
Diversity creation in local search for the evolution of neural network ensembles
Pete Duell, Iris Fermin, Xin Yao
Diversity creation in local search for the evolution of neural network ensembles
Pete Duell, Iris Fermin, Xin Yao
Abstract:
The EENCL algorithm [1] automatically designs neural network ensembles for classification, combining global evolution with local search based on gradient descent. Two mechanisms encourage diversity: Negative Correlation Learning (NCL) and implicit fitness sharing. This paper analyses EENCL, finding that NCL is not an essential component of the algorithm, while implicit fitness sharing is. Furthermore, we find that a local search based on independent training is equally effective in both accuracy and diversity. We propose that NCL is unnecessary in EENCL for the tested datasets, and that complementary diversity in local search and global evolution may lead to better ensembles.
The EENCL algorithm [1] automatically designs neural network ensembles for classification, combining global evolution with local search based on gradient descent. Two mechanisms encourage diversity: Negative Correlation Learning (NCL) and implicit fitness sharing. This paper analyses EENCL, finding that NCL is not an essential component of the algorithm, while implicit fitness sharing is. Furthermore, we find that a local search based on independent training is equally effective in both accuracy and diversity. We propose that NCL is unnecessary in EENCL for the tested datasets, and that complementary diversity in local search and global evolution may lead to better ensembles.
ES2006-67
Immune Network based Ensembles
Nicolás García-Pedrajas, Colin Fyfe
Immune Network based Ensembles
Nicolás García-Pedrajas, Colin Fyfe
Abstract:
This paper presents a new method for constructing ensembles of classifiers based on Immune Network Theory, one of the most interesting paradigms within the field of Artificial Immune Systems. Ensembles of classifiers are a very interesting alternative to single classifiers when facing difficult problems. In general, ensembles are able to achieve better performance in terms of learning and generalization error. We construct an Immune Network that constitutes an ensemble of classifiers. Using a neural network as base classifier we have compared the performance of this ensemble with five standard methods of ensemble construction. This comparison is made using 35 real-world classification problems from the UCI Machine Learning Repository. The results show a general advantage of the proposed model over the standard methods.
This paper presents a new method for constructing ensembles of classifiers based on Immune Network Theory, one of the most interesting paradigms within the field of Artificial Immune Systems. Ensembles of classifiers are a very interesting alternative to single classifiers when facing difficult problems. In general, ensembles are able to achieve better performance in terms of learning and generalization error. We construct an Immune Network that constitutes an ensemble of classifiers. Using a neural network as base classifier we have compared the performance of this ensemble with five standard methods of ensemble construction. This comparison is made using 35 real-world classification problems from the UCI Machine Learning Repository. The results show a general advantage of the proposed model over the standard methods.
ES2006-128
Classification by means of Evolutionary Response Surfaces
Rafael del Castillo-Gomariz, Nicolás García-Pedrajas
Classification by means of Evolutionary Response Surfaces
Rafael del Castillo-Gomariz, Nicolás García-Pedrajas
Abstract:
Response surfaces are a powerful tool for both classification and regression as they are able to model many different phenomena and construct complex boundaries between classes. Nevertheless, the absence of efficient methods for obtaining manageable response surfaces for real-world problems due to the large number of terms needed, greatly undermines their applicability. In this paper we propose the use of real-coded genetic algorithms for overcoming these limitations. We apply the evolved response surfaces to classification in two classes. The proposed algorithm selects a model of minimum dimensionality improving the robustness and generalisation abilities of the obtained classifier. The algorithm uses a dual codification (real and binary) and specific operators adapted from the standard operators for real-coded genetic algorithms. The fitness function considers the classification error and a regularisation term that takes into account the number of terms of the model. The results obtained in 10 real-world classification problems from the UCI Machine Learning Repository are comparable with well-known classification algorithms with a more interpretable polynomial function.
Response surfaces are a powerful tool for both classification and regression as they are able to model many different phenomena and construct complex boundaries between classes. Nevertheless, the absence of efficient methods for obtaining manageable response surfaces for real-world problems due to the large number of terms needed, greatly undermines their applicability. In this paper we propose the use of real-coded genetic algorithms for overcoming these limitations. We apply the evolved response surfaces to classification in two classes. The proposed algorithm selects a model of minimum dimensionality improving the robustness and generalisation abilities of the obtained classifier. The algorithm uses a dual codification (real and binary) and specific operators adapted from the standard operators for real-coded genetic algorithms. The fitness function considers the classification error and a regularisation term that takes into account the number of terms of the model. The results obtained in 10 real-world classification problems from the UCI Machine Learning Repository are comparable with well-known classification algorithms with a more interpretable polynomial function.
ES2006-36
Hierarchical analysis of GSM network performance data
Mikko Multanen, Kimmo Raivio, Pasi Lehtimäki
Hierarchical analysis of GSM network performance data
Mikko Multanen, Kimmo Raivio, Pasi Lehtimäki
Abstract:
In this study, a method for hierarchical examination and visualization of GSM data using the Self-Organizing Map (SOM) is described. The data is examined in few phases. At first temporally averaged data is used and then, in each phase some of the data is discarded and the rest is examined in more detail. The SOM is used both in clustering and in visualization. The actual clustering is performed to the nodes of the SOM to lower the computational cost and to help to understand better the properties of the clusters.
In this study, a method for hierarchical examination and visualization of GSM data using the Self-Organizing Map (SOM) is described. The data is examined in few phases. At first temporally averaged data is used and then, in each phase some of the data is discarded and the rest is examined in more detail. The SOM is used both in clustering and in visualization. The actual clustering is performed to the nodes of the SOM to lower the computational cost and to help to understand better the properties of the clusters.
ES2006-125
Learning with monotonicity requirements for optimal routing with end-to-end quality of service constraints
Antoine Mahul, Alexandre Aussem
Learning with monotonicity requirements for optimal routing with end-to-end quality of service constraints
Antoine Mahul, Alexandre Aussem
Abstract:
In this paper, we adapt the classical learning algorithm for feed-forward neural networks when monotonicity is required in the input-output mapping. Monotonicity can be imposed by adding of suitable penalization terms to the error function. This yields a computationally efficient algorithm with little overhead compared to back-propagation. This algorithm is used to train neural networks for delay evaluation in an optimization scheme for optimal routing in a communication network.
In this paper, we adapt the classical learning algorithm for feed-forward neural networks when monotonicity is required in the input-output mapping. Monotonicity can be imposed by adding of suitable penalization terms to the error function. This yields a computationally efficient algorithm with little overhead compared to back-propagation. This algorithm is used to train neural networks for delay evaluation in an optimization scheme for optimal routing in a communication network.
Biologically inspired models
ES2006-10
Evolving multi-segment 'super-lamprey' CPG's for increased swimming control
Leena Patel, Alan Murray, John Hallam
Evolving multi-segment 'super-lamprey' CPG's for increased swimming control
Leena Patel, Alan Murray, John Hallam
Abstract:
'Super-lamprey' swimmers which operate over a greater control range are evolved. Propulsion in the lamprey, an eel-like fish, is governed by activity in its spinal neural network. This CPG is simulated, in accordance with Ekeberg's model, and then optimised alternatives are generated with genetic algorithms. Extending our prior lamprey work on single segment oscillators to multiple segments (including interaction with a mechanical model) demonstrates that Ekeberg's CPG is not a unique solution and that simpler versions with wider operative ranges can be generated. This work 'out-evolves' nature as an initial step in understanding how to control wave power devices, with similar motion to the lamprey.
'Super-lamprey' swimmers which operate over a greater control range are evolved. Propulsion in the lamprey, an eel-like fish, is governed by activity in its spinal neural network. This CPG is simulated, in accordance with Ekeberg's model, and then optimised alternatives are generated with genetic algorithms. Extending our prior lamprey work on single segment oscillators to multiple segments (including interaction with a mechanical model) demonstrates that Ekeberg's CPG is not a unique solution and that simpler versions with wider operative ranges can be generated. This work 'out-evolves' nature as an initial step in understanding how to control wave power devices, with similar motion to the lamprey.
ES2006-140
Exploring the role of intrinsic plasticity for the learning of sensory representations
Nicholas Butko, Jochen Triesch
Exploring the role of intrinsic plasticity for the learning of sensory representations
Nicholas Butko, Jochen Triesch
Abstract:
Intrinsic plasticity (IP) refers to a neuron's ability to regulate its firing activity by adapting its intrinsic excitability. Previously, we showed that model neurons combining IP with Hebbian synaptic plasticity can adapt their weight vector to discover heavy-tailed directions in the input space. In this paper we consider networks of coupled model neurons and show how a population of such units can solve a standard non-linear ICA problem. We also present a simple model for the formation of maps of oriented receptive fields in primary visual cortex. Together, our results indicate that intrinsic plasticity may play an important role for learning efficient representations in populations of cortical neurons.
Intrinsic plasticity (IP) refers to a neuron's ability to regulate its firing activity by adapting its intrinsic excitability. Previously, we showed that model neurons combining IP with Hebbian synaptic plasticity can adapt their weight vector to discover heavy-tailed directions in the input space. In this paper we consider networks of coupled model neurons and show how a population of such units can solve a standard non-linear ICA problem. We also present a simple model for the formation of maps of oriented receptive fields in primary visual cortex. Together, our results indicate that intrinsic plasticity may play an important role for learning efficient representations in populations of cortical neurons.
Kernel methods
ES2006-75
LS-SVM functional network for time series prediction
Tuomas Kärnä, Fabrice Rossi, Amaury Lendasse
LS-SVM functional network for time series prediction
Tuomas Kärnä, Fabrice Rossi, Amaury Lendasse
Abstract:
Usually time series prediction is done with regularly sampled data. In practice, however, the data available may be irregularly sampled. In this case the conventional prediction methods cannot be used. One solution is to use Functional Data Analysis (FDA). In FDA an interpolating function is fitted to the data and the fitting coefficients are being analyzed instead of the original data points. In this paper, we propose a functional approach to time series prediction. Radial Basis Function Network (RBFN) is used for the interpolation. The interpolation parameters are optimized with a k-Nearest Neighbors (k-NN) model. Least Squares Support Vector Machine (LS-SVM) is used for the prediction.
Usually time series prediction is done with regularly sampled data. In practice, however, the data available may be irregularly sampled. In this case the conventional prediction methods cannot be used. One solution is to use Functional Data Analysis (FDA). In FDA an interpolating function is fitted to the data and the fitting coefficients are being analyzed instead of the original data points. In this paper, we propose a functional approach to time series prediction. Radial Basis Function Network (RBFN) is used for the interpolation. The interpolation parameters are optimized with a k-Nearest Neighbors (k-NN) model. Least Squares Support Vector Machine (LS-SVM) is used for the prediction.
ES2006-61
Synthesis of maximum margin and multiview learning using unlabeled data
Sandor Szedmak, John Shawe-Taylor
Synthesis of maximum margin and multiview learning using unlabeled data
Sandor Szedmak, John Shawe-Taylor
Abstract:
In this presentation we show the semi-supervised learning with two input sources can be transformed into a maximum margin problem to be similar to a binary SVM. Our formulation exploits the unlabeled data to reduce the complexity of the class of the learning functions. In order to measure how the complexity is decreased we use the Rademacher Complexity Theory. The corresponding optimization problem is convex and it is efficiently solvable for large-scale applications as well.
In this presentation we show the semi-supervised learning with two input sources can be transformed into a maximum margin problem to be similar to a binary SVM. Our formulation exploits the unlabeled data to reduce the complexity of the class of the learning functions. In order to measure how the complexity is decreased we use the Rademacher Complexity Theory. The corresponding optimization problem is convex and it is efficiently solvable for large-scale applications as well.
ES2006-116
Efficient Forward Regression with Marginal Likelihood
Ping Sun, Xin Yao
Efficient Forward Regression with Marginal Likelihood
Ping Sun, Xin Yao
Abstract:
We propose an efficient forward regression algorithm based on greedy optimization of marginal likelihood. It can be understood as a forward selection procedure which adds a new basis vector at each step with the largest increment to the marginal likelihood. The computational cost of our algorithm is linear in the number $n$ of training examples and quadratic in the number $k$ of selected basis vectors, i.e. $\mathcal{O}(nk^2)$. Moreover, our approach is only required to store a small fraction of all columns of the full design matrix. We compare our algorithm with the well-known Relevance Vector Machines (RVM) which also optimizes marginal likelihood iteratively. The results show that our algorithm can achieve comparable prediction accuracy but with significantly better scaling performance in terms of both computational cost and memory requirements.
We propose an efficient forward regression algorithm based on greedy optimization of marginal likelihood. It can be understood as a forward selection procedure which adds a new basis vector at each step with the largest increment to the marginal likelihood. The computational cost of our algorithm is linear in the number $n$ of training examples and quadratic in the number $k$ of selected basis vectors, i.e. $\mathcal{O}(nk^2)$. Moreover, our approach is only required to store a small fraction of all columns of the full design matrix. We compare our algorithm with the well-known Relevance Vector Machines (RVM) which also optimizes marginal likelihood iteratively. The results show that our algorithm can achieve comparable prediction accuracy but with significantly better scaling performance in terms of both computational cost and memory requirements.
Nonlinear dynamics
ES2006-6
Nonlinear dynamics in neural computation
Tjeerd olde Scheper, Nigel Crook
Nonlinear dynamics in neural computation
Tjeerd olde Scheper, Nigel Crook
Abstract:
This tutorial reports on the use of nonlinear dynamics in several different models of neural systems. We discuss a number of distinct approaches to neural information processing based on nonlinear dynamics. The models we consider combine controlled chaotic models with phenomenological models of spiking mechanisms as well as using weakly chaotic systems. The recent work of several major researchers in this field is briefly introduced.
This tutorial reports on the use of nonlinear dynamics in several different models of neural systems. We discuss a number of distinct approaches to neural information processing based on nonlinear dynamics. The models we consider combine controlled chaotic models with phenomenological models of spiking mechanisms as well as using weakly chaotic systems. The recent work of several major researchers in this field is briefly introduced.
ES2006-136
Dynamical reservoir properties as network effects
Carlos Lourenço
Dynamical reservoir properties as network effects
Carlos Lourenço
Abstract:
It has been proposed that chaos can serve as a reservoir providing an infinite number of dynamical states. These can be interpreted as different behaviors, search actions or computational states which are selectively adequate for different tasks. The high flexibility of chaotic regimes has been noted, as well as other advantages over regular regimes. However, the model neurons used to demonstrate these ideas could be criticized as lacking physical or biological realism. In the present paper we show that the same kind of rich behavior displayed by the toy models can be found with a more realistic neural model [6]. Furthermore, much of the complex behavior arises from network properties often overlooked in the literature.
It has been proposed that chaos can serve as a reservoir providing an infinite number of dynamical states. These can be interpreted as different behaviors, search actions or computational states which are selectively adequate for different tasks. The high flexibility of chaotic regimes has been noted, as well as other advantages over regular regimes. However, the model neurons used to demonstrate these ideas could be criticized as lacking physical or biological realism. In the present paper we show that the same kind of rich behavior displayed by the toy models can be found with a more realistic neural model [6]. Furthermore, much of the complex behavior arises from network properties often overlooked in the literature.
ES2006-29
Nonlinear transient computation and variable noise tolerance
Nigel Crook
Nonlinear transient computation and variable noise tolerance
Nigel Crook
Abstract:
A novel nonlinear transient computation device is presented which is designed to perform computations on multiple spike-train input signals. The input signals perturb the internal dynamic state of the device in a way that is characteristic of the input signal presented in each case. These characteristics are reflected in the output spike train of the device. Experimental evidence is presented in this paper which shows that this output spike train is both a noise tolerant and a noise sensitive response to the input signal presented.
A novel nonlinear transient computation device is presented which is designed to perform computations on multiple spike-train input signals. The input signals perturb the internal dynamic state of the device in a way that is characteristic of the input signal presented in each case. These characteristics are reflected in the output spike train of the device. Experimental evidence is presented in this paper which shows that this output spike train is both a noise tolerant and a noise sensitive response to the input signal presented.
ES2006-149
Cultures of dissociated neurons display a variety of avalanche behaviours
Roberta Alessio, Laura Cozzi, Vittorio Sanguineti
Cultures of dissociated neurons display a variety of avalanche behaviours
Roberta Alessio, Laura Cozzi, Vittorio Sanguineti
Abstract:
Avalanche dynamics has been described in organotypic cultures and acute slices from rat cortex. Its distinctive feature is a statistical distribution of avalanche size and duration following a power law with specific exponents, corresponding to near-critical state. We asked whether the same dynamics is present in dissociated cultures from rat embryos, which are characterized by complete lack of anatomic structure and high, random synaptic connectivity. We indeed observed such dynamics in some, but not all, experimental preparations. We conclude that the variability found in the dynamics of dissociated cultures also affects general features like the criticality of avalanche behavior.
Avalanche dynamics has been described in organotypic cultures and acute slices from rat cortex. Its distinctive feature is a statistical distribution of avalanche size and duration following a power law with specific exponents, corresponding to near-critical state. We asked whether the same dynamics is present in dissociated cultures from rat embryos, which are characterized by complete lack of anatomic structure and high, random synaptic connectivity. We indeed observed such dynamics in some, but not all, experimental preparations. We conclude that the variability found in the dynamics of dissociated cultures also affects general features like the criticality of avalanche behavior.
Neural Networks and Machine Learning in Bioinformatics - Theory and Applications
ES2006-7
Neural networks and machine learning in bioinformatics - theory and applications
Udo Seiffert, Barbara Hammer, Samuel Kaski, Thomas Villmann
Neural networks and machine learning in bioinformatics - theory and applications
Udo Seiffert, Barbara Hammer, Samuel Kaski, Thomas Villmann
Abstract:
Bioinformatics is a promising and innovative research field. Despite of a high number of techniques specifically dedicated to bioinformatics problems as well as many successful applications, we are in the beginning of a process to massively integrate the aspects and experiences in the different core subjects such as biology, medicine, computer science, engineering, chemistry, physics, and mathematics. Within this rather wide area we focus on neural networks and machine learning related approaches in bioinformatics with particular emphasis on integrative research against the background of the above mentioned scope.
Bioinformatics is a promising and innovative research field. Despite of a high number of techniques specifically dedicated to bioinformatics problems as well as many successful applications, we are in the beginning of a process to massively integrate the aspects and experiences in the different core subjects such as biology, medicine, computer science, engineering, chemistry, physics, and mathematics. Within this rather wide area we focus on neural networks and machine learning related approaches in bioinformatics with particular emphasis on integrative research against the background of the above mentioned scope.
ES2006-38
Using sampling methods to improve binding site predictions
Yi Sun, Mark Robinson, Rod Adams, Rene te Boeckhorst, Alistair G. Rust, Neil Davey
Using sampling methods to improve binding site predictions
Yi Sun, Mark Robinson, Rod Adams, Rene te Boeckhorst, Alistair G. Rust, Neil Davey
Abstract:
Currently the best algorithms for transcription factor binding site prediction are severely limited in accuracy. In previous work we combine random selection under-sampling with the SMOTE over-sampling techniques, working with several classification algorithms from the machine learning field to integrate binding site predictions. In this paper, we improve the classification result with the aid of Tomek links, either as an under-sampling technique or to remove further noisy data after sampling.
Currently the best algorithms for transcription factor binding site prediction are severely limited in accuracy. In previous work we combine random selection under-sampling with the SMOTE over-sampling techniques, working with several classification algorithms from the machine learning field to integrate binding site predictions. In this paper, we improve the classification result with the aid of Tomek links, either as an under-sampling technique or to remove further noisy data after sampling.
ES2006-28
Margin based Active Learning for LVQ Networks
Frank-Michael Schleif, Barbara Hammer, Thomas Villmann
Margin based Active Learning for LVQ Networks
Frank-Michael Schleif, Barbara Hammer, Thomas Villmann
Abstract:
In this article, we extend a local prototype-based learning model by active learning, which gives the learner the capability to select training samples and thereby increase speed and accuracy of the model. Our algorithm is based on the idea of selecting a query on the borderline of the actual classification. This can be done by considering margins in an extension of learning vector quantization based on an appropriate cost function. The performance of the query algorithm is demonstrated on real life data.
In this article, we extend a local prototype-based learning model by active learning, which gives the learner the capability to select training samples and thereby increase speed and accuracy of the model. Our algorithm is based on the idea of selecting a query on the borderline of the actual classification. This can be done by considering margins in an extension of learning vector quantization based on an appropriate cost function. The performance of the query algorithm is demonstrated on real life data.
ES2006-129
Classification of Boar Sperm Head Images using Learning Vector Quantization
Michael Biehl, Piter Pasma, Marten Pijl, Lidia Sanchez, Nicolai Petkov
Classification of Boar Sperm Head Images using Learning Vector Quantization
Michael Biehl, Piter Pasma, Marten Pijl, Lidia Sanchez, Nicolai Petkov
Abstract:
We apply Learning Vector Quantization (LVQ) in the domain of medical image analysis for automated boar semen quality assessment. The classification of single boar spermatozoid heads into healthy (normal) and damaged (non-normal) ones is based on greyscale microscopic images. Sample data was classified by veterinary experts and is used for training a system with a number of prototypes for each class. We apply as training schemes Kohonen's LVQ1 and the recent variants Generalized LVQ (GLVQ) and Generalized Relevance LVQ (GRLVQ). We compare their performance and furthermore study the influence of the employed metric.
We apply Learning Vector Quantization (LVQ) in the domain of medical image analysis for automated boar semen quality assessment. The classification of single boar spermatozoid heads into healthy (normal) and damaged (non-normal) ones is based on greyscale microscopic images. Sample data was classified by veterinary experts and is used for training a system with a number of prototypes for each class. We apply as training schemes Kohonen's LVQ1 and the recent variants Generalized LVQ (GLVQ) and Generalized Relevance LVQ (GRLVQ). We compare their performance and furthermore study the influence of the employed metric.
ES2006-33
Selection of more than one gene at a time for cancer prediction from gene expression data
Oleg Okun, Nikolay Zagoruiko, Alexessander Alves, Olga Kutnenko, Irina Borisova
Selection of more than one gene at a time for cancer prediction from gene expression data
Oleg Okun, Nikolay Zagoruiko, Alexessander Alves, Olga Kutnenko, Irina Borisova
Abstract:
A new gene selection method capable of selecting more than one gene at a time is introduced. This characteristic contrasts it with almost all known methods assuming that there are no interactions between genes. The only exception is the pairwise gene selection method recently proposed by B{\o} and Jonassen~\cite{bj02}. Motivated by this method, we compare it and ours. Classification into healthy tissue and cancerous tumor is studied, where gene selection finds gene subsets well suitable for discriminating between these classes. Experiments demonstrate superiority of our method in terms of leave-one-out cross-validation error.
A new gene selection method capable of selecting more than one gene at a time is introduced. This characteristic contrasts it with almost all known methods assuming that there are no interactions between genes. The only exception is the pairwise gene selection method recently proposed by B{\o} and Jonassen~\cite{bj02}. Motivated by this method, we compare it and ours. Classification into healthy tissue and cancerous tumor is studied, where gene selection finds gene subsets well suitable for discriminating between these classes. Experiments demonstrate superiority of our method in terms of leave-one-out cross-validation error.
ES2006-69
Visualizing gene interaction graphs with local multidimensional scaling
Jarkko Venna, Samuel Kaski
Visualizing gene interaction graphs with local multidimensional scaling
Jarkko Venna, Samuel Kaski
Abstract:
Several bioinformatics data sets are naturally represented as graphs, for instance gene regulation, metabolic pathways, and protein-protein interactions. The graphs are often large and complex, and their straightforward visualizations are incomprehensible. We have recently developed a new method called local multidimensional scaling for visualizing high-dimensional data sets. In this paper we adapt it to visualize graphs, and compare it with two commonly used graph visualization packages in visualizing yeast gene interaction graphs. The new method outperforms the alternatives in two crucial respects: It produces graph layouts that are both more trustworthy and have fever edge crossings.
Several bioinformatics data sets are naturally represented as graphs, for instance gene regulation, metabolic pathways, and protein-protein interactions. The graphs are often large and complex, and their straightforward visualizations are incomprehensible. We have recently developed a new method called local multidimensional scaling for visualizing high-dimensional data sets. In this paper we adapt it to visualize graphs, and compare it with two commonly used graph visualization packages in visualizing yeast gene interaction graphs. The new method outperforms the alternatives in two crucial respects: It produces graph layouts that are both more trustworthy and have fever edge crossings.
ES2006-141
Fuzzy image segmentation with Fuzzy Labelled Neural Gas
Cornelia Brüß, Felix Bollenbeck, Frank-Michael Schleif, Winfriede Weschke, Thomas Villmann, Udo Seiffert
Fuzzy image segmentation with Fuzzy Labelled Neural Gas
Cornelia Brüß, Felix Bollenbeck, Frank-Michael Schleif, Winfriede Weschke, Thomas Villmann, Udo Seiffert
Abstract:
Processing biological data often requires handling of uncertain and sometimes inconsistent information. Particularly when coping with image segmentation tasks against biomedical background, a clear description of for example tissue borders is often hard to define. On the other hand, there are only a few promising segmentation algorithms being able to process fuzzy input data. This paper describes one novel alternative applying the recently introduced Fuzzy Labelled Neural Gas (FLNG) as subsequent classification step to a biologically relevant fuzzy labelling with underlying image feature extraction.
Processing biological data often requires handling of uncertain and sometimes inconsistent information. Particularly when coping with image segmentation tasks against biomedical background, a clear description of for example tissue borders is often hard to define. On the other hand, there are only a few promising segmentation algorithms being able to process fuzzy input data. This paper describes one novel alternative applying the recently introduced Fuzzy Labelled Neural Gas (FLNG) as subsequent classification step to a biologically relevant fuzzy labelling with underlying image feature extraction.
ES2006-144
Elucidating the structure of genetic regulatory networks: a study of a second order dynamical model on artificial data
Minh Quach, Pierre Geurts, Florence d'Alché-Buc
Elucidating the structure of genetic regulatory networks: a study of a second order dynamical model on artificial data
Minh Quach, Pierre Geurts, Florence d'Alché-Buc
Abstract:
Learning regulatory networks from time-series of gene expression is a challenging task. We propose to use synthetic data to analyze the ability of a state-space model to retrieve the network structure while varying a number of relevant problem parameters. ROC curves together with new tools such as spectral clustering of local solutions found by EM are used to analyze these results and provide relevant insights.
Learning regulatory networks from time-series of gene expression is a challenging task. We propose to use synthetic data to analyze the ability of a state-space model to retrieve the network structure while varying a number of relevant problem parameters. ROC curves together with new tools such as spectral clustering of local solutions found by EM are used to analyze these results and provide relevant insights.
Learning III
ES2006-56
OnlineDoubleMaxMinOver: a simple approximate time and information efficient online Support Vector Classification method
Daniel Schneegaß, Thomas Martinetz, Michael Clausohm
OnlineDoubleMaxMinOver: a simple approximate time and information efficient online Support Vector Classification method
Daniel Schneegaß, Thomas Martinetz, Michael Clausohm
Abstract:
We present the OnlineDoubleMaxMinOver approach to obtain the Support Vectors in two class classification problems. With its linear time complexity and linear convergence the algorithm achieves a competitive speed. We approach the problem of the impossibility of perfect non trivial online Support Vector Learning by parameterising the exactness. Even in the case of linearly inseparable data within the feature space the method converges to a solution expressible by a finite amount of information while observing an arbitrarily large number of input vectors. The results of the online method are comparable to the batch ones, occasionally even better.
We present the OnlineDoubleMaxMinOver approach to obtain the Support Vectors in two class classification problems. With its linear time complexity and linear convergence the algorithm achieves a competitive speed. We approach the problem of the impossibility of perfect non trivial online Support Vector Learning by parameterising the exactness. Even in the case of linearly inseparable data within the feature space the method converges to a solution expressible by a finite amount of information while observing an arbitrarily large number of input vectors. The results of the online method are comparable to the batch ones, occasionally even better.
ES2006-80
Variants of Unsupervised Kernel Regression: General cost functions
Stefan Klanke, Helge Ritter
Variants of Unsupervised Kernel Regression: General cost functions
Stefan Klanke, Helge Ritter
Abstract:
We present an extension to a recent method for learning of nonlinear manifolds, which allows to incorporate general cost functions. We focus on the epsilon-insensitive loss and visually demonstrate our method on both toy and real data.
We present an extension to a recent method for learning of nonlinear manifolds, which allows to incorporate general cost functions. We focus on the epsilon-insensitive loss and visually demonstrate our method on both toy and real data.
ES2006-102
Degeneracy in model selection for SVMs with radial Gaussian kernel
Tobias Glasmachers
Degeneracy in model selection for SVMs with radial Gaussian kernel
Tobias Glasmachers
Abstract:
We consider the model selection problem for support vector machines applied to binary classification. As the data generating process is unknown, we have to rely on heuristics as model section criteria. In this study, we analyze the behavior of two criteria, radius margin quotient and kernel polarization, applied to SVMs with radial Gaussian kernel. We proof necessary and sufficient conditions for local optima at the boundary of the kernel parameter space in the limit of arbitrarily narrow kernels. The theorems show that multi-modality of the model selection objectives can arise due to insignificant properties of the training dataset.
We consider the model selection problem for support vector machines applied to binary classification. As the data generating process is unknown, we have to rely on heuristics as model section criteria. In this study, we analyze the behavior of two criteria, radius margin quotient and kernel polarization, applied to SVMs with radial Gaussian kernel. We proof necessary and sufficient conditions for local optima at the boundary of the kernel parameter space in the limit of arbitrarily narrow kernels. The theorems show that multi-modality of the model selection objectives can arise due to insignificant properties of the training dataset.
ES2006-117
Evolino for recurrent support vector machines
Juergen Schmidhuber, Matteo Gagliolo, Daan Wierstra, Faustino Gomez
Evolino for recurrent support vector machines
Juergen Schmidhuber, Matteo Gagliolo, Daan Wierstra, Faustino Gomez
Abstract:
We introduce a new class of recurrent, truly sequential SVM-like devices with internal adaptive states, trained by a novel method called EVOlution of systems with KErnel-based outputs (Evoke), an instance of the recent Evolino class of methods. Evoke evolves recurrent networks to detect and represent temporal dependencies while using SVM to produce precise outputs. Evoke is the first SVM-based mechanism learning to classify a context-sensitive language. It also outperforms recent state-of-the-art gradient-based recurrent neural networks (RNNs) on various time series prediction tasks.
We introduce a new class of recurrent, truly sequential SVM-like devices with internal adaptive states, trained by a novel method called EVOlution of systems with KErnel-based outputs (Evoke), an instance of the recent Evolino class of methods. Evoke evolves recurrent networks to detect and represent temporal dependencies while using SVM to produce precise outputs. Evoke is the first SVM-based mechanism learning to classify a context-sensitive language. It also outperforms recent state-of-the-art gradient-based recurrent neural networks (RNNs) on various time series prediction tasks.
ES2006-114
Hybrid generative/discriminative training of radial basis function networks
Artur Ferreira, Mario Figueiredo
Hybrid generative/discriminative training of radial basis function networks
Artur Ferreira, Mario Figueiredo
Abstract:
We propose a new training algorithm for radial basis function networks (RBFN), which incorporates both generative (mixture-based) and discriminative (logistic) criteria. Our algorithm incorporates steps from the classical expectation-maximization algorithm for mixtures of Gaussians with a logistic regression step to update (in a discriminative way) the output weights. We also describe an incremental version of the algorithm, which is robust regarding initial conditions. Comparison of our approach with existing training algorithms, on (both synthetic and real) binary classification problems, shows that it achieves better performance.
We propose a new training algorithm for radial basis function networks (RBFN), which incorporates both generative (mixture-based) and discriminative (logistic) criteria. Our algorithm incorporates steps from the classical expectation-maximization algorithm for mixtures of Gaussians with a logistic regression step to update (in a discriminative way) the output weights. We also describe an incremental version of the algorithm, which is robust regarding initial conditions. Comparison of our approach with existing training algorithms, on (both synthetic and real) binary classification problems, shows that it achieves better performance.
ES2006-124
Rotation-based ensembles of RBF networks
Juan J. Rodriguez, Jesus Maudes, Carlos Alonso
Rotation-based ensembles of RBF networks
Juan J. Rodriguez, Jesus Maudes, Carlos Alonso
Abstract:
Ensemble methods allow to improve the accuracy of classification methods. This work considers the application of one of these methods, named Rotation-based, when the classifiers to combine are RBF Networks. This ensemble method, for each member of the ensemble, transforms the data set using a pseudo-random rotation of the axis. Then the classifier is constructed using this rotation data. The results of the ensembles obtained with this method are compared with the results using other ensemble methods (including Bagging and Boosting), over 34 data sets.
Ensemble methods allow to improve the accuracy of classification methods. This work considers the application of one of these methods, named Rotation-based, when the classifiers to combine are RBF Networks. This ensemble method, for each member of the ensemble, transforms the data set using a pseudo-random rotation of the axis. Then the classifier is constructed using this rotation data. The results of the ensembles obtained with this method are compared with the results using other ensemble methods (including Bagging and Boosting), over 34 data sets.
ES2006-119
Learning and discrimination through STDP in a top-down modulated associative memory
Anthony Mouraud, Hélène Paugam-Moisy
Learning and discrimination through STDP in a top-down modulated associative memory
Anthony Mouraud, Hélène Paugam-Moisy
Abstract:
This article underlines the learning and discrimination capabilities of a model of associative memory based on artificial networks of spiking neurons. Inspired from neuropsychology and neurobiology, the model implements top-down modulations, as in neocortical layer V pyramidal neurons, with a learning rule based on synaptic plasticity (STDP), for performing a multimodal association learning task. A temporal correlation method of analysis proves the ability of the model to associate specific activity patterns to different samples of stimulation. Even in the absence of initial learning and with continuously varying weights, the activity patterns become stable enough for discrimination.
This article underlines the learning and discrimination capabilities of a model of associative memory based on artificial networks of spiking neurons. Inspired from neuropsychology and neurobiology, the model implements top-down modulations, as in neocortical layer V pyramidal neurons, with a learning rule based on synaptic plasticity (STDP), for performing a multimodal association learning task. A temporal correlation method of analysis proves the ability of the model to associate specific activity patterns to different samples of stimulation. Even in the absence of initial learning and with continuously varying weights, the activity patterns become stable enough for discrimination.
ES2006-60
Gaussian and exponential architectures in small-world associative memories
Lee Calcraft, Rod Adams, Neil Davey
Gaussian and exponential architectures in small-world associative memories
Lee Calcraft, Rod Adams, Neil Davey
Abstract:
The performance of sparsely-connected associative memory models built from a set of perceptrons is investigated using different patterns of connectivity. Architectures based on Gaussian and exponential distributions are compared to networks created by progressively rewiring a locally-connected network. It is found that while all three architectures are capable of good pattern-completion performance, the Gaussian and exponential architectures require a significantly lower mean wiring length to achieve the same results. In the case of networks of low connectivity, relatively tight Gaussian and exponential distributions achieve the best overall performance.
The performance of sparsely-connected associative memory models built from a set of perceptrons is investigated using different patterns of connectivity. Architectures based on Gaussian and exponential distributions are compared to networks created by progressively rewiring a locally-connected network. It is found that while all three architectures are capable of good pattern-completion performance, the Gaussian and exponential architectures require a significantly lower mean wiring length to achieve the same results. In the case of networks of low connectivity, relatively tight Gaussian and exponential distributions achieve the best overall performance.
ES2006-76
Parallel hardware implementation of a broad class of spiking neurons using serial arithmetic
Benjamin Schrauwen, Jan Van Campenhout
Parallel hardware implementation of a broad class of spiking neurons using serial arithmetic
Benjamin Schrauwen, Jan Van Campenhout
Abstract:
Current digital, directly mapped implementations of spiking neural networks use serial processing and parallel arithmetic. On a standard CPU, this might be the good choice, but when using a Field Programmable Gate Array (FPGA), other implementation architectures are possible. This work present a hardware implementation of a broad class of integrate and fire spiking neurons with synapse models using parallel processing and serial arithmetic. This results in very fast and compact implementations of spiking neurons on FPGA.
Current digital, directly mapped implementations of spiking neural networks use serial processing and parallel arithmetic. On a standard CPU, this might be the good choice, but when using a Field Programmable Gate Array (FPGA), other implementation architectures are possible. This work present a hardware implementation of a broad class of integrate and fire spiking neurons with synapse models using parallel processing and serial arithmetic. This results in very fast and compact implementations of spiking neurons on FPGA.
ES2006-135
Generalization properties of spiking neurons trained with ReSuMe method
Filip Ponulak, Andrzej Kasiński
Generalization properties of spiking neurons trained with ReSuMe method
Filip Ponulak, Andrzej Kasiński
Abstract:
In this paper we demonstrate the generalization ability of the spiking neurons trained with ReSuMe method. We show in a set of experiments that the learning neuron can approximate the input-output transformations defined by another - reference neuron with a high precision and that the learning process converges very quickly. We discuss the relationship between the neuron I/O properties and the weight distribution of its input connections. Finally, we discuss the conditions under which the neuron can approximate some given I/O transformations.
In this paper we demonstrate the generalization ability of the spiking neurons trained with ReSuMe method. We show in a set of experiments that the learning neuron can approximate the input-output transformations defined by another - reference neuron with a high precision and that the learning process converges very quickly. We discuss the relationship between the neuron I/O properties and the weight distribution of its input connections. Finally, we discuss the conditions under which the neuron can approximate some given I/O transformations.
ES2006-95
A sequence-encoding neural network for face recognition
Marek Barwiński, Rolf P. Würtz
A sequence-encoding neural network for face recognition
Marek Barwiński, Rolf P. Würtz
Abstract:
We propose a feature-based system for face recognition using contextual information to improve the recognition rate. A small (6 memory blocks, 3 cells each) recurrent neural network with internalmemory cell states (LSTM) is trained on single images of 49 different identities randomly picked from the FERET database and tested on images with different facial expressions using a predefined saccade path. We present the improvement of recognition rate and an outlook to the future development of the system including autonomous saccade generation, evidence accumulation and novelty detection.
We propose a feature-based system for face recognition using contextual information to improve the recognition rate. A small (6 memory blocks, 3 cells each) recurrent neural network with internalmemory cell states (LSTM) is trained on single images of 49 different identities randomly picked from the FERET database and tested on images with different facial expressions using a predefined saccade path. We present the improvement of recognition rate and an outlook to the future development of the system including autonomous saccade generation, evidence accumulation and novelty detection.
ES2006-147
Freeform surface induction from projected planar curves via neural networks
Usman Khan, Abdelaziz Terchi, Sungwoo Lim, David Wright, Sheng-Feng Qin
Freeform surface induction from projected planar curves via neural networks
Usman Khan, Abdelaziz Terchi, Sungwoo Lim, David Wright, Sheng-Feng Qin
Abstract:
We propose a novel intelligent approach into 2D to 3D of on-line sketching in conceptual design. A Multilayer Perceptron neural network is employed to construct 3D freeform surfaces from 2D freehand curves. Planar curves were used to represent the boundary strokes of a freeform surface patch and varied iteratively to produce a training set. Sampled curves were used to train and test the network. The results obtained demonstrate that the network successfully leaned the inverse-projection map and correctly inferred respective surfaces from curves previously unencountered.
We propose a novel intelligent approach into 2D to 3D of on-line sketching in conceptual design. A Multilayer Perceptron neural network is employed to construct 3D freeform surfaces from 2D freehand curves. Planar curves were used to represent the boundary strokes of a freeform surface patch and varied iteratively to produce a training set. Sampled curves were used to train and test the network. The results obtained demonstrate that the network successfully leaned the inverse-projection map and correctly inferred respective surfaces from curves previously unencountered.
ES2006-139
The combination of STDP and intrinsic plasticity yields complex dynamics in recurrent spiking networks
Andreea Lazar, Gordon Pipa, Jochen Triesch
The combination of STDP and intrinsic plasticity yields complex dynamics in recurrent spiking networks
Andreea Lazar, Gordon Pipa, Jochen Triesch
Abstract:
We analyze the dynamics of deterministic recurrent spiking neural networks with spike-timing dependent plasticity (STDP) and intrinsic plasticity (IP) that changes the excitability of individual units. We find that STDP and IP can synergistically interact to produce complex network dynamics. These dynamics are quite different from the dynamics of networks that lack one or the other form of plasticity. Our results suggest that a synergistic combination of different forms of plasticity may contribute to cortical dynamics of high complexity, and they underscore the need to carefully study the interaction of different plasticity forms.
We analyze the dynamics of deterministic recurrent spiking neural networks with spike-timing dependent plasticity (STDP) and intrinsic plasticity (IP) that changes the excitability of individual units. We find that STDP and IP can synergistically interact to produce complex network dynamics. These dynamics are quite different from the dynamics of networks that lack one or the other form of plasticity. Our results suggest that a synergistic combination of different forms of plasticity may contribute to cortical dynamics of high complexity, and they underscore the need to carefully study the interaction of different plasticity forms.
ES2006-22
Reducing policy degradation in neuro-dynamic programming
Thomas Gabel, Martin Riedmiller
Reducing policy degradation in neuro-dynamic programming
Thomas Gabel, Martin Riedmiller
Abstract:
We focus on neuro-dynamic programming methods to learn state-action value functions and outline some of the inherent problems to be faced, when performing reinforcement learning in combination with function approximation. In an attempt to overcome some of these problems, we develop a reinforcement learning method that monitors the learning process, enables the learner to reflect whether it is better to cease learning, and thus obtains more stable learning results.
We focus on neuro-dynamic programming methods to learn state-action value functions and outline some of the inherent problems to be faced, when performing reinforcement learning in combination with function approximation. In an attempt to overcome some of these problems, we develop a reinforcement learning method that monitors the learning process, enables the learner to reflect whether it is better to cease learning, and thus obtains more stable learning results.
ES2006-122
Probabilistic classifiers and time-scale representations: application to the monitoring of a tramway guiding system
Zahra HAMOU MAMAR, Pierre Chainais, Alexandre Aussem
Probabilistic classifiers and time-scale representations: application to the monitoring of a tramway guiding system
Zahra HAMOU MAMAR, Pierre Chainais, Alexandre Aussem
Abstract:
We discuss a new diagnosis system combining wavelet analysis techniques and probabilistic classifiers for detecting tramway rollers defects. A continuous wavelet transform is applied on the vibration signals measured by specific accelerometers located on the rail. A temporal segmentation of the signals is carried out in order to identify the contribution of each pair of rollers to the overall vibration signal. The singular values decomposition (SVD) method is applied to segments of the time-scale representation to extract the most significative features. The resulting multi-class problem is then solved using pairwise classifiers trained on two-class sub-problems. The efficiency of this approach is successfully illustrated on several experiments on the tramway.
We discuss a new diagnosis system combining wavelet analysis techniques and probabilistic classifiers for detecting tramway rollers defects. A continuous wavelet transform is applied on the vibration signals measured by specific accelerometers located on the rail. A temporal segmentation of the signals is carried out in order to identify the contribution of each pair of rollers to the overall vibration signal. The singular values decomposition (SVD) method is applied to segments of the time-scale representation to extract the most significative features. The resulting multi-class problem is then solved using pairwise classifiers trained on two-class sub-problems. The efficiency of this approach is successfully illustrated on several experiments on the tramway.
ES2006-146
Pattern analysis in illicit heroin seizures: a novel application of machine learning algorithms
Frédéric Ratle, Anne-Laure Terrettaz, Mikhaïl Kanevski, Pierre Esseiva, Olivier Ribaux
Pattern analysis in illicit heroin seizures: a novel application of machine learning algorithms
Frédéric Ratle, Anne-Laure Terrettaz, Mikhaïl Kanevski, Pierre Esseiva, Olivier Ribaux
Abstract:
An application of machine learning algorithms to the clustering and classification of chemical data concerning heroin seizures is presented. The data concerns the chemical constituents of heroin as given by a gas chromatography analysis. Following a preprocessing step, where the six initial constituents are reduced to only two significant features, the data are clustered in order to find natural classes which we have supposed to correspond to the country of origin. A classification is then made using a multi-layer perceptron, a probabilistic neural network, a radial basis function network and the k-nearest neighbors method. Results are encouraging and add important information to previous work in the field.
An application of machine learning algorithms to the clustering and classification of chemical data concerning heroin seizures is presented. The data concerns the chemical constituents of heroin as given by a gas chromatography analysis. Following a preprocessing step, where the six initial constituents are reduced to only two significant features, the data are clustered in order to find natural classes which we have supposed to correspond to the country of origin. A classification is then made using a multi-layer perceptron, a probabilistic neural network, a radial basis function network and the k-nearest neighbors method. Results are encouraging and add important information to previous work in the field.