ESANN 2011 - proceedings | ESANN 2025

19^th European Symposium on Artificial Neural Networks
Bruges, Belgium, April 27-28-29

Content of the proceedings

Information theory related learning

Self-organizing maps and recurrent networks

Semi-supervised learning

Computational Intelligence in Life Sciences

Seeing is believing: The importance of visualization in real-world machine learning applications

Learning theory

Feature selection and dimensionality reduction

Learning of causal relations

Sequence and time processing

Optimization and learning

Information theory related learning

ES2011-7
Information theory related learning
Thomas Villmann, Jose C. Principe, Andrej Cichocki

Abstract:
This is the introduction paper to a special session held on ESANN conference 2011. It reviews and highlights recent developments and new direction in information related learning, which is a fastly developing research area. These algorithms are based on the fundamental principles of information theory and relate them implicitly or explicitly to learning algoithms and strategies.

Download manuscript

ES2011-100
Optimization of Parametrized Divergences in Fuzzy c-Means
Tina Geweniger, Marika Kaestner, Thomas Villmann

Abstract:
We propose the utilization of divergences as dissimilarity measure in the Fuzzy c-Means algorithm for the clustering of functional data. Further we adapt the relevance parameter to improve the data representation and therefore obtain more accurate clusterings in terms of separation and compactness. We show for two example applications that this method leads to improved performance.

Download manuscript

ES2011-66
Multivariate class labeling in Robust Soft LVQ
Petra Schneider, Tina Geweniger, Frank-Michael Schleif, Michael Biehl, Thomas Villmann

Abstract:
We introduce a generalization of Robust Soft Learning Vector Quantization (RSLVQ). This algorithm for nearest prototype classification is derived from an explicit cost function and follows the dynamics of a stochastic gradient ascent. We generalize the RSLVQ cost function with respect to vectorial class labels: Probabilistic LVQ (PLVQ) allows to realize multivariate class memberships for prototypes and training samples, and the prototype labels can be learned from the data during training. We present experiments to demonstrate the new algorithm in practice.

Download manuscript

ES2011-11
Statistical dependence measure for feature selection in microarray datasets
Veronica Bolon-Canedo, Sohan Seth, Noelia Sánchez-Maroño, Amparo Alonso-Betanzos, Jose C. Principe

Abstract:
Feature selection is the domain of machine learning which studies data-driven methods to select, among a set of input variables, the ones that will lead to the most accurate predictive model. In this paper, a statistical dependence measure is presented for variable selection in the context of classification. Its performance is tested over DNA microarray data, a challenging dataset for machine learning researchers due to the high number of genes and relatively small number of measurements. This measure is compared against the so called mRMR approach, and is shown to obtain better or equal performance over the binary datasets.

Download manuscript

ES2011-72
Mathematical Foundations of the Self Organized Neighbor Embedding (SONE) for Dimension Reduction and Visualization
Kerstin Bunte, Frank-Michael Schleif, Sven Haase, Thomas Villmann

Abstract:
In this paper we propose the generalization of the recently introduced Neighbor Embedding Exploratory Observation Machine (NE-XOM) for dimension reduction and visualization. We provide a general mathematical framework called Self Organized Neighbor Embedding (SONE). It treats the components, like data similarity measures and neighborhood functions, independently and easily changeable. And it enables the utilization of different divergences, based on the theory of Fr\'echet derivatives. In this way we propose a new dimension reduction and visualization algorithm, which can be easily adapted to the user specific request and the actual problem.

Download manuscript

Self-organizing maps and recurrent networks

ES2011-71
Sparsity Issues in Self-Organizing-Maps for Structures
Markus Hagenbuchner, Giovanni Da San Martino, Ah Chung Tsoi, Alessandro Sperduti

Abstract:
Recent developments with Self-Organizing Maps (SOMs) produced methods capable of clustering graph structured data onto a fixed dimensional display space. These methods have been applied successfully to a number of benchmark problems and produced state-of-the-art results. This paper discusses a limitation of the most powerful version of these SOMs, known as probability measure graph SOMs (PMGraphSOMs), viz., the sparsity induced by processing a large number of small graphs, which prevents a successful application of PMGraphSOM to such problems. An approach using the idea of compactifying the generated state space to address this sparsity problem is proposed. An application to an established benchmark problem, viz., the Mutag dataset in toxicology will show that the proposed method is effective when dealing with a large number of small graphs. Hence, this work fills a gap between the processing of a number of small graphs, and the processing of densely connected graphs using PMGraphSOMs.

Download manuscript

ES2011-26
Multi-Goal Path Planning Using Self-Organizing Map with Navigation Functions
Jan Faigl, Jan Mačák

Abstract:
In this paper, we present a combination of Self-Organizing Map (SOM) approach and navigation functions in the Traveling Salesman Problem (TSP) with segment goals. The problem arises from the inspection planning, where a path from which all points of the given polygonal environment have to be ``seen''. This type of problem belongs to the TSP with neighborhoods family. Moreover, the problem has to also deal with paths among obstacles, thus the problem is not the Euclidean TSP. The proposed approach demonstrates applicability of SOM principles in such problems in which SOM has not yet been applied.

Download manuscript

ES2011-40
Statistical properties of the `Hopfield estimator' of dynamical systems
Miguel Atencia, Gonzalo Joya

Abstract:
This paper analyses the statistical properties of a method for estimating the parameters of systems defined by ordinary differential equations. In previous work, this estimation method was defined as an adapted version of Hopfield neural networks, and its convergence and robustness with respect to signal disturbances were proved, even when parameters are time-varying. This contribution aims at establishing the behaviour of the estimation error by performing a set of simulations where a random noise with known probability distribution is added to signals. It is shown that, asymptotically, the estimator is unbiased and its variance vanishes. Further theoretical work is being undertaken in order to rigourously support these empirical findings.

Download manuscript

ES2011-31
Negatively Correlated Echo State Networks
Ali Rodan, Peter Tino

Abstract:
Echo State Network (ESN) is a special type of recurrent neural network with fixed random recurrent part (reservoir) and a trainable reservoir-to-output readout mapping (typically obtained by linear regression). In this work we utilise an ensemble of ESNs with diverse reservoirs whose collective read-out is obtained through Negative Correlation Learning (NCL) of ensemble of Multi-Layer Perceptrons (MLP), where each individual MPL realises the readout from a single ESN. Experimental results on three data sets confirm that, compared with both single ESN and flat ensembles of ESNs, NCL based ESN ensembles achieve better generalisation performance.

Download manuscript

ES2011-48
Reservoir regularization stabilizes learning of Echo State Networks with output feedback
R. Felix Reinhart, Jochen J. Steil

Abstract:
Output feedback is crucial for autonomous and parameterized pattern generation with reservoir networks. Read-out learning can lead to error amplification in these settings and therefore regularization is important for both generalization and reduction of error amplification. We show that regularization of the inner reservoir network mitigates parameter dependencies and boosts the task-specific performance.

Download manuscript

Semi-supervised learning

ES2011-57
A Multi-kernel Framework for Inductive Semi-supervised Learning
Xilan TIAN, Gilles Gasso, Stéphane Canu

Abstract:
We investigate the benefit of combining both cluster assumption and manifold assumption underlying most of the semi-supervised algorithms using the flexibility and the efficiency of multi-kernel learning. The multiple kernel version of Transductive SVM (a cluster assumption based approach) is proposed and it is solved based on DC (Difference of Convex functions) programming. Experiments on the benchmark data sets indicate the promising results of the proposed work.

Download manuscript

ES2011-80
Training of multiple classifier systems utilizing partially labeled sequential data sets
Martin Schels, Patrick Schillinger, Friedhelm Schwenker

Abstract:
Making use of unlabeled data samples for training a classifier is a desirable aim for many real world applications in pattern recognition. In this study, a multiple classifier system is utilized to investigate this matter. Further, cluster analysis is used in order to group the available data while neglecting the actual labels. Then, by implementing an information fusion architecture based on these clusters, a classification architecture is constructed. This kind of an architecture is investigated by means of a facial expression data collection with focusing on one-against-one class decisions to produce locally “unlabeled”, i.e. not assigned to one of the considered classes, data.

Download manuscript

Computational Intelligence in Life Sciences

ES2011-5
Recent trends in computational intelligence in life sciences
Udo Seiffert, Frank-Michael Schleif, Dietlind Zühlke

Abstract:
Abstract not available

Download manuscript

ES2011-36
Adaptive Kernel Smoothing Regression for Spatio-Temporal Environmental Datasets
Federico Montesino Pouzols, Amaury Lendasse

Abstract:
This paper describes a method for performing kernel smoothing regression in an incremental, adaptive manner. A simple and fast combination of incremental vector quantization with kernel smoothing regression using adaptive bandwidth is shown to be effective for online regression modeling of environmental datasets. The method is illustrated on openly available datasets corresponding to the Tropical Atmosphere Ocean array and the Helsinki Commission hydrographical database for the Baltic Sea.

Download manuscript

ES2011-18
Generalized functional relevance learning vector quantization
Marika Kaestner, Barbara Hammer, Michael Biehl, Thomas Villmann

Abstract:
Generalized learning vector quantization (GRLVQ) is a prototype based classification algorithm with metric adaptation weighting each data dimensions according to their relevance for the classification task. We present in this paper an extension for functional data, which are usually very high dimensional. This approach supposes the data vectors have to be functional representations. Taking into account, these information the so-called relevance profile are modeled by superposition of simple basis functions depending on only a few parameters. As a consequence, the resulting functional GRLVQ has drastically reduced number of parameters to be adapted for relevance learning. We demonstrate the ability of the new algorithms for standard functional data sets using different basis functions, namely Gaussians and Lorentzians.

Download manuscript

ES2011-89
Patch Affinity Propagation
Xibin Zhu, Barbara Hammer

Abstract:
Affinity propagation constitutes an exemplar based clustering technique which reliably optimizes the quantization error given a matrix of pairwise data dissimilarities by means of the max-sum algorithm for factor graphs. Albeit very efficient for sparse matrices, it displays squared complexity in the worst case, hence it is not suited as high throughput method due to time and memory constraints. We propose an extension of affinity propagation to patch clustering such that data are treated in chunks of fixed size with limited memory requirements and linear time. We test the suitability of the approach for two biomedical applications.

Download manuscript

ES2011-20
Multispectral image characterization by partial generalized covariance
Marc Strickert, Björn Labitzke, Andreas Kolb, Thomas Villmann

Abstract:
A general method is presented for the assessment of data attribute variability, which plays an important role in initial screening of multi- and high-dimensional data sets. Instead of the commonly used second centralized moment, known as variance, the proposed method allows a mathematically rigorous characterization of attribute sensitivity given not only Euclidean distances but partial data comparisons by general similarity measures. Depending on the choice of measure different spectral features get highlighted by attribute assessment, this way creating new image segmentation aspects, as shown in a comparison of Euclidean distance, Pearson correlation and gamma divergence applied to multi-spectral images.

Download manuscript

Learning I

ES2011-14
Increased robustness and intermittent dynamics in structured Reservoir Networks with feedback
Sarah Jarvis, Stefan Rotter, Ulrich Egert

Abstract:
Recent studies using feedforward Echo State Networks (ESN) demonstrate that reservoir stability can be strongly affected by reservoir substructures, such as clusters. Here, we evaluate the impact of including feedback on clustered ESNs and assert that certain cluster configurations extend the permissible range of spectral radius values. We also report a new class of reservoir activity: intermittent dynamics, characterized by variable periods of chaotic activity before returning to quiescent behaviour. Using a non-linear benchmark data set, we establish clustered ESNs have comparable performance against conventional ESNs, but importantly display an increased tolerance and robustness to spectral radius and input choice.

Download manuscript

ES2011-69
Anticipating Rewards in Continuous Time and Space with Echo State Networks and Actor-Critic Design
Mohamed Oubbati

Abstract:
In this paper we implement echo state networks (ESNs) within the concept of actor-critic design to obtain optimal control policy for a mobile robot. The robot is asked to anticipate future rewards/punishments and react accordingly. In this design, the ESN critic has to consider continuous state/action space, deal with uncertainties, and be computationally cheap in order to deal with real-world constraints. Benefits and problems involved in a obstacle avoidance scenario are discussed, supported by real-world experiments.

Download manuscript

ES2011-60
Application of stochastic recurrent reinforcement learning to index trading
Denise Gorse

Abstract:
A novel stochastic adaptation of the recurrent reinforcement learning (RRL) methodology is applied to daily, weekly, and monthly stock index data, and compared to results obtained elsewhere using genetic programming (GP). The data sets used have been a considered a challenging test for algorithmic trading. It is demonstrated that RRL can reliably outperform buy-and-hold for the higher frequency data, in contrast to GP which performed best for monthly data.

Download manuscript

ES2011-110
A brief tutorial on reinforcement learning: The game of Chung Toi
Christopher Gatti, Jonathan Linton, Mark Embrechts

Abstract:
This work presents a simple implementation of reinforcement learning, using the temporal difference algorithm and a neural network, applied to the board game of Chung Toi, which is a challenging variation of Tic-Tac-Toe. The implementation of this learning algorithm is fully described and includes all parameter settings and various techniques to improve the ability of the network to learn the board game. With relatively little training, the network was able to win nearly 90% of games played against a 'smart' random opponent.

Download manuscript

ES2011-62
D-VisionDraughts: a draughts player neural network that learns by reinforcement in a high performance environment
Araujo Barcelos Ayres Roberto, Silva Julia Rita Maria, Matias Junior Rivalino

Abstract:
This paper describes D-VisionDraughts, a distributed player agent for draughts which is based on Neural Networks trained by Temporal Differences. D-VisionDraughs is trained in a high performance environment and achieves a high level of play without expert game analysis and with minimum human intervention. D-VisionDraughts corresponds to a distributed version of the efficient agent player VisionDraughts. In this way, the main contributions of this paper consist on substituting the distributed Young Brothers Wait Concept algorithm (YBWC) for the serial alpha-beta search algorithm used in VisionDraughts and on measuring the impact of a high performance environment into the non supervised learning abilities of the player. Evaluative tests proved that even a modest distributed version counting on just 10 processors is able to reduce from 68% the search runtime and to increase from 15% its capacity of winning

Download manuscript

ES2011-58
SO-VAT: Self-Organizing Visual Assessment of cluster Tendency for large data sets
Enrique Pelayo, Carlos Orrite, David Buldain

Abstract:
A new method, Self-Organizing Visual Assessment of cluster Tendency (SO-VAT), is given for visually assessing the cluster tendency in large data sets. It is based on training a SOM with the input samples, and then calculating the VAT image from a selected group of the generated neurons, selection that is done according to a certain density of activation. Tests with synthetic and real examples demonstrate that the new SO-VAT algorithm results in clearer images and shorter computing time than applying directly the VAT procedure to the whole input-data set.

Download manuscript

ES2011-104
New conditioning model for robots
Jean Marc Salotti

Abstract:
We present a neural network for the prediction of rewards in a conditioning model. It is based on two noisy-or and one noisy-and nodes and update rules inspired from BANNER technique. Interesting results are presented with application to robotics.

Download manuscript

ES2011-94
Stability of Neural Network Control for Uncertain Sampled-Data Systems
Pornchai Khlaeo-om, Sasikanchana Yenaeng, Sunya Pasuk, Supachai Aroonpun, Sompun Aumpawan

Abstract:
This paper derives robust stability conditions for neural network control of sampled-data systems whose parameters are uncertain. The controllers are nonlinear, full state regulators implemented as single hidden layer, feedforward neural networks. The controlled systems must be locally controllable and full-state accessible. The robust stability is confirmed by the existence of a Lyapunov function of the closed loop systems. A modified backpropagation algorithm with a model reference technique is employed to determine the weights of the controllers. Simulation results on the classical motor-driven inverted pendulum model are presented to demonstrate the applications of these conditions.

Download manuscript

ES2011-55
Thresholds tuning of a neuro-symbolic net controlling a behavior-based robotic system
Mariacarla Staffa, Silvia Rossi, Massimo De Gregorio, Ernesto Burattini

Abstract:
In this paper we present the results obtained by adopting an evolutionary approach to tune some critical neuron thresholds of a neuro-symbolic net that regulates the overall emergent behavior of a behavior-based robotic system.

Download manuscript

ES2011-101
Ensemble Usage for More Reliable Policy Identification in Reinforcement Learning
Alexander Hans, Steffen Udluft

Abstract:
Reinforcement learning (RL) methods employing powerful function approximators like neural networks have become an interesting approach for optimal control. Since they learn a policy from observations, they are also applicable when no analytical description of the system is available. Although impressive results have been reported, their handling in practice is still hard, as they can fail at reliably determining a good policy. In previous work, we used ensembles of policies from independent runs of neural fitted Q-iteration (NFQ) to produce successful policies more reliable. In this paper we evaluate the approach on more problems and propose to form ensembles from successive iterations of a single NFQ run as a computationally cheap alternative to completely independent runs.

Download manuscript

ES2011-16
Fisherman learning algorithm of the SOM realized in the CMOS technology
Rafal Dlugosz, Marta Kolasa, Witold Pedrycz

Abstract:
This study presents an idea of transistor level realization of the fisherman learning algorithm of Self-Organizing Maps (SOMs). The realization of this algorithm in hardware calls for a soluion of several specific problems not present in software implementation. The main problem is related to an iterative nature of the adaptation process of the neighboring neurons positioned at particular rings surrounding the winning neuron. This makes the circuit structure of the SOM very complex. To come up with a feasible realization, we introduce some modifications to the original fisherman algorithm. Detailed simulations of the software model of the SOM show that these modifications do not have the negative impact on the learning process, and helps bring significant reduction of the circuit complexity. In consequence, a fully parallel adaptation of all neurons is possible, which makes the SOM very fast.

Download manuscript

ES2011-12
Abstract category learning
Atsushi Hashimoto, Haruo Hosoya

Abstract:
Motivated by a neurophysiological experiment on prefrontal cortex, we study a scheme for learning abstract categories. An abstract category represents a set of vectors that are identical to each other modulo substitution, e.g., 'ABAB', 'BABA', 'ACAC', etc. We employ a clustering-based unsupervised learning method for such abstract categories, in which the recognition step is reduced to the problem of maximal perfect weight matching. Our simulations using artificial inputs confirm that the scheme learns abstract categories robustly even with a certain level of noise in the inputs.

Download manuscript

ES2011-97
Symbolic computing of LS-SVM based models
Siamak Mehrkanoon, Li Jiang, Carlos Alzate, J.A.K. Suykens

Abstract:
This paper introduces a software tool emph{SYM-LS-SVM-SOLVER} written in emph{Maple} to derive the dual system and the dual model representation of LS-SVM based models, symbolically. emph{SYM-LS-SVM-SOLVER} constructs the Lagrangian from the given objective function and list of constraints. Afterwards it obtains the KKT (Karush-Kuhn-Tucker) optimality conditions and finally formulates a linear system in terms of the dual variables. The effectiveness of the developed solver is illustrated by applying it to a variety of problems involving LS-SVM based models.

Download manuscript

ES2011-43
Sparse LS-SVMs with L0–norm minimization
J. López, K. De Brabanter, J.R. Dorronsoro, J.A.K. Suykens

Abstract:
Least-Squares Support Vector Machines (LS-SVMs) have been successfully applied in many classification and regression tasks. Their main drawback is the lack of sparseness of the final models. Thus, a procedure to sparsify LS-SVMs is a frequent desideratum. In this paper, we adapt to the LS-SVM case a recent work for sparsifying classical SVM classifiers, which is based on an iterative approximation to the L0-norm. Experiments on real-world classification and regression datasets illustrate that this adaptation achieves very sparse models, without significant loss of accuracy compared to standard LS-SVMs or SVMs.

Download manuscript

ES2011-29
A post-processing strategy for SVM learning from unbalanced data
Haydemar Núñez, Luis Gonzalez-Abril, Cecilio Angulo

Abstract:
Standard learning algorithms may perform poorly when learning from unbalanced datasets. Based on the Fisher's discriminant analysis, a post-processing strategy is introduced to deal datasets with significant imbalance in the data distribution. Empirical results from experiments for a learned SVM model on twelve UCI datasets indicates that the proposed solution improves the original SVM, and they also improve those reported when using a z-SVM, in terms of g-mean and sensitivity.

Download manuscript

ES2011-87
Clustering data streams with weightless neural networks
Douglas de O. Cardoso, Priscila M. V. Lima Lima, Massimo De Gregorio, João Gama, Felipe M. G. França

Abstract:
Producing good quality clustering of data streams in real time is a difficult problem, since it is necessary to perform the analysis of data points arriving in a continuous style, with the support of quite limited computational resources. The incremental and evolving nature of the resulting clustering structures must reflect the dynamics of the target data stream. The WiSARD weightless perceptron, and its associated DRASiW extension, are intrinsically capable of, respectively, performing one-shot learning and producing prototypes of the learnt categories. This work introduces a simple generalization of RAM-based neurons in order to explore both weightless neural models in the data stream clustering problem.

Download manuscript

ES2011-49
Locating Anomalies Using Bayesian Factorizations and Masks
Li Yao, Amaury Lendasse, Francesco Corona

Abstract:
A plethora of methods have been developed to handle anomaly detection in various application domains. This work focuses on locating anomalies inside a categorical data set without assuming any specific domain knowledge. By exploiting the conditional dependence and independence relationships among data attributes, not only can data analysts recognize the anomaly, but also locate the potentially anomalous attributes inside an anomalous instance following its masks. Masks are geometrically generated based on the factorization of the joint probability from a Bayesian network automatically learnt from the given data set.

Download manuscript

ES2011-42
Comparison of the Complex Valued and Real Valued Neural Networks Trained with Gradient Descent and Random Search Algorithms
Hans-Georg Zimmermann, Alexey Minin, Viktoria Kusherbaeva

Abstract:
Complex Valued Neural Network is one of the open topics in the machine learning society. In this paper we will try to get through the problems of the complex valued neural networks. The outcome of the current research is the combined global-local algorithm for training the complex valued feed forward neural network which is appropriate for the considered chaotic problem.

Download manuscript

Seeing is believing: The importance of visualization in real-world machine learning applications

ES2011-3
Seeing is believing: The importance of visualization in real-world machine learning applications
Alfredo Vellido, José D Martín, Fabrice Rossi, Paulo Lisboa

Abstract:
The increasing availability of data sets with a huge amount of information, coded in many different features, justifies the research on new methods of knowledge extraction: the great challenge is the translation of the raw data into useful information that can be used to improve decision-making processes, detect relevant profiles, find out relationships among features, etc. It is undoubtedly true that a picture is worth a thousand words, what makes visualization methods be likely the most appealing and one of the most relevant kinds of knowledge extration methods. At ESANN 2011, the special session ``Seeing is believing: The importance of visualization in real-world machine learning applications'' reflects some of the main emerging topics in the field. This tutorial prefaces the session, summarizing some of its contributions, while also providing some clues to the current state and the near future of visualization methods within the framework of Machine Learning.

Download manuscript

ES2011-24
Hierarchical clustering for graph visualization
Stéphan Clémençon, Hector De Arazoza, Fabrice Rossi, Viet Chi Tran

Abstract:
This paper describes a graph visualization methodology based on hierarchical maximal modularity clustering, with interactive and significant coarsening and refining possibilities. An application of this method to HIV epidemic analysis in Cuba is outlined.

Download manuscript

ES2011-44
A probabilistic approach to the visual exploration of G Protein-Coupled Receptor sequences
Alfredo Vellido, Martha Ivón Cárdenas, Iván Olier, Xavier Rovira, Jesús Giraldo

Abstract:
The study of G protein-coupled receptors (GPCRs) is of great interest in pharmaceutical research, but only a few of their 3D structures are known at present. On the contrary, their amino acid sequences are known and accessible. Sequence analysis can provide new insight on GPCR function. Here, we use a kernel-based statistical machine learning model for the visual exploration of GPCR functional groups from their sequences. This is based on the rich information provided by the model regarding the probability of each sequence belonging to a certain receptor group.

Download manuscript

ES2011-74
Growing Hierarchical Sectors on Sectors
José M Martínez, Pablo Escandell, Emilio Soria, José D Martín, Juan Gómez, Joan Vila

Abstract:
Self-organizing maps are widely used in visual data mining. This paper proposes a new visualization approach for GHSOM algorithm, a hierarchical variant of SOM. The method is based on pie charts. That improves the visualization in hierarchical data structures making possible to extract all the existing relationships among the attributes of the neurons at any hierarchy level. The methodology is tested in one synthetic data set and one real data set. Achieved results show the suitability and usefulness of the proposed approach.

Download manuscript

ES2011-56
Analysis of a Reinforcement Learning algorithm using Self-Organizing Maps
Vicente Buendia-Ramon, Emilio Soria, José D Martín, Pablo Escandell, José M Martínez

Abstract:
The scenario of this work is defined by the need of many Machine Learning algorithms to tune a number of parameters that define its behavior; the resulting performance can be evaluated with different indices. The relationship between parameters and performance may be neither linear nor straightforward. This work proposes a qualitative approach to the afore-mentioned relationship by using Self-Organizing Maps due to their visual information processing. The approach is evaluated in the framework of Reinforcement Learning algorithms.

Download manuscript

Learning theory

ES2011-15
General bound of overfitting for MLP regression models
joseph Rynkiewicz

Abstract:
Multilayer perceptrons (MLP) with one hidden layer have been used for a long time to deal with non-linear regression. However, in some task, MLP's are too powerful models and a small mean square error (MSE) may be more due to overfitting than to actual modelling. If the noise of the regression model is Gaussian, the overfitting of the model is totally determined by the behavior of the likelihood ratio test statistic (LRTS), however in numerous cases the assumption of normality of the noise is arbitrary if not false. In this paper, we present an universal bound for the overfitting of such model under smooth assumptions, this bound is valid without Gaussian or identifiability assumptions. The main application of this bound is to give a hint about determining the true architecture of the MLP model when the number of data goes to infinite.

Download manuscript

ES2011-78
Maximal Discrepancy vs. Rademacher Complexity for error estimation
Davide Anguita, Alessandro Ghio, Luca Oneto, Sandro Ridella

Abstract:
The Maximal Discrepancy and the Rademacher Complexity are powerful statistical tools that can be exploited to obtain reliable, albeit not tight, upper bounds of the generalization error of a classifier. We study the different behavior of the two methods when applied to linear classifiers and suggest a practical procedure to tighten the bounds. The resulting generalization estimation can be succesfully used for classifier model selection.

Download manuscript

Feature selection and dimensionality reduction

ES2011-8
Class-Specific Feature Selection for One-Against-All Multiclass SVMs
Gaël de Lannoy, Damien Francois, Michel Verleysen

Abstract:
This paper proposes a method to perform class-specific feature selection in multiclass addressed solved with the one-against-all strategy. The main issue arises at the final step of the classification process, where binary classifier outputs must be compared one against another to elect the winning class. This comparison may be biased towards one specific class when the binary classifiers are built on distinct feature subsets. This paper proposes a normalization of the binary classifiers outputs that allows fair comparisons in such cases.

Download manuscript

ES2011-50
Mutual information for feature selection with missing data
Gauthier Doquire, Michel Verleysen

Abstract:
Feature selection is an important task for many machine learning applications; moreover missing data are encoutered very often in practice. This paper proposes to adapt a nearest neighbors based mutual information estimator to handle missing data and to use it to achieve feature selection. Results on articial and real world datasets show that the method is able to select important features without the need for any imputation algorithm. Moreover, experiments also indicate that selecting the features before imputing the data generally increases the precision of the prediction models.

Download manuscript

ES2011-19
Probabilistic Fisher discriminant analysis
Charles Bouveyron, Camille Brunet

Abstract:
Fisher Discriminant Analysis (FDA) is a popular method for dimensionality reduction and classification but has poor performances in the cases of label noise and sparse labeled data. To overcome these limitations, we propose a probabilistic framework for FDA and extend it to the semi-supervised case. Experiments on real-world datasets show that the proposed approach works as well as FDA in standard situations and outperform it in the label noise and sparse label cases.

Download manuscript

ES2011-73
Supervised dimension reduction mappings
Kerstin Bunte, Michael Biehl, Barbara Hammer

Abstract:
We propose a general principle to extend dimension reduction tools to explicit dimension reduction mappings and we show that this can serve as an interface to incorporate prior knowledge in the form of class labels. We explicitly demonstrate this technique by combining locally linear mappings which result from matrix learning vector quantization schemes with the t-distributed stochastic neighbor embedding cost function. The technique is tested on several benchmark data sets.

Download manuscript

Learning of causal relations

ES2011-2
Learning of causal relations
John A. Quinn, Joris Mooij, Tom Heskes, Michael Biehl

Abstract:
To learn about causal relations between variables just by observing samples from them, particular assumptions must be made about those variables' distributions. This article gives a practical description of how such a learning task can be undertaken based on different possible assumptions. Two categories of assumptions lead to different methods, constraint-based and Bayesian learning, and in each case we review both the basic ideas and some recent extensions and alternatives to them.

Download manuscript

ES2011-93
Inferring the causal decomposition under the presence of deterministic relations
Jan Lemeire, Stijn Meganck, Francesco Cartella, Tingting Liu, Alexander Statnikov

Abstract:
The presence of deterministic relations pose problems for current algorithms that learn the causal structure of a system based on the observed conditional independencies. Deterministic variables lead to information equivalences; two sets of variables have the same information about a third variable. Based on information content, one cannot decide on the direct causes. Several edges model equally well the dependencies. We call them equivalent edges. We propose to select among the equivalent edges the one with the simplest descriptive complexity. This approach assumes that the descriptive complexity increases along a causal path. As confirmed by our experimental results, the accuracy of the method depends on the chance of accidental matches of complexities.

Download manuscript

ES2011-106
A unified approach to estimation and control of the False Discovery Rate in Bayesian network skeleton identification
Angelos P. Armen, Ioannis Tsamardinos

Abstract:
Constraint-based Bayesian network (BN) structure learning algorithms typically control the False Positive Rate (FPR) of their skeleton identification phase. The False Discovery Rate (FDR), however, may be of greater interest and methods for its utilization by these algorithms have been recently devised. We present a unified approach to BN skeleton identification FDR estimation and control and experimentally evaluate the performance of FDR estimators in both tasks over several networks. We demonstrate that estimation is too conservative for most networks and strong control at common FDR thresholds is not achieved with some networks; finally, we identify the possible causes of this situation.

Download manuscript

ES2011-82
A structure independent algorithm for causal discovery
Tom Claassen, Tom Heskes

Abstract:
We present two inference rules, based on so called minimal conditional independencies, that are sufficient to find all invariant arrowheads in a single causal DAG, even when selection bias may be present. It turns out that the set of seven graphical orientation rules that are usually employed to identify these arrowheads are, in fact, just different instances/manifestations of these two rules. The resulting algorithm to obtain all definite causal information is elegant and fast, once the (often surprisingly small) set of minimal independencies is found.

Download manuscript

ES2011-84
Causal relevance learning for robust classification under interventions
Ernest Mwebaze, John A. Quinn, Michael Biehl

Abstract:
In some classification problems the distribution of the test data is different from that of the training data because of external manipulations to the variables we observe. We propose a classification scheme which is robust to outside interventions by identifying causes in the training data, given that causes of a target variable remain predictive even when the data is manipulated. We do this be extending Relevance Learning Vector Quantization (RLVQ), a classification scheme that learns a relevance profile for the classification task presented. Our proposed algorithm, Causal-RLVQ, learns a relevance profile that weights causally relevant features more strongly. The algorithm can determine a tradeoff between robustness to intervention and accuracy on non-manipulated data, yielding RLVQ as a special case.

Download manuscript

ES2011-76
A constraint-based approach to incorporate prior knowledge in causal models
Giorgos Borboudakis, Sofia Triantafillou, Vincenzo Lagani, Ioannis Tsamardinos

Abstract:
In this paper we address the problem of incorporating prior knowledge, in the form of causal relations, in causal models. We use the framework of Maximal Ancestral Graphs and use a constraint-based approach, allowing the incorporation of prior knowledge in a more flexible manner compared to previous approaches. Finally, we discuss the limitation of the used graphical models to capture causal knowledge, as well as open questions regarding this problem.

Download manuscript

Learning II

ES2011-23
Selecting from an infinite set of features in SVM
Remi Flamary, Florian Yger, Alain Rakotomamonjy

Abstract:
Dealing with the continuous parameters of a feature extraction method has always been a difficult task that is usually solved by cross-validation. In this paper, we propose an active set algorithm for selecting automatically these parameters in a SVM classification context. Our experiments on texture recognition and BCI signal classification show that optimizing the feature parameters in a continuous space while learning the decision function yields to better performances than using fixed parameters obtained from a grid sampling.

Download manuscript

ES2011-51
Mutual information based feature selection for mixed data
Gauthier Doquire, Michel Verleysen

Abstract:
The problem of feature selection is crucial for many applications and has thus been studied extensively. However, most of the existing methods are designed to handle data consisting only in categorical or in real-valued features while a mix of both kinds of features is often encoutered in practice. This paper proposes an approach based on mutual information and the maximal Relevance minimal Redundancy principle to handle the case of mixed data. It combines aspects of both wrapper and filter methods and is well suited for regression problems. Experiments on artificial and real-world datasets show the interest of the methodology.

Download manuscript

ES2011-91
Unsupervised feature selection for sparse data
Artur Ferreira, Mario Figueiredo

Abstract:
Feature selection is a well-known problem in machine learning and pattern recognition. Many high-dimensional datasets are sparse, that is, many features have zero value. In some cases, we do not known the class label for some (or even all) patterns in the dataset, leading us to semi-supervised or unsupervised learning problems. For instance, in text classification with the bag-of-words (BoW) representations, there is usually a large number of features, many of which may be irrelevant (or even detrimental) for categorization tasks. In this paper, we propose one efficient unsupervised feature selection technique for sparse data, suitable for both standard floating point and binary features. The experimental results on standard datasets show that the proposed method yields efficient feature selection, reducing the number of features while simultaneously improving the classification accuracy.

Download manuscript

ES2011-53
Multi-class classification in the presence of labelling errors
Jakramate Bootkrajang, Ata Kaban

Abstract:
Learning a classifier from a training set that contains labelling errors is a difficult, yet not very well studied problem. Here we present a model-based approach that extends multi-class quadratic normal discriminant analysis with a model of the mislabelling process. We demonstrate the benefits of this approach in terms of parameter recovery as well as improved classification performance, on both synthetic and real-world multi-class problems. We also obtain enhanced accuracy in comparison with a previous model-free approach.

Download manuscript

ES2011-28
Principal component analysis for unsupervised calibration of bio-inspired airflow array sensors
Michiel Van Dyck, Herbert Peremans

Abstract:
This paper describes the automatic calibration of a set of air flow sensitive sensors on a robot exposed to unknown random air flow stimuli. This might support the idea that the cricket cercus neural system in the terminal abdominal ganglion is evolved by learning. The algorithm makes use of the singular value decomposition (SVD) and the known reduced model dimension of the system for learning the sensor array setup. The absolute orientation of the array can only be found in function of a reference flow or reference sensor which must be calibrated manually. When only a change in airflow measure is needed, the reference sensor can be left uncalibrated.

Download manuscript

ES2011-27
Effects of sparseness and randomness of pairwise distance matrix on t-SNE results
Eli Parviainen

Abstract:
We apply ideas from random graph theory to sparse pairwise distance matrices in dimension reduction. We use matrices with some short and some randomly chosen distances, and study effects of matrix sparseness and randomness on trustworthiness and continuity of t-SNE visualizations. The existing works have either concentrated on matrices with only short distances, or implemented heuristics with mixed distances without explaining the effects. We find that trustworthiness generally increases with randomness, but not without limit. Continuity is less affected, but drops if matrices become too random. Sparseness has little effect on continuity, but decreases trustworthiness. Decrease in quality appears sublinear, which suggests that sparse t-SNE could be made subquadratic in complexity without too much effect on quality.

Download manuscript

ES2011-38
Nearest neighbors and correlation dimension for dimensionality estimation. Application to factor analysis of real biological time series data.
Jérôme Lapuyade-Lahorgue, Ali Mohammad-Djafari

Abstract:
Determining the number of components in dimensionality reduction techniques is still one of the open problems of research on data analysis. These methods are often used in knowledge extraction of multivariate great dimensional data, but very often the number of components is assumed to be known. One of the classical methods to estimate this dimensionality is based on the Principal Components Analysis (PCA) eigenvalues. However, this method supposes that the model is linear and the signals are Gaussian. To be able to consider non-linear and non-Gaussian cases, we propose in this paper “measure based methods” as nearest neighbors dimension and correlation dimension. The comparaison between the three methods is evaluated both with simulated data and with real biological data, which are gene expression time series. The main goal of this study is to estimate the minimum number of factors.

Download manuscript

ES2011-61
A Similarity Function with Local Feature Weighting for Structured Data
Rubén Suárez, Rocío García-Durán, Fernando Fernández

Abstract:
The application of learning approaches as Kernel or Instance Based methods to tree structured data requires the definition of similarity functions able to deal with such data. A new similarity function the for nearest prototype classification in relational data that follows a tree structure is defined in this paper. Its main characteristic is its capability to weight the importance of the different data features in different areas of the feature space. This work is built over two previous ideas:a similarity function for Local Feature Weighting (LFW), and a Relational Nearest Prototype Classification algorithm (RNPC).

Download manuscript

ES2011-83
Exploiting vertices states in GraphESN by weighted nearest neighbor
Claudio Gallicchio, Alessio Micheli

Abstract:
Graph Echo State Networks (GraphESN) extend the Reservoir Computing approach to directly process graph structures. The reservoir is applied to every vertex of an input graph, realizing a contractive encoding process and resulting in a structured state isomorphic to the input. Whenever an unstructured output is required, a state mapping function maps the structured state into a fixed-size feature representation that feeds the linear readout. In this paper we propose an alternative approach, based on distance-weighted nearest neighbor, to realize a more flexible readout exploiting the state information computed for every vertex according to its individual relevance.

Download manuscript

ES2011-86
The role of Fisher information in primary data space for neighbourhood mapping
Héctor Ruiz, Ian Jarman, José D Martín, Paulo Lisboa

Abstract:
Clustering methods and nearest neighbour classifiers typically compute distances between data points as a measure of similarity, with nearby pairs of points considered more like each other than remote pairs. The distance measure of choice is often Euclidean, implicitly treating all directions in space as equally relevant. This paper reviews the application of Fisher information to derive a metric in primary data space. The aim is to provide a natural coordinate space to represent pairwise distances with respect to a probability distribution p(c|x), defined by an external label c, and use it to compute more informative distances.

Download manuscript

ES2011-95
Communication Challenges in Cloud K-means
Fabrice Rossi, Matthieu Durut

Abstract:
This paper studies how parallel machine learning algorithms can be implemented on top of Microsoft Windows Azure cloud computing platform. More specifically, the proposed algorithm focuses on communication mechanisms through the storage.

Download manuscript

ES2011-107
A Spectral Based Clustering Algorithm for Categorical Data with Maximum Modularity
Lazhar Labiod, Younès Bennani

Abstract:
In this paper we propose a spectral based clustering algorithm to maximize an extended Modularity measure for categorical data; first, we establish the connection with the Relational Analysis criterion. Second, the maximization of the extended modularity is shown as a trace maximization problem. A spectral based algorithm is then presented to search for the partitions maximizing the extended Modularity criterion. Experimental results indicate that the new algorithm is efficient and effective at finding a good clustering across a variety of real world data sets

Download manuscript

ES2011-41
Single-trial P300 detection with Kalman filtering and SVMs
Lucie Daubigney, Olivier Pietquin

Abstract:
Brain Computer Interfaces (BCI) are systems enabling humans to communicate with machines through signals generated by the brain. Several kinds of signals can be envisioned as well as means to measure them. In this paper we are particularly interested in even-related brain potentials (ERP) and especially visually-evoked potential signals (P300) measured with surface electroencephalograms (EEG). When the human is stimulated with visual inputs, the P300 signals arise about 300 ms after the visual stimulus has been received. Yet, the EEG signal is often very noisy which makes the P300 detection hard. It is customary to use an average of several trials to enhance the P300 signal and reduce the random noise but this results in a lower bit rate of the interface. In this contribution, we propose a novel approach to P300 detection using Kalman filtering and SVMs. Experiments show that this method is a promising step toward single-trial detection of P300.

Download manuscript

ES2011-35
Classifying mental states with machine learning algorithms using alpha activity decline
Carina Walter, Gabriele Cierniak, Peter Gerjets, Wolfgang Rosenstiel, Martin Bogdan

Abstract:
This publication aims at developing computer based learning environments adapting to learners’ individual cognitive condition. The adaptive mechanism, based on Brain-Computer-Interface (BCI) methodology, relays on electroencephalogram (EEG)-data to diagnose learners’ mental states. A first within-subjects study (10 students) was accomplished aiming at differentiating between states of learning and non-learning by means of EEG-data. Support-Vector-Machines classified characteristics in the EEG-signals for these two different stimuli on average as 74.55% correct. For individual students the percentage of correct classification reached 92.22%. The results indicate that continuous EEG-data combined with BCI methodology is a promising approach to measuring learners’ mental states online.

Download manuscript

ES2011-30
Approaches for Automatic Speaker Recognition in a Binaural Humanoid Context
Karim Youssef, Bastien Breteau, Sylvain Argentieri, Jean-Luc Zarader, Zefeng Wang

Abstract:
This paper presents two methods of Automatic Speaker Recognition (ASkR). ASkR has been largely studied in the last decades, but in most cases in mono-microphone or microphone array contexts. Our systems are placed in a binaural humanoid context where the signals captured by both ears of a humanoid robot will be exploited to perform the ASkR. Both methods use Mel-Frequency Cepstral Coding (MFCC), but one performs the classification with Predictive Neural Networks (PNN) and the other performs it with Gaussian Mixture Models (GMM). Tests are made on a database simulating the functioning of the human ears. They study the influence of noise, reverberations and speaker spatial position on the recognition rate.

Download manuscript

ES2011-37
Fast Data Mining with Sparse Chemical Graph Fingerprints by Estimating the Probability of Unique Patterns
Georg Hinselmann, Lars Rosenbaum, Andeas Jahn, Andreas Zell

Abstract:
The aim of this work is to introduce a modification of chemical graphs fingerprints for data mining. The algorithm reduces the number of features by taking the probability of producing an unique feature at a specific search depth into account. We observed the probability of generating a non-unique feature depending on a search parameter (which leads to a power-law growths of features) and modeled it by a sigmoid function. This function was integrated into a fingerprinting routine to reduce the features according to their probability. The predictive performance was convincing with a considerable speedup for the training of a linear support vector machine for sparse instances.

Download manuscript

ES2011-85
Automatic Enhancement of Correspondence Detection in an Object Tracking System
Denis Schulze, Sven Wachsmuth, Katharina J. Rohlfing

Abstract:
This paper proposes a strategy to automatically detect the correspondence between measurements of different sensors using object tracking. In addition the strategy includes the ability to learn new features to facilitate the correspondence computation for future measurements. Therefore first a correlation between objects of different modalities is computed using time synchronous changes of attribute values. Using statistical methods to determine the dependencies between changes of different attributes it is shown how a multi layer perceptron (MLP) can be used to enhance the correspondence detection in ambiguous situations.

Download manuscript

Sequence and time processing

ES2011-70
Time Experiencing by Robotic Agents
Michail Maniadakis, Marc Wittmann, Panos Trahanias

Abstract:
Biological organisms perceive and act in the world based on spatiotemporal experiences and interpretations. However, artificial agents consider mainly the spatial relationships that exist in the world, typically ignoring its temporal aspects. In an attempt to direct research interest towards the fundamental issue of time experiencing, the current work explores two temporally different versions of a robotic rule switching task. An evolutionary process is employed to design a neural network controller capable of accomplishing both versions of the task. The systematic exploration of neural network dynamics revealed a self-organized time perception capacity in the agent's cognitive system that significantly facilitates the accomplishment of tasks, through the modulation of the supplementary behavioural and cognitive processes.

Download manuscript

ES2011-45
Visual place recognition using Bayesian filtering with Markov chains
Mathieu Dubois, Hervé Guillaume, Emmanuelle Frenoux, Philippe Tarroux

Abstract:
We present a novel idea to use Bayesian filtering in the case of place recognition. More precisely, our system combines global image characterization, Learned Vector Quantization, Markov chains and Bayesian filtering. The goal is to integrate several images seen by a robot during exploration of the environment and the dependency between them. We present our system and the new Bayesian filtering algorithm. Our system has been evaluated on a standard database and shows promising results.

Download manuscript

ES2011-52
Iterative multi-task sequence labeling for predicting structural properties of proteins
Francis Maes, Julien Becker, Louis Wehenkel

Abstract:
Developing computational tools for predicting protein structural information given their amino acid sequence is of primary importance in protein science. Problems, such as the prediction of secondary structures, of solvent accessibility, or of disordered regions, can be expressed as sequence labeling problems and could be solved independently by existing machine learning based sequence labeling approaches. But, since these problems are closely related, we propose to rather approach them jointly in a multi-task approach. To this end, we introduce a new generic framework for iterative multi-task sequence labeling. We apply this - conceptually simple but quite effective - strategy to jointly solve a set of five protein annotation tasks. Our empirical results with two protein datasets show that the proposed strategy significantly outperforms the single-task approaches.

Download manuscript

ES2011-54
Identification of sparse spatio-temporal features in Evoked Response Potentials
Nisrine Jrad, Marco Congedo

Abstract:
Electroencephalographic Evoked Response Potentials (ERP)s exhibit distinct and individualized spatial and temporal characteristics. Identification of spatio-temporal features improves single-trial classification performance and allows a better understanding of the underlying physiology. This paper presents a method for analyzing the spatio-temporal characteristics associated with Error related Potentials (ErrP)s. First, a resampling procedure based on Global Field Power (GFP) extracts temporal features. Second, a spatially weighted SVM (sw-SVM) is proposed to learn a spatial filter optimizing the classification performance for each temporal feature. Third, the so obtained ensemble of sw-SVM classifiers are combined using a weighted combination of all sw-SVM outputs. Results indicate that inclusion of temporal features provides useful insight regarding the spatio-temporal characteristics of error potentials.

Download manuscript

ES2011-102
Hybrid HMM and HCRF model for sequence classification
Yann Soullard, thierry Artieres

Abstract:
We propose a hybrid model combining a generative model and a discriminative model for signal labelling and classification tasks, aiming at taking the best from each world. The idea is to focus the learning of the discriminative model on most likely state sequences as output by the generative model. This allows taking advantage of the usual increased accuracy of generative models on small training datasets and of discriminative models on large training datasets. We instantiate this framework with Hidden Markov Models and Hidden Conditional Random Fields. We validate our model on financial time series and on handwriting data.

Download manuscript

ES2011-105
A Neural Filter for Electrolocation in Weakly Electric Fish
DaeEun Kim

Abstract:
Weakly electric fish have the electric organ to generate the electric field and electrosensory mechanism to read the change of electric field with their electroreceptors. The electric organ produces a waveform of electric field as their electric organ discharge (EOD). Their active electrolocation system can detect distortion of the self-generated electric field, which is caused by a target object, and estimate the position of the target object. In this paper, we suggest a hypothesis that the periodic EOD signals can be involved to extract localization features from the noisy electrosensory signals and then provide a possible neural network to process noise-filtering to obtain the accurate information of a target position. The neural network has sinusoidal weights to process a time series of sensor readings for each electroreceptor and to obtain the original clean signal purely depending on object perturbation.

Download manuscript

Optimization and learning

ES2011-75
Non-linearly increasing resampling in racing algorithms
Verena Heidrich-Meisner, Christian Igel

Abstract:
Racing algorithms are iterative methods trying to find the best among several options with high probability. The quality of each option is a random variable and is estimated by its empirical mean and concentration bounds obtained from repeated sampling. In each iteration of a standard racing algorithm each promising option is reevaluated once before being statistically compared with its competitors. We argue that Hoeffding and empirical Bernstein races benefit from decoupling the racing iteration from the number of samples per option and illustrate this on an artificial benchmark problem.

Download manuscript

ES2011-13
A distributed learning algorithm based on two-layer artificial neural networks and genetic algorithms
Diego Peteiro-Barral, Bertha Guijarro-Berdiñas, Beatriz Pérez-Sánchez, Oscar Fontenla-Romero

Abstract:
In many real-world applications of machine learning, the amount of data is now beyond the capability of learning algorithms because they cannot process all available data in a reasonable time. Moreover, most large datasets are naturally distributed or they are being stored in a distributed manner. A promising line of research in order to deal with large and/or distributed data is distributed learning. We propose a new distributed learning algorithm based on two-layer artificial neural networks and genetic algorithms. The results obtained show that our method performs better than other distributed learning algorithms.

Download manuscript

Deep Learning

ES2011-4
An Introduction to Deep Learning
Ludovic Arnold, Sébastien Rebecchi, Sylvain Chevallier, Hélène Paugam-Moisy

Abstract:
The deep learning paradigm tackles problems on which shallow architectures (e.g. SVM) are affected by the curse of dimensionality. As part of a two-stage learning scheme involving multiple layers of nonlinear processing a set of statistically robust features is automatically extracted from the data. The present tutorial introducing the ESANN deep learning special session details the state-of-the-art models and summarizes the current understanding of this learning approach that is a reference for many difficult classification tasks.

Download manuscript

ES2011-10
Using very deep autoencoders for content-based image retrieval
Alex Krizhevsky, Geoffrey Hinton

Abstract:
We show how to learn many layers of features on color images and we use these features to initialize deep autoencoders. We then use the autoencoders to map images to short binary codes. Using semantic hashing, 28-bit codes can be used to retrieve images that are similar to a query image in a time that is independent of the size of the database. This extremely fast retrieval makes it possible to search using multiple different transformations of the query image. 256-bit binary codes allow much more accurate matching and can be used to prune the set of images found using the 28-bit codes.

Download manuscript

ES2011-79
Training RBMs based on the signs of the CD approximation of the log-likelihood derivatives
Asja Fischer, Christian Igel

Abstract:
Contrastive Divergence (CD) learning is frequently applied to Restricted Boltzmann Machines (RBMs), the building blocks of deep believe networks. It relies on biased approximations of the log-likelihood gradient. This bias can deteriorate the learning process. It was claimed that the signs of most components of the CD update are equal to the corresponding signs of the log-likelihood gradient. This suggests using optimization techniques only depending on the signs. Resilient backpropagation is such a method and we combine it with CD learning. However, it does not prevent divergence caused by the approximation bias.

Download manuscript

ES2011-21
A supervised strategy for deep kernel machine
Florian Yger, Maxime Berar, Gilles Gasso, Alain Rakotomamonjy

Abstract:
This paper presents an alternative approach for learning a Multilayer Kernel Machine (MKM) based on kernel partial least squares. After explaining the caracteristics and motivation of our approach, we show compelling evidences on real world applications.

Download manuscript