Bruges, Belgium, April 25-26-27
Content of the proceedings
-
Deep learning and image processing
Interaction and User Integration in Machine Learning for Information Visualisation
Nonlinear dimensionality reduction
Classification
Regression and recommendation systems
Shallow and Deep models for transfer learning and domain adaptation
Machine Learning and Data Analysis in Astroinformatics
Deep Learning in Bioinformatics and Medicine
Randomized Neural Networks
Clustering and feature selection
Mathematical aspects of learning, and reinforcement learning
Emerging trends in machine learning: beyond conventional methods and data
Temporal data, sequences and incremental learning
Impact of Biases in Big Data
Optimization and metaheuristics
Deep learning and image processing
ES2018-166
A Sub-Layered Hierarchical Pyramidal Neural Architecture for Facial Expression Recognition
Henrique Siqueira, Pablo Barros, Sven Magg, Cornelius Weber, Stefan Wermter
A Sub-Layered Hierarchical Pyramidal Neural Architecture for Facial Expression Recognition
Henrique Siqueira, Pablo Barros, Sven Magg, Cornelius Weber, Stefan Wermter
Abstract:
In domains where computational resources and labeled data are limited, such as in robotics, deep networks with millions of weights might not be the optimal solution. In this paper, we introduce a connectivity scheme for pyramidal architectures to increase their capacity for learning features. Experiments on facial expression recognition of unseen people demonstrate that our approach is a potential candidate for applications with restricted resources, due to good generalization performance and low computational cost. We show that our approach generalizes as well as convolutional architectures in this task but uses fewer trainable parameters and is more robust for low-resolution faces.
In domains where computational resources and labeled data are limited, such as in robotics, deep networks with millions of weights might not be the optimal solution. In this paper, we introduce a connectivity scheme for pyramidal architectures to increase their capacity for learning features. Experiments on facial expression recognition of unseen people demonstrate that our approach is a potential candidate for applications with restricted resources, due to good generalization performance and low computational cost. We show that our approach generalizes as well as convolutional architectures in this task but uses fewer trainable parameters and is more robust for low-resolution faces.
ES2018-102
interpretation of convolutional neural networks for speech regression from electrocorticography
Miguel Angrick, Christian Herff, Garett Johnson, Jerry Shih, Dean Krusienski, Tanja Schultz
interpretation of convolutional neural networks for speech regression from electrocorticography
Miguel Angrick, Christian Herff, Garett Johnson, Jerry Shih, Dean Krusienski, Tanja Schultz
Abstract:
The direct synthesis of continuously spoken speech from neural activity is envisioned to enable fast and intuitive Brain-Computer Interfaces. Earlier results indicate that intracranial recordings reveal very suitable signal characteristics for direct synthesis. To map the complex dynamics of neural activity to spectral representations of speech, Convolutional Neural Networks (CNNs) can be trained. However, the resulting networks are hard to interpret and thus provide little opportunity to gain insights on neural processes underlying speech. Here, we show that CNNs are useful to reconstruct speech from intracranial recordings of brain activity and propose an approach to interpret the trained CNNs.
The direct synthesis of continuously spoken speech from neural activity is envisioned to enable fast and intuitive Brain-Computer Interfaces. Earlier results indicate that intracranial recordings reveal very suitable signal characteristics for direct synthesis. To map the complex dynamics of neural activity to spectral representations of speech, Convolutional Neural Networks (CNNs) can be trained. However, the resulting networks are hard to interpret and thus provide little opportunity to gain insights on neural processes underlying speech. Here, we show that CNNs are useful to reconstruct speech from intracranial recordings of brain activity and propose an approach to interpret the trained CNNs.
ES2018-188
transferring style in motion capture sequences with adversarial learning
QI WANG, Mickael CHEN, thierry Artieres, Ludovic Denoyer
transferring style in motion capture sequences with adversarial learning
QI WANG, Mickael CHEN, thierry Artieres, Ludovic Denoyer
Abstract:
We focus on style transfer for sequential data in a supervised setting. Assuming sequential data include both content and style information we want to learn models able to transform a sequence into another one with the same content information but with the style of another one, from a training dataset where content and style labels are available. Following works on image generation and edition with adversarial learning, we explore the design of neural network architectures for the task of sequence edition that we apply to motion capture sequences.
We focus on style transfer for sequential data in a supervised setting. Assuming sequential data include both content and style information we want to learn models able to transform a sequence into another one with the same content information but with the style of another one, from a training dataset where content and style labels are available. Following works on image generation and edition with adversarial learning, we explore the design of neural network architectures for the task of sequence edition that we apply to motion capture sequences.
ES2018-164
Properties of adv−1 – Adversarials of Adversarials
Nils Worzyk, Oliver Kramer
Properties of adv−1 – Adversarials of Adversarials
Nils Worzyk, Oliver Kramer
Abstract:
Neural networks are very successful in the domain of image processing, but they are still vulnerable against adversarial images – carefully crafted images to fool the neural network during image classification. There are already some attacks to create those adversarial images, therefore the transition from original images to adversarial images is well understood. In this paper we apply adversarial attacks on adversarial images. These new images are called adv−1. The goal is to investigate the transition from adversarial images to adv−1 images. This knowledge can be used to 1.) identify adversarial images and 2.) to find the original class of adversarial images.
Neural networks are very successful in the domain of image processing, but they are still vulnerable against adversarial images – carefully crafted images to fool the neural network during image classification. There are already some attacks to create those adversarial images, therefore the transition from original images to adversarial images is well understood. In this paper we apply adversarial attacks on adversarial images. These new images are called adv−1. The goal is to investigate the transition from adversarial images to adv−1 images. This knowledge can be used to 1.) identify adversarial images and 2.) to find the original class of adversarial images.
ES2018-96
An analysis of subtask-dependency in robot command interpretation with dilated CNNs
Manfred Eppe, Tayfun Alpay, Fares Abawi, Stefan Wermter
An analysis of subtask-dependency in robot command interpretation with dilated CNNs
Manfred Eppe, Tayfun Alpay, Fares Abawi, Stefan Wermter
Abstract:
In this paper, we tackle sequence-to-tree transduction for language processing with neural networks implementing several subtasks, namely tokenization, semantic annotation, and tree generation. Our research question is how the individual subtasks influence the overall end-to-end learning performance in case of a convolutional network with dilated perceptive fields. We investigate a benchmark problem for robot command interpretation and conclude that dilation has a strong positive effect for performing character-level transduction and for generating parsing trees.
In this paper, we tackle sequence-to-tree transduction for language processing with neural networks implementing several subtasks, namely tokenization, semantic annotation, and tree generation. Our research question is how the individual subtasks influence the overall end-to-end learning performance in case of a convolutional network with dilated perceptive fields. We investigate a benchmark problem for robot command interpretation and conclude that dilation has a strong positive effect for performing character-level transduction and for generating parsing trees.
ES2018-200
Image retrieval and ranking through Deep Comparative Neural Networks
Aymen Cherif, Salim Jouili
Image retrieval and ranking through Deep Comparative Neural Networks
Aymen Cherif, Salim Jouili
Abstract:
Information retrieval is the task of extracting the most accurate documents from an existing collection with respect to a certain query. We focus our work to instance-level image retrieval. We approach this problem from the point of view of learning to rank. We explore the idea of using the pair-wise ranking model instead of simply providing a similarity measure between a query and a candidate document. We also investigate the ability of this a model to capture high level features that are query-document joint features and category independent.
Information retrieval is the task of extracting the most accurate documents from an existing collection with respect to a certain query. We focus our work to instance-level image retrieval. We approach this problem from the point of view of learning to rank. We explore the idea of using the pair-wise ranking model instead of simply providing a similarity measure between a query and a candidate document. We also investigate the ability of this a model to capture high level features that are query-document joint features and category independent.
ES2018-154
Incremental learning with deep neural networks using a test-time oracle
Alexander Gepperth, Saad Abdullah Gondal
Incremental learning with deep neural networks using a test-time oracle
Alexander Gepperth, Saad Abdullah Gondal
Abstract:
We present a simple idea to avoid catastrophic forgetting when training deep neural networks (DNNs) on class-incremental tasks. This means that initial training is conducted on a sub-task described by a dataset $D1$, whereas re-training is conducted subsequently, on a sub-task described by a dataset $D2$ that is composed of different classes. As our recent work suggest that DNNs perform very poorly at this problem, we propose a simple extension that proposes an individually trained readout layer for each sub-task. While this is unproblematic for training, a clustering method is used at test time to determine to which sub-task a sample most likely belongs. Experiments on simple benchmarks derived from MNIST show the effectiveness of this method for which a dedicated TensorFlow implementation is made available.
We present a simple idea to avoid catastrophic forgetting when training deep neural networks (DNNs) on class-incremental tasks. This means that initial training is conducted on a sub-task described by a dataset $D1$, whereas re-training is conducted subsequently, on a sub-task described by a dataset $D2$ that is composed of different classes. As our recent work suggest that DNNs perform very poorly at this problem, we propose a simple extension that proposes an individually trained readout layer for each sub-task. While this is unproblematic for training, a clustering method is used at test time to determine to which sub-task a sample most likely belongs. Experiments on simple benchmarks derived from MNIST show the effectiveness of this method for which a dedicated TensorFlow implementation is made available.
ES2018-162
Image-to-Text Transduction with Spatial Self-Attention
Sebastian Springenberg, Egor Lakomkin, Cornelius Weber, Stefan Wermter
Image-to-Text Transduction with Spatial Self-Attention
Sebastian Springenberg, Egor Lakomkin, Cornelius Weber, Stefan Wermter
Abstract:
Attention mechanisms have been shown to improve recurrent encoder-decoder architectures in sequence-to-sequence learning scenarios. Recently, the Transformer model has been proposed which only applies dot-product attention and omits recurrent operations to obtain a source-target mapping. This paper shows that the concepts of self- and inter-attention can effectively be applied in an image-to-text task. The encoder applies pre-trained convolution and pooling operations followed by self-attention to obtain an image feature representation. Self-attention combines image features of regions based on their similarity before they are made accessible to the decoder through inter-attention.
Attention mechanisms have been shown to improve recurrent encoder-decoder architectures in sequence-to-sequence learning scenarios. Recently, the Transformer model has been proposed which only applies dot-product attention and omits recurrent operations to obtain a source-target mapping. This paper shows that the concepts of self- and inter-attention can effectively be applied in an image-to-text task. The encoder applies pre-trained convolution and pooling operations followed by self-attention to obtain an image feature representation. Self-attention combines image features of regions based on their similarity before they are made accessible to the decoder through inter-attention.
ES2018-88
Hierarchical Recurrent Filtering for Fully Convolutional DenseNets
Jörg Wagner, Volker Fischer, Michael Herman, Sven Behnke
Hierarchical Recurrent Filtering for Fully Convolutional DenseNets
Jörg Wagner, Volker Fischer, Michael Herman, Sven Behnke
Abstract:
Generating a robust representation of the environment is a crucial ability of learning agents. Deep learning based methods have greatly improved perception systems but still fail in challenging situations. These failures are often not solvable on the basis of a single image. In this work, we present a parameter-efficient temporal filtering concept which extends an existing single-frame segmentation model to work with multiple frames. The resulting recurrent architecture temporally filters representations on all abstraction levels in a hierarchical manner, while decoupling temporal dependencies from scene representation. Using a synthetic dataset, we show the ability of our model to cope with data perturbations and highlight the importance of recurrent and hierarchical filtering.
Generating a robust representation of the environment is a crucial ability of learning agents. Deep learning based methods have greatly improved perception systems but still fail in challenging situations. These failures are often not solvable on the basis of a single image. In this work, we present a parameter-efficient temporal filtering concept which extends an existing single-frame segmentation model to work with multiple frames. The resulting recurrent architecture temporally filters representations on all abstraction levels in a hierarchical manner, while decoupling temporal dependencies from scene representation. Using a synthetic dataset, we show the ability of our model to cope with data perturbations and highlight the importance of recurrent and hierarchical filtering.
ES2018-70
Towards cognitive automotive environment modelling: reasoning based on vector representations
Florian Mirus, Terrence C. Stewart, Jörg Conradt
Towards cognitive automotive environment modelling: reasoning based on vector representations
Florian Mirus, Terrence C. Stewart, Jörg Conradt
Abstract:
In this paper, we propose a novel approach to knowledge representation for automotive environment modelling based on Vector Symbolic Architectures (VSAs). We build a vector representation describing structured information and relations within the current scene based on high-level object-lists perceived by individual sensors. Such a representation can be applied to different tasks with little modifications. In a sample instantiation, we focus on two example tasks, namely driving context classification and simple behavior prediction, to demonstrate the general applicability of our approach. Allowing efficient implementation in Spiking Neural Networks (SNNs), we envision to improve task performance of our approach through online-learning.
In this paper, we propose a novel approach to knowledge representation for automotive environment modelling based on Vector Symbolic Architectures (VSAs). We build a vector representation describing structured information and relations within the current scene based on high-level object-lists perceived by individual sensors. Such a representation can be applied to different tasks with little modifications. In a sample instantiation, we focus on two example tasks, namely driving context classification and simple behavior prediction, to demonstrate the general applicability of our approach. Allowing efficient implementation in Spiking Neural Networks (SNNs), we envision to improve task performance of our approach through online-learning.
ES2018-61
Inferencing based on unsupervised learning of disentangled representations
Tobias Hinz, Stefan Wermter
Inferencing based on unsupervised learning of disentangled representations
Tobias Hinz, Stefan Wermter
Abstract:
Combining Generative Adversarial Networks (GANs) with encoders that learn to encode data points has shown promising results in learning data representations in an unsupervised way. We propose a framework that combines an encoder and a generator to learn disentangled representations which encode meaningful information about the data distribution without the need for any labels. While current approaches focus mostly on the generative aspects of GANs, our framework can be used to perform inference on both real and generated data points. Experiments on several data sets show that the encoder learns interpretable, disentangled representations which encode descriptive properties and can be used to sample images that exhibit specific characteristics.
Combining Generative Adversarial Networks (GANs) with encoders that learn to encode data points has shown promising results in learning data representations in an unsupervised way. We propose a framework that combines an encoder and a generator to learn disentangled representations which encode meaningful information about the data distribution without the need for any labels. While current approaches focus mostly on the generative aspects of GANs, our framework can be used to perform inference on both real and generated data points. Experiments on several data sets show that the encoder learns interpretable, disentangled representations which encode descriptive properties and can be used to sample images that exhibit specific characteristics.
ES2018-32
Dynamic autonomous image segmentation based on Grow Cut
Alexandru-Ion Marinescu, Zoltán Bálint, Laura Dioșan, Anca Andreica
Dynamic autonomous image segmentation based on Grow Cut
Alexandru-Ion Marinescu, Zoltán Bálint, Laura Dioșan, Anca Andreica
Abstract:
The main incentive of this paper is to provide an enhanced approach for 2D medical image segmentation based on the Unsupervised Grow Cut algorithm, a method that requires no prior training. This paper assumes that the reader is, to some extent, familiar with cellular automata and their function as they make up the core of this technique. The benchmarks were performed on 2D MRI images of the heart and chest cavity. We obtained a significant increase in the output quality as compared to classical Unsupervised Grow Cut by using standard measures, based on the existence of accurate ground truth. This increase was obtained by dynamically altering the local threshold parameter. In conclusion, our approach provides the opportunity to become a building block of a computer aided diagnostic system.
The main incentive of this paper is to provide an enhanced approach for 2D medical image segmentation based on the Unsupervised Grow Cut algorithm, a method that requires no prior training. This paper assumes that the reader is, to some extent, familiar with cellular automata and their function as they make up the core of this technique. The benchmarks were performed on 2D MRI images of the heart and chest cavity. We obtained a significant increase in the output quality as compared to classical Unsupervised Grow Cut by using standard measures, based on the existence of accurate ground truth. This increase was obtained by dynamically altering the local threshold parameter. In conclusion, our approach provides the opportunity to become a building block of a computer aided diagnostic system.
ES2018-169
Continuous convolutional object tracking
Peer Springstübe, Stefan Heinrich, Stefan Wermter
Continuous convolutional object tracking
Peer Springstübe, Stefan Heinrich, Stefan Wermter
Abstract:
Tracking arbitrary objects is a challenging task in visual computing. A central problem is the need to adapt to the changing appearance of an object, particularly under strong transformation and occlusion. We propose a tracking framework that utilises the strengths of Convolutional Neural Networks (CNNs) to create a robust and adaptive model of the object from training data produced during tracking. An incremental update mechanism provides increased performance and reduces training during tracking, allowing its real-time use.
Tracking arbitrary objects is a challenging task in visual computing. A central problem is the need to adapt to the changing appearance of an object, particularly under strong transformation and occlusion. We propose a tracking framework that utilises the strengths of Convolutional Neural Networks (CNNs) to create a robust and adaptive model of the object from training data produced during tracking. An incremental update mechanism provides increased performance and reduces training during tracking, allowing its real-time use.
ES2018-155
Active Learning based on Transfer Learning Techniques for Image Classification
Daniela Onita, Adriana Birlutiu
Active Learning based on Transfer Learning Techniques for Image Classification
Daniela Onita, Adriana Birlutiu
Abstract:
In many imaging tasks only an expert can annotate the data. Though domain experts are available, their labor is expensive and we would like to avoid querying them whenever possible. Our task is to make use of our resources as efficient as possible for a learning task. There are various ways of working in cases of labelled data shortage. This type of learning problems can be approached with Active and Transfer Learning techniques. Active Learning and Transfer Learning have demonstrated their efficiency and ability to train accurate models with significantly reduced amount of training data in many real-life applications. In this paper we investigate the combination of Active and Transfer Learning for building an efficient algorithm for image classification. The experimental results show that by combining active and transfer learning, we can learn faster with fewer labels on a target domain than by random selection.
In many imaging tasks only an expert can annotate the data. Though domain experts are available, their labor is expensive and we would like to avoid querying them whenever possible. Our task is to make use of our resources as efficient as possible for a learning task. There are various ways of working in cases of labelled data shortage. This type of learning problems can be approached with Active and Transfer Learning techniques. Active Learning and Transfer Learning have demonstrated their efficiency and ability to train accurate models with significantly reduced amount of training data in many real-life applications. In this paper we investigate the combination of Active and Transfer Learning for building an efficient algorithm for image classification. The experimental results show that by combining active and transfer learning, we can learn faster with fewer labels on a target domain than by random selection.
ES2018-141
Near-optimal facial emotion classification using a WiSARD-based weightless system
Leopoldo Lusquino Filho, Felipe França, Priscila Lima
Near-optimal facial emotion classification using a WiSARD-based weightless system
Leopoldo Lusquino Filho, Felipe França, Priscila Lima
Abstract:
The recognition of facial expressions through the use of a WiSARD-based n-tuple classifier is explored in this work. The competitiveness of this weightless neural network is tested in the specific challenge of identifying emotions from photos of faces, limited to the six basic emotions described in the seminal work of Ekman and and Friesen (1977) on the identification of facial expressions. Current state-of-the-art for this problem uses a convolutional neural network (CNN), with accuracy of 100% and 99.6% in the Cohn-Kanade and MMI datasets, respectively, with the proposed WiSARD-based architecture reaching accuracy of 100% and 99.4% in the same datasets.
The recognition of facial expressions through the use of a WiSARD-based n-tuple classifier is explored in this work. The competitiveness of this weightless neural network is tested in the specific challenge of identifying emotions from photos of faces, limited to the six basic emotions described in the seminal work of Ekman and and Friesen (1977) on the identification of facial expressions. Current state-of-the-art for this problem uses a convolutional neural network (CNN), with accuracy of 100% and 99.6% in the Cohn-Kanade and MMI datasets, respectively, with the proposed WiSARD-based architecture reaching accuracy of 100% and 99.4% in the same datasets.
ES2018-142
Spatial pooling as feature selection method for object recognition
Murat Kirtay, Lorenzo Vannucci, Ugo Albanese, Alessandro Ambrosano, Egidio Falotico, Cecilia Laschi
Spatial pooling as feature selection method for object recognition
Murat Kirtay, Lorenzo Vannucci, Ugo Albanese, Alessandro Ambrosano, Egidio Falotico, Cecilia Laschi
Abstract:
This paper reports our work on object recognition by using the spatial pooler of Hierarchical Temporal Memory (HTM) as a method for feature selection. To perform recognition task, we employed this pooling mechanism to select features from COIL-100 dataset. We benchmarked the results with the state-of-the-art feature extraction methods while using different amounts of training data (from 5% to 45%). The results indicate that the performed method is effective for object recognition with a low amount of training data in which the hand-engineered state-of-the-art feature extraction methods show limitations.
This paper reports our work on object recognition by using the spatial pooler of Hierarchical Temporal Memory (HTM) as a method for feature selection. To perform recognition task, we employed this pooling mechanism to select features from COIL-100 dataset. We benchmarked the results with the state-of-the-art feature extraction methods while using different amounts of training data (from 5% to 45%). The results indicate that the performed method is effective for object recognition with a low amount of training data in which the hand-engineered state-of-the-art feature extraction methods show limitations.
Interaction and User Integration in Machine Learning for Information Visualisation
ES2018-3
Information visualisation and machine learning: latest trends towards convergence
Benoît Frénay, Bruno Dumas, John A. Lee
Information visualisation and machine learning: latest trends towards convergence
Benoît Frénay, Bruno Dumas, John A. Lee
Abstract:
Many methods have been developed in machine learning (ML) for information visualisation (infovis). For example, PCA, MDS, t-SNE and improvements are standard tools to reduce the dimensionality of high dimensional datasets for visualisation purposes. However, multiple other means are regularly used in the field of infovis when tackling datasets with high dimensionality. Letting the user manipulate the visualisation is one of these means, either through selection, navigation or filtering. Introducing manipulation of the visualisation also integrates the user as a core aspect of a given system. In the context of machine learning, beyond the informational and exploratory use of infovis, users' feedback can for example be highly informational to drive the dimensionality reduction process. This special session of the ESANN conference is a followup of the special session on "Information Visualisation and Machine Learning: Techniques, Validation and Integration" at ESANN 2016. It aims to gather researchers that integrate users in the core of ML methods for infovis. New algorithms and frameworks are welcome, as well as experimental use cases that bring new insight in the integration of interaction and user integration in ML for infovis. This special session aims to provide practitioners from both communities a common forum of discussion where issues at the crossroads of machine learning and information visualisation could be discussed.
Many methods have been developed in machine learning (ML) for information visualisation (infovis). For example, PCA, MDS, t-SNE and improvements are standard tools to reduce the dimensionality of high dimensional datasets for visualisation purposes. However, multiple other means are regularly used in the field of infovis when tackling datasets with high dimensionality. Letting the user manipulate the visualisation is one of these means, either through selection, navigation or filtering. Introducing manipulation of the visualisation also integrates the user as a core aspect of a given system. In the context of machine learning, beyond the informational and exploratory use of infovis, users' feedback can for example be highly informational to drive the dimensionality reduction process. This special session of the ESANN conference is a followup of the special session on "Information Visualisation and Machine Learning: Techniques, Validation and Integration" at ESANN 2016. It aims to gather researchers that integrate users in the core of ML methods for infovis. New algorithms and frameworks are welcome, as well as experimental use cases that bring new insight in the integration of interaction and user integration in ML for infovis. This special session aims to provide practitioners from both communities a common forum of discussion where issues at the crossroads of machine learning and information visualisation could be discussed.
ES2018-74
VisCoDeR: A tool for visually comparing dimensionality reduction algorithms
Rene Cutura, Stefan Holzer, Michaël Aupetit, Michael Sedlmair
VisCoDeR: A tool for visually comparing dimensionality reduction algorithms
Rene Cutura, Stefan Holzer, Michaël Aupetit, Michael Sedlmair
Abstract:
We propose VisCoDeR, a tool that leverages comparative visualization to support learning and analyzing different dimensionality reduction (DR) methods. VisCoDeR fosters two modes. The Discover mode allows to qualitatively compare several DR results by juxtaposing and linking the resulting scatterplots. The Explore mode allows for analyzing hundreds of differently parameterized DR results in a quantitative way. We present use cases that show that our approach helps to understand similarities and differences between DR algorithms.
We propose VisCoDeR, a tool that leverages comparative visualization to support learning and analyzing different dimensionality reduction (DR) methods. VisCoDeR fosters two modes. The Discover mode allows to qualitatively compare several DR results by juxtaposing and linking the resulting scatterplots. The Explore mode allows for analyzing hundreds of differently parameterized DR results in a quantitative way. We present use cases that show that our approach helps to understand similarities and differences between DR algorithms.
ES2018-158
G-Rap: interactive text synthesis using recurrent neural network suggestions
Udo Schlegel, Eren Cakmak, Juri Buchmüller, Daniel Keim
G-Rap: interactive text synthesis using recurrent neural network suggestions
Udo Schlegel, Eren Cakmak, Juri Buchmüller, Daniel Keim
Abstract:
Finding the best neural network configuration for a given goal can be challenging, especially when it is not possible to assess the output quality of a network automatically. We present G-Rap, an interactive interface based on Visual Analytics principles for comparing outputs of multiple RNNs for the same training data. G-Rap enables an iterative result generation process that allows a user to already work productively while evaluating the outputs with contextual statistics at the same time. We demonstrate the applicability of G-Rap at the example of interactive music lyrics generation.
Finding the best neural network configuration for a given goal can be challenging, especially when it is not possible to assess the output quality of a network automatically. We present G-Rap, an interactive interface based on Visual Analytics principles for comparing outputs of multiple RNNs for the same training data. G-Rap enables an iterative result generation process that allows a user to already work productively while evaluating the outputs with contextual statistics at the same time. We demonstrate the applicability of G-Rap at the example of interactive music lyrics generation.
ES2018-47
Interactive dimensionality reduction of large datasets using interpolation
Ignacio Diaz-Blanco, Daniel Pérez, Abel A. Cuadrado, Diego Garcia-Perez, Dominguez Manuel
Interactive dimensionality reduction of large datasets using interpolation
Ignacio Diaz-Blanco, Daniel Pérez, Abel A. Cuadrado, Diego Garcia-Perez, Dominguez Manuel
Abstract:
In this work we present an approach to achieve interactive dimensionality reduction (iDR) on large datasets. The main idea of the paper relies on using generalized regression neural network (GRNN) interpolation to obtain massive out of sample projections from iDR projections obtained on a reduced sample of the original dataset. The proposed method allows to achieve fluid iDR interaction on datasets between 45 times and 100 times larger than with the original DR method for similar latencies, yet achieving good distance preservation. The paper includes a rank-based comparison between the proposed method and the DR method used alone for different datasets and parameter values.
In this work we present an approach to achieve interactive dimensionality reduction (iDR) on large datasets. The main idea of the paper relies on using generalized regression neural network (GRNN) interpolation to obtain massive out of sample projections from iDR projections obtained on a reduced sample of the original dataset. The proposed method allows to achieve fluid iDR interaction on datasets between 45 times and 100 times larger than with the original DR method for similar latencies, yet achieving good distance preservation. The paper includes a rank-based comparison between the proposed method and the DR method used alone for different datasets and parameter values.
Nonlinear dimensionality reduction
ES2018-185
Perplexity-free t-SNE and twice Student tt-SNE
Cyril de Bodt, Dounia Mulders, Michel Verleysen, John A. Lee
Perplexity-free t-SNE and twice Student tt-SNE
Cyril de Bodt, Dounia Mulders, Michel Verleysen, John A. Lee
Abstract:
In fields of dimensionality reduction and data visualisation, t-SNE has become recently a very popular method. In this paper, we propose two variants to the Gaussian neighbourhoods used to characterise the neighbourhoods around each high-dimensional datum in t-SNE. A first alternative is to use t distributions just like they are used already in the low-dimensional embedding space; a variable degree of freedom accounts for the intrinsic dimensionality of data. The second variant relies on compounds of Gaussian neighbourhoods with growing widths, thereby suppressing the for the user to adjust a single size or perplexity. In both cases, neighbourhoods with heavy tails are thus used in the data space. Experiments show that both variants are competitive, with no extra cost.
In fields of dimensionality reduction and data visualisation, t-SNE has become recently a very popular method. In this paper, we propose two variants to the Gaussian neighbourhoods used to characterise the neighbourhoods around each high-dimensional datum in t-SNE. A first alternative is to use t distributions just like they are used already in the low-dimensional embedding space; a variable degree of freedom accounts for the intrinsic dimensionality of data. The second variant relies on compounds of Gaussian neighbourhoods with growing widths, thereby suppressing the for the user to adjust a single size or perplexity. In both cases, neighbourhoods with heavy tails are thus used in the data space. Experiments show that both variants are competitive, with no extra cost.
ES2018-173
Generative Kernel PCA
Joachim Schreurs, Johan Suykens
Generative Kernel PCA
Joachim Schreurs, Johan Suykens
Abstract:
Kernel PCA has shown to be a powerful feature extractor within many applications. Using the Restricted Kernel Machine formulation, a representation using visible and hidden units is obtained. This enables the exploration of new insights and connections between Restricted Boltzmann machines and kernel methods. This paper explores these connections, introducing a generative kernel PCA which can be used to generate new data, as well as denoise a given training dataset. Moreover, relations with linear PCA and a pre-image reconstruction method are introduced in this paper.
Kernel PCA has shown to be a powerful feature extractor within many applications. Using the Restricted Kernel Machine formulation, a representation using visible and hidden units is obtained. This enables the exploration of new insights and connections between Restricted Boltzmann machines and kernel methods. This paper explores these connections, introducing a generative kernel PCA which can be used to generate new data, as well as denoise a given training dataset. Moreover, relations with linear PCA and a pre-image reconstruction method are introduced in this paper.
ES2018-76
Extensive assessment of Barnes-Hut t-SNE
Cyril de Bodt, Dounia Mulders, Michel Verleysen, John A. Lee
Extensive assessment of Barnes-Hut t-SNE
Cyril de Bodt, Dounia Mulders, Michel Verleysen, John A. Lee
Abstract:
Stochastic Neighbor Embedding (SNE) and variants are dimensionality reduction (DR) methods able to foil the curse of dimensionality to deliver outstanding experimental results. Mitigating the crowding problem, t-SNE became an extremely popular DR scheme. Its quadratic time complexity in the number of samples is nevertheless unaffordable for big data sets. This motivates its Barnes-Hut (BH) acceleration for large-scale use. Although the latter is faster by orders of magnitude, few studies quantify its DR quality with respect to t-SNE. Extensive comparisons between t-SNE and its BH version are conducted using neighborhood preservation-based criteria. Both methods perform very similarly, suggesting the BH scheme superiority thanks to its reduced time complexity.
Stochastic Neighbor Embedding (SNE) and variants are dimensionality reduction (DR) methods able to foil the curse of dimensionality to deliver outstanding experimental results. Mitigating the crowding problem, t-SNE became an extremely popular DR scheme. Its quadratic time complexity in the number of samples is nevertheless unaffordable for big data sets. This motivates its Barnes-Hut (BH) acceleration for large-scale use. Although the latter is faster by orders of magnitude, few studies quantify its DR quality with respect to t-SNE. Extensive comparisons between t-SNE and its BH version are conducted using neighborhood preservation-based criteria. Both methods perform very similarly, suggesting the BH scheme superiority thanks to its reduced time complexity.
ES2018-41
Understanding wafer patterns in semiconductor production with variational auto-encoders
Tiago Santos, Roman Kern
Understanding wafer patterns in semiconductor production with variational auto-encoders
Tiago Santos, Roman Kern
Abstract:
Semiconductor manufacturing processes critically depend on hundreds of highly complex process steps, which may cause critical deviations in the end-product. Hence, a better understanding of wafer test data patterns, which represent stress tests conducted on devices in semiconductor material slices, may lead to an improved production process. However, the shapes and types of these wafer patterns, as well as their relation to single process steps, are unknown. In a first step to address these issues, we tailor and apply a variational auto-encoder (VAE) to wafer pattern images. We find the VAE's generator allows for explorative wafer pattern analysis, and its encoder provides an effective dimensionality reduction algorithm, which, in a clustering application, performs better than several baselines such as t-SNE and yields interpretable clusters of wafer patterns.
Semiconductor manufacturing processes critically depend on hundreds of highly complex process steps, which may cause critical deviations in the end-product. Hence, a better understanding of wafer test data patterns, which represent stress tests conducted on devices in semiconductor material slices, may lead to an improved production process. However, the shapes and types of these wafer patterns, as well as their relation to single process steps, are unknown. In a first step to address these issues, we tailor and apply a variational auto-encoder (VAE) to wafer pattern images. We find the VAE's generator allows for explorative wafer pattern analysis, and its encoder provides an effective dimensionality reduction algorithm, which, in a clustering application, performs better than several baselines such as t-SNE and yields interpretable clusters of wafer patterns.
Classification
ES2018-53
Feature noise tuning for resource efficient Bayesian Network Classifiers
Laura Isabel Galindez Olascoaga, Jonas Vlasselaer, Wannes Meert, Marian Verhelst
Feature noise tuning for resource efficient Bayesian Network Classifiers
Laura Isabel Galindez Olascoaga, Jonas Vlasselaer, Wannes Meert, Marian Verhelst
Abstract:
Emerging portable applications require always-on sensing technologies to continuously monitor the environment and their user's needs. Yet, the high power consumption that results from this continuous sensing, often hampers these systems' always-on functionality. In this paper we propose a hardware-aware Machine Learning scheme that exploits the devices' ability to trade-off the quality of its sensors versus its power consumption. We introduce a technique that extends Bayesian Network classifiers with hardware description nodes that encode the probabilistic relation between sensory features and their degraded versions. We show how this allows to tune the hardware device's power consumption versus inference accuracy trade-off space with fine granularity, resulting in operating points that achieve significant power savings at almost no accuracy loss. This is empirically shown on various Machine Learning benchmarking datasets.
Emerging portable applications require always-on sensing technologies to continuously monitor the environment and their user's needs. Yet, the high power consumption that results from this continuous sensing, often hampers these systems' always-on functionality. In this paper we propose a hardware-aware Machine Learning scheme that exploits the devices' ability to trade-off the quality of its sensors versus its power consumption. We introduce a technique that extends Bayesian Network classifiers with hardware description nodes that encode the probabilistic relation between sensory features and their degraded versions. We show how this allows to tune the hardware device's power consumption versus inference accuracy trade-off space with fine granularity, resulting in operating points that achieve significant power savings at almost no accuracy loss. This is empirically shown on various Machine Learning benchmarking datasets.
ES2018-97
Reliable Patient Classification in Case of Uncertain Class Labels Using a Cross-Entropy Approach
Andrea Villmann, Marika Kaden, Sascha Saralajew, Wieland Hermann, Thomas Villmann
Reliable Patient Classification in Case of Uncertain Class Labels Using a Cross-Entropy Approach
Andrea Villmann, Marika Kaden, Sascha Saralajew, Wieland Hermann, Thomas Villmann
Abstract:
Classification learning crucially depends on the correct label information in training data. We consider the problem that a respective uncertainty can neither be neglected nor it can be approximated by a statistical model. In the proposed approach each training data is equipped with a certainty value reflecting the probability of the label correctness. This information is used in the learning process for the classifier. For this purpose, we adopt the cross-entropy cost function from deep learning for a modified learning vector quantization model. We show the usefulness of this knowledge integration in medical diagnostic data analysis for detection of Wilson's disease as an example.
Classification learning crucially depends on the correct label information in training data. We consider the problem that a respective uncertainty can neither be neglected nor it can be approximated by a statistical model. In the proposed approach each training data is equipped with a certainty value reflecting the probability of the label correctness. This information is used in the learning process for the classifier. For this purpose, we adopt the cross-entropy cost function from deep learning for a modified learning vector quantization model. We show the usefulness of this knowledge integration in medical diagnostic data analysis for detection of Wilson's disease as an example.
ES2018-108
behaviour-based working memory capacity classification using recurrent neural networks
Mazen Salous, Felix Putze
behaviour-based working memory capacity classification using recurrent neural networks
Mazen Salous, Felix Putze
Abstract:
A user's working memory capacity is a crucial factor for successful Human Computer Interaction. While reliable tests for working memory capacity are available, they are time-consuming, stressful, and not well-integrated into HCI applications. This paper presents a classifier based on Long Short Term Memory networks to exploit sparse temporal dependencies in behavioural data, collected in a complex, memory-intense interaction task, to classify working memory capacity. A cognitive user simulation is introduced to generate additional training data episodes that follow the behaviour of existing real data. We show that the classifier outperforms a linear baseline especially for short segments of data.
A user's working memory capacity is a crucial factor for successful Human Computer Interaction. While reliable tests for working memory capacity are available, they are time-consuming, stressful, and not well-integrated into HCI applications. This paper presents a classifier based on Long Short Term Memory networks to exploit sparse temporal dependencies in behavioural data, collected in a complex, memory-intense interaction task, to classify working memory capacity. A cognitive user simulation is introduced to generate additional training data episodes that follow the behaviour of existing real data. We show that the classifier outperforms a linear baseline especially for short segments of data.
ES2018-118
Structuring and Solving Multi-Criteria Decision Making Problems using Artificial Neural Networks: a smartphone recommendation case
Victor Amaral De Sousa, Anthony Simonofski, Monique Snoeck, Ivan Jureta
Structuring and Solving Multi-Criteria Decision Making Problems using Artificial Neural Networks: a smartphone recommendation case
Victor Amaral De Sousa, Anthony Simonofski, Monique Snoeck, Ivan Jureta
Abstract:
Several techniques can be used to solve multi-criteria decision making (MCDM) problems and to provide a global ranking of the alternatives considered. However, in a context with a high number of alternatives and where decision criteria relate to soft goals, the decision problem is particularly hard to solve. This paper analyzes the use of artificial neural networks to improve the relevance of the ranking of alternatives delivered by MCDM problem-solving techniques. Afterwards, a model using a combination of artificial neural networks and of the weighted sum model, a particular MCDM problem-solving technique, is built to recommend smartphones.
Several techniques can be used to solve multi-criteria decision making (MCDM) problems and to provide a global ranking of the alternatives considered. However, in a context with a high number of alternatives and where decision criteria relate to soft goals, the decision problem is particularly hard to solve. This paper analyzes the use of artificial neural networks to improve the relevance of the ranking of alternatives delivered by MCDM problem-solving techniques. Afterwards, a model using a combination of artificial neural networks and of the weighted sum model, a particular MCDM problem-solving technique, is built to recommend smartphones.
ES2018-127
Efficient accuracy estimation for instance-based incremental active learning
Christian Limberg, Heiko Wersing, Helge Ritter
Efficient accuracy estimation for instance-based incremental active learning
Christian Limberg, Heiko Wersing, Helge Ritter
Abstract:
Estimating system's accuracy is crucial for applications of incremental learning. In this paper, we introduce the Distogram Estimation (DGE) approach to estimate the accuracy of instance-based classifiers. By calculating relative distances to samples it is possible to train an offline regression model, capable of predicting the classifier's accuracy on unseen data. Our approach requires only a few supervised samples for training and can instantaneously be applied on unseen data afterwards. We evaluate our method on five benchmark data sets and for a robot object recognition task. Our algorithm clearly outperforms two baseline methods both for random and active selection of incremental training examples.
Estimating system's accuracy is crucial for applications of incremental learning. In this paper, we introduce the Distogram Estimation (DGE) approach to estimate the accuracy of instance-based classifiers. By calculating relative distances to samples it is possible to train an offline regression model, capable of predicting the classifier's accuracy on unseen data. Our approach requires only a few supervised samples for training and can instantaneously be applied on unseen data afterwards. We evaluate our method on five benchmark data sets and for a robot object recognition task. Our algorithm clearly outperforms two baseline methods both for random and active selection of incremental training examples.
ES2018-168
Boolean kernels for interpretable kernel machines
Mirko Polato, Fabio Aiolli
Boolean kernels for interpretable kernel machines
Mirko Polato, Fabio Aiolli
Abstract:
Most of the machine learning (ML) community's efforts in the last decades have been devoted to improving the power and the prediction quality of ML models at the expense of their interpretability. However, nowadays, ML is becoming more and more ubiquitous and it is increasingly demanded the need for models that can be interpreted. To this end, in this work we propose a method for extracting explanation rules from a kernel machine. The core idea is based on using kernels with feature spaces composed by logical propositions. On top of that, a searching algorithm tries to retrieve the most relevant features/rules that can be used to explain the trained model. Experiments on several benchmarks and artificial datasets show the effectiveness of the proposed approach.
Most of the machine learning (ML) community's efforts in the last decades have been devoted to improving the power and the prediction quality of ML models at the expense of their interpretability. However, nowadays, ML is becoming more and more ubiquitous and it is increasingly demanded the need for models that can be interpreted. To this end, in this work we propose a method for extracting explanation rules from a kernel machine. The core idea is based on using kernels with feature spaces composed by logical propositions. On top of that, a searching algorithm tries to retrieve the most relevant features/rules that can be used to explain the trained model. Experiments on several benchmarks and artificial datasets show the effectiveness of the proposed approach.
ES2018-181
The minimum effort maximum output principle applied to Multiple Kernel Learning
Ivano Lauriola, Mirko Polato, Fabio Aiolli
The minimum effort maximum output principle applied to Multiple Kernel Learning
Ivano Lauriola, Mirko Polato, Fabio Aiolli
Abstract:
The Multiple Kernel Learning (MKL) paradigm aims at learning the representation from data reducing the effort devoted to the choice of kernel's hyperparameters. Typically, the resulting kernel is obtained as the maximal margin combination of a set of base kernels. When too expressive base kernels are provided to the MKL algorithm, the solution found by these algorithms can overfit data. In this paper, we propose a novel MKL algorithm which takes into consideration the expressiveness of the obtained representation in its objective function in such a way that a trade-off between large margins and simple hypothesis spaces can be found. Moreover, an empirical comparison against hard baselines and state-of-the-art MKL methods on several real-world datasets is presented showing the merits of the proposed algorithm especially with respect to the robustness to overfitting.
The Multiple Kernel Learning (MKL) paradigm aims at learning the representation from data reducing the effort devoted to the choice of kernel's hyperparameters. Typically, the resulting kernel is obtained as the maximal margin combination of a set of base kernels. When too expressive base kernels are provided to the MKL algorithm, the solution found by these algorithms can overfit data. In this paper, we propose a novel MKL algorithm which takes into consideration the expressiveness of the obtained representation in its objective function in such a way that a trade-off between large margins and simple hypothesis spaces can be found. Moreover, an empirical comparison against hard baselines and state-of-the-art MKL methods on several real-world datasets is presented showing the merits of the proposed algorithm especially with respect to the robustness to overfitting.
ES2018-113
One-class Autoencoder approach to classify Raman spectra outliers
Katharina Hofer-Schmitz, Phuong-Ha Nguyen, Kristian Berwanger
One-class Autoencoder approach to classify Raman spectra outliers
Katharina Hofer-Schmitz, Phuong-Ha Nguyen, Kristian Berwanger
Abstract:
We present an one-class Anomaly detector based on (deep) Autoencoder for Raman spectra. Omitting preprocessing of the spectra, we use raw data of our main class to learn the reconstruction, with many typical noise sources automatically reduced as the outcome. To separate anomalies from the norm class, we use several, independent statistical metrics for a majority voting. Our evaluation shows a f1-score of up to 99% success.
We present an one-class Anomaly detector based on (deep) Autoencoder for Raman spectra. Omitting preprocessing of the spectra, we use raw data of our main class to learn the reconstruction, with many typical noise sources automatically reduced as the outcome. To separate anomalies from the norm class, we use several, independent statistical metrics for a majority voting. Our evaluation shows a f1-score of up to 99% success.
ES2018-161
Radar Based Pedestrian Detection using Support Vector Machine and the Micro Doppler Effect
Joao Victor Bruneti Severino, Alessandro Zimmer, Leandro dos Santos Coelho, Roberto Zanetti Freire
Radar Based Pedestrian Detection using Support Vector Machine and the Micro Doppler Effect
Joao Victor Bruneti Severino, Alessandro Zimmer, Leandro dos Santos Coelho, Roberto Zanetti Freire
Abstract:
Based on alarming statistics related to both pedestrian fatalities and injuries in traffic accidents, this paper presents the development of a pedestrian detection method for an Advanced Driving Assistance System (ADAS). Using a 79GHz automotive radar, a signal processing application that can early identify pedestrians in short range situations using Support Vector Machine (SVM) was presented and evaluated in order to improve the velocity resolution for the micro Doppler effects extraction. By assuming pre-processing multiobjective optimization, promising results in terms of velocity resolution and measuring time were obtained, improving the accuracy of the classifier.
Based on alarming statistics related to both pedestrian fatalities and injuries in traffic accidents, this paper presents the development of a pedestrian detection method for an Advanced Driving Assistance System (ADAS). Using a 79GHz automotive radar, a signal processing application that can early identify pedestrians in short range situations using Support Vector Machine (SVM) was presented and evaluated in order to improve the velocity resolution for the micro Doppler effects extraction. By assuming pre-processing multiobjective optimization, promising results in terms of velocity resolution and measuring time were obtained, improving the accuracy of the classifier.
ES2018-198
Opposite neighborhood: a new method to select reference points of minimal learning machines
Madson Dias, Lucas Sousa, Ajalmar Rocha Neto, Amauri Souza Junior
Opposite neighborhood: a new method to select reference points of minimal learning machines
Madson Dias, Lucas Sousa, Ajalmar Rocha Neto, Amauri Souza Junior
Abstract:
This paper introduces a new approach to select reference points of minimal learning machines (MLMs) for classification tasks. The MLM training procedure is related to the selection of a subset of the training set, named reference points (RPs), that is used to build a mapping between the input geometric configurations and their corresponding labels. We propose a method, named opposite neighborhood (ON), that explores the Euclidean distance in input space to select RPs. Experiments were performed using UCI data sets. The proposal was able to both reduce the number of reference points and achieve competitive performance when compared to conventional approaches for selecting RPs.
This paper introduces a new approach to select reference points of minimal learning machines (MLMs) for classification tasks. The MLM training procedure is related to the selection of a subset of the training set, named reference points (RPs), that is used to build a mapping between the input geometric configurations and their corresponding labels. We propose a method, named opposite neighborhood (ON), that explores the Euclidean distance in input space to select RPs. Experiments were performed using UCI data sets. The proposal was able to both reduce the number of reference points and achieve competitive performance when compared to conventional approaches for selecting RPs.
ES2018-60
A neural network cost function for highly class-imbalanced data sets
David Twomey, Denise Gorse
A neural network cost function for highly class-imbalanced data sets
David Twomey, Denise Gorse
Abstract:
We introduce a new cost function for the training of a neural network classifier in conditions of high class imbalance. This function, based on an approximate confusion matrix, represents a balance of sensitivity and specificity and is thus well suited to problems where cost functions such as the mean squared error and cross entropy are prone to overpredicting the majority class. The benefit of the new measure is shown on a set of common class-imbalanced datasets using the Matthews Correlation Coefficient as an independent scoring measure.
We introduce a new cost function for the training of a neural network classifier in conditions of high class imbalance. This function, based on an approximate confusion matrix, represents a balance of sensitivity and specificity and is thus well suited to problems where cost functions such as the mean squared error and cross entropy are prone to overpredicting the majority class. The benefit of the new measure is shown on a set of common class-imbalanced datasets using the Matthews Correlation Coefficient as an independent scoring measure.
ES2018-78
Self-learning assembly systems during ramp-up
Ralf Schönherr, Maximilian Knaller, Markus Philipp
Self-learning assembly systems during ramp-up
Ralf Schönherr, Maximilian Knaller, Markus Philipp
Abstract:
Achieving the targeted production volume during the ramp-up phase plays an important role for the economic success of manufacturing companies. But ramp-up phases are usually characterized by a high degree of uncertainty, as many situations arise for the first time. These unexpected events lead to errors and faults in automated processes which cause losses in the overall production volume. This paper proposes an architecture for assembly systems to predict and avoid faults of the assembly process during ramp-up through self-learning. Different algorithms for self-learning components are evaluated. By using real production data sets, neural networks could be identified as the best solution.
Achieving the targeted production volume during the ramp-up phase plays an important role for the economic success of manufacturing companies. But ramp-up phases are usually characterized by a high degree of uncertainty, as many situations arise for the first time. These unexpected events lead to errors and faults in automated processes which cause losses in the overall production volume. This paper proposes an architecture for assembly systems to predict and avoid faults of the assembly process during ramp-up through self-learning. Different algorithms for self-learning components are evaluated. By using real production data sets, neural networks could be identified as the best solution.
ES2018-87
Feasibility based Large Margin Nearest Neighbor metric learning
Babak Hosseini, Barbara Hammer
Feasibility based Large Margin Nearest Neighbor metric learning
Babak Hosseini, Barbara Hammer
Abstract:
Large margin nearest neighbor (LMNN) is a metric learner which optimizes the performance of the popular $k$NN classifier. However, its resulting metric relies on pre-selected target neighbors. In this paper, we address the feasibility of LMNN's optimization constraints regarding these target points, and introduce a mathematical measure to evaluate the size of the feasible region of the optimization problem. We enhance the optimization framework of LMNN by a weighting scheme which prefers data triplets which yield a larger feasible region. This increases the chances to obtain a good metric as the solution of LMNN's problem. We evaluate the performance of the resulting feasibility-based LMNN algorithm using synthetic and real datasets. The empirical results show an improved accuracy for different types of datasets in comparison to regular LMNN.
Large margin nearest neighbor (LMNN) is a metric learner which optimizes the performance of the popular $k$NN classifier. However, its resulting metric relies on pre-selected target neighbors. In this paper, we address the feasibility of LMNN's optimization constraints regarding these target points, and introduce a mathematical measure to evaluate the size of the feasible region of the optimization problem. We enhance the optimization framework of LMNN by a weighting scheme which prefers data triplets which yield a larger feasible region. This increases the chances to obtain a good metric as the solution of LMNN's problem. We evaluate the performance of the resulting feasibility-based LMNN algorithm using synthetic and real datasets. The empirical results show an improved accuracy for different types of datasets in comparison to regular LMNN.
ES2018-101
Combining latent tree modeling with a random forest-based approach, for genetic association studies
Christine Sinoquet, Kamel MEKHNACHA
Combining latent tree modeling with a random forest-based approach, for genetic association studies
Christine Sinoquet, Kamel MEKHNACHA
Abstract:
Association studies have been widely used to discover the genetic basis of complex phenotypes. However, standard univariate tests, and their alternatives, do not fully exploit the dependences between genetic markers. In this paper, we propose Sylva, a hybrid approach in which a random forest framework based on embedded trees benefits from a probabilistic graphical model. The latter is a collection of tree-shaped Bayesian networks with latent variables. We extensively compared Sylva and T-Trees, on simulated and real data. Sylva outperforms the already highly performant T-Trees, in a vast majority of cases.
Association studies have been widely used to discover the genetic basis of complex phenotypes. However, standard univariate tests, and their alternatives, do not fully exploit the dependences between genetic markers. In this paper, we propose Sylva, a hybrid approach in which a random forest framework based on embedded trees benefits from a probabilistic graphical model. The latter is a collection of tree-shaped Bayesian networks with latent variables. We extensively compared Sylva and T-Trees, on simulated and real data. Sylva outperforms the already highly performant T-Trees, in a vast majority of cases.
ES2018-63
Graph based neural networks for automatic classification of multiple sclerosis clinical courses
Francesco Calimeri, Aldo Marzullo, Claudio Stamile, Giorgio Terracina
Graph based neural networks for automatic classification of multiple sclerosis clinical courses
Francesco Calimeri, Aldo Marzullo, Claudio Stamile, Giorgio Terracina
Abstract:
Automatic classification of biomedical imaging became an important field of research within the scientific community, in the latest years. Indeed, advances in image acquisition and processing techniques, along with the success of novel deep learning methods and architectures, represented a considerable support in providing better biomarkers for the characterization of several diseases, and brain diseases in particular. In this work we propose a novel neural network approach that is applied to graphs generated from MRI data in order to make predictions about the clinical status of a patient. Results show high performances in classification tasks and open interesting perspectives in the field.
Automatic classification of biomedical imaging became an important field of research within the scientific community, in the latest years. Indeed, advances in image acquisition and processing techniques, along with the success of novel deep learning methods and architectures, represented a considerable support in providing better biomarkers for the characterization of several diseases, and brain diseases in particular. In this work we propose a novel neural network approach that is applied to graphs generated from MRI data in order to make predictions about the clinical status of a patient. Results show high performances in classification tasks and open interesting perspectives in the field.
Regression and recommendation systems
ES2018-72
Extreme Minimal Learning Machine
Tommi Kärkkäinen
Extreme Minimal Learning Machine
Tommi Kärkkäinen
Abstract:
Extreme Learning Machine (ELM) and Minimal Learning Machine (MLM) are nonlinear and scalable machine learning techniques with randomly generated basis. Both techniques share a step where a matrix of weights for the linear combination of the basis is recovered. In MLM, the kernel in this step corresponds to distance calculations between the training data and a set of reference points, whereas in ELM transformation with a sigmoidal activation function is most commonly used. MLM then needs additional interpolation step to estimate the actual distance-regression based output. A natural combination of these two techniques is proposed here, i.e., to use a distance-based kernel characteristic in MLM in ELM. The experimental results show promising potential of the proposed technique.
Extreme Learning Machine (ELM) and Minimal Learning Machine (MLM) are nonlinear and scalable machine learning techniques with randomly generated basis. Both techniques share a step where a matrix of weights for the linear combination of the basis is recovered. In MLM, the kernel in this step corresponds to distance calculations between the training data and a set of reference points, whereas in ELM transformation with a sigmoidal activation function is most commonly used. MLM then needs additional interpolation step to estimate the actual distance-regression based output. A natural combination of these two techniques is proposed here, i.e., to use a distance-based kernel characteristic in MLM in ELM. The experimental results show promising potential of the proposed technique.
ES2018-182
Learning with a Fisher surrogate loss in a small data regime
Moussab Djerrab, Alexandre Garcia
Learning with a Fisher surrogate loss in a small data regime
Moussab Djerrab, Alexandre Garcia
Abstract:
We introduce a novel framework, Output Fisher Embedding Regression (OFER), that makes use of a Fisher vector representation of the outputs and provides prediction by solving an appropriate pre-image problem. OFER takes advantage of the implicit structure of the marginal probability distribution of the output to improve performance in prediction. Although the proposed approach is general and versatile, we put a stress on the Gaussian mixture model for modelling the output data and design a closed-form solution for the corresponding pre-image problem. Numerical results are presented on a drug activity prediction task and a multi-class classification problem cast into a semantic regression problem and show the relevance of the approach in small data regime.
We introduce a novel framework, Output Fisher Embedding Regression (OFER), that makes use of a Fisher vector representation of the outputs and provides prediction by solving an appropriate pre-image problem. OFER takes advantage of the implicit structure of the marginal probability distribution of the output to improve performance in prediction. Although the proposed approach is general and versatile, we put a stress on the Gaussian mixture model for modelling the output data and design a closed-form solution for the corresponding pre-image problem. Numerical results are presented on a drug activity prediction task and a multi-class classification problem cast into a semantic regression problem and show the relevance of the approach in small data regime.
ES2018-94
Fast Power system security analysis with Guided Dropout
Benjamin Donnot, Isabelle Guyon, Antoine MAROT, Marc Schoenauer, Patrick Panciatici
Fast Power system security analysis with Guided Dropout
Benjamin Donnot, Isabelle Guyon, Antoine MAROT, Marc Schoenauer, Patrick Panciatici
Abstract:
We propose a new method to efficiently compute load-flows (the steady-state of the power-grid for given productions, consumptions and grid topology), substituting conventional simulators based on differential equation solvers. We use a deep feed-forward neural network trained with load-flows precomputed by simulation. Our architecture permits to train a network on so-called ``n-1'' problems, in which load flows are evaluated for every possible line disconnection, then generalize to ``n-2'' problems without re-training (a clear advantage because of the combinatorial nature of the problem). To that end, we developed a technique bearing similarity with ``dropout'', which we named ``guided dropout''.
We propose a new method to efficiently compute load-flows (the steady-state of the power-grid for given productions, consumptions and grid topology), substituting conventional simulators based on differential equation solvers. We use a deep feed-forward neural network trained with load-flows precomputed by simulation. Our architecture permits to train a network on so-called ``n-1'' problems, in which load flows are evaluated for every possible line disconnection, then generalize to ``n-2'' problems without re-training (a clear advantage because of the combinatorial nature of the problem). To that end, we developed a technique bearing similarity with ``dropout'', which we named ``guided dropout''.
ES2018-51
Neural Networks for Implicit Feedback Datasets
Josef Feigl, Martin Bogdan
Neural Networks for Implicit Feedback Datasets
Josef Feigl, Martin Bogdan
Abstract:
Most users typically interact with products only through implicit feedback such as clicks or purchases rather than explicit user-provided information like product ratings. Learning to rank products according to individual preferences using only this implicit feedback can be helpful to make useful recommendations. In this paper, a neural network architecture to solve collaborative filtering problems for personalized rankings on implicit feedback datasets is presented. It is shown how a layer of constant weights forces the network to learn pairwise rankings. Additionally, similarities between the network and a matrix factorization model trained with Bayesian Personalized Ranking are proven. The experiments indicate state-of-the-art performance for the task of personalized ranking.
Most users typically interact with products only through implicit feedback such as clicks or purchases rather than explicit user-provided information like product ratings. Learning to rank products according to individual preferences using only this implicit feedback can be helpful to make useful recommendations. In this paper, a neural network architecture to solve collaborative filtering problems for personalized rankings on implicit feedback datasets is presented. It is shown how a layer of constant weights forces the network to learn pairwise rankings. Additionally, similarities between the network and a matrix factorization model trained with Bayesian Personalized Ranking are proven. The experiments indicate state-of-the-art performance for the task of personalized ranking.
ES2018-152
Regularize and explicit collaborative filtering with textual attention
Charles-Emmanuel Dias, Vincent Guigue, Patrick Gallinari
Regularize and explicit collaborative filtering with textual attention
Charles-Emmanuel Dias, Vincent Guigue, Patrick Gallinari
Abstract:
Recommendation can be seen as tantamount to blind sentiment analysis, i.e. a sentiment prediction without text data. In that sense, we aim at encoding priors on users and items while reading their reviews, using a deep architecture with personalized attention modeling. Following this idea, we build an hybrid hierarchical sentiment classifier which is then used as a recommender system in inference.
Recommendation can be seen as tantamount to blind sentiment analysis, i.e. a sentiment prediction without text data. In that sense, we aim at encoding priors on users and items while reading their reviews, using a deep architecture with personalized attention modeling. Following this idea, we build an hybrid hierarchical sentiment classifier which is then used as a recommender system in inference.
ES2018-183
Adaptive random forests for data stream regression
Heitor Murilo Gomes, Jean Paul Barddal, Luis Eduardo Boiko, Albert Bifet
Adaptive random forests for data stream regression
Heitor Murilo Gomes, Jean Paul Barddal, Luis Eduardo Boiko, Albert Bifet
Abstract:
Data stream mining is a hot topic in the machine learning community that tackles the problem of learning and updating predictive models as new data becomes available over time. Even though several new methods are proposed every year, most focus on the classification task and overlook the regression task. In this paper, we propose an adaptation to the Adaptive Random Forest so that it can handle regression tasks, namely ARF-Reg. ARF-Reg is empirically evaluated and compared to existing works of the area, thus highlighting its applicability in different data stream scenarios.
Data stream mining is a hot topic in the machine learning community that tackles the problem of learning and updating predictive models as new data becomes available over time. Even though several new methods are proposed every year, most focus on the classification task and overlook the regression task. In this paper, we propose an adaptation to the Adaptive Random Forest so that it can handle regression tasks, namely ARF-Reg. ARF-Reg is empirically evaluated and compared to existing works of the area, thus highlighting its applicability in different data stream scenarios.
ES2018-33
Cache-efficient Gradient Descent Algorithm
imen chakroun, Tom Vander Aa, thomas ashby
Cache-efficient Gradient Descent Algorithm
imen chakroun, Tom Vander Aa, thomas ashby
Abstract:
Best practice when using Stochastic Gradient Descent (SGD) suggests randomising the order of training points and streaming the whole set through the learner. This results in extremely low temporal locality of access to the training set and thus makes minimal use of the small, fast layers of memory in an HPC memory hierarchy. While mini-batch SGD is often used to control the noise on the gradient and make convergence smoother and more easy to identify than SGD, it suffers from the same extremely low temporal locality. In this paper we introduce Sliding Window SGD (SW-SGD) which uses temporal locality of training point access in an attempt to combine the advantages of SGD with mini batch-SGD by leveraging HPC memory hierarchies. We give initial results on a classification and a regression problems using the MNIST and CHEMBL datasets showing that memory hierarchies can be used to improve the performances of gradient algorithms.
Best practice when using Stochastic Gradient Descent (SGD) suggests randomising the order of training points and streaming the whole set through the learner. This results in extremely low temporal locality of access to the training set and thus makes minimal use of the small, fast layers of memory in an HPC memory hierarchy. While mini-batch SGD is often used to control the noise on the gradient and make convergence smoother and more easy to identify than SGD, it suffers from the same extremely low temporal locality. In this paper we introduce Sliding Window SGD (SW-SGD) which uses temporal locality of training point access in an attempt to combine the advantages of SGD with mini batch-SGD by leveraging HPC memory hierarchies. We give initial results on a classification and a regression problems using the MNIST and CHEMBL datasets showing that memory hierarchies can be used to improve the performances of gradient algorithms.
ES2018-73
Sensitivity analysis for predictive uncertainty
Stefan Depeweg, José Miguel Hernández-Lobato, Steffen Udluft, Thomas Runkler
Sensitivity analysis for predictive uncertainty
Stefan Depeweg, José Miguel Hernández-Lobato, Steffen Udluft, Thomas Runkler
Abstract:
We derive a novel sensitivity analysis of input variables for predictive epistemic and aleatoric uncertainty. We use Bayesian neural networks with latent variables as a model class and illustrate the usefulness of our sensitivity analysis on real-world datasets. Our method increases the interpretability of complex black-box probabilistic models.
We derive a novel sensitivity analysis of input variables for predictive epistemic and aleatoric uncertainty. We use Bayesian neural networks with latent variables as a model class and illustrate the usefulness of our sensitivity analysis on real-world datasets. Our method increases the interpretability of complex black-box probabilistic models.
ES2018-81
Revisiting FISTA for Lasso: Acceleration Strategies Over The Regularization Path
Alejandro Catalina, Carlos M. Alaíz, José R. Dorronsoro
Revisiting FISTA for Lasso: Acceleration Strategies Over The Regularization Path
Alejandro Catalina, Carlos M. Alaíz, José R. Dorronsoro
Abstract:
In this work we revisit FISTA algorithm for Lasso showing that recent acceleration techniques may greatly improve its basic version, resulting in a much more competitive procedure. We study the contribu- tion of the different improvement strategies, showing experimentally that the final version becomes much faster than the standard one.
In this work we revisit FISTA algorithm for Lasso showing that recent acceleration techniques may greatly improve its basic version, resulting in a much more competitive procedure. We study the contribu- tion of the different improvement strategies, showing experimentally that the final version becomes much faster than the standard one.
Shallow and Deep models for transfer learning and domain adaptation
ES2018-5
Shallow and Deep Models for Domain Adaptation problems
Siamak Mehrkanoon, Matthew Blaschko , Johan Suykens
Shallow and Deep Models for Domain Adaptation problems
Siamak Mehrkanoon, Matthew Blaschko , Johan Suykens
Abstract:
Manual labeling of sufficient training data for diverse application domains is a costly, laborious task and often prohibitive. Therefore, designing models that can leverage rich labeled data in one domain and be applicable to a different but related domain is highly desirable. In particular, domain adaptation or transfer learning algorithms seek to generalize a model trained in a source domain to a new target domain. Recent years has witnessed increasing interest in these types of models due to their practical importance in real-life applications. In this paper we provide a brief overview of recent techniques with both shallow and deep architectures for domain adaptation models.
Manual labeling of sufficient training data for diverse application domains is a costly, laborious task and often prohibitive. Therefore, designing models that can leverage rich labeled data in one domain and be applicable to a different but related domain is highly desirable. In particular, domain adaptation or transfer learning algorithms seek to generalize a model trained in a source domain to a new target domain. Recent years has witnessed increasing interest in these types of models due to their practical importance in real-life applications. In this paper we provide a brief overview of recent techniques with both shallow and deep architectures for domain adaptation models.
ES2018-145
Unsupervised domain adaptation of deep object detectors
Debjeet Majumdar, Vinay Namboodiri
Unsupervised domain adaptation of deep object detectors
Debjeet Majumdar, Vinay Namboodiri
Abstract:
Domain adaptation has been understood and adopted in vision. Recently with the advent of deep learning there are a number of techniques that propose methods for deep learning based domain adaptation. However, the methods proposed have been used for adapting object classification techniques. In this paper, we solve for domain adaptation of object detection that is more commonly used. We adapt deep adaptation techniques for the Faster R-CNN framework. The techniques that we adapt are the recent techniques based on Gradient Reversal and Maximum Mean Discrepancy (MMD) reduction based techniques. Among them we show that the MK-MMD based method when used appropriately provides the best results. We analyze our model with standard real world settings by using Pascal VOC as source and MS-COCO as target and show a gain of 2.5 mAP at IoU of 0.5 over a source only trained model. We show that this improvement is statistically significant.
Domain adaptation has been understood and adopted in vision. Recently with the advent of deep learning there are a number of techniques that propose methods for deep learning based domain adaptation. However, the methods proposed have been used for adapting object classification techniques. In this paper, we solve for domain adaptation of object detection that is more commonly used. We adapt deep adaptation techniques for the Faster R-CNN framework. The techniques that we adapt are the recent techniques based on Gradient Reversal and Maximum Mean Discrepancy (MMD) reduction based techniques. Among them we show that the MK-MMD based method when used appropriately provides the best results. We analyze our model with standard real world settings by using Pascal VOC as source and MS-COCO as target and show a gain of 2.5 mAP at IoU of 0.5 over a source only trained model. We show that this improvement is statistically significant.
Machine Learning and Data Analysis in Astroinformatics
ES2018-2
Machine learning and data analysis in astroinformatics
Michael Biehl, Kerstin Bunte, Giuseppe Longo, Peter Tino
Machine learning and data analysis in astroinformatics
Michael Biehl, Kerstin Bunte, Giuseppe Longo, Peter Tino
Abstract:
Astroinformatics is a new discipline at the cross-road of as- tronomy, advanced statistics and computer science. With next generation sky surveys, space missions and modern instrumentation astronomy will enter the Petascale regime raising the demand for advanced computer sci- ence techniques with hard- and software solutions for data management, analysis, efficient automation and knowledge discovery. This tutorial re- views important developments in astroinformatics over the past years and discusses some relevant research questions and concrete problems. The contribution ends with a short review of the special session papers in these proceedings, as well as perspectives and challenges for the near future.
Astroinformatics is a new discipline at the cross-road of as- tronomy, advanced statistics and computer science. With next generation sky surveys, space missions and modern instrumentation astronomy will enter the Petascale regime raising the demand for advanced computer sci- ence techniques with hard- and software solutions for data management, analysis, efficient automation and knowledge discovery. This tutorial re- views important developments in astroinformatics over the past years and discusses some relevant research questions and concrete problems. The contribution ends with a short review of the special session papers in these proceedings, as well as perspectives and challenges for the near future.
ES2018-125
Anomaly detection in star light curves using hierarchical Gaussian processes
Haoyan Chen, Tom Diethe, Niall Twomey, Peter Flach
Anomaly detection in star light curves using hierarchical Gaussian processes
Haoyan Chen, Tom Diethe, Niall Twomey, Peter Flach
Abstract:
Here we examine astronomical time-series called light-curve data, which represent the brightness of celestial objects over a period of time. We focus specifically on the task of finding anomalies in three sets of light-curves of periodic variable stars. We employ a hierarchical Gaussian process to create a general and stable model of time series for anomaly detection, and apply this approach to the light curve problem. Hierarchical Gaussian processes require only a few additional parameters than Gaussian processes and incur negligible additional complexity. Additionally, the additional parameters are objectively optimised in a principled probabilistic framework. Experimentally, our approach outperforms several baselines and highlight several anomalous light curves in the datasets investigated.
Here we examine astronomical time-series called light-curve data, which represent the brightness of celestial objects over a period of time. We focus specifically on the task of finding anomalies in three sets of light-curves of periodic variable stars. We employ a hierarchical Gaussian process to create a general and stable model of time series for anomaly detection, and apply this approach to the light curve problem. Hierarchical Gaussian processes require only a few additional parameters than Gaussian processes and incur negligible additional complexity. Additionally, the additional parameters are objectively optimised in a principled probabilistic framework. Experimentally, our approach outperforms several baselines and highlight several anomalous light curves in the datasets investigated.
ES2018-130
Latent representations of transient candidates from an astronomical image difference pipeline using Variational Autoencoders
Pablo Huijse, Nicolas Astorga, Pablo Estevez, Giuliano Pignata
Latent representations of transient candidates from an astronomical image difference pipeline using Variational Autoencoders
Pablo Huijse, Nicolas Astorga, Pablo Estevez, Giuliano Pignata
Abstract:
The Chilean Automatic Supernovae SEarch (CHASE) is a survey designed to detect early Supernovae. In this paper we explore deep autoencoders to obtain a compressed latent space for a large transient candidate database from the CHASE image difference pipeline. Compared to conventional methods, the latent variables obtained with variational autoencoders preserve more information and are more discriminative towards real astronomical transients.
The Chilean Automatic Supernovae SEarch (CHASE) is a survey designed to detect early Supernovae. In this paper we explore deep autoencoders to obtain a compressed latent space for a large transient candidate database from the CHASE image difference pipeline. Compared to conventional methods, the latent variables obtained with variational autoencoders preserve more information and are more discriminative towards real astronomical transients.
ES2018-86
Globular Cluster Detection in the Gaia Survey
Mohammad Mohammadi, Reynier Peletier, Frank-Michael Schleif, Nicolai Petkov, Kerstin Bunte
Globular Cluster Detection in the Gaia Survey
Mohammad Mohammadi, Reynier Peletier, Frank-Michael Schleif, Nicolai Petkov, Kerstin Bunte
Abstract:
Existing algorithms for the detection of stellar structures in the Milky Way are most efficient when full phase-space and color information is available. This, however, is not often the case. Since recently, the Gaia satellite surveys the whole sky and is providing highly accurate positions for more than one billion sources. In this contribution we propose two independent strategies to find globular clusters in this database, based on magnitude distributions only. One approach is a nearest neighbor retrieval and the other an anomaly detection. Both techniques are able to find known globular clusters within our test frame consistently, as well as additional candidates for further investigation.
Existing algorithms for the detection of stellar structures in the Milky Way are most efficient when full phase-space and color information is available. This, however, is not often the case. Since recently, the Gaia satellite surveys the whole sky and is providing highly accurate positions for more than one billion sources. In this contribution we propose two independent strategies to find globular clusters in this database, based on magnitude distributions only. One approach is a nearest neighbor retrieval and the other an anomaly detection. Both techniques are able to find known globular clusters within our test frame consistently, as well as additional candidates for further investigation.
ES2018-100
stellar formation rates in galaxies using machine learning models
Michele Delli Veneri, Stefano Cavuoti, Massimo Brescia, Giuseppe Riccio, Giuseppe Longo
stellar formation rates in galaxies using machine learning models
Michele Delli Veneri, Stefano Cavuoti, Massimo Brescia, Giuseppe Riccio, Giuseppe Longo
Abstract:
Global Stellar Formation Rates or SFRs are crucial to constrain theories of galaxy formation and evolution. SFR’s are usually estimated via spectroscopic observations which require too much previous telescope time and therefore cannot match the needs of modern precision cosmology. We therefore propose a novel method to estimate SFRs for large samples of galaxies using a variety of supervised ML models.
Global Stellar Formation Rates or SFRs are crucial to constrain theories of galaxy formation and evolution. SFR’s are usually estimated via spectroscopic observations which require too much previous telescope time and therefore cannot match the needs of modern precision cosmology. We therefore propose a novel method to estimate SFRs for large samples of galaxies using a variety of supervised ML models.
ES2018-115
Prototype-based analysis of GAMA galaxy catalogue data
Aleke Nolte , Lingyu Wang, Michael Biehl
Prototype-based analysis of GAMA galaxy catalogue data
Aleke Nolte , Lingyu Wang, Michael Biehl
Abstract:
We present a prototype-based machine learning analysis of labeled galaxy catalogue data containing parameters from the Galaxy and Mass Assembly (GAMA) survey. Using both an unsupervised and supervised method, the Self-Organizing Map and Generalized Relevance Matrix Learning Vec- tor Quantization, we find that the data does not fully support the popular visual-inspection-based galaxy classification scheme employed to categorize the galaxies. In particular, only one class, the Little Blue Spheroids, is consistently separable from the other classes. In a proof-of-concept experiment, we present the galaxy parameters that are most discriminative for this class.
We present a prototype-based machine learning analysis of labeled galaxy catalogue data containing parameters from the Galaxy and Mass Assembly (GAMA) survey. Using both an unsupervised and supervised method, the Self-Organizing Map and Generalized Relevance Matrix Learning Vec- tor Quantization, we find that the data does not fully support the popular visual-inspection-based galaxy classification scheme employed to categorize the galaxies. In particular, only one class, the Little Blue Spheroids, is consistently separable from the other classes. In a proof-of-concept experiment, we present the galaxy parameters that are most discriminative for this class.
Deep Learning in Bioinformatics and Medicine
ES2018-1
Bioinformatics and medicine in the era of deep learning
Davide Bacciu, Paulo Lisboa, José D. Martín, Ruxandra Stoean, Alfredo Vellido
Bioinformatics and medicine in the era of deep learning
Davide Bacciu, Paulo Lisboa, José D. Martín, Ruxandra Stoean, Alfredo Vellido
Abstract:
Many of the current scientific advances in the life sciences have their origin in the intensive use of data for knowledge discovery. In no area this is so clear as in bioinformatics, led by technological breakthroughs in data acquisition technologies. It has been argued that bioinformatics could quickly become the field of research generating the largest data repositories, beating other data-intensive areas such as high-energy physics or astroinformatics. Over the last decade, deep learning has become a disruptive advance in machine learning, giving new live to the long-standing connectionist paradigm in artificial intelligence. Deep learning methods are ideally suited to large-scale data and, therefore, they should be ideally suited to knowledge discovery in bioinformatics and biomedicine at large. In this brief paper, we review key aspects of the application of deep learning in bioinformatics and medicine, drawing from the themes covered by the contributions to an ESANN 2018 special session devoted to this topic.
Many of the current scientific advances in the life sciences have their origin in the intensive use of data for knowledge discovery. In no area this is so clear as in bioinformatics, led by technological breakthroughs in data acquisition technologies. It has been argued that bioinformatics could quickly become the field of research generating the largest data repositories, beating other data-intensive areas such as high-energy physics or astroinformatics. Over the last decade, deep learning has become a disruptive advance in machine learning, giving new live to the long-standing connectionist paradigm in artificial intelligence. Deep learning methods are ideally suited to large-scale data and, therefore, they should be ideally suited to knowledge discovery in bioinformatics and biomedicine at large. In this brief paper, we review key aspects of the application of deep learning in bioinformatics and medicine, drawing from the themes covered by the contributions to an ESANN 2018 special session devoted to this topic.
ES2018-128
Controlling biological neural networks with deep reinforcement learning
Jan Wülfing, Sreedhar Saseendran Kumar, Joschka Boedecker, Martin Riedmiller, Ulrich Egert
Controlling biological neural networks with deep reinforcement learning
Jan Wülfing, Sreedhar Saseendran Kumar, Joschka Boedecker, Martin Riedmiller, Ulrich Egert
Abstract:
Targeted interaction with networks in the brain is of immense therapeutic relevance. The highly dynamic nature of neuronal networks and changes with progressive diseases create an urgent need for closed-loop control. Without adequate mathematical models of such complex networks, however, it remains unclear how tractable control problems can be formulated for neurobiological systems. Reinforcement learning (RL) could be a promising tool to address such challenges. Nevertheless, RL methods have rarely been applied to live, plastic neural networks. This study demonstrates that RL methods could help control response properties of biological neural networks with little prior knowledge of their complex dynamics.
Targeted interaction with networks in the brain is of immense therapeutic relevance. The highly dynamic nature of neuronal networks and changes with progressive diseases create an urgent need for closed-loop control. Without adequate mathematical models of such complex networks, however, it remains unclear how tractable control problems can be formulated for neurobiological systems. Reinforcement learning (RL) could be a promising tool to address such challenges. Nevertheless, RL methods have rarely been applied to live, plastic neural networks. This study demonstrates that RL methods could help control response properties of biological neural networks with little prior knowledge of their complex dynamics.
ES2018-14
Learning compressed representations of blood samples time series with missing data
Filippo Maria Bianchi, Karl Øyvind Mikalsen, Robert Jenssen
Learning compressed representations of blood samples time series with missing data
Filippo Maria Bianchi, Karl Øyvind Mikalsen, Robert Jenssen
Abstract:
Clinical measurements collected over time are naturally represented as multivariate time series (MTS), which often contain missing data. An autoencoder can learn low dimensional vectorial representations of MTS that preserve important data characteristics, but cannot deal explicitly with missing data. In this work, we propose a new framework that combines an autoencoder with the Time series Cluster Kernel (TCK), a kernel that accounts for missingness patterns in MTS. Via kernel alignment, we incorporate TCK in the autoencoder to improve the learned representations in presence of missing data. We consider a classification problem of MTS with missing values, representing blood samples of patients with surgical site infection. With our approach, rather than with a standard autoencoder, we learn representations in low dimensions that can be classified better.
Clinical measurements collected over time are naturally represented as multivariate time series (MTS), which often contain missing data. An autoencoder can learn low dimensional vectorial representations of MTS that preserve important data characteristics, but cannot deal explicitly with missing data. In this work, we propose a new framework that combines an autoencoder with the Time series Cluster Kernel (TCK), a kernel that accounts for missingness patterns in MTS. Via kernel alignment, we incorporate TCK in the autoencoder to improve the learned representations in presence of missing data. We consider a classification problem of MTS with missing values, representing blood samples of patients with surgical site infection. With our approach, rather than with a standard autoencoder, we learn representations in low dimensions that can be classified better.
ES2018-59
Sleep staging with deep learning: a convolutional model
Isaac Fernández-Varela, Dimitrios Athanasakis, Samuel Parsons, Elena Hernández-Pereira, Vicente Moret-Bonillo
Sleep staging with deep learning: a convolutional model
Isaac Fernández-Varela, Dimitrios Athanasakis, Samuel Parsons, Elena Hernández-Pereira, Vicente Moret-Bonillo
Abstract:
Sleep staging is a crucial task in the context of sleep studies that involves the analysis of multiple signals, thus being a very tedious and complex task. Even for a trained expert, it can take several hours to annotate the signals recorded from a patient's sleep during a single night. To solve this problem several automatic methods have been developed, although most of them rely on hand engineered features. To address the inner problems of this approach, in this work we explore the possibility of solving this problem with a deep learning network that can self-learn the relevant features from the signals. Particularly, we propose a convolutional network, obtaining higher performance than in previous methods, achieving an average precision of 0.91, recall of 0.90, and F-1 score of 0.90.
Sleep staging is a crucial task in the context of sleep studies that involves the analysis of multiple signals, thus being a very tedious and complex task. Even for a trained expert, it can take several hours to annotate the signals recorded from a patient's sleep during a single night. To solve this problem several automatic methods have been developed, although most of them rely on hand engineered features. To address the inner problems of this approach, in this work we explore the possibility of solving this problem with a deep learning network that can self-learn the relevant features from the signals. Particularly, we propose a convolutional network, obtaining higher performance than in previous methods, achieving an average precision of 0.91, recall of 0.90, and F-1 score of 0.90.
ES2018-82
Interpreting deep learning models for ordinal problems
José P. Amorim , Inês Domingues, Pedro Henriques Abreu, João Santos
Interpreting deep learning models for ordinal problems
José P. Amorim , Inês Domingues, Pedro Henriques Abreu, João Santos
Abstract:
Machine learning algorithms have evolved by exchanging simplicity and interpretability for accuracy, which prevents their adoption in critical tasks such as healthcare. Progress can be made by improving interpretability of complex models while preserving performance. This work introduces an extension of interpretable mimic learning which teaches interpretable models to mimic predictions of complex deep neural networks, not only on binary problems but also in ordinal settings. The results show that the mimic models have comparative performance to Deep Neural Network models, with the advantage of being interpretable.
Machine learning algorithms have evolved by exchanging simplicity and interpretability for accuracy, which prevents their adoption in critical tasks such as healthcare. Progress can be made by improving interpretability of complex models while preserving performance. This work introduces an extension of interpretable mimic learning which teaches interpretable models to mimic predictions of complex deep neural networks, not only on binary problems but also in ordinal settings. The results show that the mimic models have comparative performance to Deep Neural Network models, with the advantage of being interpretable.
ES2018-62
Non-negative Matrix Factorization for Medical Imaging
Miguel Atencia, Ruxandra Stoean
Non-negative Matrix Factorization for Medical Imaging
Miguel Atencia, Ruxandra Stoean
Abstract:
A non-negative matrix factorization approach to dimensionality reduction is proposed to aid classification of images. The original images can be stored as lower-dimensional columns of a matrix that hold degrees of belonging to feature components, so they can be used in the training phase of the classification at lower runtime and without loss in accuracy. The extracted features can be visually examined and images reconstructed with limited error. The proof of concept is performed on a benchmark of handwritten digits, followed by the application to histopathological colorectal cancer slides. Results are encouraging, though dealing with real-world medical data raises a number of issues.
A non-negative matrix factorization approach to dimensionality reduction is proposed to aid classification of images. The original images can be stored as lower-dimensional columns of a matrix that hold degrees of belonging to feature components, so they can be used in the training phase of the classification at lower runtime and without loss in accuracy. The extracted features can be visually examined and images reconstructed with limited error. The proof of concept is performed on a benchmark of handwritten digits, followed by the application to histopathological colorectal cancer slides. Results are encouraging, though dealing with real-world medical data raises a number of issues.
ES2018-93
Multi-omics data integration using cross-modal neural networks
Ioana Bica, Petar Velickovic, Hui Xiao
Multi-omics data integration using cross-modal neural networks
Ioana Bica, Petar Velickovic, Hui Xiao
Abstract:
Successful integration of multi-omics data for prediction tasks can bring significant advantages to precision medicine and to understanding molecular systems. This paper introduces a novel neural network architecture for exploring and integrating modalities in omics datasets, especially in scenarios with a limited number of training examples available. The proposed cross-modal neural network achieves up to 99% accuracy on omics datasets and it can be reliably used as a tool for performing inference. Moreover, we show how analysis of the weights and activations in the network can give us biological insights into understanding which genes are most relevant for the decision process and how different types of omics influence each other.
Successful integration of multi-omics data for prediction tasks can bring significant advantages to precision medicine and to understanding molecular systems. This paper introduces a novel neural network architecture for exploring and integrating modalities in omics datasets, especially in scenarios with a limited number of training examples available. The proposed cross-modal neural network achieves up to 99% accuracy on omics datasets and it can be reliably used as a tool for performing inference. Moreover, we show how analysis of the weights and activations in the network can give us biological insights into understanding which genes are most relevant for the decision process and how different types of omics influence each other.
ES2018-131
DEEP: decomposition feature enhancement procedure for graphs
Van Dinh Tran, Nicolò Navarin, Alessandro Sperduti
DEEP: decomposition feature enhancement procedure for graphs
Van Dinh Tran, Nicolò Navarin, Alessandro Sperduti
Abstract:
When dealing with machine learning on graphs, one of the most successfully approaches is the one of kernel methods. Depending if one is interested in predicting properties of graphs (e.g. graph classification) or to predict properties of nodes in a single graph (e.g. graph node classification), different kernel functions should be adopted. In the last few years, several kernels for graphs have been defined in literature that extract local features from the input graphs, obtaining both efficiency and state-of-the-art predictive performances. Recently, some work has been done in this direction also regarding graph node kernels, but the majority of the graph node kernels available in literature consider only global information, that can be not optimal for many tasks. In this paper, we propose a procedure that allows to transform a local graph kernel in a kernel for nodes in a single, huge graph. We apply a specific instantiation to the task of disease gene prioritization from the bioinformatics domain, improving the state of the art in many diseases.
When dealing with machine learning on graphs, one of the most successfully approaches is the one of kernel methods. Depending if one is interested in predicting properties of graphs (e.g. graph classification) or to predict properties of nodes in a single graph (e.g. graph node classification), different kernel functions should be adopted. In the last few years, several kernels for graphs have been defined in literature that extract local features from the input graphs, obtaining both efficiency and state-of-the-art predictive performances. Recently, some work has been done in this direction also regarding graph node kernels, but the majority of the graph node kernels available in literature consider only global information, that can be not optimal for many tasks. In this paper, we propose a procedure that allows to transform a local graph kernel in a kernel for nodes in a single, huge graph. We apply a specific instantiation to the task of disease gene prioritization from the bioinformatics domain, improving the state of the art in many diseases.
ES2018-163
Deep Echo State Networks for Diagnosis of Parkinson's Disease
Claudio Gallicchio, Alessio Micheli, Luca Pedrelli
Deep Echo State Networks for Diagnosis of Parkinson's Disease
Claudio Gallicchio, Alessio Micheli, Luca Pedrelli
Abstract:
In this paper, we introduce a novel approach for diagnosis of Parkinson's Disease (PD) based on deep Echo State Networks (ESNs). The identification of PD is performed by analyzing the whole time-series collected from a tablet device during the sketching of spiral tests, without the need for feature extraction and data preprocessing. We evaluated the proposed approach on a public dataset of spiral tests. The results of experimental analysis show that deepESNs perform significantly better than shallow ESN model. Overall, the proposed approach obtains state-of-the-art results in the identification of PD on this kind of temporal data.
In this paper, we introduce a novel approach for diagnosis of Parkinson's Disease (PD) based on deep Echo State Networks (ESNs). The identification of PD is performed by analyzing the whole time-series collected from a tablet device during the sketching of spiral tests, without the need for feature extraction and data preprocessing. We evaluated the proposed approach on a public dataset of spiral tests. The results of experimental analysis show that deepESNs perform significantly better than shallow ESN model. Overall, the proposed approach obtains state-of-the-art results in the identification of PD on this kind of temporal data.
ES2018-180
Capturing variabilities from Computed Tomography images with Generative Adversarial Networks (GANs)
UMAIR JAVAID, John A. Lee
Capturing variabilities from Computed Tomography images with Generative Adversarial Networks (GANs)
UMAIR JAVAID, John A. Lee
Abstract:
With the advent of Deep Learning (DL) techniques, especially Generative Adversarial Networks (GANs), data augmentation and generation are quickly evolving domains that have raised much interest recently. However, the DL techniques are data demanding and since, medical data is not easily accessible, they suffer from the data insufficiency. To deal with this limitation, different data augmentation techniques are used. Here, we propose a novel unsupervised data-driven approach for data augmentation that can generate 2D Computed Tomography (CT) images using a simple GAN. The generated CT images have good global and local features of a real CT image and can be used to augment the training datasets for effective learning. In this proof-of-concept study, we show that our proposed solution using GANs is able to capture some of the global and local CT variabilities. Our network is able to generate visually realistic CT images and we aim to further enhance its output by scaling it to a higher resolution and potentially from 2D to 3D.
With the advent of Deep Learning (DL) techniques, especially Generative Adversarial Networks (GANs), data augmentation and generation are quickly evolving domains that have raised much interest recently. However, the DL techniques are data demanding and since, medical data is not easily accessible, they suffer from the data insufficiency. To deal with this limitation, different data augmentation techniques are used. Here, we propose a novel unsupervised data-driven approach for data augmentation that can generate 2D Computed Tomography (CT) images using a simple GAN. The generated CT images have good global and local features of a real CT image and can be used to augment the training datasets for effective learning. In this proof-of-concept study, we show that our proposed solution using GANs is able to capture some of the global and local CT variabilities. Our network is able to generate visually realistic CT images and we aim to further enhance its output by scaling it to a higher resolution and potentially from 2D to 3D.
ES2018-199
Pollen grain recognition using convolutional neural network
Natalia Khanzhina, Evgeny Putin, Andrey Filchenkov, Elena Zamyatina
Pollen grain recognition using convolutional neural network
Natalia Khanzhina, Evgeny Putin, Andrey Filchenkov, Elena Zamyatina
Abstract:
This paper addresses two problems: the automated pollen species recognition and counting them on an image obtained with a lighting microscope. Automation of pollen recognition is required in several domains, including allergy and asthma prevention in medicine and honey quality control in the nutrition industry. We propose a deep learning solution based on a convolutional neural network for classification, feature extraction and image segmentation. Our approach achieves state-of-the-art results in terms of accuracy. For 5 species, the approach provides 99.8% of accuracy, for 11 species - 95.9%.
This paper addresses two problems: the automated pollen species recognition and counting them on an image obtained with a lighting microscope. Automation of pollen recognition is required in several domains, including allergy and asthma prevention in medicine and honey quality control in the nutrition industry. We propose a deep learning solution based on a convolutional neural network for classification, feature extraction and image segmentation. Our approach achieves state-of-the-art results in terms of accuracy. For 5 species, the approach provides 99.8% of accuracy, for 11 species - 95.9%.
Randomized Neural Networks
ES2018-6
Randomized Recurrent Neural Networks
Claudio Gallicchio, Alessio Micheli, Peter Tino
Randomized Recurrent Neural Networks
Claudio Gallicchio, Alessio Micheli, Peter Tino
ES2018-49
Bidirectional deep-readout echo state networks
Filippo Maria Bianchi, Simone Scardapane, Sigurd Løkse, Robert Jenssen
Bidirectional deep-readout echo state networks
Filippo Maria Bianchi, Simone Scardapane, Sigurd Løkse, Robert Jenssen
Abstract:
We propose a deep architecture for the classification of multivariate time series. By means of a recurrent and untrained reservoir we generate a vectorial representation that embeds temporal relationships in the data. To improve the memorization capability, we implement a bidirectional reservoir, whose last state captures also past dependencies in the input. We apply dimensionality reduction to the final reservoir states to obtain compressed fixed size representations of the time series. These are subsequently fed into a deep feedforward network trained to perform the final classification. We test our architecture on benchmark datasets and on a real-world use-case of blood samples classification. Results show that our method performs better than a standard echo state network and, at the same time, achieves results comparable to a fully-trained recurrent network, but with a faster training.
We propose a deep architecture for the classification of multivariate time series. By means of a recurrent and untrained reservoir we generate a vectorial representation that embeds temporal relationships in the data. To improve the memorization capability, we implement a bidirectional reservoir, whose last state captures also past dependencies in the input. We apply dimensionality reduction to the final reservoir states to obtain compressed fixed size representations of the time series. These are subsequently fed into a deep feedforward network trained to perform the final classification. We test our architecture on benchmark datasets and on a real-world use-case of blood samples classification. Results show that our method performs better than a standard echo state network and, at the same time, achieves results comparable to a fully-trained recurrent network, but with a faster training.
ES2018-105
Forecasting Business Failure in Highly Imbalanced Distribution based on Delay Line Reservoir
Ali Rodan, Pedro A. Castillo, Hossam Faris, A.M. Mora, Huthaifa Jawazneh
Forecasting Business Failure in Highly Imbalanced Distribution based on Delay Line Reservoir
Ali Rodan, Pedro A. Castillo, Hossam Faris, A.M. Mora, Huthaifa Jawazneh
Abstract:
Bankruptcy is a critical financial problem that affects a high number of companies around the world. Thus, in recent years an increasing number of researchers have tried to solve it by applying different machine-learning models as powerful tools for the different economical agents related to the company. In this work, we propose the use of a simple deterministic delay line reservoir (DLR) state space by combining it with three popular classification algorithms (J48, k-NN, and MLP) as an efficient and accurate solution to the bankruptcy prediction problem. The proposed approach is evaluated on a real world dataset collected from Spanish companies. Obtained results show that the proposed models have a higher predictive ability than traditional classification approaches (without DLR reservoir state), resulting in a suitable and efficient alternative approach to solve this complex problem.
Bankruptcy is a critical financial problem that affects a high number of companies around the world. Thus, in recent years an increasing number of researchers have tried to solve it by applying different machine-learning models as powerful tools for the different economical agents related to the company. In this work, we propose the use of a simple deterministic delay line reservoir (DLR) state space by combining it with three popular classification algorithms (J48, k-NN, and MLP) as an efficient and accurate solution to the bankruptcy prediction problem. The proposed approach is evaluated on a real world dataset collected from Spanish companies. Obtained results show that the proposed models have a higher predictive ability than traditional classification approaches (without DLR reservoir state), resulting in a suitable and efficient alternative approach to solve this complex problem.
ES2018-172
Estimation of the Human Concentration using Echo State Networks
Hikmat Dashdamirov, Sebastián Basterrech
Estimation of the Human Concentration using Echo State Networks
Hikmat Dashdamirov, Sebastián Basterrech
Abstract:
We introduce a very simple and portable device for estimating the human concentration. We developed a Brain-Computer Interface system based on EEG signals which is able to produce highly accurate prediction of the human activities. There are two types of mental activities, one requires high concentration and another one requires relaxation. We show that it is possible to estimate the human concentration with few brain signals. The classification problem is solved using Neural Networks. In particular, we obtain a very accurate classifier using the fast and robust Echo State Network method.
We introduce a very simple and portable device for estimating the human concentration. We developed a Brain-Computer Interface system based on EEG signals which is able to produce highly accurate prediction of the human activities. There are two types of mental activities, one requires high concentration and another one requires relaxation. We show that it is possible to estimate the human concentration with few brain signals. The classification problem is solved using Neural Networks. In particular, we obtain a very accurate classifier using the fast and robust Echo State Network method.
ES2018-176
Quantifying the Reservoir Quality using Dimensionality Reduction Techniques
Tomas Burianek, Sebastián Basterrech
Quantifying the Reservoir Quality using Dimensionality Reduction Techniques
Tomas Burianek, Sebastián Basterrech
Abstract:
Echo State Network is a particular type of Recurrent Neural Networks that combines principles from kernels, linear regression and dynamical systems. The neural network has a random initialized hidden-hidden weights (reservoir) that keeps fixed during the training. The reservoir projects the input patterns onto a feature map. Here, we present a correlation analysis between the input space and the feature map. We use a dimensionality reduction technique (Sammon Mapping) for representing the input space. We show a correlation between the Sammon energy and the model accuracy, which can be useful for defining good reservoir topologies.
Echo State Network is a particular type of Recurrent Neural Networks that combines principles from kernels, linear regression and dynamical systems. The neural network has a random initialized hidden-hidden weights (reservoir) that keeps fixed during the training. The reservoir projects the input patterns onto a feature map. Here, we present a correlation analysis between the input space and the feature map. We use a dimensionality reduction technique (Sammon Mapping) for representing the input space. We show a correlation between the Sammon energy and the model accuracy, which can be useful for defining good reservoir topologies.
Clustering and feature selection
ES2018-134
Scalable robust clustering method for large and sparse data
Joonas Hämäläinen, Tommi Kärkkäinen, Tuomo Rossi
Scalable robust clustering method for large and sparse data
Joonas Hämäläinen, Tommi Kärkkäinen, Tuomo Rossi
Abstract:
Datasets for unsupervised clustering can be large and sparse, with significant portion of missing values. We present here a scalable version of a robust clustering method with the available data strategy. More precisely, a general algorithm is described and the accuracy and scalability of a distributed implementation of the algorithm is tested. The obtained results allow us to conclude the viability of the proposed approach.
Datasets for unsupervised clustering can be large and sparse, with significant portion of missing values. We present here a scalable version of a robust clustering method with the available data strategy. More precisely, a general algorithm is described and the accuracy and scalability of a distributed implementation of the algorithm is tested. The obtained results allow us to conclude the viability of the proposed approach.
ES2018-22
clustering with decision trees: divisive and agglomerative approach
Lauriane Castin, Benoît Frénay
clustering with decision trees: divisive and agglomerative approach
Lauriane Castin, Benoît Frénay
Abstract:
Decision trees are mainly used to perform classification tasks. Samples are submitted to a test in each node of the tree and guided through the tree based on the result. Decision trees can also be used to perform clustering, with a few adjustments. On one hand, new split criteria must be discovered to construct the tree without the knowledge of samples labels. On the other hand, new algorithms must be applied to merge sub-clusters at leaf nodes into actual clusters. In this paper, new split criteria and agglomeration algorithms are developed for clustering, with results comparable to other existing clustering techniques.
Decision trees are mainly used to perform classification tasks. Samples are submitted to a test in each node of the tree and guided through the tree based on the result. Decision trees can also be used to perform clustering, with a few adjustments. On one hand, new split criteria must be discovered to construct the tree without the knowledge of samples labels. On the other hand, new algorithms must be applied to merge sub-clusters at leaf nodes into actual clusters. In this paper, new split criteria and agglomeration algorithms are developed for clustering, with results comparable to other existing clustering techniques.
ES2018-16
Comparison of cluster validation indices with missing data
Marko Niemelä, Sami Äyrämö, Tommi Kärkkäinen
Comparison of cluster validation indices with missing data
Marko Niemelä, Sami Äyrämö, Tommi Kärkkäinen
Abstract:
Clustering is an unsupervised machine learning technique, which aims to divide a given set of data into subsets. The number of hidden groups in cluster analysis is not always obvious and, for this purpose, various cluster validation indices have been suggested. Recently some studies reviewing validation indices have been provided, but any experiments against missing data are not yet available. In this paper, performance of ten well-known indices on ten synthetic data sets with various ratios of missing values is measured using squared euclidean and city block distances based clustering. The original indices are modified for a city block distance in novel way. Experiments illustrate different degree of stability for the indices with respect to the missing data.
Clustering is an unsupervised machine learning technique, which aims to divide a given set of data into subsets. The number of hidden groups in cluster analysis is not always obvious and, for this purpose, various cluster validation indices have been suggested. Recently some studies reviewing validation indices have been provided, but any experiments against missing data are not yet available. In this paper, performance of ten well-known indices on ten synthetic data sets with various ratios of missing values is measured using squared euclidean and city block distances based clustering. The original indices are modified for a city block distance in novel way. Experiments illustrate different degree of stability for the indices with respect to the missing data.
ES2018-122
Efficient approximate representations for computationally expensive features
Raul Santos-Rodriguez, Niall Twomey
Efficient approximate representations for computationally expensive features
Raul Santos-Rodriguez, Niall Twomey
Abstract:
High computational complexity is often a barrier to achieving desired representations in resource-constrained settings. This paper introduces a simple and computationally cheap method of approximating complex features. We do so by carefully constraining the architecture of a neural network and regress from raw data to the desired feature representation. Our analysis focuses on spectral features, and demonstrate how low-capacity networks can capture the end-to-end dynamics of cascaded composite functions. Not only do approximating neural networks simplify the analysis pipeline, but our approach produces feature representations up to 20 times more quickly. Excellent feature fidelity is achieved in our experimental analysis with feature approximations, but we also report nearly indistinguishable predictive performance when comparing between exact and approximate representations.
High computational complexity is often a barrier to achieving desired representations in resource-constrained settings. This paper introduces a simple and computationally cheap method of approximating complex features. We do so by carefully constraining the architecture of a neural network and regress from raw data to the desired feature representation. Our analysis focuses on spectral features, and demonstrate how low-capacity networks can capture the end-to-end dynamics of cascaded composite functions. Not only do approximating neural networks simplify the analysis pipeline, but our approach produces feature representations up to 20 times more quickly. Excellent feature fidelity is achieved in our experimental analysis with feature approximations, but we also report nearly indistinguishable predictive performance when comparing between exact and approximate representations.
ES2018-167
Regularised maximum-likelihood inference of mixture of experts for regression and clustering
Bao Tuyen Huynh, Faicel Chamroukhi
Regularised maximum-likelihood inference of mixture of experts for regression and clustering
Bao Tuyen Huynh, Faicel Chamroukhi
Abstract:
Variable selection is fundamental to high-dimensional statistical modeling, and is challenging in particular in unsupervised modeling, including mixture models. We propose a regularised maximum-likelihood inference of the Mixture of Experts model which is able to deal with potentially correlated features and encourages sparse models in a potentially high-dimensional scenarios. We develop a hybrid Expectation-Majorization-Maximization (EM/MM) algorithm for model fitting. Unlike state-of-the art regularised ML inference [1,2], the proposed modeling doesn't require an approximate of the regularisation. The proposed algorithm allows to automatically obtain sparse solutions without thresholding, and includes coordinate descent updates avoiding matrix inversion. An experimental study shows the capability of the algorithm to retrieve sparse solutions and for model fitting in model-based clustering of regression data.
Variable selection is fundamental to high-dimensional statistical modeling, and is challenging in particular in unsupervised modeling, including mixture models. We propose a regularised maximum-likelihood inference of the Mixture of Experts model which is able to deal with potentially correlated features and encourages sparse models in a potentially high-dimensional scenarios. We develop a hybrid Expectation-Majorization-Maximization (EM/MM) algorithm for model fitting. Unlike state-of-the art regularised ML inference [1,2], the proposed modeling doesn't require an approximate of the regularisation. The proposed algorithm allows to automatically obtain sparse solutions without thresholding, and includes coordinate descent updates avoiding matrix inversion. An experimental study shows the capability of the algorithm to retrieve sparse solutions and for model fitting in model-based clustering of regression data.
ES2018-179
Feature selection for label ranking
Noelia Sánchez-Maroño, Beatriz Pérez-Sánchez
Feature selection for label ranking
Noelia Sánchez-Maroño, Beatriz Pérez-Sánchez
Abstract:
Over the last years, feature selection and label ranking have attracted considerable attention in Artificial Intelligence research. Feature selection has been applied to many machine learning problems with excellent results. However, studies about its combination with label ranking are undeveloped. This paper presents a novelty work that uses feature selection filters as a preprocessing step for label ranking. Experimental results show a significant reduction, up to 33%, in the number of features used for the label ranking problems whereas the performance results are competitive in terms of similarity measure.
Over the last years, feature selection and label ranking have attracted considerable attention in Artificial Intelligence research. Feature selection has been applied to many machine learning problems with excellent results. However, studies about its combination with label ranking are undeveloped. This paper presents a novelty work that uses feature selection filters as a preprocessing step for label ranking. Experimental results show a significant reduction, up to 33%, in the number of features used for the label ranking problems whereas the performance results are competitive in terms of similarity measure.
ES2018-57
A novel filter algorithm for unsupervised feature selection based on a space filling measure
Mohamed Laib, Mikhaïl Kanevski
A novel filter algorithm for unsupervised feature selection based on a space filling measure
Mohamed Laib, Mikhaïl Kanevski
Abstract:
The research proposes a novel filter algorithm for the unsupervised feature selection problems based on a space filling measure. A well-known criterion of space filling design, called the coverage measure, is adapted to dimensionality reduction problems. Originally, this measure was developed to judge the quality of a space filling design. In this work it is used to reduce the redundancy in data. The proposed algorithm is evaluated on simulated data with several scenarios of noise injection. Furthermore, a comparison with some benchmark methods of feature selection is performed on real UCI datasets.
The research proposes a novel filter algorithm for the unsupervised feature selection problems based on a space filling measure. A well-known criterion of space filling design, called the coverage measure, is adapted to dimensionality reduction problems. Originally, this measure was developed to judge the quality of a space filling design. In this work it is used to reduce the redundancy in data. The proposed algorithm is evaluated on simulated data with several scenarios of noise injection. Furthermore, a comparison with some benchmark methods of feature selection is performed on real UCI datasets.
Mathematical aspects of learning, and reinforcement learning
ES2018-45
Asymptotic statistics for multilayer perceptron with ReLu hidden units
joseph Rynkiewicz
Asymptotic statistics for multilayer perceptron with ReLu hidden units
joseph Rynkiewicz
Abstract:
We consider regression models involving multilayer perceptrons (MLP) with rectified linear unit (ReLu) functions for hidden units. It is a difficult task to study statistical properties of such models for several reasons: A first difficulty is that these activation functions are not differentiable everywhere, a second reason is also that in practice these models may be heavily overparametrized. In general, the estimation of the parameters of the MLP is done by minimizing a cost function, we focus here on the sum of square errors (SSE) which is the standard cost function for regression purpose. In this framework, we can characterize the asymptotic behavior of the SSE of estimated models which give information on the possible overfitting of such models. This task is done using recent methodology introduced to deal with models with a loss of identifiability which is very flexible. So, we don't have to assume that a true model exits or that a finite set of parameters realize the best regression function.
We consider regression models involving multilayer perceptrons (MLP) with rectified linear unit (ReLu) functions for hidden units. It is a difficult task to study statistical properties of such models for several reasons: A first difficulty is that these activation functions are not differentiable everywhere, a second reason is also that in practice these models may be heavily overparametrized. In general, the estimation of the parameters of the MLP is done by minimizing a cost function, we focus here on the sum of square errors (SSE) which is the standard cost function for regression purpose. In this framework, we can characterize the asymptotic behavior of the SSE of estimated models which give information on the possible overfitting of such models. This task is done using recent methodology introduced to deal with models with a loss of identifiability which is very flexible. So, we don't have to assume that a true model exits or that a finite set of parameters realize the best regression function.
ES2018-139
Local Rademacher Complexity Machine
Luca Oneto, Sandro Ridella, Davide Anguita
Local Rademacher Complexity Machine
Luca Oneto, Sandro Ridella, Davide Anguita
Abstract:
In this paper we present the Local Rademacher Complexity Machine, a transposition of the Local Rademacher Complexity Theory into a learning algorithm. By exploiting a series of real world small-sample datasets, we show the advantages of our proposal with respect to the Support Vector Machines, i.e. the transposition of the milestone results of V. N. Vapnik and A. Chervonenkis into a learning algorithm.
In this paper we present the Local Rademacher Complexity Machine, a transposition of the Local Rademacher Complexity Theory into a learning algorithm. By exploiting a series of real world small-sample datasets, we show the advantages of our proposal with respect to the Support Vector Machines, i.e. the transposition of the milestone results of V. N. Vapnik and A. Chervonenkis into a learning algorithm.
ES2018-174
A sharper bound on the Rademacher complexity of margin multi-category classifiers
Khadija Musayeva, Fabien Lauer, Yann Guermeur
A sharper bound on the Rademacher complexity of margin multi-category classifiers
Khadija Musayeva, Fabien Lauer, Yann Guermeur
Abstract:
One of the main open problems in the theory of margin multi-category pattern classification is the dependency of a guaranteed risk on the number C of categories, the sample size m and the margin parameter gamma. This paper derives a new bound on the probability of error of margin multi-category classifiers under minimal learnability assumptions. It improves the dependency on C over the state of the art. This is achieved through the introduction of a new Sauer-Shelah lemma.
One of the main open problems in the theory of margin multi-category pattern classification is the dependency of a guaranteed risk on the number C of categories, the sample size m and the margin parameter gamma. This paper derives a new bound on the probability of error of margin multi-category classifiers under minimal learnability assumptions. It improves the dependency on C over the state of the art. This is achieved through the introduction of a new Sauer-Shelah lemma.
ES2018-123
Slowness-based neural visuomotor control with an Intrinsically motivated Continuous Actor-Critic
Muhammad Burhan Hafez, Matthias Kerzel, Cornelius Weber, Stefan Wermter
Slowness-based neural visuomotor control with an Intrinsically motivated Continuous Actor-Critic
Muhammad Burhan Hafez, Matthias Kerzel, Cornelius Weber, Stefan Wermter
Abstract:
In this paper, we present a new visually guided exploration approach for autonomous learning of visuomotor skills. Our approach uses hierarchical Slow Feature Analysis for unsupervised learning of efficient state representation and an Intrinsically motivated Continuous Actor-Critic learner for neuro-optimal control. The system learns online an ensemble of local forward models and generates an intrinsic reward based on the learning progress of each learned forward model. Combined with the external reward, the intrinsic reward guides the system’s exploration strategy. We evaluate the approach for the task of learning to reach an object using raw pixel data in a realistic robot simulator. The results show that the control policies learned with our approach are significantly better both in terms of length and average reward than those learned with any of the baseline algorithms.
In this paper, we present a new visually guided exploration approach for autonomous learning of visuomotor skills. Our approach uses hierarchical Slow Feature Analysis for unsupervised learning of efficient state representation and an Intrinsically motivated Continuous Actor-Critic learner for neuro-optimal control. The system learns online an ensemble of local forward models and generates an intrinsic reward based on the learning progress of each learned forward model. Combined with the external reward, the intrinsic reward guides the system’s exploration strategy. We evaluate the approach for the task of learning to reach an object using raw pixel data in a realistic robot simulator. The results show that the control policies learned with our approach are significantly better both in terms of length and average reward than those learned with any of the baseline algorithms.
ES2018-177
A variable projection method for block term decomposition of higher-order tensors
Guillaume Olikier, Pierre-Antoine Absil, Lieven De Lathauwer
A variable projection method for block term decomposition of higher-order tensors
Guillaume Olikier, Pierre-Antoine Absil, Lieven De Lathauwer
Abstract:
Higher-order tensors have become popular in many areas of applied mathematics such as statistics, scientific computing, signal processing or machine learning, notably thanks to the many possible ways of decomposing a tensor. In this paper, we focus on the best approximation in the least-squares sense of a higher-order tensor by a block term decomposition. Using variable projection, we express the tensor approximation problem as a minimization of a cost function on a Cartesian product of Stiefel manifolds. We present numerical experiments where variable projection makes a steepest-descent method approximately twice faster.
Higher-order tensors have become popular in many areas of applied mathematics such as statistics, scientific computing, signal processing or machine learning, notably thanks to the many possible ways of decomposing a tensor. In this paper, we focus on the best approximation in the least-squares sense of a higher-order tensor by a block term decomposition. Using variable projection, we express the tensor approximation problem as a minimization of a cost function on a Cartesian product of Stiefel manifolds. We present numerical experiments where variable projection makes a steepest-descent method approximately twice faster.
ES2018-50
Reinforcement Learning for High-Frequency Market Making
Ye-Sheen Lim, Denise Gorse
Reinforcement Learning for High-Frequency Market Making
Ye-Sheen Lim, Denise Gorse
Abstract:
In this paper we present the first practical application of reinforcement learning to optimal market making in high-frequency trading. States, actions, and reward formulations unique to high-frequency market making are proposed, including a novel use of the CARA utility as a terminal reward for improving learning. We show that the optimal policy trained using Q-learning outperforms state-of-the-art market making algorithms. Finally, we analyse the optimal reinforcement learning policies, and the influence of the CARA utility from a trading perspective.
In this paper we present the first practical application of reinforcement learning to optimal market making in high-frequency trading. States, actions, and reward formulations unique to high-frequency market making are proposed, including a novel use of the CARA utility as a terminal reward for improving learning. We show that the optimal policy trained using Q-learning outperforms state-of-the-art market making algorithms. Finally, we analyse the optimal reinforcement learning policies, and the influence of the CARA utility from a trading perspective.
Emerging trends in machine learning: beyond conventional methods and data
ES2018-4
Emerging trends in machine learning: beyond conventional methods and data
Luca Oneto, Nicolò Navarin, Michele Donini, Davide Anguita
Emerging trends in machine learning: beyond conventional methods and data
Luca Oneto, Nicolò Navarin, Michele Donini, Davide Anguita
Abstract:
Recently, new promising theoretical results, techniques, and methodologies have attracted the attention of many researchers and have allowed to broaden the range of applications in which machine learning can be effectively applied in order to extract useful and actionable information from the huge amount of heterogeneous data produced everyday by an increasingly digital world. Examples of these methods and problems are: learning under privacy and anonymity constraints, learning from structured, semi-structured, multi-modal (heterogeneous) data, constructive machine learning, reliable machine learning, learning to learn, mixing deep and structured learning, semantics-enabled recommender systems, reproducibility and interpretability in machine learning, human-in-the-loop, adversarial learning. The focus of this special session is to attract both solid contributions or preliminary results which show the potentiality and the limitations of new ideas, refinements, or contaminations between the different fields of machine learning and other fields of research in solving real world problems. Both theoretical and practical results are welcome to our special session.
Recently, new promising theoretical results, techniques, and methodologies have attracted the attention of many researchers and have allowed to broaden the range of applications in which machine learning can be effectively applied in order to extract useful and actionable information from the huge amount of heterogeneous data produced everyday by an increasingly digital world. Examples of these methods and problems are: learning under privacy and anonymity constraints, learning from structured, semi-structured, multi-modal (heterogeneous) data, constructive machine learning, reliable machine learning, learning to learn, mixing deep and structured learning, semantics-enabled recommender systems, reproducibility and interpretability in machine learning, human-in-the-loop, adversarial learning. The focus of this special session is to attract both solid contributions or preliminary results which show the potentiality and the limitations of new ideas, refinements, or contaminations between the different fields of machine learning and other fields of research in solving real world problems. Both theoretical and practical results are welcome to our special session.
ES2018-89
Finding the most interpretable MDS rotation for sparse linear models based on external features
Adrien Bibal, Rebecca Marion, Benoît Frénay
Finding the most interpretable MDS rotation for sparse linear models based on external features
Adrien Bibal, Rebecca Marion, Benoît Frénay
Abstract:
One approach to interpreting multidimensional scaling (MDS) embeddings is to estimate a linear relationship between the MDS dimensions and a set of external features. However, because MDS only preserves distances between instances, the MDS embedding is invariant to rotation. As a result, the weights characterizing this linear relationship are arbitrary and difficult to interpret. This paper proposes a procedure for selecting the most pertinent rotation for interpreting a 2D MDS embedding.
One approach to interpreting multidimensional scaling (MDS) embeddings is to estimate a linear relationship between the MDS dimensions and a set of external features. However, because MDS only preserves distances between instances, the MDS embedding is invariant to rotation. As a result, the weights characterizing this linear relationship are arbitrary and difficult to interpret. This paper proposes a procedure for selecting the most pertinent rotation for interpreting a 2D MDS embedding.
ES2018-112
Mixture of Hidden Markov Model as Tree Encoder
Davide Bacciu, Daniele Castellana
Mixture of Hidden Markov Model as Tree Encoder
Davide Bacciu, Daniele Castellana
Abstract:
The paper introduces a new probabilistic tree encoder based on a mixture of Bottom-up Hidden Markov Tree Models. The ability to recognise similar structures in data is experimentally assessed both in clusterization and classification tasks. The results obtained on this preliminary experiment suggests that this model can be used successfully to compress the tree information content in a fixed representation.
The paper introduces a new probabilistic tree encoder based on a mixture of Bottom-up Hidden Markov Tree Models. The ability to recognise similar structures in data is experimentally assessed both in clusterization and classification tasks. The results obtained on this preliminary experiment suggests that this model can be used successfully to compress the tree information content in a fixed representation.
ES2018-143
Set point thresholds from topological data analysis and an outlier detector
Alessio Carrega
Set point thresholds from topological data analysis and an outlier detector
Alessio Carrega
Abstract:
We provide an algorithm for unsupervised or semi-supervised learning to determine, once the input settings are given, a very easily described zone of optimal execution settings for a production. A region is very easily described if anyone can determine whether a point is inside it and select a point on it with a certain range of choice. This can be applied both in production optimization and in predictive maintenance. Part of the method is based on a topological data analysis tool: Mapper. We also provide a method to detect outliers on new data.
We provide an algorithm for unsupervised or semi-supervised learning to determine, once the input settings are given, a very easily described zone of optimal execution settings for a production. A region is very easily described if anyone can determine whether a point is inside it and select a point on it with a certain range of choice. This can be applied both in production optimization and in predictive maintenance. Part of the method is based on a topological data analysis tool: Mapper. We also provide a method to detect outliers on new data.
ES2018-119
Differential private relevance learning
Johannes Brinkrolf, Kolja Berger, Barbara Hammer
Differential private relevance learning
Johannes Brinkrolf, Kolja Berger, Barbara Hammer
Abstract:
Digital information is collected daily in growing volumes. Mutual benefits drive the demand for the exchange and publication of data among parties. However, it is often unclear how to handle these data properly in the case that the data contains sensitive information. Differential privacy has become a powerful principle for privacy-preserving data analysis tasks in the last few years, since it entails a formal privacy guarantee for such settings. This is obtained by a separation of the utility of the database and the risk of an individual to lose his/her privacy. In this contribution, we introduce the Laplace mechanism and a stochastic gradient descent methodology which guarantee differential privacy. Then, we show how these paradigms can be incorporated into two popular machine learning algorithm, namely GLVQ and GMLVQ. We demonstrate the results of privacy-preserving LVQ based on three benchmarks.
Digital information is collected daily in growing volumes. Mutual benefits drive the demand for the exchange and publication of data among parties. However, it is often unclear how to handle these data properly in the case that the data contains sensitive information. Differential privacy has become a powerful principle for privacy-preserving data analysis tasks in the last few years, since it entails a formal privacy guarantee for such settings. This is obtained by a separation of the utility of the database and the risk of an individual to lose his/her privacy. In this contribution, we introduce the Laplace mechanism and a stochastic gradient descent methodology which guarantee differential privacy. Then, we show how these paradigms can be incorporated into two popular machine learning algorithm, namely GLVQ and GMLVQ. We demonstrate the results of privacy-preserving LVQ based on three benchmarks.
ES2018-124
On aggregation in ranking median regression
Stéphan Clémençon, Anna Korba
On aggregation in ranking median regression
Stéphan Clémençon, Anna Korba
Abstract:
In the present era of personalized customer services and recommender systems, predicting the preferences of an individual/user over a set of items indexed by $\n=\{1,\; \ldots,\; n\}$, $n\geq 1$, based on its characteristics, modelled as a r.v. $X$ say, is an ubiquitous issue. Though easy to state, this predictive problem referered to as \textit{ranking median regression} (RMR in short) is very difficult to solve in practice. The major challenge lies in the fact that, here, the (discrete) output space is the symmetric group $\mathfrak{S}_n$, composed of all permutations of $\n$, of explosive cardinality $n!$, and which is not a subset of a vector space. It is thus far from straightforward to build predictive rules taking their values in $\mathfrak{S}_n$, except by means of ranking aggregation techniques implemented at a local level, as proposed in \cite{YWL10} or \cite{CKS17bis}. However, such local learning techniques exhibit high instability and it is the main goal of this paper to investigate to which extent Kemeny ranking aggregation of randomized RMR rules may remedy this drawback. Beyond a theoretical analysis establishing its validity, the relevance of this novel ensemble learning technique is supported by experimental results.
In the present era of personalized customer services and recommender systems, predicting the preferences of an individual/user over a set of items indexed by $\n=\{1,\; \ldots,\; n\}$, $n\geq 1$, based on its characteristics, modelled as a r.v. $X$ say, is an ubiquitous issue. Though easy to state, this predictive problem referered to as \textit{ranking median regression} (RMR in short) is very difficult to solve in practice. The major challenge lies in the fact that, here, the (discrete) output space is the symmetric group $\mathfrak{S}_n$, composed of all permutations of $\n$, of explosive cardinality $n!$, and which is not a subset of a vector space. It is thus far from straightforward to build predictive rules taking their values in $\mathfrak{S}_n$, except by means of ranking aggregation techniques implemented at a local level, as proposed in \cite{YWL10} or \cite{CKS17bis}. However, such local learning techniques exhibit high instability and it is the main goal of this paper to investigate to which extent Kemeny ranking aggregation of randomized RMR rules may remedy this drawback. Beyond a theoretical analysis establishing its validity, the relevance of this novel ensemble learning technique is supported by experimental results.
ES2018-202
Temporal transfer learning for drift adaptation
Daegun Won, Peter Jansen, Jaime Carbonell
Temporal transfer learning for drift adaptation
Daegun Won, Peter Jansen, Jaime Carbonell
Abstract:
Whereas detecting and adapting to concept drift has been well studied, predicting temporal drift of decision boundaries has received much less attention. This paper proposes a method for drift prediction, drift projection, and active-learning for adjusting the projected decision boundary so as to regain accuracy with minimal additional labeled samples. The method works with different underlying learning algorithms. Results on several data sets with translational and rotational drift and corresponding boundary projection show regained accuracy with significantly fewer labeled samples, even in the presence of noisy drift.
Whereas detecting and adapting to concept drift has been well studied, predicting temporal drift of decision boundaries has received much less attention. This paper proposes a method for drift prediction, drift projection, and active-learning for adjusting the projected decision boundary so as to regain accuracy with minimal additional labeled samples. The method works with different underlying learning algorithms. Results on several data sets with translational and rotational drift and corresponding boundary projection show regained accuracy with significantly fewer labeled samples, even in the presence of noisy drift.
ES2018-140
LANN-DSVD: A privacy-preserving distributed algorithm for machine learning
Oscar Fontenla-Romero, Bertha Guijarro-Berdiñas, Beatriz Pérez-Sánchez, Marcelo Gómez-Casal
LANN-DSVD: A privacy-preserving distributed algorithm for machine learning
Oscar Fontenla-Romero, Bertha Guijarro-Berdiñas, Beatriz Pérez-Sánchez, Marcelo Gómez-Casal
Abstract:
In the Big Data era new challenges have arisen for the machine learning field related with the Volume (a high number of samples or variables), the Velocity, etc. making many of the classic and brilliant methods not applicable anymore. One of these concerns derives from with Privacy issues when data is distributed and it cannot be shared. In this paper we present the LANN-DSVD algorithm a non iterative method for One-Layer Neural Networks that allows distributed learning guaranteeing privacy among locations. Moreover, it is non iterative, parameter-free and provides incremental learning, thus making it very suitable to manage huge and/or continuous data. Results demonstrate its competitiveness both in efficiency and efficacy.
In the Big Data era new challenges have arisen for the machine learning field related with the Volume (a high number of samples or variables), the Velocity, etc. making many of the classic and brilliant methods not applicable anymore. One of these concerns derives from with Privacy issues when data is distributed and it cannot be shared. In this paper we present the LANN-DSVD algorithm a non iterative method for One-Layer Neural Networks that allows distributed learning guaranteeing privacy among locations. Moreover, it is non iterative, parameter-free and provides incremental learning, thus making it very suitable to manage huge and/or continuous data. Results demonstrate its competitiveness both in efficiency and efficacy.
ES2018-192
Vector Field Based Neural Networks
Daniel Vieira, Fabio Rangel, Fabrício Firmino, Joao Paixao
Vector Field Based Neural Networks
Daniel Vieira, Fabio Rangel, Fabrício Firmino, Joao Paixao
Abstract:
A novel Neural Network architecture is proposed using the mathematically and physically rich idea of vector fields as hidden layers to perform nonlinear transformations in the data. The data points are interpreted as particles moving along a flow defined by the vector field which intuitively represents the desired movement to enable classification. The architecture moves the data points from their original configuration to a new one following the streamlines of the vector field with the objective of achieving a final configuration where classes are separable. An optimization problem is solved through gradient descent to learn this vector field.
A novel Neural Network architecture is proposed using the mathematically and physically rich idea of vector fields as hidden layers to perform nonlinear transformations in the data. The data points are interpreted as particles moving along a flow defined by the vector field which intuitively represents the desired movement to enable classification. The architecture moves the data points from their original configuration to a new one following the streamlines of the vector field with the objective of achieving a final configuration where classes are separable. An optimization problem is solved through gradient descent to learn this vector field.
Temporal data, sequences and incremental learning
ES2018-156
Non-Negative Tensor Dictionary Learning
Abraham Traoré, Maxime Berar, Alain Rakotomamonjy
Non-Negative Tensor Dictionary Learning
Abraham Traoré, Maxime Berar, Alain Rakotomamonjy
Abstract:
A challenge faced by dictionary learning and non-negative ma- trix factorization is to eciently model, in a context of feature learning, temporal patterns for data presenting sequential (two-dimensional) structure such as spectrograms. In this paper, we address this issue through tensor factorization. For this purpose, we make clear the connection between dictionary learning and tensor factorization when several examples are available. From this connection, we derive a novel (supervised) learning problem which induces emergence of temporal patterns in the learned dictionary. Obtained features are compared in a classication framework with those obtained by NMF and achieve promising results.
A challenge faced by dictionary learning and non-negative ma- trix factorization is to eciently model, in a context of feature learning, temporal patterns for data presenting sequential (two-dimensional) structure such as spectrograms. In this paper, we address this issue through tensor factorization. For this purpose, we make clear the connection between dictionary learning and tensor factorization when several examples are available. From this connection, we derive a novel (supervised) learning problem which induces emergence of temporal patterns in the learned dictionary. Obtained features are compared in a classication framework with those obtained by NMF and achieve promising results.
ES2018-157
An extension of nonstationary fuzzy sets to heteroskedastic fuzzy time series
Marcos Antonio Alves, Petrônio Cândido de Lima e Silva, Carlos Alberto Severiano Junior, Gustavo Linhares Vieira, Frederico Gadelha Guimarães, Hossein Javedani Sadaei
An extension of nonstationary fuzzy sets to heteroskedastic fuzzy time series
Marcos Antonio Alves, Petrônio Cândido de Lima e Silva, Carlos Alberto Severiano Junior, Gustavo Linhares Vieira, Frederico Gadelha Guimarães, Hossein Javedani Sadaei
Abstract:
Most applications deal with unconditional variance of the time series. Fuzzy time series allow an inexpensive computation to forecasting dynamic processes and uncertainties. In this paper we have extended the concept of nonstationary fuzzy sets to Fuzzy Time Series, termed Nonstationary Fuzzy Time Series (NSFTS). While some models require new data before adapting, the NSFTS is capable of adapting to heteroskedastic time series. In the experiments, NSFTS outperformed other known FTS methods with box-cox transformations available. Statistical tests in three different datasets indicate that the results achieved by the proposed model are either superior or non-inferior to other FTS models.
Most applications deal with unconditional variance of the time series. Fuzzy time series allow an inexpensive computation to forecasting dynamic processes and uncertainties. In this paper we have extended the concept of nonstationary fuzzy sets to Fuzzy Time Series, termed Nonstationary Fuzzy Time Series (NSFTS). While some models require new data before adapting, the NSFTS is capable of adapting to heteroskedastic time series. In the experiments, NSFTS outperformed other known FTS methods with box-cox transformations available. Statistical tests in three different datasets indicate that the results achieved by the proposed model are either superior or non-inferior to other FTS models.
ES2018-120
Surprisal-based activation in recurrent neural networks
Tayfun Alpay, Fares Abawi, Stefan Wermter
Surprisal-based activation in recurrent neural networks
Tayfun Alpay, Fares Abawi, Stefan Wermter
Abstract:
Learning hierarchical abstractions from sequences is a challenging and open problem for Recurrent Neural Networks (RNNs). This is mainly due to the difficulty of detecting features that span over long distances with also different frequencies. In this paper, we address this challenge by introducing surprisal-based activation, a novel method to preserve activations contingent on encoding-based self-information. The preserved activations can be considered as temporal shortcuts with perfect memory. We evaluate surprisal-based activation on language modelling by testing it on the Penn Treebank corpus and find that it can improve performance when compared to a baseline RNN.
Learning hierarchical abstractions from sequences is a challenging and open problem for Recurrent Neural Networks (RNNs). This is mainly due to the difficulty of detecting features that span over long distances with also different frequencies. In this paper, we address this challenge by introducing surprisal-based activation, a novel method to preserve activations contingent on encoding-based self-information. The preserved activations can be considered as temporal shortcuts with perfect memory. We evaluate surprisal-based activation on language modelling by testing it on the Penn Treebank corpus and find that it can improve performance when compared to a baseline RNN.
ES2018-116
K-spectral centroid: extension and optimizations
Brieuc Conan-Guez, Alain Gély, Lydia Boudjeloud-Assala, Alexandre Blansché
K-spectral centroid: extension and optimizations
Brieuc Conan-Guez, Alain Gély, Lydia Boudjeloud-Assala, Alexandre Blansché
Abstract:
In this work, we address the problem of unsupervised classification of large time series datasets. We focus on K-Spectral Centroid (KSC), a k-means-like model, devised for time series clustering. KSC relies on a custom dissimilarity measure between time series, which is invariant to time shifting and Y-scaling. KSC has two downsides: firstly its dissimilarity measure only makes sense for non negative time series. Secondly the KSC algorithm is relatively demanding in terms of computation time. In this paper, we present a natural extension of the KSC dissimilarity measure to time series of arbitrary signs. We show that this new measure is a metric distance. We propose to speed up this extended KSC (EKSC) thanks to four exact optimizations. Finally, we compare EKSC to a similar model, K-Shape, on real world datasets.
In this work, we address the problem of unsupervised classification of large time series datasets. We focus on K-Spectral Centroid (KSC), a k-means-like model, devised for time series clustering. KSC relies on a custom dissimilarity measure between time series, which is invariant to time shifting and Y-scaling. KSC has two downsides: firstly its dissimilarity measure only makes sense for non negative time series. Secondly the KSC algorithm is relatively demanding in terms of computation time. In this paper, we present a natural extension of the KSC dissimilarity measure to time series of arbitrary signs. We show that this new measure is a metric distance. We propose to speed up this extended KSC (EKSC) thanks to four exact optimizations. Finally, we compare EKSC to a similar model, K-Shape, on real world datasets.
ES2018-133
Temporal modeling of ALS using longitudinal data and long-short term memory-based algorithm
Aviv Nahon, Boaz Lerner
Temporal modeling of ALS using longitudinal data and long-short term memory-based algorithm
Aviv Nahon, Boaz Lerner
Abstract:
ALS is a neurodegenerative disease where factors such as disease progression rate and pattern vary greatly among patients. Since patient functionality deteriorates over time, we model ALS temporally to mimic the physician's reasoning by incorporating old with new information using a long-short term memory (LSTM) network. We demonstrate that the LSTM achieves a higher accuracy than a random forest in disease state prediction, and improves accuracy with data from additional clinic visits. Being an anytime predictor, our model can help physicians and caregivers to adjust patients' treatment and living environment along the disease period, improving patients' life quality.
ALS is a neurodegenerative disease where factors such as disease progression rate and pattern vary greatly among patients. Since patient functionality deteriorates over time, we model ALS temporally to mimic the physician's reasoning by incorporating old with new information using a long-short term memory (LSTM) network. We demonstrate that the LSTM achieves a higher accuracy than a random forest in disease state prediction, and improves accuracy with data from additional clinic visits. Being an anytime predictor, our model can help physicians and caregivers to adjust patients' treatment and living environment along the disease period, improving patients' life quality.
ES2018-190
Person Identification and Discovery With Wrist Worn Accelerometer Data
Ryan McConville, Raul Santos-Rodriguez, Niall Twomey
Person Identification and Discovery With Wrist Worn Accelerometer Data
Ryan McConville, Raul Santos-Rodriguez, Niall Twomey
Abstract:
Internet of Things devices with embedded accelerometers continue to grow in popularity. These are often attached to individuals, whether they are a mobile phone in a pocket or a wrist-worn smartwatch, capturing data of a personal nature. In this work we propose a method for person identification using accelerometer data via supervised machine learning techniques. Further, we introduce the first unsupervised method for discovering individuals using the same accelerometer data. We report high performance both in terms of classification and clustering using a publicly available dataset covering a large number of activities of daily living. While this has numerous benefits in tasks such as activity recognition, this work also motivates the debate and discussion around privacy concerns of the analysis of accelerometer data.
Internet of Things devices with embedded accelerometers continue to grow in popularity. These are often attached to individuals, whether they are a mobile phone in a pocket or a wrist-worn smartwatch, capturing data of a personal nature. In this work we propose a method for person identification using accelerometer data via supervised machine learning techniques. Further, we introduce the first unsupervised method for discovering individuals using the same accelerometer data. We report high performance both in terms of classification and clustering using a publicly available dataset covering a large number of activities of daily living. While this has numerous benefits in tasks such as activity recognition, this work also motivates the debate and discussion around privacy concerns of the analysis of accelerometer data.
ES2018-203
CDTW-based classification for Parkinson's Disease diagnosis
Nicolas KHOURY , Ferhat ATTAL, Yacine Amirat, Abdelghani CHIBANI, Samer Mohammed
CDTW-based classification for Parkinson's Disease diagnosis
Nicolas KHOURY , Ferhat ATTAL, Yacine Amirat, Abdelghani CHIBANI, Samer Mohammed
Abstract:
This paper presents a new classification approach for Parkinson's Disease (PD) diagnosis using Continuous Dynamic Time Warping (CDTW) technique and gait cycles data. These data are the vertical Ground Reaction Forces (vGRFs) recordings collected from eight force sensors placed in each shoe sole worn by each subject. The proposed approach exploits the principle of the repetition of gait cycle patterns to discriminate healthy subjects from PD subjects. The repetition of gait cycles is evaluated using the similarity of the time-series corresponding to stance phases estimated by applying the CDTW technique. The CDTW distances, extracted from gait cycles, are used as inputs of a binary classifier discriminating healthy subjects from PD subjects. Different classification methods are evaluated, including four supervised methods: K-Nearest Neighbours (K-NN), Decision Tree (DT), Random Forest (RF), and Support Vector Machines (SVM), and two unsupervised ones: Gaussian Mixture Model (GMM), and K-means. The proposed approach compares favorably with a classification based on standard features.
This paper presents a new classification approach for Parkinson's Disease (PD) diagnosis using Continuous Dynamic Time Warping (CDTW) technique and gait cycles data. These data are the vertical Ground Reaction Forces (vGRFs) recordings collected from eight force sensors placed in each shoe sole worn by each subject. The proposed approach exploits the principle of the repetition of gait cycle patterns to discriminate healthy subjects from PD subjects. The repetition of gait cycles is evaluated using the similarity of the time-series corresponding to stance phases estimated by applying the CDTW technique. The CDTW distances, extracted from gait cycles, are used as inputs of a binary classifier discriminating healthy subjects from PD subjects. Different classification methods are evaluated, including four supervised methods: K-Nearest Neighbours (K-NN), Decision Tree (DT), Random Forest (RF), and Support Vector Machines (SVM), and two unsupervised ones: Gaussian Mixture Model (GMM), and K-means. The proposed approach compares favorably with a classification based on standard features.
ES2018-48
Personalizing human activity recognition models using incremental learning
Pekka Siirtola, Heli Koskimäki, Juha Röning
Personalizing human activity recognition models using incremental learning
Pekka Siirtola, Heli Koskimäki, Juha Röning
Abstract:
In this study, the aim is to personalize inertial sensor data-based human activity recognition models using incremental learning. At first, the recognition is based on user-independent model. However, when personal streaming data becomes available, the incremental learning-based recognition model can be updated, and therefore personalized, based on the data without user-interruption. The used incremental learning algorithm is Learn++ which is an ensemble method that can use any classifier as base classifier. In fact, study compares three different base classifiers: linear discriminant analysis (LDA), quadratic discriminant analysis (QDA) and classification and regression tree (CART). Experiments are based on publicly open data set and they show that already a small personal training data set can improve the classification accuracy. Improvement using LDA as base classifier is 4.6 percentage units, using QDA 2.0 percentage units, and 2.3 percentage units using CART. However, if the user-independent model used in the first phase of the recognition process is not accurate enough, personalization cannot improve recognition accuracy.
In this study, the aim is to personalize inertial sensor data-based human activity recognition models using incremental learning. At first, the recognition is based on user-independent model. However, when personal streaming data becomes available, the incremental learning-based recognition model can be updated, and therefore personalized, based on the data without user-interruption. The used incremental learning algorithm is Learn++ which is an ensemble method that can use any classifier as base classifier. In fact, study compares three different base classifiers: linear discriminant analysis (LDA), quadratic discriminant analysis (QDA) and classification and regression tree (CART). Experiments are based on publicly open data set and they show that already a small personal training data set can improve the classification accuracy. Improvement using LDA as base classifier is 4.6 percentage units, using QDA 2.0 percentage units, and 2.3 percentage units using CART. However, if the user-independent model used in the first phase of the recognition process is not accurate enough, personalization cannot improve recognition accuracy.
ES2018-107
Short-term Memory of Deep RNN
Claudio Gallicchio
Short-term Memory of Deep RNN
Claudio Gallicchio
Abstract:
The extension of deep learning towards temporal data processing is gaining an increasing research interest. In this paper we investigate the properties of state dynamics developed in successive levels of deep recurrent neural networks (RNNs) in terms of short-term memory abilities. Our results reveal interesting insights that shed light on the nature of layering as a factor of RNN design. Noticeably, higher layers in a hierarchically organized RNN architecture results to be inherently biased towards longer memory spans even prior to training of the recurrent connections. Moreover, in the context of Reservoir Computing framework, our analysis also points out the benefit of a layered recurrent organization as an efficient approach to improve the memory skills of reservoir models.
The extension of deep learning towards temporal data processing is gaining an increasing research interest. In this paper we investigate the properties of state dynamics developed in successive levels of deep recurrent neural networks (RNNs) in terms of short-term memory abilities. Our results reveal interesting insights that shed light on the nature of layering as a factor of RNN design. Noticeably, higher layers in a hierarchically organized RNN architecture results to be inherently biased towards longer memory spans even prior to training of the recurrent connections. Moreover, in the context of Reservoir Computing framework, our analysis also points out the benefit of a layered recurrent organization as an efficient approach to improve the memory skills of reservoir models.
ES2018-104
Effect of context in swipe gesture-based continuous authentication on smartphones
Pekka Siirtola, Jukka Komulainen, Vili Kellokumpu
Effect of context in swipe gesture-based continuous authentication on smartphones
Pekka Siirtola, Jukka Komulainen, Vili Kellokumpu
Abstract:
This work investigates how context should be taken into account when conducting continuous authentication of a smartphone user based on touchscreen and accelerometer readings from swipe gestures. The study is based on publicly open data set consisting of 100 study subjects performing pre-defined reading and navigation tasks while sitting or walking. It is shown that context-specific models are needed for different smartphone usage and human activity scenarios to minimize authentication error. Also, the experimental results suggests that utilization of phone movement improves swipe gesture-based verification performance only when the user is moving.
This work investigates how context should be taken into account when conducting continuous authentication of a smartphone user based on touchscreen and accelerometer readings from swipe gestures. The study is based on publicly open data set consisting of 100 study subjects performing pre-defined reading and navigation tasks while sitting or walking. It is shown that context-specific models are needed for different smartphone usage and human activity scenarios to minimize authentication error. Also, the experimental results suggests that utilization of phone movement improves swipe gesture-based verification performance only when the user is moving.
Impact of Biases in Big Data
ES2018-7
Impact of Biases in Big Data
Patrick Glauner, Petko Valtchev, Radu State
Impact of Biases in Big Data
Patrick Glauner, Petko Valtchev, Radu State
Abstract:
The underlying paradigm of big data-driven machine learning reflects the desire of deriving better conclusions from simply analyzing more data, without the necessity of looking at theory and models. Is having simply more data always helpful? In 1936, The Literary Digest collected 2.3M filled in questionnaires to predict the outcome of that year's US presidential election. The outcome of this big data prediction proved to be entirely wrong, whereas George Gallup only needed 3K handpicked people to make an accurate prediction. Generally, biases occur in machine learning whenever the distributions of training set and test set are different. In this work, we provide a review of different sorts of biases in (big) data sets in machine learning. We provide definitions and discussions of the most commonly appearing biases in machine learning: class imbalance and covariate shift. We also show how these biases can be quantified and corrected. This work is an introductory text for both researchers and practitioners to become more aware of this topic and thus to derive more reliable models for their learning problems.
The underlying paradigm of big data-driven machine learning reflects the desire of deriving better conclusions from simply analyzing more data, without the necessity of looking at theory and models. Is having simply more data always helpful? In 1936, The Literary Digest collected 2.3M filled in questionnaires to predict the outcome of that year's US presidential election. The outcome of this big data prediction proved to be entirely wrong, whereas George Gallup only needed 3K handpicked people to make an accurate prediction. Generally, biases occur in machine learning whenever the distributions of training set and test set are different. In this work, we provide a review of different sorts of biases in (big) data sets in machine learning. We provide definitions and discussions of the most commonly appearing biases in machine learning: class imbalance and covariate shift. We also show how these biases can be quantified and corrected. This work is an introductory text for both researchers and practitioners to become more aware of this topic and thus to derive more reliable models for their learning problems.
ES2018-58
Analysis of imputation bias for feature selection with missing data
Borja Seijo-Pardo, Amparo Alonso-Betanzos, Kristin Bennett, Veronica Bolon-Canedo, Isabelle Guyon, Julie Josse, Mehreen Saeed
Analysis of imputation bias for feature selection with missing data
Borja Seijo-Pardo, Amparo Alonso-Betanzos, Kristin Bennett, Veronica Bolon-Canedo, Isabelle Guyon, Julie Josse, Mehreen Saeed
Abstract:
We study risk/benefit tradeoff of missing value imputation in the context of feature selection. We caution against using imputation methods that may yield false positives: features not associated to the target becoming dependent as a result of imputation. We also investigate situations in which imputing missing values may be beneficial to reduce false negatives. We use causal graphs to characterize when structural bias arises and introduce a de-biased version of the t-test.
We study risk/benefit tradeoff of missing value imputation in the context of feature selection. We caution against using imputation methods that may yield false positives: features not associated to the target becoming dependent as a result of imputation. We also investigate situations in which imputing missing values may be beneficial to reduce false negatives. We use causal graphs to characterize when structural bias arises and introduce a de-biased version of the t-test.
ES2018-99
Systematics aware learning : a case study in high energy physics
Victor Estrade, Cecile Germain, Isabelle Guyon, David Rousseau
Systematics aware learning : a case study in high energy physics
Victor Estrade, Cecile Germain, Isabelle Guyon, David Rousseau
Abstract:
Experimental science often has to cope with systematic errors that coherently bias data. We analyze this issue on the analysis of data produced by experiments of the Large Hadron Collider at CERN as a case of supervised domain adaptation. Systematics-aware learning should create an efficient representation that is insensitive to perturbations induced by the systematic effects. We present an experimental comparison of the adversarial knowledge-free approach and a less data-intensive alternative.
Experimental science often has to cope with systematic errors that coherently bias data. We analyze this issue on the analysis of data produced by experiments of the Large Hadron Collider at CERN as a case of supervised domain adaptation. Systematics-aware learning should create an efficient representation that is insensitive to perturbations induced by the systematic effects. We present an experimental comparison of the adversarial knowledge-free approach and a less data-intensive alternative.
Optimization and metaheuristics
ES2018-149
Evolutionary RL for Container Loading
Sarmimala Saikia, Richa Verma, Puneet Agarwal, Gautam Shroff, Lovekesh Vig, Ashwin Srinivasan
Evolutionary RL for Container Loading
Sarmimala Saikia, Richa Verma, Puneet Agarwal, Gautam Shroff, Lovekesh Vig, Ashwin Srinivasan
Abstract:
Loading the containers on the ship from a yard, is an important part of port operations. Finding the optimal sequence for loading of containers, is known to be computationally hard and is an example of combinatorial optimization, leading to usage of simple heuristics in practice. In this paper, we propose an approach which uses a mix of Evolutionary Strategies and Reinforcement Learning (RL) technique to find an approximation of the optimal solution. The RL based agent uses Policy Gradient method, an evolutionary reward strategy and a Pool of good (not-optimal) solutions to find the approximation. We find that the RL agent learns near-optimal solutions that outperforms the heuristic solutions. We also observe that the RL agent assisted with a pool generalizes better for unseen problems than an RL agent without a pool. We present our results on synthetic data as well as real-world data taken from container terminal. The results validate that our approach does comparatively better than the heuristics solutions available, and adapts to unseen problems better.
Loading the containers on the ship from a yard, is an important part of port operations. Finding the optimal sequence for loading of containers, is known to be computationally hard and is an example of combinatorial optimization, leading to usage of simple heuristics in practice. In this paper, we propose an approach which uses a mix of Evolutionary Strategies and Reinforcement Learning (RL) technique to find an approximation of the optimal solution. The RL based agent uses Policy Gradient method, an evolutionary reward strategy and a Pool of good (not-optimal) solutions to find the approximation. We find that the RL agent learns near-optimal solutions that outperforms the heuristic solutions. We also observe that the RL agent assisted with a pool generalizes better for unseen problems than an RL agent without a pool. We present our results on synthetic data as well as real-world data taken from container terminal. The results validate that our approach does comparatively better than the heuristics solutions available, and adapts to unseen problems better.
ES2018-175
Enhancement of a stochastic Markov-blanket framework with ant colony optimization, to uncover epistasis in genetic association studies
Christine Sinoquet, Clément Niel
Enhancement of a stochastic Markov-blanket framework with ant colony optimization, to uncover epistasis in genetic association studies
Christine Sinoquet, Clément Niel
Abstract:
In association genetics, many studies rely on univariate statistical tests to reveal genotype-phenotype relationships, and are thus prone to miss the situations of epistasis (interaction between genes). We designed SMMB (Multiple Stochastic Markov blankets), and SMMB-ACO, a variant combined with ant colony optimization, to detect epistasis. We compare our proposals with three other methods. SMMB-ACO outperforms the other methods for 50% of simulated datasets. On real datasets, the detection ability of SMMB-ACO is close to that of the best approach, which is a slow method, and SMMB-ACO is the fastest algorithm behind a much less performing method.
In association genetics, many studies rely on univariate statistical tests to reveal genotype-phenotype relationships, and are thus prone to miss the situations of epistasis (interaction between genes). We designed SMMB (Multiple Stochastic Markov blankets), and SMMB-ACO, a variant combined with ant colony optimization, to detect epistasis. We compare our proposals with three other methods. SMMB-ACO outperforms the other methods for 50% of simulated datasets. On real datasets, the detection ability of SMMB-ACO is close to that of the best approach, which is a slow method, and SMMB-ACO is the fastest algorithm behind a much less performing method.
ES2018-35
Meerkats-inspired Algorithm for Global Optimization Problems
Carlos Eduardo Klein, Leandro dos Santos Coelho
Meerkats-inspired Algorithm for Global Optimization Problems
Carlos Eduardo Klein, Leandro dos Santos Coelho
Abstract:
Bio-inspired computing has been a relevant topic in scientific, computing and engineering fields in recent years. Most bio-inspired metaheuristics model a specific phenomenon or mechanism based on which they tackle optimization problems. This paper introduced the meerkats-inspired algorithm (MEA) a novel population-based swarm intelligence algorithm for global optimization in the continuous domain. The performance of MEA is showcased on six classical constrained engineering problems from literature. Numerical results and comparisons with other state of the art stochastic algorithms are also provided. Results analysis reveal that the MEA produced consistent results when compared with other optimizers.
Bio-inspired computing has been a relevant topic in scientific, computing and engineering fields in recent years. Most bio-inspired metaheuristics model a specific phenomenon or mechanism based on which they tackle optimization problems. This paper introduced the meerkats-inspired algorithm (MEA) a novel population-based swarm intelligence algorithm for global optimization in the continuous domain. The performance of MEA is showcased on six classical constrained engineering problems from literature. Numerical results and comparisons with other state of the art stochastic algorithms are also provided. Results analysis reveal that the MEA produced consistent results when compared with other optimizers.
ES2018-18
Cheetah Based Optimization Algorithm: A Novel Swarm Intelligence Paradigm
Carlos Eduardo Klein, Viviana Cocco Mariani, Leandro dos Santos Coelho
Cheetah Based Optimization Algorithm: A Novel Swarm Intelligence Paradigm
Carlos Eduardo Klein, Viviana Cocco Mariani, Leandro dos Santos Coelho
Abstract:
All the new gadgets, systems and advances in technology are bringing the actual engineers problems with increasing complexity. To solve those problems, the optimization algorithms are popping up to support and even improve the actual scenario. Several stochastic optimization paradigms called metaheuristics are being proposed each year and the inspiration comes from animals, plants, experiments, chemical processes or simply math. In this paper, a cheetah based optimization algorithm (CBA) is proposed, capturing the social behavior from those animals. The proposed CBA is validated against seven known optimizers using three different benchmark problems. Finally, some considerations about research issues and directions in the CBA design are given.
All the new gadgets, systems and advances in technology are bringing the actual engineers problems with increasing complexity. To solve those problems, the optimization algorithms are popping up to support and even improve the actual scenario. Several stochastic optimization paradigms called metaheuristics are being proposed each year and the inspiration comes from animals, plants, experiments, chemical processes or simply math. In this paper, a cheetah based optimization algorithm (CBA) is proposed, capturing the social behavior from those animals. The proposed CBA is validated against seven known optimizers using three different benchmark problems. Finally, some considerations about research issues and directions in the CBA design are given.
ES2018-77
Evolutionary Composition of Customized Fault Localization Heuristics
Diogo de-Freitas, Plinio Leitao-Junior, Celso Camilo-Junior, Rachel Harrison
Evolutionary Composition of Customized Fault Localization Heuristics
Diogo de-Freitas, Plinio Leitao-Junior, Celso Camilo-Junior, Rachel Harrison
Abstract:
Fault localisation is one of the most difficult and costly parts in software debugging. Researchers have tried to automate this process by formulating measures for assessment of code elements suspiciousness. This paper reports an evolutionary-based approach to combine non-linearly 34 previous measures to formulate a new program oriented fault localisation heuristic. The method was evaluated with 107 single-bug programs and compared against 35 approaches -- 34 spectrum-based heuristics and a previous evolutionary linear combination approach. The experiments have shown that the proposal consistently achieved competitive results related to the others according to several effectiveness metrics.
Fault localisation is one of the most difficult and costly parts in software debugging. Researchers have tried to automate this process by formulating measures for assessment of code elements suspiciousness. This paper reports an evolutionary-based approach to combine non-linearly 34 previous measures to formulate a new program oriented fault localisation heuristic. The method was evaluated with 107 single-bug programs and compared against 35 approaches -- 34 spectrum-based heuristics and a previous evolutionary linear combination approach. The experiments have shown that the proposal consistently achieved competitive results related to the others according to several effectiveness metrics.
ES2018-90
Order Crossover for the Inventory Routing Problem
Mohamed Salim Amri Sakhri, Mounira Tlili, Hamid Allaoui, Ouajdi Korbaa
Order Crossover for the Inventory Routing Problem
Mohamed Salim Amri Sakhri, Mounira Tlili, Hamid Allaoui, Ouajdi Korbaa
Abstract:
In this paper, we aim to find a solution that reduces the logistical activity costs by using new hybrid meta-heuristics. We develop, in this work, a genetic algorithm (GA) with a hybrid crossing operator. The operator considered is the Order Crossover (OX); we will test our hybrid algorithm in a Periodic Inventory Routing Problem (PIRP). Our study proves the performance of the hybrid operator OX compared with the classic GA, demonstrate the competitiveness of this innovative approach to solve the large-scale instances and bring a better quality of the solution.
In this paper, we aim to find a solution that reduces the logistical activity costs by using new hybrid meta-heuristics. We develop, in this work, a genetic algorithm (GA) with a hybrid crossing operator. The operator considered is the Order Crossover (OX); we will test our hybrid algorithm in a Periodic Inventory Routing Problem (PIRP). Our study proves the performance of the hybrid operator OX compared with the classic GA, demonstrate the competitiveness of this innovative approach to solve the large-scale instances and bring a better quality of the solution.