Bruges, Belgium October 02 - 04
Content of the proceedings
-
Adversarial learning, robustness and fairness
Image and signal processing, matrix computations and topological data
Deep learning and graph neural networks
Machine Learning Applied to Computer Networks - organized by Alexander Gepperth (University of Applied Sciences Fulda, Germany), Sebastian Rieger (University of Applied Sciences Fulda, Deutschland)
Quantum Machine Learning - Organized by José D. Martín-Guerrero (Universitat de València, Spain), Lucas Lamata (Universidad de Sevilla, Spain)
Recurrent networks and reinforcement learning
Unsupervised learning
Feature selection and dimensionality reduction
Statistical learning and optimization
Tensor Decompositions in Deep Learning - organized by Davide Bacciu (Università di Pisa, Italy), Danilo Mandic (Imperial College, United Kingdom)
Image and text analysis
Learning from partially labeled data - organized by Siamak Mehrkanoon (Maastricht University, The Netherlands), Xiaolin Huang (Shanghai Jiao Tong University, China), Johan Suykens (KU Leuven, Belgium)
Machine learning in the pharmaceutical industry - organized by Paul Smyth (GlaxoSmithKline Tech Data & Analytics, Belgium), Thibault Helleputte (DNAlytics, Belgium), Gael de Lannoy (GlaxoSmithKline, CMC Statistical Sciences, Belgium)
Frontiers in Reservoir Computing - organized by Claudio Gallicchio (University of Pisa, Italy), Mantas Lukosevicius (Kaunas University of Technology, Lithuania), Simone Scardapane (Sapienza University of Rome, Italia)
Language processing in the era of deep learning - organized by Ivano Lauriola (University of Padova, Italy), Alberto Lavelli (Fondazione Bruno Kessler, Italy), Fabio Aiolli (University of Padova, Italy)
Supervised learning
Adversarial learning, robustness and fairness
ES2020-175
Attacking Model Sets with Adversarial Examples
István Megyeri, István Hegedűs, Mark Jelasity
Attacking Model Sets with Adversarial Examples
István Megyeri, István Hegedűs, Mark Jelasity
Abstract:
Adversarial input perturbation is a well-studied problem in machine learning. Here, we introduce a generalized variant of this problem, where we look for adversarial examples that satisfy multiple constraints simultaneously over a set of multi-class models. For example, we might want to force an entire set of models to make the same mistake over the same example, in order to create transferable attacks. Or we might want to fool just a single model, without fooling the rest of the models, in order to target only a specific manufacturer. Known attacks are not directly suitable for addressing this problem. The generated example has to satisfy multiple constraints and no feasible solution may exist for any amount of perturbation. We introduce an iterative heuristic algorithm inspired by the DeepFool attack. We evaluate our method over the MNIST and CIFAR-10 data sets. We show that it can find feasible multi-model adversarial perturbations, and that the magnitude of these perturbations is similar to the single model case.
Adversarial input perturbation is a well-studied problem in machine learning. Here, we introduce a generalized variant of this problem, where we look for adversarial examples that satisfy multiple constraints simultaneously over a set of multi-class models. For example, we might want to force an entire set of models to make the same mistake over the same example, in order to create transferable attacks. Or we might want to fool just a single model, without fooling the rest of the models, in order to target only a specific manufacturer. Known attacks are not directly suitable for addressing this problem. The generated example has to satisfy multiple constraints and no feasible solution may exist for any amount of perturbation. We introduce an iterative heuristic algorithm inspired by the DeepFool attack. We evaluate our method over the MNIST and CIFAR-10 data sets. We show that it can find feasible multi-model adversarial perturbations, and that the magnitude of these perturbations is similar to the single model case.
ES2020-159
GraN: An Efficient Gradient-Norm Based Detector for Adversarial and Misclassified Examples
Julia Lust, Alexandru Paul Condurache
GraN: An Efficient Gradient-Norm Based Detector for Adversarial and Misclassified Examples
Julia Lust, Alexandru Paul Condurache
Abstract:
Deep neural networks (DNNs) are vulnerable to adversarial examples and other data perturbations. Especially in safety critical applications of DNNs, it is therefore crucial to detect misclassified samples. The current state-of-the-art detection methods require either significantly more runtime or more parameters than the original network itself. This paper therefore proposes GraN, a time- and parameter-efficient method that is easily adaptable to any DNN. GraN is based on the layer-wise norm of the DNN's gradient regarding the loss of the current input-output combination, which can be computed via backpropagation. GraN achieves state-of-the-art performance on numerous problem set-ups.
Deep neural networks (DNNs) are vulnerable to adversarial examples and other data perturbations. Especially in safety critical applications of DNNs, it is therefore crucial to detect misclassified samples. The current state-of-the-art detection methods require either significantly more runtime or more parameters than the original network itself. This paper therefore proposes GraN, a time- and parameter-efficient method that is easily adaptable to any DNN. GraN is based on the layer-wise norm of the DNN's gradient regarding the loss of the current input-output combination, which can be computed via backpropagation. GraN achieves state-of-the-art performance on numerous problem set-ups.
ES2020-64
Unsupervised Latent Space Translation Network
Magda Friedjungová, Daniel Vašata, Tomáš Chobola, Marcel Jiřina
Unsupervised Latent Space Translation Network
Magda Friedjungová, Daniel Vašata, Tomáš Chobola, Marcel Jiřina
Abstract:
One task that is often discussed in a computer vision is the mapping of an image from one domain to a corresponding image in another domain known as image-to-image translation. Currently there are several approaches solving this task. In this paper, we present an enhancement of the UNIT framework that aids in removing its main drawbacks. More specifically, we introduce an additional adversarial discriminator on the latent representation used instead of VAE, which enforces the latent space distributions of both domains to be similar. On MNIST and USPS domain adaptation tasks, this approach greatly outperforms competing approaches.
One task that is often discussed in a computer vision is the mapping of an image from one domain to a corresponding image in another domain known as image-to-image translation. Currently there are several approaches solving this task. In this paper, we present an enhancement of the UNIT framework that aids in removing its main drawbacks. More specifically, we introduce an additional adversarial discriminator on the latent representation used instead of VAE, which enforces the latent space distributions of both domains to be similar. On MNIST and USPS domain adaptation tasks, this approach greatly outperforms competing approaches.
ES2020-55
Efficient computation of counterfactual explanations of LVQ models
André Artelt, Barbara Hammer
Efficient computation of counterfactual explanations of LVQ models
André Artelt, Barbara Hammer
Abstract:
The increasing use of machine learning in practice and legal regulations like EU's GDPR cause the necessity to be able to explain the prediction and behavior of machine learning models. A prominent example of particularly intuitive explanations of AI models in the context of decision making are counterfactual explanations. Yet, it is still an open research problem how to efficiently compute counterfactual explanations for many models. We investigate how to efficiently compute counterfactual explanations for an important class of models, prototype-based classifiers such as learning vector quantization models. In particular, we derive specific convex and non-convex programs depending on the used metric.
The increasing use of machine learning in practice and legal regulations like EU's GDPR cause the necessity to be able to explain the prediction and behavior of machine learning models. A prominent example of particularly intuitive explanations of AI models in the context of decision making are counterfactual explanations. Yet, it is still an open research problem how to efficiently compute counterfactual explanations for many models. We investigate how to efficiently compute counterfactual explanations for an important class of models, prototype-based classifiers such as learning vector quantization models. In particular, we derive specific convex and non-convex programs depending on the used metric.
ES2020-109
MultiMBNN: Matched and Balanced Causal Inference with Neural Networks
Ankit Sharma, Garima Gupta, Ranjitha Prasad, Arnab Chatterjee, Lovekesh Vig, Gautam Shroff
MultiMBNN: Matched and Balanced Causal Inference with Neural Networks
Ankit Sharma, Garima Gupta, Ranjitha Prasad, Arnab Chatterjee, Lovekesh Vig, Gautam Shroff
Abstract:
Causal inference (CI) in observational data is extremely relevant in healthcare, education, ad attribution, etc. Confounding is a typical hazard, where the context affects both, the treatment assignment and response. In a multiple treatment scenario, we propose the neural network based MultiMBNN, where we overcome confounding by employing generalized propensity score based matching, and learning balanced representations. We benchmark the performance on synthetic and real-world datasets using PEHE, and mean absolute percentage error over ATE as metrics. MultiMBNN outperforms the state-of-the-art algorithms for CI such as TARNet and Perfect Match (PM).
Causal inference (CI) in observational data is extremely relevant in healthcare, education, ad attribution, etc. Confounding is a typical hazard, where the context affects both, the treatment assignment and response. In a multiple treatment scenario, we propose the neural network based MultiMBNN, where we overcome confounding by employing generalized propensity score based matching, and learning balanced representations. We benchmark the performance on synthetic and real-world datasets using PEHE, and mean absolute percentage error over ATE as metrics. MultiMBNN outperforms the state-of-the-art algorithms for CI such as TARNet and Perfect Match (PM).
ES2020-75
Learning Deep Fair Graph Neural Networks
Luca Oneto, Nicolò Navarin, Michele Donini
Learning Deep Fair Graph Neural Networks
Luca Oneto, Nicolò Navarin, Michele Donini
Abstract:
Developing learning methods which do not discriminate subgroups in the population is the central goal of algorithmic fairness. One way to reach this goal is to learn a data representation that is expressive enough to describe the data and fair enough to remove the possibility to discriminate subgroups when a model is learned leveraging on the learned representation. This problem is even more challenging when our data are graphs, which nowadays are ubiquitous and allow to model entities and relationships between them. In this work we measure fairness according to demographic parity, requiring the probability of the possible model decisions to be independent of the sensitive information. We investigate how to impose this constraint in the different layers of a deep graph neural network through the use of two different regularizers. The first one is based on a simple convex relaxation, and the second one inspired by a Wasserstein distance formulation of demographic parity. We present experiments on a real world dataset, showing the effectiveness of our proposal.
Developing learning methods which do not discriminate subgroups in the population is the central goal of algorithmic fairness. One way to reach this goal is to learn a data representation that is expressive enough to describe the data and fair enough to remove the possibility to discriminate subgroups when a model is learned leveraging on the learned representation. This problem is even more challenging when our data are graphs, which nowadays are ubiquitous and allow to model entities and relationships between them. In this work we measure fairness according to demographic parity, requiring the probability of the possible model decisions to be independent of the sensitive information. We investigate how to impose this constraint in the different layers of a deep graph neural network through the use of two different regularizers. The first one is based on a simple convex relaxation, and the second one inspired by a Wasserstein distance formulation of demographic parity. We present experiments on a real world dataset, showing the effectiveness of our proposal.
ES2020-97
Interpretation of Model Agnostic Classifiers via Local Mental Images
Aluizio Lima Filho, Gabriel Guarisa, Leopoldo Lusquino, Luiz Oliveira, Carlos Cosenza, Felipe França, Priscila Lima
Interpretation of Model Agnostic Classifiers via Local Mental Images
Aluizio Lima Filho, Gabriel Guarisa, Leopoldo Lusquino, Luiz Oliveira, Carlos Cosenza, Felipe França, Priscila Lima
Abstract:
Although successful black-box learning models have been created, understanding what happens when a machine produces a classification response is still a challenge. This work introduces FRWI – Fuzzy Regression WiSARD Interpreter, a novel fuzzy rules-based algorithm that is capable of interpreting the responses of black-box classifiers via the production of local mental images from a WiSARD n-tuple classifier. FRWI is compared with LIME – Local Interpretable Model-Agnostic Explanations, a pioneering agnostic classification interpreter model. To make a quantitative evaluation of interpretable models, a new metric – InterpretationCapacity Score – is proposed. Using this metric, it is shown that FRWI surpasses LIME in producing coherent interpretations.
Although successful black-box learning models have been created, understanding what happens when a machine produces a classification response is still a challenge. This work introduces FRWI – Fuzzy Regression WiSARD Interpreter, a novel fuzzy rules-based algorithm that is capable of interpreting the responses of black-box classifiers via the production of local mental images from a WiSARD n-tuple classifier. FRWI is compared with LIME – Local Interpretable Model-Agnostic Explanations, a pioneering agnostic classification interpreter model. To make a quantitative evaluation of interpretable models, a new metric – InterpretationCapacity Score – is proposed. Using this metric, it is shown that FRWI surpasses LIME in producing coherent interpretations.
ES2020-110
Estimating Individual Treatment Effects through Causal Populations Identification
Celine Beji, Eric Benhamou, Michael Bon, Florian Yger, Jamal Atif
Estimating Individual Treatment Effects through Causal Populations Identification
Celine Beji, Eric Benhamou, Michael Bon, Florian Yger, Jamal Atif
Abstract:
Estimating the Individual Treatment Effect from observational data, defined as the difference between outcomes with and without intervention, while observing just one of both, is one of the challenging problems in causal learning. In this paper, we formulate this problem as an inference from hidden variables and enforce causal constraints based on a model of four exclusive causal populations. We propose a new version of the EM algorithm, coined as Expected-Causality-Maximization (ECM) algorithm and provide hints on its convergence under mild conditions. We assess our algorithm on synthetic and real-world data and discuss its performances w.r.t. baseline methods.
Estimating the Individual Treatment Effect from observational data, defined as the difference between outcomes with and without intervention, while observing just one of both, is one of the challenging problems in causal learning. In this paper, we formulate this problem as an inference from hidden variables and enforce causal constraints based on a model of four exclusive causal populations. We propose a new version of the EM algorithm, coined as Expected-Causality-Maximization (ECM) algorithm and provide hints on its convergence under mild conditions. We assess our algorithm on synthetic and real-world data and discuss its performances w.r.t. baseline methods.
ES2020-128
Towards Adversarial Attack Resistant Deep Neural Networks
Tiago Alves, Sandip Kundu
Towards Adversarial Attack Resistant Deep Neural Networks
Tiago Alves, Sandip Kundu
Abstract:
Recent publications have shown that neural network based classifiers are vulnerable to adversarial inputs that are virtually indistinguishable from normal data, constructed explicitly for the purpose of forcing misclassification. Further, it has been demonstrated that private data used for training a neural network model can also be exposed by guided model query, even when adversary lacks access to internal model data. In this paper, we present several defenses to counter these threats. First, we observe that most adversarial attacks succeed by mounting gradient ascent on the confidence returned by the model, which allows adversary to gain understanding of the classification boundary. Our defenses are based on denying access to the precise classification boundary. Our first defense adds a controlled random noise to the output confidence levels, which prevents an adversary from converging in their numerical approximation attack. In a simple solution, such random noise can be zeroed by averaging results from multiple queries with the same input. Our noise injection mechanism addresses this problem. Our next defense is based on the observation that by varying the order of the training, often we arrive at models which offer the same classification accuracy, yet they are different numerically. An ensemble of such models allows us to randomly switch between these equivalent models during query which further blurs the classification boundary. We demonstrate our defense by via an adversarial input generator which defeats previously published defenses but cannot breach the proposed defenses do to their \textit{non-static} nature.
Recent publications have shown that neural network based classifiers are vulnerable to adversarial inputs that are virtually indistinguishable from normal data, constructed explicitly for the purpose of forcing misclassification. Further, it has been demonstrated that private data used for training a neural network model can also be exposed by guided model query, even when adversary lacks access to internal model data. In this paper, we present several defenses to counter these threats. First, we observe that most adversarial attacks succeed by mounting gradient ascent on the confidence returned by the model, which allows adversary to gain understanding of the classification boundary. Our defenses are based on denying access to the precise classification boundary. Our first defense adds a controlled random noise to the output confidence levels, which prevents an adversary from converging in their numerical approximation attack. In a simple solution, such random noise can be zeroed by averaging results from multiple queries with the same input. Our noise injection mechanism addresses this problem. Our next defense is based on the observation that by varying the order of the training, often we arrive at models which offer the same classification accuracy, yet they are different numerically. An ensemble of such models allows us to randomly switch between these equivalent models during query which further blurs the classification boundary. We demonstrate our defense by via an adversarial input generator which defeats previously published defenses but cannot breach the proposed defenses do to their \textit{non-static} nature.
ES2020-57
Fast and Stable Interval Bounds Propagation for Training Verifiably Robust Models
Pawel Morawiecki, Przemysław Spurek, Marek Śmieja, Jacek Tabor
Fast and Stable Interval Bounds Propagation for Training Verifiably Robust Models
Pawel Morawiecki, Przemysław Spurek, Marek Śmieja, Jacek Tabor
Abstract:
We present an efficient technique to train classification networks which are verifiably robust against norm-bounded adversarial attacks. This framework is built upon interval bounds propagation (IBP), which applies the interval arithmetic to bound the activations at each layer and keeps the prediction invariant to the input perturbation. To speed up and stabilize training of IBP, we supply its cost function with an additional term, which encourages the model to keep the interval bounds at hidden layers small. Experimental results demonstrate that the training of our model is faster, more stable and less sensitive to the exact specification of the training process than original IBP
We present an efficient technique to train classification networks which are verifiably robust against norm-bounded adversarial attacks. This framework is built upon interval bounds propagation (IBP), which applies the interval arithmetic to bound the activations at each layer and keeps the prediction invariant to the input perturbation. To speed up and stabilize training of IBP, we supply its cost function with an additional term, which encourages the model to keep the interval bounds at hidden layers small. Experimental results demonstrate that the training of our model is faster, more stable and less sensitive to the exact specification of the training process than original IBP
ES2020-114
Adversarial domain adaptation without gradient reversal layer
Aymen Cherif, Hugo Serieys
Adversarial domain adaptation without gradient reversal layer
Aymen Cherif, Hugo Serieys
Abstract:
Adversarial Domain adaptation is one of the most efficient ways to deal with the domain shift phenomenon. We propose an improvement to the popular GRL method introduced in \cite{dann}, an unsupervised domain adaptation (i.e. no labels in the target domain) technique easy to implement. We call our method NoGRL, and it is inspired by generative adversarial networks \cite{gan}. Our main idea is to dissociate prediction optimization and domain adaptation optimization. Our method outperforms results obtained by GRL in small image benchmarks.
Adversarial Domain adaptation is one of the most efficient ways to deal with the domain shift phenomenon. We propose an improvement to the popular GRL method introduced in \cite{dann}, an unsupervised domain adaptation (i.e. no labels in the target domain) technique easy to implement. We call our method NoGRL, and it is inspired by generative adversarial networks \cite{gan}. Our main idea is to dissociate prediction optimization and domain adaptation optimization. Our method outperforms results obtained by GRL in small image benchmarks.
Image and signal processing, matrix computations and topological data
ES2020-147
ASAP - A Sub-sampling Approach for Preserving Topological Structures
Abolfazl Taghribi, Kerstin Bunte, Michele Mastropietro, Sven De Rijcke, Peter Tino
ASAP - A Sub-sampling Approach for Preserving Topological Structures
Abolfazl Taghribi, Kerstin Bunte, Michele Mastropietro, Sven De Rijcke, Peter Tino
Abstract:
Topological data analysis tools enjoy increasing popularity in a wide range of applications. However, due to computational complexity, processing large samples of higher dimensionality quickly becomes infeasible. We propose a novel sub-sampling strategy inspired by Coulomb’s law to decrease the number of data points in d-dimensional point clouds while preserving its Homology. The method is not only capable of reducing the memory and computation time needed for the construction of different types of simplicial complexes but also preserves the size of the voids in d-dimensions, which is crucial e.g. for astronomical applications. We demonstrate and compare the strategy in several synthetic scenarios and an astronomical particle simulation of a Jellyfish galaxy for the detection of superbubbles (supernova signatures).
Topological data analysis tools enjoy increasing popularity in a wide range of applications. However, due to computational complexity, processing large samples of higher dimensionality quickly becomes infeasible. We propose a novel sub-sampling strategy inspired by Coulomb’s law to decrease the number of data points in d-dimensional point clouds while preserving its Homology. The method is not only capable of reducing the memory and computation time needed for the construction of different types of simplicial complexes but also preserves the size of the voids in d-dimensions, which is crucial e.g. for astronomical applications. We demonstrate and compare the strategy in several synthetic scenarios and an astronomical particle simulation of a Jellyfish galaxy for the detection of superbubbles (supernova signatures).
ES2020-116
Image completion via nonnegative matrix factorization using B-splines
Cécile Hautecoeur, François Glineur
Image completion via nonnegative matrix factorization using B-splines
Cécile Hautecoeur, François Glineur
Abstract:
When performing image completion, it is common to assume that images are smooth and low-rank, when viewed as matrices of pixel intensities. In this work, we use nonnegative matrix factorization to successively refine the image by representing alternatively rows and columns as smooth signals using splines. Previous work solved this model using an alternating direction method of multipliers. Instead, we propose to use a version of the hierarchical alternating least squares algorithm adapted to handle splines, and show in numerical experiments that it outperforms the existing method. Performance can be further improved by increasing progressively the size of used splines. We also introduce a non iterative algorithm using the same NMF approach, where factorization is computed in a fast and accurate way but for which convergence is harder to achieve.
When performing image completion, it is common to assume that images are smooth and low-rank, when viewed as matrices of pixel intensities. In this work, we use nonnegative matrix factorization to successively refine the image by representing alternatively rows and columns as smooth signals using splines. Previous work solved this model using an alternating direction method of multipliers. Instead, we propose to use a version of the hierarchical alternating least squares algorithm adapted to handle splines, and show in numerical experiments that it outperforms the existing method. Performance can be further improved by increasing progressively the size of used splines. We also introduce a non iterative algorithm using the same NMF approach, where factorization is computed in a fast and accurate way but for which convergence is harder to achieve.
ES2020-24
Motion Segmentation using Frequency Domain Transformer Networks
Hafez Farazi, Sven Behnke
Motion Segmentation using Frequency Domain Transformer Networks
Hafez Farazi, Sven Behnke
Abstract:
Self-supervised prediction is a powerful mechanism to learn representations that capture the underlying structure of the data. Despite recent progress, the self-supervised video prediction task is still challenging. One of the critical factors that make the task hard is motion segmentation, which is segmenting individual objects and the background and estimating their motion separately. In video prediction, the shape and transformation of each object should be understood only by predicting the next frame in pixel space. To address this issue, we propose a novel end-to-end learnable architecture that predicts the next frame by modeling foreground and background separately while simultaneously estimating and predicting the foreground motion using Frequency Domain Transformer Networks. Experimental evaluations show that this yields interpretable representations and that our approach can outperform some widely used video prediction methods like Video Ladder Network (VLN) and Predictive Gated Pyramids (PGP) on synthetic datasets.
Self-supervised prediction is a powerful mechanism to learn representations that capture the underlying structure of the data. Despite recent progress, the self-supervised video prediction task is still challenging. One of the critical factors that make the task hard is motion segmentation, which is segmenting individual objects and the background and estimating their motion separately. In video prediction, the shape and transformation of each object should be understood only by predicting the next frame in pixel space. To address this issue, we propose a novel end-to-end learnable architecture that predicts the next frame by modeling foreground and background separately while simultaneously estimating and predicting the foreground motion using Frequency Domain Transformer Networks. Experimental evaluations show that this yields interpretable representations and that our approach can outperform some widely used video prediction methods like Video Ladder Network (VLN) and Predictive Gated Pyramids (PGP) on synthetic datasets.
ES2020-149
Predicting low gamma- from lower frequency band activity in electrocorticography
Marc Van Hulle, Bob Van Dyck, Wittevrongel Benjamin, Flavio Camarrone, Ine Dauwe, Evelien Carrette, Alfred Meurs, Paul Boon, Dirk Van Roost
Predicting low gamma- from lower frequency band activity in electrocorticography
Marc Van Hulle, Bob Van Dyck, Wittevrongel Benjamin, Flavio Camarrone, Ine Dauwe, Evelien Carrette, Alfred Meurs, Paul Boon, Dirk Van Roost
Abstract:
Electrocorticography (ECoG) has witnessed increasing interest from brain modelers for spanning a broader spectral band than EEG. As human brain activity exhibits broadband modulations, we hypothesize that this should also be reflected by ECoG signal predictability across frequency bands. As a concrete case, we consider the prediction of low gamma- (40-70 Hz) from lower frequency band non-task related activity using the recently developed Block Term Tensor Regression (BTTR) algorithm. As a result, we achieved prediction accuracies up to 89% (Pearson correlation coefficient), providing evidence for a substantial degree of low gamma predictability.
Electrocorticography (ECoG) has witnessed increasing interest from brain modelers for spanning a broader spectral band than EEG. As human brain activity exhibits broadband modulations, we hypothesize that this should also be reflected by ECoG signal predictability across frequency bands. As a concrete case, we consider the prediction of low gamma- (40-70 Hz) from lower frequency band non-task related activity using the recently developed Block Term Tensor Regression (BTTR) algorithm. As a result, we achieved prediction accuracies up to 89% (Pearson correlation coefficient), providing evidence for a substantial degree of low gamma predictability.
ES2020-157
Lower bounds on the nonnegative rank using a nested polytopes formulation
Julien Dewez, François Glineur
Lower bounds on the nonnegative rank using a nested polytopes formulation
Julien Dewez, François Glineur
Abstract:
Computing the nonnegative rank of a nonnegative matrix has been proven to be, in general, NP-hard [1]. However, this quantity has many interesting applications, e.g., it can be used to compute the extension complexity of polytopes [2]. Therefore researchers have been trying to approximate this quantity as closely as possible with strong lower and upper bounds. In this work, we introduce a new lower bound on the nonnegative rank based on a representation of the matrix as a pair of nested polytopes. The nonnegative rank then corresponds to the minimum number of vertices of any polytope nested between these two polytopes. Using the geometric concept of supporting corner, we introduce a parametrized family of computable lower bounds and present preliminary numerical results on slack matrices of regular polygons.
Computing the nonnegative rank of a nonnegative matrix has been proven to be, in general, NP-hard [1]. However, this quantity has many interesting applications, e.g., it can be used to compute the extension complexity of polytopes [2]. Therefore researchers have been trying to approximate this quantity as closely as possible with strong lower and upper bounds. In this work, we introduce a new lower bound on the nonnegative rank based on a representation of the matrix as a pair of nested polytopes. The nonnegative rank then corresponds to the minimum number of vertices of any polytope nested between these two polytopes. Using the geometric concept of supporting corner, we introduce a parametrized family of computable lower bounds and present preliminary numerical results on slack matrices of regular polygons.
Deep learning and graph neural networks
ES2020-158
Resume: A Robust Framework for Professional Profile Learning & Evaluation
Clara Gainon de Forsan de Gabriac, Constance Scherer, Amina Djelloul, Vincent Guigue, Patrick Gallinari
Resume: A Robust Framework for Professional Profile Learning & Evaluation
Clara Gainon de Forsan de Gabriac, Constance Scherer, Amina Djelloul, Vincent Guigue, Patrick Gallinari
Abstract:
Professional Profile Extraction is a crucial challenge for any HR department. In this paper, we propose an approach to learn and evaluate professional embeddings. We first highlight the technical issues associated with this specific data; then, we propose an architecture that compares different language models to encode the textual information; finally, we learn user profiles and propose three original evaluation tasks to illustrate the strengths and weaknesses of our approach.
Professional Profile Extraction is a crucial challenge for any HR department. In this paper, we propose an approach to learn and evaluate professional embeddings. We first highlight the technical issues associated with this specific data; then, we propose an architecture that compares different language models to encode the textual information; finally, we learn user profiles and propose three original evaluation tasks to illustrate the strengths and weaknesses of our approach.
ES2020-140
Invariant Integration in Deep Convolutional Feature Space
Matthias Rath, Alexandru Paul Condurache
Invariant Integration in Deep Convolutional Feature Space
Matthias Rath, Alexandru Paul Condurache
Abstract:
In this contribution, we show how to incorporate prior knowledge to a deep neural network architecture in a principled manner. We enforce feature space invariances using a novel layer based on invariant integration. This allows us to construct a complete feature space invariant to finite transformation groups. We apply our proposed layer to explicitly insert invariance properties for vision-related classification tasks, demonstrate our approach for the case of rotation invariance and report state-of-the-art performance on the Rotated-MNIST dataset. Our method is especially beneficial when training with limited data.
In this contribution, we show how to incorporate prior knowledge to a deep neural network architecture in a principled manner. We enforce feature space invariances using a novel layer based on invariant integration. This allows us to construct a complete feature space invariant to finite transformation groups. We apply our proposed layer to explicitly insert invariance properties for vision-related classification tasks, demonstrate our approach for the case of rotation invariance and report state-of-the-art performance on the Rotated-MNIST dataset. Our method is especially beneficial when training with limited data.
ES2020-92
On Learning a Control System without Continuous Feedback
Georgi Angelov, Bogdan Georgiev
On Learning a Control System without Continuous Feedback
Georgi Angelov, Bogdan Georgiev
Abstract:
We discuss a class of control problems by means of deep neural networks (DNN). Our goal is to develop DNN models that, once trained, are able to produce solutions of such problems at an acceptable error-rate and much faster computation time than an ordinary numerical solver. In the present note we study two such models for the Brockett integrator control problem.
We discuss a class of control problems by means of deep neural networks (DNN). Our goal is to develop DNN models that, once trained, are able to produce solutions of such problems at an acceptable error-rate and much faster computation time than an ordinary numerical solver. In the present note we study two such models for the Brockett integrator control problem.
ES2020-160
Time Series Prediction using Disentangled Latent Factors
Perrine Cribier-Delande, Raphaël Puget, Vincent Guigue, Ludovic Denoyer
Time Series Prediction using Disentangled Latent Factors
Perrine Cribier-Delande, Raphaël Puget, Vincent Guigue, Ludovic Denoyer
Abstract:
We propose a new neural architecture to predict time series that are organised based on underlying factors of variations. Our method typically applies to spatio-temporal prediction of missing series where only certain locations and times are observed. The model is based on an encoder-decoder structure where the multiple factors are projected into a latent space which is learned by combining the latent factors coming from multiple observed series. We show on several spatio-temporal datasets that our method is able to predict missing series, for both observed factors values, but also for new ones.
We propose a new neural architecture to predict time series that are organised based on underlying factors of variations. Our method typically applies to spatio-temporal prediction of missing series where only certain locations and times are observed. The model is based on an encoder-decoder structure where the multiple factors are projected into a latent space which is learned by combining the latent factors coming from multiple observed series. We show on several spatio-temporal datasets that our method is able to predict missing series, for both observed factors values, but also for new ones.
ES2020-41
Biochemical Pathway Robustness Prediction with Graph Neural Networks
Marco Podda, Alessio Micheli, Davide Bacciu, Paolo Milazzo
Biochemical Pathway Robustness Prediction with Graph Neural Networks
Marco Podda, Alessio Micheli, Davide Bacciu, Paolo Milazzo
Abstract:
The robustness property of a biochemical pathway refers to maintaining stable levels of molecular concentration against the perturbation of parameters governing the underlying chemical reactions. Its computation requires an expensive integration in parameter space. We present a novel application of Graph Neural Networks (GNN) to predict robustness indicators on pathways represented as Petri nets, without the need of performing costly simulations. Our assumption is that pathway structure alone is sufficient to be effective in this task. We show experimentally for the first time that this is indeed possible to a good extent, and investigate how different architectural choices influence performances.
The robustness property of a biochemical pathway refers to maintaining stable levels of molecular concentration against the perturbation of parameters governing the underlying chemical reactions. Its computation requires an expensive integration in parameter space. We present a novel application of Graph Neural Networks (GNN) to predict robustness indicators on pathways represented as Petri nets, without the need of performing costly simulations. Our assumption is that pathway structure alone is sufficient to be effective in this task. We show experimentally for the first time that this is indeed possible to a good extent, and investigate how different architectural choices influence performances.
ES2020-76
Graph Neural Networks for the Prediction of Protein-Protein Interfaces
Niccolò Pancino, Alberto Rossi, Giorgio Ciano, Giorgia Giacomini, Simone Bonechi, Paolo Andreini, Franco Scarselli, Monica Bianchini, Pietro Bongini
Graph Neural Networks for the Prediction of Protein-Protein Interfaces
Niccolò Pancino, Alberto Rossi, Giorgio Ciano, Giorgia Giacomini, Simone Bonechi, Paolo Andreini, Franco Scarselli, Monica Bianchini, Pietro Bongini
Abstract:
Binding site identification allows to determine the functionality and the quaternary structure of protein-protein complexes. Various approaches to this problem have been proposed without reaching a viable solution. Representing the interacting peptides as graphs, a correspondence graph describing their interaction can be built. Finding the maximum clique in the correspondence graph allows to identify the secondary structure elements belonging to the interaction site. Although the maximum clique problem is NP-complete, Graph Neural Networks make for an approximation tool that can solve the problem in affordable time. Our experimental results are promising and suggest that this direction should be explored further.
Binding site identification allows to determine the functionality and the quaternary structure of protein-protein complexes. Various approaches to this problem have been proposed without reaching a viable solution. Representing the interacting peptides as graphs, a correspondence graph describing their interaction can be built. Finding the maximum clique in the correspondence graph allows to identify the secondary structure elements belonging to the interaction site. Although the maximum clique problem is NP-complete, Graph Neural Networks make for an approximation tool that can solve the problem in affordable time. Our experimental results are promising and suggest that this direction should be explored further.
ES2020-46
Embedding of FRPN in CNN architecture
Alberto Rossi, Markus Hagenbuchner, Franco Scarselli, Ah Chung Tsoi
Embedding of FRPN in CNN architecture
Alberto Rossi, Markus Hagenbuchner, Franco Scarselli, Ah Chung Tsoi
Abstract:
This paper extends the fully recursive perceptron network (FRPN) model for vectorial inputs to include deep convolutional neural networks (CNNs) which can accept multi-dimensional inputs. A FRPN consists of a recursive layer, which, given a fixed input, iteratively computes an equilibrium state. The unfolding realized with this kind of iterative mechanism allows to simulate a deep neural network with any number of layers. The extension of the FRPN to CNN results in an architecture, which we call convolutional-FRPN (C-FRPN), where the convolutional layers are recursive. The method is evaluated on several image classification benchmarks. It is shown that the C-FRPN consistently outperforms standard CNNs having the same number of parameters. The gap in performance is particularly large for small networks, showing that the C-FRPN is a very powerful architecture, since it allows to obtain equivalent performance with fewer parameters when compared with deep CNNs.
This paper extends the fully recursive perceptron network (FRPN) model for vectorial inputs to include deep convolutional neural networks (CNNs) which can accept multi-dimensional inputs. A FRPN consists of a recursive layer, which, given a fixed input, iteratively computes an equilibrium state. The unfolding realized with this kind of iterative mechanism allows to simulate a deep neural network with any number of layers. The extension of the FRPN to CNN results in an architecture, which we call convolutional-FRPN (C-FRPN), where the convolutional layers are recursive. The method is evaluated on several image classification benchmarks. It is shown that the C-FRPN consistently outperforms standard CNNs having the same number of parameters. The gap in performance is particularly large for small networks, showing that the C-FRPN is a very powerful architecture, since it allows to obtain equivalent performance with fewer parameters when compared with deep CNNs.
ES2020-49
Verifying Deep Learning-based Decisions for Facial Expression Recognition
Ines Rieger, Rene Kollmann, Bettina Finzel, Dominik Seuss, Ute Schmid
Verifying Deep Learning-based Decisions for Facial Expression Recognition
Ines Rieger, Rene Kollmann, Bettina Finzel, Dominik Seuss, Ute Schmid
Abstract:
Neural networks with high performance can still be biased towards non-relevant features. However, reliability and robustness is especially important for high-risk fields such as clinical pain treatment. We therefore propose a verification pipeline, which consists of three steps. First, we classify facial expressions with a neural network. Next, we apply layer-wise relevance propagation to create pixel-based explanations. Finally, we quantify these visual explanations based on a bounding-box method with respect to facial regions. Although our results show that the neural network achieves state-of-the-art results, the evaluation of the visual explanations reveals that relevant facial regions may not be considered.
Neural networks with high performance can still be biased towards non-relevant features. However, reliability and robustness is especially important for high-risk fields such as clinical pain treatment. We therefore propose a verification pipeline, which consists of three steps. First, we classify facial expressions with a neural network. Next, we apply layer-wise relevance propagation to create pixel-based explanations. Finally, we quantify these visual explanations based on a bounding-box method with respect to facial regions. Although our results show that the neural network achieves state-of-the-art results, the evaluation of the visual explanations reveals that relevant facial regions may not be considered.
ES2020-51
Cost-free resolution enhancement in Convolutional Neural Networks for medical image segmentation
Oscar J. Pellicer Valero, María J. Rupérez-Moreno, José D. Martín-Guerrero
Cost-free resolution enhancement in Convolutional Neural Networks for medical image segmentation
Oscar J. Pellicer Valero, María J. Rupérez-Moreno, José D. Martín-Guerrero
Abstract:
High-resolution segmentations of medical images are imperative for applications such as treatment planning, image fusion or computer-aided surgery. Nevertheless, these are often hard and time-consuming to produce. This paper presents a method for improving the output resolution of Convolutional Neural Networks (CNNs) for medical image segmentation. It is straightforward to implement and works with any already trained CNN with no modification nor retraining required. It is able to produce better results than binary interpolation methods since it exploits all the contextual information to predict the sought values.
High-resolution segmentations of medical images are imperative for applications such as treatment planning, image fusion or computer-aided surgery. Nevertheless, these are often hard and time-consuming to produce. This paper presents a method for improving the output resolution of Convolutional Neural Networks (CNNs) for medical image segmentation. It is straightforward to implement and works with any already trained CNN with no modification nor retraining required. It is able to produce better results than binary interpolation methods since it exploits all the contextual information to predict the sought values.
ES2020-96
Linear Graph Convolutional Networks
Nicolò Navarin, Wolfgang Erb, Luca Pasa, Alessandro Sperduti
Linear Graph Convolutional Networks
Nicolò Navarin, Wolfgang Erb, Luca Pasa, Alessandro Sperduti
Abstract:
Many neural networks for graphs are based on the graph convolution operator, proposed more than a decade ago. Since then, many alternative definitions have been proposed, that tend to add complexity (and non-linearity) to the model. In this paper, we follow the opposite direction by proposing a linear graph convolution operator. Despite its simplicity, we show that our convolution operator is more theoretically grounded than many proposals in literature, and shows improved predictive performance.
Many neural networks for graphs are based on the graph convolution operator, proposed more than a decade ago. Since then, many alternative definitions have been proposed, that tend to add complexity (and non-linearity) to the model. In this paper, we follow the opposite direction by proposing a linear graph convolution operator. Despite its simplicity, we show that our convolution operator is more theoretically grounded than many proposals in literature, and shows improved predictive performance.
ES2020-107
Deep Recurrent Graph Neural Networks
Luca Pasa, Nicolò Navarin, Alessandro Sperduti
Deep Recurrent Graph Neural Networks
Luca Pasa, Nicolò Navarin, Alessandro Sperduti
Abstract:
Graph Neural Networks (GNN) show good results in classification and regression on graphs, notwithstanding most GNN models use a limited depth. Indeed, they are composed of only a few stacked graph convolutional layers. One reason for this is the growing number of parameters with the number of GNN layers. In this paper, we show how using a recurrent graph convolution layer can help in building deeper GNN, without increasing the complexity of the training phase, while improving on the performances. We also analyze how the depth of the model influences the final result.
Graph Neural Networks (GNN) show good results in classification and regression on graphs, notwithstanding most GNN models use a limited depth. Indeed, they are composed of only a few stacked graph convolutional layers. One reason for this is the growing number of parameters with the number of GNN layers. In this paper, we show how using a recurrent graph convolution layer can help in building deeper GNN, without increasing the complexity of the training phase, while improving on the performances. We also analyze how the depth of the model influences the final result.
ES2020-118
Investigating 3D-STDenseNet for Explainable Spatial Temporal Crime Forecasting
Brian Maguire, Faisal Ghaffar
Investigating 3D-STDenseNet for Explainable Spatial Temporal Crime Forecasting
Brian Maguire, Faisal Ghaffar
Abstract:
Crime is a well-known social problem faced worldwide. With the availability of large city datasets, scientific community for predictive policing has switched its focus from people-centric to place-centric focusing on heterogeneous data points related to a particular geographic region in predicting crimes. Such data-driven techniques of identify micro-level regions known as hotspots with high crime intensity. In this paper, we adapt the state-of-the-art spatial-temporal prediction model STDenseNetFus to predict crime in geographic region in the presence of external factors such as a region’s demographics, seasonal events, and weather. We demonstrate that STDenseNet maintains prediction performance compared to previous results [1] on the same dataset despite significantly reduced parameter count. We further extend STDenseNetFus architecture from two-dimensional to three-dimensional convolutions and show that it further improves the prediction results. Finally, we determine the important factors in the dataset affecting the crime predictions by applying the DeepShap model explanation method to our models.
Crime is a well-known social problem faced worldwide. With the availability of large city datasets, scientific community for predictive policing has switched its focus from people-centric to place-centric focusing on heterogeneous data points related to a particular geographic region in predicting crimes. Such data-driven techniques of identify micro-level regions known as hotspots with high crime intensity. In this paper, we adapt the state-of-the-art spatial-temporal prediction model STDenseNetFus to predict crime in geographic region in the presence of external factors such as a region’s demographics, seasonal events, and weather. We demonstrate that STDenseNet maintains prediction performance compared to previous results [1] on the same dataset despite significantly reduced parameter count. We further extend STDenseNetFus architecture from two-dimensional to three-dimensional convolutions and show that it further improves the prediction results. Finally, we determine the important factors in the dataset affecting the crime predictions by applying the DeepShap model explanation method to our models.
ES2020-130
Visualization of the Feature Space of Neural Networks
Carlos M. Alaíz, Ángela Fernández, José R. Dorronsoro
Visualization of the Feature Space of Neural Networks
Carlos M. Alaíz, Ángela Fernández, José R. Dorronsoro
Abstract:
Visualization of a learning machine can be crucial to understand its behaviour, specially in the case of (deep) neural networks, since they are quite difficult to interpret. An approach for visualizing the feature space of a neural network is presented, trying to answer to the question "what representation of the data is the network using to make its decision?" The proposed method gives a representation of the space where the network is tackling the problem, reducing it while respecting the linearity of the model. As shown experimentally, this technique allows to study the evolution of the model with respect to the training epochs, to have a representation of the data similar to the one used by the neural network, and even to detect groups of patterns that behave differently.
Visualization of a learning machine can be crucial to understand its behaviour, specially in the case of (deep) neural networks, since they are quite difficult to interpret. An approach for visualizing the feature space of a neural network is presented, trying to answer to the question "what representation of the data is the network using to make its decision?" The proposed method gives a representation of the space where the network is tackling the problem, reducing it while respecting the linearity of the model. As shown experimentally, this technique allows to study the evolution of the model with respect to the training epochs, to have a representation of the data similar to the one used by the neural network, and even to detect groups of patterns that behave differently.
ES2020-132
Theoretically Expressive and Edge-aware Graph Learning
Federico Errica, Davide Bacciu, Alessio Micheli
Theoretically Expressive and Edge-aware Graph Learning
Federico Errica, Davide Bacciu, Alessio Micheli
Abstract:
We propose a new Graph Neural Network that combines recent advancements in the field. We give theoretical contributions by proving that the model is strictly more general than the Graph Isomorphism Network and the Gated Graph Neural Network, as it can approximate the same functions and deal with arbitrary edge values. Then, we show how a single node information can flow through the graph unchanged.
We propose a new Graph Neural Network that combines recent advancements in the field. We give theoretical contributions by proving that the model is strictly more general than the Graph Isomorphism Network and the Gated Graph Neural Network, as it can approximate the same functions and deal with arbitrary edge values. Then, we show how a single node information can flow through the graph unchanged.
ES2020-153
Random Signal Cut for Improving Multimodal CNN Robustness of 2D Road Object Detection
Robin Condat, Alexandrina Rogozan, Abdelaziz Bensrhair
Random Signal Cut for Improving Multimodal CNN Robustness of 2D Road Object Detection
Robin Condat, Alexandrina Rogozan, Abdelaziz Bensrhair
Abstract:
Given the large number of deep neural network proposals using only RGB images for 2D object detection for Advanced Driver-Assistance Systems, we propose MMRetina, a CNN taking multimodal data (RGB, Depth from Stereo, Optical Flow, LIDAR) as input for detecting road objects and their 2D localization. We introduce a new data augmentation method, we called Random Signal Cut, to make our multimodal CNN more robust to sensor malfunctions or breakdowns. The experiments show on KITTI dataset that using multimodal data with Random Signal Cut improves significantly CNN robustness without lowering its overall performances when all sensors are well functioning.
Given the large number of deep neural network proposals using only RGB images for 2D object detection for Advanced Driver-Assistance Systems, we propose MMRetina, a CNN taking multimodal data (RGB, Depth from Stereo, Optical Flow, LIDAR) as input for detecting road objects and their 2D localization. We introduce a new data augmentation method, we called Random Signal Cut, to make our multimodal CNN more robust to sensor malfunctions or breakdowns. The experiments show on KITTI dataset that using multimodal data with Random Signal Cut improves significantly CNN robustness without lowering its overall performances when all sensors are well functioning.
ES2020-156
New Results on Sparse Autoencoders for Posture Classification and Segmentation
Doreen Jirak, Stefan Wermter
New Results on Sparse Autoencoders for Posture Classification and Segmentation
Doreen Jirak, Stefan Wermter
Abstract:
This paper is a sequel on posture recognition using sparse autoencoders. We conduct experiments on a posture dataset and show that shallow sparse autoencoders achieve even better performance compared to a convolutional neural network, state-of-the-art model for recognition tasks. Also, our results support robust image representation from the autoencoder model rendering further finetuning unnecessary. Finally, we suggest using sparse autoencoders for image segmentation.
This paper is a sequel on posture recognition using sparse autoencoders. We conduct experiments on a posture dataset and show that shallow sparse autoencoders achieve even better performance compared to a convolutional neural network, state-of-the-art model for recognition tasks. Also, our results support robust image representation from the autoencoder model rendering further finetuning unnecessary. Finally, we suggest using sparse autoencoders for image segmentation.
ES2020-172
Fréchet Mean Computation in Graph Space through Projected Block Gradient Descent
Nicolas Boria, Benjamin Negrevergne, Florian Yger
Fréchet Mean Computation in Graph Space through Projected Block Gradient Descent
Nicolas Boria, Benjamin Negrevergne, Florian Yger
Abstract:
A fundamental concept in statistics is the concept of Fréchet sample mean. While its computation is a simple task in Euclidian space, the same does not hold for less structured spaces such as the space of graphs, where concepts of distance or mid-point can be hard to compute. We present some work in progress regarding new distance measures and new algorithms to compute the Fréchet mean in the space of Graphs.
A fundamental concept in statistics is the concept of Fréchet sample mean. While its computation is a simple task in Euclidian space, the same does not hold for less structured spaces such as the space of graphs, where concepts of distance or mid-point can be hard to compute. We present some work in progress regarding new distance measures and new algorithms to compute the Fréchet mean in the space of Graphs.
ES2020-185
Improving Light-weight Convolutional Neural Networks for Face Recognition Targeting Resource Constrained Platforms
Iulian-Ionut Felea, Radu Dogaru
Improving Light-weight Convolutional Neural Networks for Face Recognition Targeting Resource Constrained Platforms
Iulian-Ionut Felea, Radu Dogaru
Abstract:
A thorough investigation of the possibility to optimize deep convolutional neural network architectures for face recognition problems is considered, from the perspective of training very compact models to be further deployed on resource-constrained systems. Latencies in recognition phase and memory usage are minimized while recognition accuracies are maintained close to state of the art performance of more complicated deep neural networks. Using two widely used datasets, namely VGG-Face and YouTube Faces, several modifications of a recent light-weight CNN model are proposed, and for a reasonable accuracy the most compact solutions were identified. Experiments on VGG-Face show that our proposed models achieves 95.5% accuracy, with 5.6 times less memory storage when compared to the reference slim model.
A thorough investigation of the possibility to optimize deep convolutional neural network architectures for face recognition problems is considered, from the perspective of training very compact models to be further deployed on resource-constrained systems. Latencies in recognition phase and memory usage are minimized while recognition accuracies are maintained close to state of the art performance of more complicated deep neural networks. Using two widely used datasets, namely VGG-Face and YouTube Faces, several modifications of a recent light-weight CNN model are proposed, and for a reasonable accuracy the most compact solutions were identified. Experiments on VGG-Face show that our proposed models achieves 95.5% accuracy, with 5.6 times less memory storage when compared to the reference slim model.
ES2020-188
Variational MIxture of Normalizing Flows
Guilherme Pires, Mário Figueiredo
Variational MIxture of Normalizing Flows
Guilherme Pires, Mário Figueiredo
Abstract:
In the past few years, deep generative models, such as gen-erative adversarial networks, variational autoencoders, and their variants,have seen wide adoption for the task of modelling complex data distri-butions. In spite of the outstanding sample quality achieved by thosemethods, they model the target distributionsimplicitly, in the sense thatthe probability density functions approximated by them are not explicitlyaccessible. This fact renders those methods unfit for tasks that require,for example, scoring new instances of data with the learned distributions.Normalizing flows overcome this limitation by leveraging the change-of-variables formula for probability density functions, and by using trans-formations designed to have tractable and cheaply computable Jacobians.Although flexible, this framework lacked (until the publication of recentwork) a way to introduce discrete structure (such as the one found in mix-tures) in the models it allows to construct, in an unsupervised scenario.The present work overcomes this by using normalizing flows as compo-nents in a mixture model, and devising a training procedure for such amodel. This procedure is based on variational inference, and uses a varia-tional posterior parameterized by a neural network. As will become clear,this model naturally lends itself to (multimodal) density estimation, semi-supervised learning, and clustering. The proposed model is evaluated ontwo synthetic datasets, as well as on a real-world dataset.
In the past few years, deep generative models, such as gen-erative adversarial networks, variational autoencoders, and their variants,have seen wide adoption for the task of modelling complex data distri-butions. In spite of the outstanding sample quality achieved by thosemethods, they model the target distributionsimplicitly, in the sense thatthe probability density functions approximated by them are not explicitlyaccessible. This fact renders those methods unfit for tasks that require,for example, scoring new instances of data with the learned distributions.Normalizing flows overcome this limitation by leveraging the change-of-variables formula for probability density functions, and by using trans-formations designed to have tractable and cheaply computable Jacobians.Although flexible, this framework lacked (until the publication of recentwork) a way to introduce discrete structure (such as the one found in mix-tures) in the models it allows to construct, in an unsupervised scenario.The present work overcomes this by using normalizing flows as compo-nents in a mixture model, and devising a training procedure for such amodel. This procedure is based on variational inference, and uses a varia-tional posterior parameterized by a neural network. As will become clear,this model naturally lends itself to (multimodal) density estimation, semi-supervised learning, and clustering. The proposed model is evaluated ontwo synthetic datasets, as well as on a real-world dataset.
ES2020-200
Fast Deep Neural Networks Convergence using a Weightless Neural Model
Alan T. L. Bacellar, Brunno F. Goldstein, Victor C Ferreira, Leandro Santiago, Priscila Lima, Felipe França
Fast Deep Neural Networks Convergence using a Weightless Neural Model
Alan T. L. Bacellar, Brunno F. Goldstein, Victor C Ferreira, Leandro Santiago, Priscila Lima, Felipe França
Abstract:
Deep Neural Networks (DNNs) have surged as a promising technique for AI applications combining a huge parametric space with efficient learning algorithms. The efficiency of the training procedure relies on some optimization algorithms which adjust the initial weights to minimize the loss of the model. Such strategies are essential to speed up the convergence of the optimization steps. Nonetheless, a general initialization procedure is still an open problem since the proposed techniques either require a long processing time or take a considerable number of iterations to figure out an acceptable model. This work presents a weight initialization strategy using transfer learning via Weightless Neural Network (WNN). This WNN initialization strategy reaches up to $5.5\times$ accuracy and $15\times$ loss reduction at the first iterations when compared against well-known techniques such as Xavier and He.
Deep Neural Networks (DNNs) have surged as a promising technique for AI applications combining a huge parametric space with efficient learning algorithms. The efficiency of the training procedure relies on some optimization algorithms which adjust the initial weights to minimize the loss of the model. Such strategies are essential to speed up the convergence of the optimization steps. Nonetheless, a general initialization procedure is still an open problem since the proposed techniques either require a long processing time or take a considerable number of iterations to figure out an acceptable model. This work presents a weight initialization strategy using transfer learning via Weightless Neural Network (WNN). This WNN initialization strategy reaches up to $5.5\times$ accuracy and $15\times$ loss reduction at the first iterations when compared against well-known techniques such as Xavier and He.
ES2020-205
An Empirical Study of Iterative Knowledge Distillation for Neural Network Compression
Sharan Yalburgi, Tirtharaj Dash, Ramya Hebbalaguppe, Srinidhi Hegde, Ashwin Srinivasan
An Empirical Study of Iterative Knowledge Distillation for Neural Network Compression
Sharan Yalburgi, Tirtharaj Dash, Ramya Hebbalaguppe, Srinidhi Hegde, Ashwin Srinivasan
Abstract:
In this paper we introduce Iterative Knowledge Distillation (IKD), the process of successively minimizing models based on the Knowledge Distillation (KD) approach in [1]. We study two variations of IKD, called parental- and ancestral- training. Both use a single-teacher, and result in a single-student model: the differences arise from which model is used as a teacher. Our results provide support for the utility of the IKD procedure, in the form of increased model compression, without significant losses in predictive accuracy. An important task in IKD is choosing the right model(s) to act as a teacher for a subsequent iteration. Across the variations of IKD studied, our results suggest that the most recent model constructed (parental-training) is the best single teacher for the model in the next iteration. This result suggests that training in IKD can proceed without requiring us to keep all models in the sequence.
In this paper we introduce Iterative Knowledge Distillation (IKD), the process of successively minimizing models based on the Knowledge Distillation (KD) approach in [1]. We study two variations of IKD, called parental- and ancestral- training. Both use a single-teacher, and result in a single-student model: the differences arise from which model is used as a teacher. Our results provide support for the utility of the IKD procedure, in the form of increased model compression, without significant losses in predictive accuracy. An important task in IKD is choosing the right model(s) to act as a teacher for a subsequent iteration. Across the variations of IKD studied, our results suggest that the most recent model constructed (parental-training) is the best single teacher for the model in the next iteration. This result suggests that training in IKD can proceed without requiring us to keep all models in the sequence.
ES2020-207
Why state-of-the-art deep learning barely works as good as a linear classifier in extreme multi-label text classification
Mohammadreza Qaraei, Sujay Khandagale, Rohit Babbar
Why state-of-the-art deep learning barely works as good as a linear classifier in extreme multi-label text classification
Mohammadreza Qaraei, Sujay Khandagale, Rohit Babbar
Abstract:
Extreme Multi-label Text Classification (XMTC) refers to supervised learning of a classifier which can predict a small subset of relevant labels for a document from an extremely large set. Even though deep learning algorithms have surpassed linear and kernel methods for most natural language processing tasks over the last decade; recent works show that state-of-the-art deep learning methods can only barely manage to work as well as a linear classifier for the XMTC task. The goal of this work is twofold : (i) to investigate the reasons for the comparable performance of these two strands of methods for XMTC, and (ii) to document this observation explicitly, as the efficacy of linear classifiers in this regime, has been ignored in many relevant recent works.
Extreme Multi-label Text Classification (XMTC) refers to supervised learning of a classifier which can predict a small subset of relevant labels for a document from an extremely large set. Even though deep learning algorithms have surpassed linear and kernel methods for most natural language processing tasks over the last decade; recent works show that state-of-the-art deep learning methods can only barely manage to work as well as a linear classifier for the XMTC task. The goal of this work is twofold : (i) to investigate the reasons for the comparable performance of these two strands of methods for XMTC, and (ii) to document this observation explicitly, as the efficacy of linear classifiers in this regime, has been ignored in many relevant recent works.
ES2020-32
Incorporating Human Priors into Deep Reinforcement Learning for Robotic Control
Manon Flageat, Kai Arulkumaran, Anil A Bharath
Incorporating Human Priors into Deep Reinforcement Learning for Robotic Control
Manon Flageat, Kai Arulkumaran, Anil A Bharath
Abstract:
Deep reinforcement learning (DRL) shows promise for robotic control, as it scales to high-dimensional observations and does not require a model of the robot or environment. However, properties such as control continuity or movement smoothness, which are desirable for application in the real world, will not necessarily emerge from training on reward functions based purely on task success. Inspired by human neuromotor control and movement analysis literature, we define a modular set of costs that promote more efficient, human-like movement policies. Using a simulated 3-DoF manipulator robot, we demonstrate the benefits of these costs by incorporating them into the training of a model-free DRL algorithm and decision-time planning of a trained model-based DRL algorithm. We also quantify these benefits through metrics based on the same literature, which allows for greater interpretability of learned policies---a common concern when learning policies with powerful and complex function approximators.
Deep reinforcement learning (DRL) shows promise for robotic control, as it scales to high-dimensional observations and does not require a model of the robot or environment. However, properties such as control continuity or movement smoothness, which are desirable for application in the real world, will not necessarily emerge from training on reward functions based purely on task success. Inspired by human neuromotor control and movement analysis literature, we define a modular set of costs that promote more efficient, human-like movement policies. Using a simulated 3-DoF manipulator robot, we demonstrate the benefits of these costs by incorporating them into the training of a model-free DRL algorithm and decision-time planning of a trained model-based DRL algorithm. We also quantify these benefits through metrics based on the same literature, which allows for greater interpretability of learned policies---a common concern when learning policies with powerful and complex function approximators.
ES2020-103
Sparse K-means for mixed data via group-sparse clustering
Marie Chavent, Jérôme Lacaille, Alex Mourer, Madalina Olteanu
Sparse K-means for mixed data via group-sparse clustering
Marie Chavent, Jérôme Lacaille, Alex Mourer, Madalina Olteanu
Abstract:
The present manuscript tackles the issue of variable selection for clustering, in high dimensional data described both by numerical and categorical features. First, we build upon the sparse k-means algorithm with lasso penalty, and introduce the group-L_1 penalty -- already known in regression -- in the unsupervised context. Second, we preprocess mixed data and transform categorical features into groups of dummy variables with appropriate scaling, on which one may then apply the group-sparse clustering procedure. The proposed method performs simultaneously clustering and feature selection, and provides meaningful partitions and meaningful features, numerical and categorical, for describing them.
The present manuscript tackles the issue of variable selection for clustering, in high dimensional data described both by numerical and categorical features. First, we build upon the sparse k-means algorithm with lasso penalty, and introduce the group-L_1 penalty -- already known in regression -- in the unsupervised context. Second, we preprocess mixed data and transform categorical features into groups of dummy variables with appropriate scaling, on which one may then apply the group-sparse clustering procedure. The proposed method performs simultaneously clustering and feature selection, and provides meaningful partitions and meaningful features, numerical and categorical, for describing them.
Machine Learning Applied to Computer Networks - organized by Alexander Gepperth (University of Applied Sciences Fulda, Germany), Sebastian Rieger (University of Applied Sciences Fulda, Deutschland)
ES2020-2
A Survey of Machine Learning applied to Computer Networks
Alexander Gepperth, Sebastian Rieger
A Survey of Machine Learning applied to Computer Networks
Alexander Gepperth, Sebastian Rieger
Abstract:
We review the current state of the art in the domain of machine learning applied to computer networks. First of all, we describe recent developments in computer networking and outline the potential fields for machine learning that arise from these developments. We discuss challenges for machine learning in this particular field, namely the inherent big data aspect of computer networks, and the fact that learning very often needs to be conducted in a streaming setting with non-stationary data distributions. We discuss practical issues like privacy protection and computing resources before finally outlining potential technological benefits of this emerging scientific field.
We review the current state of the art in the domain of machine learning applied to computer networks. First of all, we describe recent developments in computer networking and outline the potential fields for machine learning that arise from these developments. We discuss challenges for machine learning in this particular field, namely the inherent big data aspect of computer networks, and the fact that learning very often needs to be conducted in a streaming setting with non-stationary data distributions. We discuss practical issues like privacy protection and computing resources before finally outlining potential technological benefits of this emerging scientific field.
ES2020-191
Anomaly Detection Approach in Cyber Security for User and Entity Behavior Analytics System
Vladimir Muliukha, Alexey Lukashin, Lev Utkin, Mikhail Popov, Anna Meldo
Anomaly Detection Approach in Cyber Security for User and Entity Behavior Analytics System
Vladimir Muliukha, Alexey Lukashin, Lev Utkin, Mikhail Popov, Anna Meldo
Abstract:
This paper presents a prototype of an intelligent system for advanced analytics for integrated security of complex information and cyberphysical systems with the implementation of analytical models and software developed in Peter the Great St. Petersburg Polytechnic University. The article discusses the practical aspects of the application of unsupervised machine learning methods to the tasks of identifying abnormal objects in the field of information security in computer networks. The format of presenting initial data on various events in computer networks is described, as well as the process of preparing a training set for machine learning. The results of detecting anomalies by the Isolation Forest and Local Outlier Factor methods are presented, as well as an analysis of the results.
This paper presents a prototype of an intelligent system for advanced analytics for integrated security of complex information and cyberphysical systems with the implementation of analytical models and software developed in Peter the Great St. Petersburg Polytechnic University. The article discusses the practical aspects of the application of unsupervised machine learning methods to the tasks of identifying abnormal objects in the field of information security in computer networks. The format of presenting initial data on various events in computer networks is described, as well as the process of preparing a training set for machine learning. The results of detecting anomalies by the Isolation Forest and Local Outlier Factor methods are presented, as well as an analysis of the results.
Quantum Machine Learning - Organized by José D. Martín-Guerrero (Universitat de València, Spain), Lucas Lamata (Universidad de Sevilla, Spain)
ES2020-6
Quantum Machine Learning
José D. Martín-Guerrero, Lucas Lamata
Quantum Machine Learning
José D. Martín-Guerrero, Lucas Lamata
Abstract:
Machine Learning (ML) is becoming a more and more popular field of knowledge, being a term known not only in the academic field due to its successful applications to many real-world problems. The advent of Deep Learning and Big Data in the last decade has contributed to make it even more popular. Many companies, both large ones and SMEs, have created specific departments for ML and data analysis, being in fact their main activity in many cases. This current exploitation of ML should not mislead us; while it is a mature field of knowledge, there is still room for many novel contributions, namely, a better understanding of the underlying Mathematics, proposal and tuning of algorithms suitable for new problems (e.g., Natural Language Processing), automation and optimization of the search of parameters, etc. Within this framework of new contributions to ML, Quantum Machine Learning (QML) has emerged strongly lately, speeding up ML calculations and providing alternative representations to existing approaches. This special session includes six high-quality papers dealing with some of the most relevant aspects of QML, including analysis of learning in quantum computing and quantum annealers, quantum versions of classical ML models –like neural networks or learning vector quantization–, and quantum learning approaches for measurement and control.
Machine Learning (ML) is becoming a more and more popular field of knowledge, being a term known not only in the academic field due to its successful applications to many real-world problems. The advent of Deep Learning and Big Data in the last decade has contributed to make it even more popular. Many companies, both large ones and SMEs, have created specific departments for ML and data analysis, being in fact their main activity in many cases. This current exploitation of ML should not mislead us; while it is a mature field of knowledge, there is still room for many novel contributions, namely, a better understanding of the underlying Mathematics, proposal and tuning of algorithms suitable for new problems (e.g., Natural Language Processing), automation and optimization of the search of parameters, etc. Within this framework of new contributions to ML, Quantum Machine Learning (QML) has emerged strongly lately, speeding up ML calculations and providing alternative representations to existing approaches. This special session includes six high-quality papers dealing with some of the most relevant aspects of QML, including analysis of learning in quantum computing and quantum annealers, quantum versions of classical ML models –like neural networks or learning vector quantization–, and quantum learning approaches for measurement and control.
ES2020-203
Machine learning framework for control in classical and quantum domains
Archismita Dalal, Eduardo J. P\'aez, Seyed Shakib Vedaie, Barry C. Sanders
Machine learning framework for control in classical and quantum domains
Archismita Dalal, Eduardo J. P\'aez, Seyed Shakib Vedaie, Barry C. Sanders
Abstract:
Our aim is to construct a framework that relates learning and control for both classical and quantum domains. As an application of the proposed framework, we cast the quantum-control problem of adaptive quantum-enhanced metrology as a supervised learning problem. The novelty of our work lies in the unification of quantum and classical control and learning theories and the pictorial representations of knowledge in these disparate areas. Our work enhances the control toolkit and helps un-confuse this interdisciplinary field of machine learning for control. It also highlights new research directions in areas inter-connecting learning and control.
Our aim is to construct a framework that relates learning and control for both classical and quantum domains. As an application of the proposed framework, we cast the quantum-control problem of adaptive quantum-enhanced metrology as a supervised learning problem. The novelty of our work lies in the unification of quantum and classical control and learning theories and the pictorial representations of knowledge in these disparate areas. Our work enhances the control toolkit and helps un-confuse this interdisciplinary field of machine learning for control. It also highlights new research directions in areas inter-connecting learning and control.
ES2020-180
Understanding and improving unsupervised training of Boltzman machines
Przemys{\l}aw Grzybowski, Gorka Muñoz-Gil, Alejandro Pozas-Kerstjens, Miguel Angel Garcia-March, Maciej Lewenstein
Understanding and improving unsupervised training of Boltzman machines
Przemys{\l}aw Grzybowski, Gorka Muñoz-Gil, Alejandro Pozas-Kerstjens, Miguel Angel Garcia-March, Maciej Lewenstein
Abstract:
We have analyzed the training of Boltzmann machines under the perspective of statistical physics. We argue that training models in spin-glass regime is highly inefficient and unnecessary. To that end, previously we have presented RAPID, a method to control the frustration of spin models and to train them without the need of expensive sampling methods. In this contribution we study effects of initialising Boltzmann machines in easily sampling regime and training with standard methods.
We have analyzed the training of Boltzmann machines under the perspective of statistical physics. We argue that training models in spin-glass regime is highly inefficient and unnecessary. To that end, previously we have presented RAPID, a method to control the frustration of spin models and to train them without the need of expensive sampling methods. In this contribution we study effects of initialising Boltzmann machines in easily sampling regime and training with standard methods.
ES2020-90
Quantum-Inspired Learning Vector Quantization for Classification Learning
Thomas Villmann, Jensun Ravichandran, Alexander Engelsberger, Andrea Villmann, Marika Kaden
Quantum-Inspired Learning Vector Quantization for Classification Learning
Thomas Villmann, Jensun Ravichandran, Alexander Engelsberger, Andrea Villmann, Marika Kaden
Abstract:
This paper introduces a variant of the prototype-based generalized learning vector quantization (GLVQ) for classification learning inspired by quantum computing. Starting from the motivation of kernelized GLVQ, the nonlinear transformation of real data and prototypes into quantum bit vectors allows to formulate a GLVQ variant in a ($n$-dimensional) quantum bit vector space $\mathscr{\mathcal{H}}^{n}$. A key feature for this approch is that $\mathscr{\mathcal{H}}^{n}$ is an Hilbert space with particular inner product properties, which finally restrict the prototype adaptation to be unitary transformations. The resulting approach is denoted as Qu-GLVQ.
This paper introduces a variant of the prototype-based generalized learning vector quantization (GLVQ) for classification learning inspired by quantum computing. Starting from the motivation of kernelized GLVQ, the nonlinear transformation of real data and prototypes into quantum bit vectors allows to formulate a GLVQ variant in a ($n$-dimensional) quantum bit vector space $\mathscr{\mathcal{H}}^{n}$. A key feature for this approch is that $\mathscr{\mathcal{H}}^{n}$ is an Hilbert space with particular inner product properties, which finally restrict the prototype adaptation to be unitary transformations. The resulting approach is denoted as Qu-GLVQ.
ES2020-30
An quantum algorithm for feedforward neural networks tested on existing quantum hardware
Daniele Bajoni, Dario Gerace, Chiara Macchiavello, Francesco Tacchino, Panagiotis Barkoutsos, Ivano Tavernelli
An quantum algorithm for feedforward neural networks tested on existing quantum hardware
Daniele Bajoni, Dario Gerace, Chiara Macchiavello, Francesco Tacchino, Panagiotis Barkoutsos, Ivano Tavernelli
Abstract:
We present a memory-efficient quantum algorithm implementing the action of an artificial neuron according to a binary-valued model of the classical perceptron. The algorithm, tested on noisy IBM-Q superconducting real quantum processors, succeeds in elementary classification and image-recognition tasks through a hybrid quantum-classical training procedure. Here we also show that this model is amenable to be extended to a multilayered artificial neural network, which is able to solve a task that would be impossible to a single one of its constituent artificial neurons, thus laying the basis for a fully quantum artificial intelligence algorithm run on noisy intermediate-scale quantum hardware.
We present a memory-efficient quantum algorithm implementing the action of an artificial neuron according to a binary-valued model of the classical perceptron. The algorithm, tested on noisy IBM-Q superconducting real quantum processors, succeeds in elementary classification and image-recognition tasks through a hybrid quantum-classical training procedure. Here we also show that this model is amenable to be extended to a multilayered artificial neural network, which is able to solve a task that would be impossible to a single one of its constituent artificial neurons, thus laying the basis for a fully quantum artificial intelligence algorithm run on noisy intermediate-scale quantum hardware.
ES2020-195
Approximating Archetypal Analysis Using Quantum Annealing
Sebastian Feld, Christoph Roch, Katja Geirhos, Thomas Gabor
Approximating Archetypal Analysis Using Quantum Annealing
Sebastian Feld, Christoph Roch, Katja Geirhos, Thomas Gabor
Abstract:
Archetypes are those extreme values of a data set that can jointly represent all other data points. They often have descriptive meanings and can thus contribute to the understanding of the data. Such archetypes are identified using archetypal analysis and all data points are represented as convex combinations thereof. In this work, archetypal analysis is linked with quantum annealing. For both steps, i.e. the determination of archetypes and the assignment of data points, we derive a QUBO formulation which is solved on D-Wave's 2000Q Quantum Annealer. For selected data sets, called \textit{toy} and \textit{iris}, our quantum annealing-based approach can achieve similar results to the original R-package ``archetypes''.
Archetypes are those extreme values of a data set that can jointly represent all other data points. They often have descriptive meanings and can thus contribute to the understanding of the data. Such archetypes are identified using archetypal analysis and all data points are represented as convex combinations thereof. In this work, archetypal analysis is linked with quantum annealing. For both steps, i.e. the determination of archetypes and the assignment of data points, we derive a QUBO formulation which is solved on D-Wave's 2000Q Quantum Annealer. For selected data sets, called \textit{toy} and \textit{iris}, our quantum annealing-based approach can achieve similar results to the original R-package ``archetypes''.
ES2020-197
Explorations in Quantum Neural Networks with Intermediate Measurements
Lukas Franken, Bogdan Georgiev
Explorations in Quantum Neural Networks with Intermediate Measurements
Lukas Franken, Bogdan Georgiev
Abstract:
In this short note we explore a few quantum circuits with the particular goal of basic image recognition. The models we study are inspired by recent progress in Quantum Convolution Neural Networks (QCNN) [Cong et al., 2019]. We present a few experimental results, where we attempt to learn basic image patterns motivated by scaling down the MNIST dataset.
In this short note we explore a few quantum circuits with the particular goal of basic image recognition. The models we study are inspired by recent progress in Quantum Convolution Neural Networks (QCNN) [Cong et al., 2019]. We present a few experimental results, where we attempt to learn basic image patterns motivated by scaling down the MNIST dataset.
Recurrent networks and reinforcement learning
ES2020-161
A Distributed Neural Network Architecture for Robust Non-Linear Spatio-Temporal Prediction
Matthias Karlbauer, Sebastian Otte, Hendrik Lensch, Thomas Scholten, Volker Wulfmeyer, Martin Butz
A Distributed Neural Network Architecture for Robust Non-Linear Spatio-Temporal Prediction
Matthias Karlbauer, Sebastian Otte, Hendrik Lensch, Thomas Scholten, Volker Wulfmeyer, Martin Butz
Abstract:
DISTANA -- a distributed spatio-temporal artificial neural network architecture -- learns to model and predict spatio-temporal time series dynamics. It learns in a parallel, spatially distributed manner while employing a mesh of recurrent, neural prediction kernels (PKs). Individual PKs predict the local data stream and exchange information laterally. DISTANA essentially assumes that generally applicable causes, which may be locally modified, generate the observed data. We show that DISTANA scales and generalizes to large problem spaces, can approximate complex dynamics, and is robust to overfitting, outperforming other competitive ANNs.
DISTANA -- a distributed spatio-temporal artificial neural network architecture -- learns to model and predict spatio-temporal time series dynamics. It learns in a parallel, spatially distributed manner while employing a mesh of recurrent, neural prediction kernels (PKs). Individual PKs predict the local data stream and exchange information laterally. DISTANA essentially assumes that generally applicable causes, which may be locally modified, generate the observed data. We show that DISTANA scales and generalizes to large problem spaces, can approximate complex dynamics, and is robust to overfitting, outperforming other competitive ANNs.
ES2020-187
Softmax Recurrent Unit: A new type of RNN cell
Lucas Vos, Twan van Laarhoven
Softmax Recurrent Unit: A new type of RNN cell
Lucas Vos, Twan van Laarhoven
Abstract:
Recurrent Neural Networks (RNNs) have been very successful in many state-of-the-art solutions for natural language tasks like machine translation. However, LSTM, the most common RNN cell, is complex and utilizes a lot of components. We present the Softmax Recurrent Unit (SMRU), a novel and elegant design of a new type of RNN cell. The SMRU has a simple structure, which is solely based around the softmax function. We present four different variants of the SMRU and compare them to both the LSTM and GRU on various tasks and datasets. These experiments show that the SMRU achieves competitive performance, surpassing either the LSTM or the GRU on any the given task, while having a much simpler design.
Recurrent Neural Networks (RNNs) have been very successful in many state-of-the-art solutions for natural language tasks like machine translation. However, LSTM, the most common RNN cell, is complex and utilizes a lot of components. We present the Softmax Recurrent Unit (SMRU), a novel and elegant design of a new type of RNN cell. The SMRU has a simple structure, which is solely based around the softmax function. We present four different variants of the SMRU and compare them to both the LSTM and GRU on various tasks and datasets. These experiments show that the SMRU achieves competitive performance, surpassing either the LSTM or the GRU on any the given task, while having a much simpler design.
ES2020-108
Language Grounded Task-Adaptation in Reinforcement Learning
Matthias Hutsebaut-Buysse, Kevin Mets, Steven Latré
Language Grounded Task-Adaptation in Reinforcement Learning
Matthias Hutsebaut-Buysse, Kevin Mets, Steven Latré
Abstract:
Over its lifetime, a Reinforcement Learning agent is often instructed to perform different tasks. How to efficiently adapt a previously learned control policy from one task to another, remains an open research question. In this paper, we investigate how instructions formulated in natural language can enable faster and more effective task adaptation. Our proposed method is capable of assessing, given a set of developed base control policies, which base policy will be the most qualified to adapt to a new unseen task.
Over its lifetime, a Reinforcement Learning agent is often instructed to perform different tasks. How to efficiently adapt a previously learned control policy from one task to another, remains an open research question. In this paper, we investigate how instructions formulated in natural language can enable faster and more effective task adaptation. Our proposed method is capable of assessing, given a set of developed base control policies, which base policy will be the most qualified to adapt to a new unseen task.
ES2020-100
Object-centered Fourier Motion Estimation and Segment-Transformation Prediction
moritz wolter, Angela Yao, Sven Behnke
Object-centered Fourier Motion Estimation and Segment-Transformation Prediction
moritz wolter, Angela Yao, Sven Behnke
Abstract:
The ability to anticipate the future is essential for action planning in autonomous systems. To this end, learning video prediction methods have been developed, but current systems often produce blurred predictions. We address this issue by introducing an object-centered movement estimation, frame prediction, and correction framework using frequency-domain approaches. We transform single objects based on estimated translation and rotation speeds which we correct using a learned encoding of the past. This results in clear predictions with few parameters. Experimental evaluation shows that our approach is accurate and computationally efficient.
The ability to anticipate the future is essential for action planning in autonomous systems. To this end, learning video prediction methods have been developed, but current systems often produce blurred predictions. We address this issue by introducing an object-centered movement estimation, frame prediction, and correction framework using frequency-domain approaches. We transform single objects based on estimated translation and rotation speeds which we correct using a learned encoding of the past. This results in clear predictions with few parameters. Experimental evaluation shows that our approach is accurate and computationally efficient.
ES2020-123
Recurrent Feedback Improves Recognition of Partially Occluded Objects
Markus Roland Ernst, Jochen Triesch, Thomas Burwick
Recurrent Feedback Improves Recognition of Partially Occluded Objects
Markus Roland Ernst, Jochen Triesch, Thomas Burwick
Abstract:
Recurrent connectivity in the visual cortex is believed to aid object recognition for challenging conditions such as occlusion. Here we investigate if and how artificial neural networks also benefit from recurrence. We compare architectures composed of bottom-up, lateral and top-down connections and evaluate their performance using two novel stereoscopic occluded object datasets. We find that classification accuracy is significantly higher for recurrent models when compared to feedforward models of matched parametric complexity. Additionally we show that for challenging stimuli, the recurrent feedback is able to correctly revise the initial feedforward guess.
Recurrent connectivity in the visual cortex is believed to aid object recognition for challenging conditions such as occlusion. Here we investigate if and how artificial neural networks also benefit from recurrence. We compare architectures composed of bottom-up, lateral and top-down connections and evaluate their performance using two novel stereoscopic occluded object datasets. We find that classification accuracy is significantly higher for recurrent models when compared to feedforward models of matched parametric complexity. Additionally we show that for challenging stimuli, the recurrent feedback is able to correctly revise the initial feedforward guess.
ES2020-174
Sequence Classification using Ensembles of Recurrent Generative Expert Modules
Marius Hobbhahn, Martin Butz, Sarah Fabi, Sebastian Otte
Sequence Classification using Ensembles of Recurrent Generative Expert Modules
Marius Hobbhahn, Martin Butz, Sarah Fabi, Sebastian Otte
Abstract:
Successful discriminative deep learning relies on large amounts of data and proper domain coverage. We introduce an ensemble of recurrent generative modules, achieving robust and effective sequence classification facing sparse data. Each module is an expert for only a few variations of a certain class. Given an input trajectory, the latent codes of the experts are adapted via back-propagation of the reconstruction error and the most accurate expert yields the class. In comparison with direct discriminative models, our approach achieves better classification rates with fewer training examples, can be easily extended (lifelong learning), and provides fully transparent decisions (explainable AI).
Successful discriminative deep learning relies on large amounts of data and proper domain coverage. We introduce an ensemble of recurrent generative modules, achieving robust and effective sequence classification facing sparse data. Each module is an expert for only a few variations of a certain class. Given an input trajectory, the latent codes of the experts are adapted via back-propagation of the reconstruction error and the most accurate expert yields the class. In comparison with direct discriminative models, our approach achieves better classification rates with fewer training examples, can be easily extended (lifelong learning), and provides fully transparent decisions (explainable AI).
ES2020-84
Epistemic Risk-Sensitive Reinforcement Learning
Hannes Eriksson, Christos Dimitrakakis
Epistemic Risk-Sensitive Reinforcement Learning
Hannes Eriksson, Christos Dimitrakakis
Abstract:
We develop a framework for risk-sensitive behaviour in reinforcement learning (RL) due to uncertainty about the environment dynamics by leveraging utility-based definitions of risk sensitivity. In this framework, the preference for risk can be tuned by varying the utility function, for which we develop dynamic programming (DP) and policy gradient-based algorithms. The risk-averse behavior is compared with the behavior of risk-neutral policy in environments with epistemic risk.
We develop a framework for risk-sensitive behaviour in reinforcement learning (RL) due to uncertainty about the environment dynamics by leveraging utility-based definitions of risk sensitivity. In this framework, the preference for risk can be tuned by varying the utility function, for which we develop dynamic programming (DP) and policy gradient-based algorithms. The risk-averse behavior is compared with the behavior of risk-neutral policy in environments with epistemic risk.
ES2020-204
Tournament Selection Improves Cartesian Genetic Programming for Atari Games
Tim Cofala, Lars Elend, Oliver Kramer
Tournament Selection Improves Cartesian Genetic Programming for Atari Games
Tim Cofala, Lars Elend, Oliver Kramer
Abstract:
The objective of this paper is to extend Cartesian Genetic Programming (CGP) for the evolution of Atari game agents in the Arcade Learning Environment. Based upon preliminary work on the use of CGP playing Atari games, we propose extensions like the repeated evaluation of elite solutions. Furthermore, we improve the CGP optimization process by increasing the diversity in the population with tournament selection. Experimental studies on four exemplary Atari games show that the modifications decrease premature stagnation during the evolutionary optimization process and result in more robust agent strategies.
The objective of this paper is to extend Cartesian Genetic Programming (CGP) for the evolution of Atari game agents in the Arcade Learning Environment. Based upon preliminary work on the use of CGP playing Atari games, we propose extensions like the repeated evaluation of elite solutions. Furthermore, we improve the CGP optimization process by increasing the diversity in the population with tournament selection. Experimental studies on four exemplary Atari games show that the modifications decrease premature stagnation during the evolutionary optimization process and result in more robust agent strategies.
ES2020-23
Handling missing data in recurrent neural networks for air quality forecasting
Michel Tokic, Anja von Beuningen, Christoph Tietz, Hans-Georg Zimmermann
Handling missing data in recurrent neural networks for air quality forecasting
Michel Tokic, Anja von Beuningen, Christoph Tietz, Hans-Georg Zimmermann
Abstract:
Practical applications of air quality forecasting, which typically provide predictions over a horizon of hours and days, often require the handling of missing data due to unobserved relevant variables, sensor defects or communication outages. In this paper we discuss two aspects being important when building air quality forecasting models for essential air pollution parameters such as particular matter and nitrogen dioxides. Using a specialized architecture of a recurrent neural network, we can build models even if (1) unobserved variables or (2) missing data are present.
Practical applications of air quality forecasting, which typically provide predictions over a horizon of hours and days, often require the handling of missing data due to unobserved relevant variables, sensor defects or communication outages. In this paper we discuss two aspects being important when building air quality forecasting models for essential air pollution parameters such as particular matter and nitrogen dioxides. Using a specialized architecture of a recurrent neural network, we can build models even if (1) unobserved variables or (2) missing data are present.
Unsupervised learning
ES2020-40
Self-organizing maps in manifolds with complex topologies: An application to the planning of closed path for indoor UAV patrols
Hervé Frezza-Buet
Self-organizing maps in manifolds with complex topologies: An application to the planning of closed path for indoor UAV patrols
Hervé Frezza-Buet
Abstract:
In this paper, the ability of 1D-SOMs to address the Euclidian Travelling Salesperson problem is extended to more irregular topologies, in order to compute short closed paths covering an indoor environment. In such environments, wall constraints makes the topology of the area to be visited by a patroller very irregular. An application to indoor unmanned aerial vehicule (UAV) security patrols is considered.
In this paper, the ability of 1D-SOMs to address the Euclidian Travelling Salesperson problem is extended to more irregular topologies, in order to compute short closed paths covering an indoor environment. In such environments, wall constraints makes the topology of the area to be visited by a patroller very irregular. An application to indoor unmanned aerial vehicule (UAV) security patrols is considered.
ES2020-81
Detection of abnormal driving situations using distributed representations and unsupervised learning
Florian Mirus, Terrence C. Stewart, Jörg Conradt
Detection of abnormal driving situations using distributed representations and unsupervised learning
Florian Mirus, Terrence C. Stewart, Jörg Conradt
Abstract:
In this paper, we present an anomaly detection system employing an unsupervised learning model trained on the information encapsulated within distributed vector representations of automotive scenes. Our representations allows us to encode automotive scenes with a varying number of traffic participants in a vector of fixed length. We train a neural network autoencoder in unsupervised fashion to detect anomalies based on this representation. We demonstrate the usefulness of our approach through a quantitative analysis on two real-world data-sets.
In this paper, we present an anomaly detection system employing an unsupervised learning model trained on the information encapsulated within distributed vector representations of automotive scenes. Our representations allows us to encode automotive scenes with a varying number of traffic participants in a vector of fixed length. We train a neural network autoencoder in unsupervised fashion to detect anomalies based on this representation. We demonstrate the usefulness of our approach through a quantitative analysis on two real-world data-sets.
ES2020-48
Comparison of Cluster Validity Indices and Decision Rules for Different Degrees of Cluster Separation
Sara Kaczynska, Rebecca Marion, Rainer von Sachs
Comparison of Cluster Validity Indices and Decision Rules for Different Degrees of Cluster Separation
Sara Kaczynska, Rebecca Marion, Rainer von Sachs
Abstract:
Clustering algorithms are powerful tools for data exploration but often require the a priori choice of the number of clusters. In practice, cluster validity indices (CVIs) are used to quantify the clustering structure of candidate partitions, then decision rules are applied to the indices to choose the best number of clusters. This study analyzes how dimensionality and the degree of cluster separation impact the choice of the number of clusters according to 7 different indices and various decision rules. In contrast to previous studies, the degree of cluster separation is controlled by a single parameter and several decision rules are tested for each CVI.
Clustering algorithms are powerful tools for data exploration but often require the a priori choice of the number of clusters. In practice, cluster validity indices (CVIs) are used to quantify the clustering structure of candidate partitions, then decision rules are applied to the indices to choose the best number of clusters. This study analyzes how dimensionality and the degree of cluster separation impact the choice of the number of clusters according to 7 different indices and various decision rules. In contrast to previous studies, the degree of cluster separation is controlled by a single parameter and several decision rules are tested for each CVI.
Feature selection and dimensionality reduction
ES2020-138
Sparse Metric Learning in Prototype-based Classification
Johannes Brinkrolf, Barbara Hammer
Sparse Metric Learning in Prototype-based Classification
Johannes Brinkrolf, Barbara Hammer
Abstract:
Metric learning schemes can greatly enhance distance-based classifiers, and provide additional model functionality such as interpretability in terms of feature relevance weights. In particular for high dimensional data, it is desirable to obtain sparse feature relevance weights for higher efficiency and interpretability. In this contribution, a new feature selection scheme is proposed for prototype-based classification models with adaptive metric learning. More precisely, we integrate the group lasso penalty and a subsequent optimization of sparsity while leaving the mapping invariant. We evaluate the performance on a variety of benchmarks.
Metric learning schemes can greatly enhance distance-based classifiers, and provide additional model functionality such as interpretability in terms of feature relevance weights. In particular for high dimensional data, it is desirable to obtain sparse feature relevance weights for higher efficiency and interpretability. In this contribution, a new feature selection scheme is proposed for prototype-based classification models with adaptive metric learning. More precisely, we integrate the group lasso penalty and a subsequent optimization of sparsity while leaving the mapping invariant. We evaluate the performance on a variety of benchmarks.
ES2020-21
Joint optimization of predictive performance and selection stability
Victor Hamer, Pierre Dupont
Joint optimization of predictive performance and selection stability
Victor Hamer, Pierre Dupont
Abstract:
Current feature selection methods, especially applied to high dimensional data, tend to suffer from instability since marginal modifications in the data may result in largely distinct selected feature sets. Such instability strongly limits a sound interpretation of the selected variables by domain experts. We address this issue by optimizing jointly the predictive accuracy and selection stability and by deriving Pareto-optimal trajectories. Our approach extends the Recursive Feature Elimination algorithm by enforcing the selection of some features based on a stable, univariate criterion. Experiments conducted on several high dimensional microarray datasets illustrate that large stability gains are obtained with no significant drop of accuracy.
Current feature selection methods, especially applied to high dimensional data, tend to suffer from instability since marginal modifications in the data may result in largely distinct selected feature sets. Such instability strongly limits a sound interpretation of the selected variables by domain experts. We address this issue by optimizing jointly the predictive accuracy and selection stability and by deriving Pareto-optimal trajectories. Our approach extends the Recursive Feature Elimination algorithm by enforcing the selection of some features based on a stable, univariate criterion. Experiments conducted on several high dimensional microarray datasets illustrate that large stability gains are obtained with no significant drop of accuracy.
ES2020-85
Perplexity-free Parametric t-SNE
Francesco Crecchi, Cyril de Bodt, Michel Verleysen, Lee John, Davide Bacciu
Perplexity-free Parametric t-SNE
Francesco Crecchi, Cyril de Bodt, Michel Verleysen, Lee John, Davide Bacciu
Abstract:
The t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm is a ubiquitously employed dimensionality reduction (DR) method. Its non-parametric nature and impressive efficacy motivated its parametric extension. It is however bounded to a user-defined perplexity parameter, restricting its DR quality compared to recently developed multi-scale perplexity-free approaches. This paper hence proposes a multi-scale parametric t-SNE scheme, relieved from the perplexity tuning and with a deep neural network implementing the mapping. It produces reliable embeddings with out-of-sample extensions, competitive with the best perplexity adjustments in terms of neighborhood preservation on multiple data sets
The t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm is a ubiquitously employed dimensionality reduction (DR) method. Its non-parametric nature and impressive efficacy motivated its parametric extension. It is however bounded to a user-defined perplexity parameter, restricting its DR quality compared to recently developed multi-scale perplexity-free approaches. This paper hence proposes a multi-scale parametric t-SNE scheme, relieved from the perplexity tuning and with a deep neural network implementing the mapping. It produces reliable embeddings with out-of-sample extensions, competitive with the best perplexity adjustments in terms of neighborhood preservation on multiple data sets
ES2020-105
Explaining t-SNE Embeddings Locally by Adapting LIME
Adrien Bibal, Viet Minh VU, Géraldin Nanfack, Benoit Frénay
Explaining t-SNE Embeddings Locally by Adapting LIME
Adrien Bibal, Viet Minh VU, Géraldin Nanfack, Benoit Frénay
Abstract:
Non-linear dimensionality reduction techniques, such as t-SNE, are widely used to visualize and analyze high-dimensional datasets. While non-linear projections can be of high quality, it is hard, or even impossible, to interpret the dimensions of the obtained embeddings. This paper adapts LIME to locally explain t-SNE embeddings. More precisely, the sampling and black-box-querying steps of LIME are modified so that they can be used to explain t-SNE locally. The result of the proposal is to provide, for a particular instance x and a particular t-SNE embedding Y, an interpretable model that locally explains the projection of x on Y.
Non-linear dimensionality reduction techniques, such as t-SNE, are widely used to visualize and analyze high-dimensional datasets. While non-linear projections can be of high quality, it is hard, or even impossible, to interpret the dimensions of the obtained embeddings. This paper adapts LIME to locally explain t-SNE embeddings. More precisely, the sampling and black-box-querying steps of LIME are modified so that they can be used to explain t-SNE locally. The result of the proposal is to provide, for a particular instance x and a particular t-SNE embedding Y, an interpretable model that locally explains the projection of x on Y.
ES2020-167
Do we need hundreds of classifiers or a good feature selection?
Laura Morán-Fernández, Verónica Bolón-Canedo, Amparo Alonso-Betanzos
Do we need hundreds of classifiers or a good feature selection?
Laura Morán-Fernández, Verónica Bolón-Canedo, Amparo Alonso-Betanzos
Abstract:
The task of choosing the appropriate classifier for a problem is not an easy-to-solve question due to the high number of algorithms available belonging to different families. Most of these classification algorithms exhibit a degradation in the performance when faced with many irrelevant and/or redundant features. Thus, in this work we analyze the impact of feature selection in classification. Experimental results over ten synthetic datasets show that the significance of selecting a classifier decreases after applying an appropriate preprocessing step and, not only this alleviates the choice, but it also improves the results in almost all classifiers tested.
The task of choosing the appropriate classifier for a problem is not an easy-to-solve question due to the high number of algorithms available belonging to different families. Most of these classification algorithms exhibit a degradation in the performance when faced with many irrelevant and/or redundant features. Thus, in this work we analyze the impact of feature selection in classification. Experimental results over ten synthetic datasets show that the significance of selecting a classifier decreases after applying an appropriate preprocessing step and, not only this alleviates the choice, but it also improves the results in almost all classifiers tested.
ES2020-13
Random Projection in supervised non-stationary environments
Moritz Heusinger, Frank-Michael Schleif
Random Projection in supervised non-stationary environments
Moritz Heusinger, Frank-Michael Schleif
Abstract:
Random Projection (RP) is a popular and efficient technique to preprocess high-dimensional data and to reduce its dimensionality. While RP has been widely used and evaluated in stationary data analysis scenarios, non-stationary environments are not well analyzed. In this paper we provide a profound evaluation of RP on streaming data. We discuss how RP can be bounded for streaming data using the Johnson-Lindenstrauss (JL) lemma. In particular we analyze the effect of concept drift, as a key challenge for streaming data. We also provide experiments with RP on streaming data, using state-of-the-art streaming classifiers like Adaptive Hoeffding Tree, to evaluate its efficiency.
Random Projection (RP) is a popular and efficient technique to preprocess high-dimensional data and to reduce its dimensionality. While RP has been widely used and evaluated in stationary data analysis scenarios, non-stationary environments are not well analyzed. In this paper we provide a profound evaluation of RP on streaming data. We discuss how RP can be bounded for streaming data using the Johnson-Lindenstrauss (JL) lemma. In particular we analyze the effect of concept drift, as a key challenge for streaming data. We also provide experiments with RP on streaming data, using state-of-the-art streaming classifiers like Adaptive Hoeffding Tree, to evaluate its efficiency.
ES2020-16
On Feature Selection Using Anisotropic General Regression Neural Network
Federico Amato, Fabian Guignard, Philippe Jacquet, Mikhail Kanevski
On Feature Selection Using Anisotropic General Regression Neural Network
Federico Amato, Fabian Guignard, Philippe Jacquet, Mikhail Kanevski
Abstract:
The presence of irrelevant features in the input dataset tends to reduce the interpretability and predictive quality of machine learning models. Therefore, the development of feature selection methods to recognize irrelevant features is a crucial topic in machine learning. Here we show how the General Regression Neural Network used with an anisotropic Gaussian Kernel can be used to perform feature selection. A number of numerical experiments are conducted using simulated data to study the robustness of the proposed methodology and its sensitivity to sample size. Finally, a comparison with four other feature selection methods is performed on several real world datasets.
The presence of irrelevant features in the input dataset tends to reduce the interpretability and predictive quality of machine learning models. Therefore, the development of feature selection methods to recognize irrelevant features is a crucial topic in machine learning. Here we show how the General Regression Neural Network used with an anisotropic Gaussian Kernel can be used to perform feature selection. A number of numerical experiments are conducted using simulated data to study the robustness of the proposed methodology and its sensitivity to sample size. Finally, a comparison with four other feature selection methods is performed on several real world datasets.
Statistical learning and optimization
ES2020-80
A preconditioned accelerated stochastic gradient descent algorithm
Alexandru Onose, Seyed Iman Mossavat, Henk-Jan H. Smilde
A preconditioned accelerated stochastic gradient descent algorithm
Alexandru Onose, Seyed Iman Mossavat, Henk-Jan H. Smilde
Abstract:
We propose a preconditioned accelerated stochastic gradient method suitable for large scale optimization. Inspired by recent popular adaptive per-feature algorithms, we propose a specific preconditioner based on the second moment of the gradient. We derive sufficient convergence conditions for the minimization of convex functions using a generic class of diagonal preconditioners and provide a formal convergence proof based on a framework originally used for on-line learning. We show empirical results for the minimization of convex and non-convex cost functions, in the context of neural network training. The method compares favorably with respect to current, first order, stochastic optimization methods.
We propose a preconditioned accelerated stochastic gradient method suitable for large scale optimization. Inspired by recent popular adaptive per-feature algorithms, we propose a specific preconditioner based on the second moment of the gradient. We derive sufficient convergence conditions for the minimization of convex functions using a generic class of diagonal preconditioners and provide a formal convergence proof based on a framework originally used for on-line learning. We show empirical results for the minimization of convex and non-convex cost functions, in the context of neural network training. The method compares favorably with respect to current, first order, stochastic optimization methods.
ES2020-74
Improving the Union Bound: a Distribution Dependent Approach
Luca Oneto, Sandro Ridella, Davide Anguita
Improving the Union Bound: a Distribution Dependent Approach
Luca Oneto, Sandro Ridella, Davide Anguita
Abstract:
Statistical Learning Theory deals with the problem of estimating the performance of a learning procedure. Any learning procedure implies making choices and this choices imply a risk. When the number of choices is finite, the state-of-the-art tool for evaluating the total risk of all the choice made is the Union Bound. The problem of the Union Bound is that it is very loose in practice if no a-priori information is available. In fact, the Union Bound considers all choices equally plausible while, as a matter of fact, a learning procedure targets just particular choices disregarding the others. In this work we will show that it is possible to improve the Union Bound based results using a distribution dependent weighting strategy of the true risks associated to each choice. Then we will prove that our proposal outperforms or, in the worst case, it degenerate in the Union Bound.
Statistical Learning Theory deals with the problem of estimating the performance of a learning procedure. Any learning procedure implies making choices and this choices imply a risk. When the number of choices is finite, the state-of-the-art tool for evaluating the total risk of all the choice made is the Union Bound. The problem of the Union Bound is that it is very loose in practice if no a-priori information is available. In fact, the Union Bound considers all choices equally plausible while, as a matter of fact, a learning procedure targets just particular choices disregarding the others. In this work we will show that it is possible to improve the Union Bound based results using a distribution dependent weighting strategy of the true risks associated to each choice. Then we will prove that our proposal outperforms or, in the worst case, it degenerate in the Union Bound.
ES2020-176
Compressive Learning of Generative Networks
Vincent Schellekens, Laurent Jacques
Compressive Learning of Generative Networks
Vincent Schellekens, Laurent Jacques
Abstract:
Generative networks implicitly approximate complex densities from their sampling with impressive accuracy. However, because of the enormous scale of modern datasets, this training process is often computationally expensive. We cast generative network training into the recent framework of compressive learning: we reduce the computational burden of large-scale datasets by first harshly compressing them in a single pass as a single sketch vector. We then propose a cost function, which approximates the Maximum Mean Discrepancy metric, but requires only this sketch, which makes it time- and memory-efficient to optimize.
Generative networks implicitly approximate complex densities from their sampling with impressive accuracy. However, because of the enormous scale of modern datasets, this training process is often computationally expensive. We cast generative network training into the recent framework of compressive learning: we reduce the computational burden of large-scale datasets by first harshly compressing them in a single pass as a single sketch vector. We then propose a cost function, which approximates the Maximum Mean Discrepancy metric, but requires only this sketch, which makes it time- and memory-efficient to optimize.
ES2020-8
Learning Step Size Adaptation in Evolution Strategies
Oliver Kramer
Learning Step Size Adaptation in Evolution Strategies
Oliver Kramer
Abstract:
Step size adaptation is an essential part of successful evolution strategies in continuous solution spaces as they moderate between exploration and exploitation. We propose to learn step sizes evolved with a sigma-self-adaptive (1+lambda)-ES using LSTMs. Based on input sequences of multi-variate distances between best solutions of successive generations and their step sizes a long short-term memory network (LSTM) is trained. The learned distances-step size pairs guide the search of the LSTM-ES, which is a (1+lambda)-ES with LSTM step size predictions. An experimental analysis illustrates the behavior of the LSTM-ES on the Sphere function with different parameter settings and problem dimensionalities.
Step size adaptation is an essential part of successful evolution strategies in continuous solution spaces as they moderate between exploration and exploitation. We propose to learn step sizes evolved with a sigma-self-adaptive (1+lambda)-ES using LSTMs. Based on input sequences of multi-variate distances between best solutions of successive generations and their step sizes a long short-term memory network (LSTM) is trained. The learned distances-step size pairs guide the search of the LSTM-ES, which is a (1+lambda)-ES with LSTM step size predictions. An experimental analysis illustrates the behavior of the LSTM-ES on the Sphere function with different parameter settings and problem dimensionalities.
Tensor Decompositions in Deep Learning - organized by Davide Bacciu (Università di Pisa, Italy), Danilo Mandic (Imperial College, United Kingdom)
ES2020-3
Tensor Decompositions in Deep Learning
Davide Bacciu, Danilo Mandic
Tensor Decompositions in Deep Learning
Davide Bacciu, Danilo Mandic
Abstract:
The paper surveys the topic of tensor decompositions in modern machine learning applications. It focuses on three active research topics of significant relevance for the community. After a brief review of consolidated works on multi-way data analysis, we consider the use of tensor decompositions in compressing the parameter space of deep learning models. Lastly, we discuss how tensor methods can be leveraged to yield richer adaptive representations of complex data, including structured information. The paper concludes with a discussion on interesting open research challenges.
The paper surveys the topic of tensor decompositions in modern machine learning applications. It focuses on three active research topics of significant relevance for the community. After a brief review of consolidated works on multi-way data analysis, we consider the use of tensor decompositions in compressing the parameter space of deep learning models. Lastly, we discuss how tensor methods can be leveraged to yield richer adaptive representations of complex data, including structured information. The paper concludes with a discussion on interesting open research challenges.
ES2020-144
Tensor Decompositions in Recursive Neural Networks for Tree-Structured Data
Daniele Castellana, Davide Bacciu
Tensor Decompositions in Recursive Neural Networks for Tree-Structured Data
Daniele Castellana, Davide Bacciu
Abstract:
The paper introduces two new aggregation functions to encode structural knowledge from tree-structured data. They leverage the Canonical and Tensor-Train decompositions to yield expressive context aggregation while limiting the number of model parameters. Finally, we define two novel neural recursive models for trees leveraging such aggregation functions, and we test them on two tree classification tasks, showing the advantage of proposed models when tree outdegree increases.
The paper introduces two new aggregation functions to encode structural knowledge from tree-structured data. They leverage the Canonical and Tensor-Train decompositions to yield expressive context aggregation while limiting the number of model parameters. Finally, we define two novel neural recursive models for trees leveraging such aggregation functions, and we test them on two tree classification tasks, showing the advantage of proposed models when tree outdegree increases.
ES2020-202
Mining Temporal Changes in Strengths and Weaknesses of Cricket Players Using Tensor Decomposition
Swarup Ranjan Behera, Vijaya Saradhi
Mining Temporal Changes in Strengths and Weaknesses of Cricket Players Using Tensor Decomposition
Swarup Ranjan Behera, Vijaya Saradhi
Abstract:
In this work, we present an application of tensor decomposition for discrete random variable tensor. In particular, we construct a tensor using cricket short text commentary data by employing domain-specific features. The aim is to understand the temporal changes in the strength rules and weakness rules of a player. Three-way correspondence analysis (TWCA) is employed to obtain the factors that show dependency between batting features, bowling features, and time respectively. Change in strength rules and weakness rules for Australian batsman Steve Smith (Test Rank \#1 ICC player) are presented.
In this work, we present an application of tensor decomposition for discrete random variable tensor. In particular, we construct a tensor using cricket short text commentary data by employing domain-specific features. The aim is to understand the temporal changes in the strength rules and weakness rules of a player. Three-way correspondence analysis (TWCA) is employed to obtain the factors that show dependency between batting features, bowling features, and time respectively. Change in strength rules and weakness rules for Australian batsman Steve Smith (Test Rank \#1 ICC player) are presented.
Image and text analysis
ES2020-127
3D U-Net for Segmentation of Plant Root MRI Images in Super-Resolution
Yi Zhao, Nils Wandel, Magdalena Landl, Andrea Schnepf, Sven Behnke
3D U-Net for Segmentation of Plant Root MRI Images in Super-Resolution
Yi Zhao, Nils Wandel, Magdalena Landl, Andrea Schnepf, Sven Behnke
Abstract:
Magnetic resonance imaging (MRI) enables plant scientists to non-invasively study root system development and root-soil interaction. Challenging recording conditions, such as low resolution and a high level of noise hamper the performance of traditional root extraction algorithms, though. We propose to increase signal-to-noise ratio and resolution by segmenting the scanned volumes into root and soil in super-resolution using a 3D U-Net. Tests on real data show that the trained network is capable to detect most roots successfully and even finds roots that were missed by human annotators. Our experiments show that the segmentation performance can be further improved with modifications of the loss function.
Magnetic resonance imaging (MRI) enables plant scientists to non-invasively study root system development and root-soil interaction. Challenging recording conditions, such as low resolution and a high level of noise hamper the performance of traditional root extraction algorithms, though. We propose to increase signal-to-noise ratio and resolution by segmenting the scanned volumes into root and soil in super-resolution using a 3D U-Net. Tests on real data show that the trained network is capable to detect most roots successfully and even finds roots that were missed by human annotators. Our experiments show that the segmentation performance can be further improved with modifications of the loss function.
ES2020-182
Respiratory Pattern Recognition from Low-Resolution Thermal Imaging
Salla Aario, Ajinkya Gorad, Miika Arvonen, Simo Sarkka
Respiratory Pattern Recognition from Low-Resolution Thermal Imaging
Salla Aario, Ajinkya Gorad, Miika Arvonen, Simo Sarkka
Abstract:
Remote monitoring of vital signs has a wide range of applications. In this paper we propose a method to identify respiratory patterns from low- resolution thermal video data using a nearest neighbor data association (NNDA) and nearest neighbor Kalman filter (NNKF) based algorithms along with multi-class support vector machine (SVM). The method in this work is evaluated against a breathing belt data as a reference, collected from healthy volunteers. Correlation of the proposed method with airflow derived from breathing belt was found to be 0.7. The SVM classifier is able to distinguish between the breathing patterns from derived airflow with 60% accuracy.
Remote monitoring of vital signs has a wide range of applications. In this paper we propose a method to identify respiratory patterns from low- resolution thermal video data using a nearest neighbor data association (NNDA) and nearest neighbor Kalman filter (NNKF) based algorithms along with multi-class support vector machine (SVM). The method in this work is evaluated against a breathing belt data as a reference, collected from healthy volunteers. Correlation of the proposed method with airflow derived from breathing belt was found to be 0.7. The SVM classifier is able to distinguish between the breathing patterns from derived airflow with 60% accuracy.
ES2020-193
Missing Image Data Imputation using Variational Autoencoders with Weighted Loss
Ricardo Cardoso Pereira, Joana Cristo Santos, José Pereira Amorim, Pedro Pereira Rodrigues, Pedro Henriques Abreu
Missing Image Data Imputation using Variational Autoencoders with Weighted Loss
Ricardo Cardoso Pereira, Joana Cristo Santos, José Pereira Amorim, Pedro Pereira Rodrigues, Pedro Henriques Abreu
Abstract:
Missing data is an issue often addressed with imputation strategies that replace the missing values with plausible ones. A trend in these strategies is the use of generative models, one being Variational Autoencoders. However, the default loss function of this method gives the same importance to all data, while a more suitable solution should focus on the missing values. In this work an extension of this method with a custom loss function is introduced (Variational Autoencoder with Weighted Loss). The method was compared with state-of-the-art generative models and the results showed improvements higher than 40% in several settings.
Missing data is an issue often addressed with imputation strategies that replace the missing values with plausible ones. A trend in these strategies is the use of generative models, one being Variational Autoencoders. However, the default loss function of this method gives the same importance to all data, while a more suitable solution should focus on the missing values. In this work an extension of this method with a custom loss function is introduced (Variational Autoencoder with Weighted Loss). The method was compared with state-of-the-art generative models and the results showed improvements higher than 40% in several settings.
ES2020-37
Seq-to-NSeq model for multi-summary generation
Guillaume Le Berre, Christophe Cerisara
Seq-to-NSeq model for multi-summary generation
Guillaume Le Berre, Christophe Cerisara
Abstract:
Summaries of texts and documents written by people present a high variability, depending on the information they want to focus on and their writing style. Despite recent progress in generative models and controllable text generation, automatic summarization systems are still relatively limited in their capacity to both generate various types of summaries and capture this variability from a corpus. We propose to address this challenge with a multi-decoder model for abstractive sentence summarization that generates several summaries from a single input text. This model is an extension of a sequence-to-sequence model in which multiple concurrent decoders with shared attention and embeddings are trained to generate different summaries that capture the variability of styles present in the corpus. The full model is trained jointly with an Expectation-Maximization algorithm. A first qualitative analysis of the resulting decoders reveals clusters that tend to be consistent with respect to a given style, e.g., passive vs. active voice.
Summaries of texts and documents written by people present a high variability, depending on the information they want to focus on and their writing style. Despite recent progress in generative models and controllable text generation, automatic summarization systems are still relatively limited in their capacity to both generate various types of summaries and capture this variability from a corpus. We propose to address this challenge with a multi-decoder model for abstractive sentence summarization that generates several summaries from a single input text. This model is an extension of a sequence-to-sequence model in which multiple concurrent decoders with shared attention and embeddings are trained to generate different summaries that capture the variability of styles present in the corpus. The full model is trained jointly with an Expectation-Maximization algorithm. A first qualitative analysis of the resulting decoders reveals clusters that tend to be consistent with respect to a given style, e.g., passive vs. active voice.
ES2020-196
CNN Encoder to Reduce the Dimensionality of Data Image for Motion Planning
Janderson Ferreira, Agostinho Junior, Yves Mendes Galvao, Bruno Fernandes, Pablo Barros
CNN Encoder to Reduce the Dimensionality of Data Image for Motion Planning
Janderson Ferreira, Agostinho Junior, Yves Mendes Galvao, Bruno Fernandes, Pablo Barros
Abstract:
Many real-world applications need path planning algorithms to solve tasks in different areas, such as social applications, autonomous cars, and tracking activities. And most importantly motion planning. Although the use of path planning is sufficient in most motion planning scenarios, they represent potential bottlenecks in large environments with dynamic changes. To tackle this problem, the number of possible routes could be reduced to make it easier for path planning algorithms to find the shortest path with less efforts. An traditional algorithm for path planning is the A*, it uses an heuristic to work faster than other solutions. In this work, we propose a CNN encoder capable of eliminating useless routes for motion planning problems, then we combine the proposed neural network output with A*. To measure the efficiency of our solution, we propose a database with different scenarios of motion planning problems. The evaluated metric is the number of the iterations to find the shortest path. The A* was compared with the CNN Encoder (proposal) with A*. In all evaluated scenarios, our solution reduced the number of iterations by more than 60\%.
Many real-world applications need path planning algorithms to solve tasks in different areas, such as social applications, autonomous cars, and tracking activities. And most importantly motion planning. Although the use of path planning is sufficient in most motion planning scenarios, they represent potential bottlenecks in large environments with dynamic changes. To tackle this problem, the number of possible routes could be reduced to make it easier for path planning algorithms to find the shortest path with less efforts. An traditional algorithm for path planning is the A*, it uses an heuristic to work faster than other solutions. In this work, we propose a CNN encoder capable of eliminating useless routes for motion planning problems, then we combine the proposed neural network output with A*. To measure the efficiency of our solution, we propose a database with different scenarios of motion planning problems. The evaluated metric is the number of the iterations to find the shortest path. The A* was compared with the CNN Encoder (proposal) with A*. In all evaluated scenarios, our solution reduced the number of iterations by more than 60\%.
Learning from partially labeled data - organized by Siamak Mehrkanoon (Maastricht University, The Netherlands), Xiaolin Huang (Shanghai Jiao Tong University, China), Johan Suykens (KU Leuven, Belgium)
ES2020-1
Learning from partially labeled data
Siamak Mehrkanoon, Xiaolin Huang, Johan Suykens
Learning from partially labeled data
Siamak Mehrkanoon, Xiaolin Huang, Johan Suykens
Abstract:
Providing sufficient labeled training data in many application domains is a laborious and costly task. Designing models that can learn from partially labeled data, or leveraging labeled data in one domain and unlabeled data in a different but related domain is of great interest in many applications. In particular, in this context one can refer to semi-supervised modelling, transfer learning, domain adaptation and multi-view learning among others. There are several possibilities for designing such models ranging from shallow to deep models. These type of models have received increasing interest due to their successful applications in real-life problems. This paper provides a brief overview of recent techniques in learning from partially labeled data.
Providing sufficient labeled training data in many application domains is a laborious and costly task. Designing models that can learn from partially labeled data, or leveraging labeled data in one domain and unlabeled data in a different but related domain is of great interest in many applications. In particular, in this context one can refer to semi-supervised modelling, transfer learning, domain adaptation and multi-view learning among others. There are several possibilities for designing such models ranging from shallow to deep models. These type of models have received increasing interest due to their successful applications in real-life problems. This paper provides a brief overview of recent techniques in learning from partially labeled data.
ES2020-71
Zero-shot and few-shot time series forecasting with ordinal regression recurrent neural networks
Bernardo Pérez Orozco, Stephen J Roberts
Zero-shot and few-shot time series forecasting with ordinal regression recurrent neural networks
Bernardo Pérez Orozco, Stephen J Roberts
Abstract:
Recurrent neural networks (RNNs) are state-of-the-art in several sequential learning tasks, but they often require considerable amounts of data to generalise well. For many time series forecasting (TSF) tasks, only a few dozens of observations may be available at training time, which restricts use of this class of models. We propose a novel RNN-based model that directly addresses this problem by learning a shared feature embedding over the space of many quantised time series. We show how this enables our RNN framework to accurately and reliably forecast unseen time series, even when there is little to no training data available.
Recurrent neural networks (RNNs) are state-of-the-art in several sequential learning tasks, but they often require considerable amounts of data to generalise well. For many time series forecasting (TSF) tasks, only a few dozens of observations may be available at training time, which restricts use of this class of models. We propose a novel RNN-based model that directly addresses this problem by learning a shared feature embedding over the space of many quantised time series. We show how this enables our RNN framework to accurately and reliably forecast unseen time series, even when there is little to no training data available.
ES2020-31
Domain Invariant Representations with Deep Spectral Alignment
Christoph Raab, Peter Meier, Frank-Michael Schleif
Domain Invariant Representations with Deep Spectral Alignment
Christoph Raab, Peter Meier, Frank-Michael Schleif
Abstract:
Similar as traditional algorithms, deep learning networks struggle in generalizing across domain boundaries. A current solution is the simultaneous training of the classification model and the minimization of domain differences in the deep network. In this work, we propose a new unsupervised deep domain adaptation architecture, which trains a classifier and minimizes the difference of spectral properties of the co-variance matrix of the data. Evaluated against standard architectures and datasets, the approach shows an alignment with respect to the data variance between related domains.
Similar as traditional algorithms, deep learning networks struggle in generalizing across domain boundaries. A current solution is the simultaneous training of the classification model and the minimization of domain differences in the deep network. In this work, we propose a new unsupervised deep domain adaptation architecture, which trains a classifier and minimizes the difference of spectral properties of the co-variance matrix of the data. Evaluated against standard architectures and datasets, the approach shows an alignment with respect to the data variance between related domains.
ES2020-120
Weighted Emprirical Risk Minimization: Transfer Learning based on Importance Sampling
Robin Vogel, Mastane Achab, Stéphan Clémençon, Charles Tillier
Weighted Emprirical Risk Minimization: Transfer Learning based on Importance Sampling
Robin Vogel, Mastane Achab, Stéphan Clémençon, Charles Tillier
Abstract:
We consider statistical learning problems, when the distribution $P'$ of the training observations $Z'_1,\; \ldots,\; Z'_n$ differs from the distribution $P$ involved in the risk one seeks to minimize (referred to as the \textit{test distribution}) but is still defined on the same measurable space as $P$ and dominates it. In the unrealistic case where the likelihood ratio $\Phi(z)=dP/dP'(z)$ is known, one may straightforwardly extends the Empirical Risk Minimization (ERM) approach to this specific \textit{transfer learning} setup using the same idea as that behind Importance Sampling, by minimizing a weighted version of the empirical risk functional computed from the 'biased' training data $Z'_i$ with weights $\Phi(Z'_i)$. Although the \textit{importance function} $\Phi(z)$ is generally unknown in practice, we show that, in various situations frequently encountered in practice, it takes a simple form and can be directly estimated from the $Z'_i$'s and some auxiliary information on the statistical population $P$. By means of linearization techniques, we then prove that the generalization capacity of the approach aforementioned is preserved when plugging the resulting estimates of the $\Phi(Z'_i)$'s into the weighted empirical risk. Beyond these theoretical guarantees, numerical results provide strong empirical evidence of the relevance of the approach promoted in this article.
We consider statistical learning problems, when the distribution $P'$ of the training observations $Z'_1,\; \ldots,\; Z'_n$ differs from the distribution $P$ involved in the risk one seeks to minimize (referred to as the \textit{test distribution}) but is still defined on the same measurable space as $P$ and dominates it. In the unrealistic case where the likelihood ratio $\Phi(z)=dP/dP'(z)$ is known, one may straightforwardly extends the Empirical Risk Minimization (ERM) approach to this specific \textit{transfer learning} setup using the same idea as that behind Importance Sampling, by minimizing a weighted version of the empirical risk functional computed from the 'biased' training data $Z'_i$ with weights $\Phi(Z'_i)$. Although the \textit{importance function} $\Phi(z)$ is generally unknown in practice, we show that, in various situations frequently encountered in practice, it takes a simple form and can be directly estimated from the $Z'_i$'s and some auxiliary information on the statistical population $P$. By means of linearization techniques, we then prove that the generalization capacity of the approach aforementioned is preserved when plugging the resulting estimates of the $\Phi(Z'_i)$'s into the weighted empirical risk. Beyond these theoretical guarantees, numerical results provide strong empirical evidence of the relevance of the approach promoted in this article.
ES2020-59
Modelling human sound localization with deep neural networks.
Kiki van der Heijden, Siamak Mehrkanoon
Modelling human sound localization with deep neural networks.
Kiki van der Heijden, Siamak Mehrkanoon
Abstract:
How the brain transforms binaural, real-life sounds into a neural representation of sound location is unclear. This paper introduces a deep learning approach to address these neurocomputational mechanisms: We develop a biological-inspired deep neural network model of sound azimuth encoding operating on auditory nerve representations of real-life sounds. We explore two types of loss functions: Euclidean distance and angular distance. Our results show that a network resembling the early stages of the human auditory pathway can predict sound azimuth location. The type of loss function modulates spatial acuity in different ways. Finally, learning is independent of environment-specific acoustic properties.
How the brain transforms binaural, real-life sounds into a neural representation of sound location is unclear. This paper introduces a deep learning approach to address these neurocomputational mechanisms: We develop a biological-inspired deep neural network model of sound azimuth encoding operating on auditory nerve representations of real-life sounds. We explore two types of loss functions: Euclidean distance and angular distance. Our results show that a network resembling the early stages of the human auditory pathway can predict sound azimuth location. The type of loss function modulates spatial acuity in different ways. Finally, learning is independent of environment-specific acoustic properties.
ES2020-33
A Real-time PCB Defect Detector Based on Supervised and Semi-supervised Learning
FAN HE, Sanli Tang, Siamak Mehrkanoon, Xiaolin Huang, Jie Yang
A Real-time PCB Defect Detector Based on Supervised and Semi-supervised Learning
FAN HE, Sanli Tang, Siamak Mehrkanoon, Xiaolin Huang, Jie Yang
Abstract:
This paper designs a deep model to detect PCB defects from an input pair of a detect-free template and a defective tested image. A novel group pyramid pooling module is proposed to efficiently extract features in various resolutions to predict defects in different scales. To train the deep model, a dataset including 6 common types of PCB defects is established, namely DeepPCB, which contains 1,500 image pairs with annotations.Besides, a semi-supervised learning manner is examined to effectively utilize the unlabelled images for training the PCB defect detector. Experiment results validate the effectiveness and efficiency of the proposed model by achieving 98.6% mAP @ 62 FPS on DeepPCB dataset. DeepPCB is now available at: https://github.com/tangsanli5201/DeepPCB.
This paper designs a deep model to detect PCB defects from an input pair of a detect-free template and a defective tested image. A novel group pyramid pooling module is proposed to efficiently extract features in various resolutions to predict defects in different scales. To train the deep model, a dataset including 6 common types of PCB defects is established, namely DeepPCB, which contains 1,500 image pairs with annotations.Besides, a semi-supervised learning manner is examined to effectively utilize the unlabelled images for training the PCB defect detector. Experiment results validate the effectiveness and efficiency of the proposed model by achieving 98.6% mAP @ 62 FPS on DeepPCB dataset. DeepPCB is now available at: https://github.com/tangsanli5201/DeepPCB.
Machine learning in the pharmaceutical industry - organized by Paul Smyth (GlaxoSmithKline Tech Data & Analytics, Belgium), Thibault Helleputte (DNAlytics, Belgium), Gael de Lannoy (GlaxoSmithKline, CMC Statistical Sciences, Belgium)
ES2020-5
Machine learning in the biopharma industry
Gael de Lannoy, Thibault Helleputte, Paul Smyth
Machine learning in the biopharma industry
Gael de Lannoy, Thibault Helleputte, Paul Smyth
Abstract:
Modern high-throughput technologies deployed in R\&D for new health products have opened the door to Machine Learning applications that allow the automation of tasks and support for data-driven risk-based decision making. Appealing opportunities of applying Machine Learning appear for the development of modern complex drugs, for biomanufacturing production lines optimization, or even for elaborating product portfolio strategies. Nevertheless, many practical challenges make it difficult to apply Machine Learning models in the biopharmaceutical field. Innovative approaches must thus be considered in many of these practical cases. This tutorial paper is an attempt to describe the landscape of Machine Learning application to the biopharmaceutical industry along three dimensions: opportunities, specificities or constraints and methods.
Modern high-throughput technologies deployed in R\&D for new health products have opened the door to Machine Learning applications that allow the automation of tasks and support for data-driven risk-based decision making. Appealing opportunities of applying Machine Learning appear for the development of modern complex drugs, for biomanufacturing production lines optimization, or even for elaborating product portfolio strategies. Nevertheless, many practical challenges make it difficult to apply Machine Learning models in the biopharmaceutical field. Innovative approaches must thus be considered in many of these practical cases. This tutorial paper is an attempt to describe the landscape of Machine Learning application to the biopharmaceutical industry along three dimensions: opportunities, specificities or constraints and methods.
ES2020-122
Deep Learning to Detect Bacterial Colonies for the Production of Vaccines
Paul Smyth, Lee John, Gael de Lannoy, Thomas Beznik
Deep Learning to Detect Bacterial Colonies for the Production of Vaccines
Paul Smyth, Lee John, Gael de Lannoy, Thomas Beznik
Abstract:
During the development of vaccines, bacterial colony forming units (CFUs) are counted in order to quantify the yield in the fermen- tation process. This is often a manual task that is time-consuming and error-prone. In this work we test multiple segmentation algorithms based on the U-Net CNN architecture and show that these offer robust, auto- mated CFU counting. We show that the multiclass generalisation with a bespoke loss function allows distinguishing virulent and avirulent colonies with acceptable accuracy. While many possibilities are left to explore, our results demonstrate the potential of deep learning for separating and classifying bacterial colonies.
During the development of vaccines, bacterial colony forming units (CFUs) are counted in order to quantify the yield in the fermen- tation process. This is often a manual task that is time-consuming and error-prone. In this work we test multiple segmentation algorithms based on the U-Net CNN architecture and show that these offer robust, auto- mated CFU counting. We show that the multiclass generalisation with a bespoke loss function allows distinguishing virulent and avirulent colonies with acceptable accuracy. While many possibilities are left to explore, our results demonstrate the potential of deep learning for separating and classifying bacterial colonies.
ES2020-141
A Systematic Assessment of Deep Learning Models for Molecule Generation
Davide Rigoni, Nicolò Navarin, Alessandro Sperduti
A Systematic Assessment of Deep Learning Models for Molecule Generation
Davide Rigoni, Nicolò Navarin, Alessandro Sperduti
Abstract:
In recent years the scientific community has devoted much effort in the development of deep learning models for the generation of new molecules with desirable properties (i.e. drugs). This has produced many proposals in literature. However, a systematic comparison among the different VAE methods is still missing. For this reason, we propose an extensive testbed for the evaluation of generative models for drug discovery, and we present the results obtained by many of the models proposed in literature.
In recent years the scientific community has devoted much effort in the development of deep learning models for the generation of new molecules with desirable properties (i.e. drugs). This has produced many proposals in literature. However, a systematic comparison among the different VAE methods is still missing. For this reason, we propose an extensive testbed for the evaluation of generative models for drug discovery, and we present the results obtained by many of the models proposed in literature.
ES2020-178
An agile machine learning project in pharma - developing a Mask R-CNN-based web application for bacterial colony counting
Paul Smyth, Tanguy Naets, Gael de Lannoy, Laurent Sorber
An agile machine learning project in pharma - developing a Mask R-CNN-based web application for bacterial colony counting
Paul Smyth, Tanguy Naets, Gael de Lannoy, Laurent Sorber
Abstract:
We present a web application to assist lab technicians with the counting of different types of bacteria colonies. We use a Mask R-CNN model trained and tuned specifically to detect the number of BVG- and BVG+ colonies. We achieve a mAPI oU =.5 of 94 %. With these encouraging results, we see opportunities to bring the benefits of improved accuracy and time saved to nearby problems and labs such as generalising to other bacteria types and viral foci counting.
We present a web application to assist lab technicians with the counting of different types of bacteria colonies. We use a Mask R-CNN model trained and tuned specifically to detect the number of BVG- and BVG+ colonies. We achieve a mAPI oU =.5 of 94 %. With these encouraging results, we see opportunities to bring the benefits of improved accuracy and time saved to nearby problems and labs such as generalising to other bacteria types and viral foci counting.
Frontiers in Reservoir Computing - organized by Claudio Gallicchio (University of Pisa, Italy), Mantas Lukosevicius (Kaunas University of Technology, Lithuania), Simone Scardapane (Sapienza University of Rome, Italia)
ES2020-7
Frontiers in Reservoir Computing
Claudio Gallicchio, Mantas Lukoševičius, Simone Scardapane
Frontiers in Reservoir Computing
Claudio Gallicchio, Mantas Lukoševičius, Simone Scardapane
Abstract:
Reservoir computing (RC) studies the properties of large recurrent networks of artificial neurons, with either fixed or random connectivity. Over the last years, reservoirs have become a key tool for pattern recognition and neuroscience problems, being able to develop a rich representation of the temporal information even if left untrained. The common paradigm has been instantiated into several models, among which the Echo State Network and the Liquid State Machine represent the most widely known ones. Nowadays, RC represents the de facto state-of-the-art approach for efficient learning in the temporal domain. Besides, theoretical studies in RC area can contribute to the broader field of Recurrent Neural Networks research by enabling a deeper understanding of the fundamental capabilities of dynamical recurrent models, even in the absence of training of the recurrent connections. RC paradigm also allows using different dynamical systems, including hardware, for computation. This paper is intended to give an overview on the RC research field, highlighting major frontiers in its development and finally introducing the contributed papers to the ESANN 2020 special session.
Reservoir computing (RC) studies the properties of large recurrent networks of artificial neurons, with either fixed or random connectivity. Over the last years, reservoirs have become a key tool for pattern recognition and neuroscience problems, being able to develop a rich representation of the temporal information even if left untrained. The common paradigm has been instantiated into several models, among which the Echo State Network and the Liquid State Machine represent the most widely known ones. Nowadays, RC represents the de facto state-of-the-art approach for efficient learning in the temporal domain. Besides, theoretical studies in RC area can contribute to the broader field of Recurrent Neural Networks research by enabling a deeper understanding of the fundamental capabilities of dynamical recurrent models, even in the absence of training of the recurrent connections. RC paradigm also allows using different dynamical systems, including hardware, for computation. This paper is intended to give an overview on the RC research field, highlighting major frontiers in its development and finally introducing the contributed papers to the ESANN 2020 special session.
ES2020-82
Reservoir memory machines
Benjamin Paassen, Alexander Schulz
Reservoir memory machines
Benjamin Paassen, Alexander Schulz
Abstract:
In recent years, Neural Turing Machines have gathered attention by joining the flexibility of neural networks with the computational capabilities of Turing machines. However, Neural Turing Machines are notoriously hard to train, which limits their applicability. We propose reservoir memory machines, which are still able to solve some of the benchmark tests for Neural Turing Machines, but are much faster to train, requiring only an alignment algorithm and linear regression. Our model can also be seen as an extension of echo state networks with an external memory, enabling arbitrarily long storage without interference.
In recent years, Neural Turing Machines have gathered attention by joining the flexibility of neural networks with the computational capabilities of Turing machines. However, Neural Turing Machines are notoriously hard to train, which limits their applicability. We propose reservoir memory machines, which are still able to solve some of the benchmark tests for Neural Turing Machines, but are much faster to train, requiring only an alignment algorithm and linear regression. Our model can also be seen as an extension of echo state networks with an external memory, enabling arbitrarily long storage without interference.
ES2020-54
Pyramidal Graph Echo State Networks
Filippo Maria Bianchi, Claudio Gallicchio, Alessio Micheli
Pyramidal Graph Echo State Networks
Filippo Maria Bianchi, Claudio Gallicchio, Alessio Micheli
Abstract:
We analyze graph neural network models that combine iterative message-passing implemented by function with untrained weights and graph pooling operations. In particular, we alternate randomized neural message passing with graph coarsening operations, which provide multiple views of the underlying graph. Each view, is concatenated to build a graph embedding for graph-level classication. The main advantage of the proposed architecture is its speed, further improved by the pooling, in computing graph-level representations. Results obtained on popular graph classication benchmark, comparing dierent topological pooling techniques, support our claim.
We analyze graph neural network models that combine iterative message-passing implemented by function with untrained weights and graph pooling operations. In particular, we alternate randomized neural message passing with graph coarsening operations, which provide multiple views of the underlying graph. Each view, is concatenated to build a graph embedding for graph-level classication. The main advantage of the proposed architecture is its speed, further improved by the pooling, in computing graph-level representations. Results obtained on popular graph classication benchmark, comparing dierent topological pooling techniques, support our claim.
ES2020-112
Simplifying Deep Reservoir Architectures
Claudio Gallicchio, Alessio Micheli, Antonio Sisbarra
Simplifying Deep Reservoir Architectures
Claudio Gallicchio, Alessio Micheli, Antonio Sisbarra
Abstract:
We study the impact of architectural simplifications to the design of deep Reservoir Computing (RC) models. To do so, we analyze the effects of shaping the structure of reservoir matrices, reducing the complexity of the deep recurrent network to a minimal setup. Experimental results point out the benefits of a particularly simple deep RC architecture with ring topology in each reservoir layer and deterministically constructed input and inter-reservoir connections.
We study the impact of architectural simplifications to the design of deep Reservoir Computing (RC) models. To do so, we analyze the effects of shaping the structure of reservoir matrices, reducing the complexity of the deep recurrent network to a minimal setup. Experimental results point out the benefits of a particularly simple deep RC architecture with ring topology in each reservoir layer and deterministically constructed input and inter-reservoir connections.
ES2020-95
Self-organized dynamic attractors in recurrent neural networks
Benedikt Vettelschoss, Matthias Freiberger, Joni Dambre
Self-organized dynamic attractors in recurrent neural networks
Benedikt Vettelschoss, Matthias Freiberger, Joni Dambre
Abstract:
Recurrent neural networks usually rely on either transient or attractor dynamics to implement working memory, and some studies suggest that it requires a combination of the two. These studies introduce attractor states by the supervised training of a network's feedback weights. In this work we report the creation of comparable memory states through unsupervised learning. We introduce attractor dynamics into an echo state network in a self-organized way by applying a differential Hebbian rule to it's feedback weights. We find that this yields periodic and quasiperiodic attractors in most cases. We analyse the linearized system after the learning phase to understand the origin of these attractors, and connect these findings to other results concerning the dynamical changes induced by neural plasticity.
Recurrent neural networks usually rely on either transient or attractor dynamics to implement working memory, and some studies suggest that it requires a combination of the two. These studies introduce attractor states by the supervised training of a network's feedback weights. In this work we report the creation of comparable memory states through unsupervised learning. We introduce attractor dynamics into an echo state network in a self-organized way by applying a differential Hebbian rule to it's feedback weights. We find that this yields periodic and quasiperiodic attractors in most cases. We analyse the linearized system after the learning phase to understand the origin of these attractors, and connect these findings to other results concerning the dynamical changes induced by neural plasticity.
ES2020-99
Self-Organizing Kernel-based Convolutional Echo State Network for Human Actions Recognition
Gin Chong Lee, Chu Kiong Loo, Wei Shiung Liew, Stefan Wermter
Self-Organizing Kernel-based Convolutional Echo State Network for Human Actions Recognition
Gin Chong Lee, Chu Kiong Loo, Wei Shiung Liew, Stefan Wermter
Abstract:
We propose a deterministic initialization of the Echo State Network reservoirs to ensure that the activation of its internal echo state representations reflects similar topological qualities of the input signal which should lead to a self-organizing reservoir. Human actions encoded as a multivariate time series signal are clustered before using the clustered nodes and interconnectivity matrices for initializing the S-ConvESN reservoirs. The capability of S-ConvESN is evaluated using several 3D-skeleton-based action recognition datasets.
We propose a deterministic initialization of the Echo State Network reservoirs to ensure that the activation of its internal echo state representations reflects similar topological qualities of the input signal which should lead to a self-organizing reservoir. Human actions encoded as a multivariate time series signal are clustered before using the clustered nodes and interconnectivity matrices for initializing the S-ConvESN reservoirs. The capability of S-ConvESN is evaluated using several 3D-skeleton-based action recognition datasets.
Language processing in the era of deep learning - organized by Ivano Lauriola (University of Padova, Italy), Alberto Lavelli (Fondazione Bruno Kessler, Italy), Fabio Aiolli (University of Padova, Italy)
ES2020-4
Language processing in the era of deep learning
Ivano Lauriola, Alberto Lavelli, Fabio Aiolli
Language processing in the era of deep learning
Ivano Lauriola, Alberto Lavelli, Fabio Aiolli
Abstract:
Natural Language Processing is a branch of artificial intelligence brimful of intricate, sophisticated, and challenging tasks, such as machine translation, question answering, summarization, and so on. Thanks to the recent advances of deep learning, NLP applications have received an unprecedented boost in performance, generating growing interest from the Machine Learning community. However, even if recent techniques are starting to reach excellent performance on various tasks, there are still several problems that need to be solved, such as the computational cost, the reproducibility of results, and the lack of interpretability. In this contribution, we provide a high-level overview of recent advances in NLP, the role of Machine Learning, and current research directions.
Natural Language Processing is a branch of artificial intelligence brimful of intricate, sophisticated, and challenging tasks, such as machine translation, question answering, summarization, and so on. Thanks to the recent advances of deep learning, NLP applications have received an unprecedented boost in performance, generating growing interest from the Machine Learning community. However, even if recent techniques are starting to reach excellent performance on various tasks, there are still several problems that need to be solved, such as the computational cost, the reproducibility of results, and the lack of interpretability. In this contribution, we provide a high-level overview of recent advances in NLP, the role of Machine Learning, and current research directions.
ES2020-53
Modular Length Control for Sentence Generation
Katya Kudashkina, Peter Wittek, Jamie Kiros, Graham W. Taylor
Modular Length Control for Sentence Generation
Katya Kudashkina, Peter Wittek, Jamie Kiros, Graham W. Taylor
Abstract:
Generating summary-sentences with preserved meaning is important for the summarization of longer documents. Length control of summary-sentences is challenging as sentences cannot simply be cut at the desired length; they must be complete and preserve input meaning. We propose a modular framework for length control of generated sentences: based on sequence-to-sequence models, powered by a two-stage training process involving a summarizer that is trained without explicit length control and a stylizer that is fine-tuned on the output of the summarizer. Our solution achieves the performance of existing models for controlling generated sentence length but light in implementation and model complexity.
Generating summary-sentences with preserved meaning is important for the summarization of longer documents. Length control of summary-sentences is challenging as sentences cannot simply be cut at the desired length; they must be complete and preserve input meaning. We propose a modular framework for length control of generated sentences: based on sequence-to-sequence models, powered by a two-stage training process involving a summarizer that is trained without explicit length control and a stylizer that is fine-tuned on the output of the summarizer. Our solution achieves the performance of existing models for controlling generated sentence length but light in implementation and model complexity.
ES2020-56
Entity-Pair Embeddings for Improving Relation Extraction in the Biomedical Domain
Farrokh Mehryary, Hans Moen, Tapio Salakoski, Filip Ginter
Entity-Pair Embeddings for Improving Relation Extraction in the Biomedical Domain
Farrokh Mehryary, Hans Moen, Tapio Salakoski, Filip Ginter
Abstract:
We introduce a new approach for training named-entity pair embeddings to improve relation extraction performance in the biomedical domain. These embeddings are trained in an unsupervised manner, based on the principles of distributional semantics. By adding them to neural network architectures, we show that improved F-Scores are achieved. Our best performing neural model which utilizes entity-pair embeddings along with a pre-trained BERT encoder, achieves an F-score of 77.19 on CHEMPROT (Chemical-Protein) relation extraction corpus, setting a new state-of-the-art result for the task.
We introduce a new approach for training named-entity pair embeddings to improve relation extraction performance in the biomedical domain. These embeddings are trained in an unsupervised manner, based on the principles of distributional semantics. By adding them to neural network architectures, we show that improved F-Scores are achieved. Our best performing neural model which utilizes entity-pair embeddings along with a pre-trained BERT encoder, achieves an F-score of 77.19 on CHEMPROT (Chemical-Protein) relation extraction corpus, setting a new state-of-the-art result for the task.
ES2020-190
Adversarials-1 in Speech Recognition: Detection and Defence
Nils Worzyk, Stefan Niewerth, Oliver Kramer
Adversarials-1 in Speech Recognition: Detection and Defence
Nils Worzyk, Stefan Niewerth, Oliver Kramer
Abstract:
Systems that accept voice commands have become established in our daily lives. To process those commands, modern systems usually use neural networks, which have been shown to be very successful. Nevertheless, they are vulnerable against adversarial attacks---slightly perturbed inputs, to fool the system, but are not recognizable by humans. In this work we extend the adversarial$^{-1}$ concept, introduced in the image domain, to the speech recognition domain. By adapting the methodology we are able to identify adversarial inputs, in certain cases, with an accuracy of 99.9\%, while still detecting benign inputs with an accuracy of 99.8\%, for the investigated attacks. Furthermore, we present a technique to restore the correct label of an adversarial input, with up to 67.6\% accuracy. All program code for this work can be found on \url{https://github.com/OLStefan/Adversarials-1Speech-Recognition.}
Systems that accept voice commands have become established in our daily lives. To process those commands, modern systems usually use neural networks, which have been shown to be very successful. Nevertheless, they are vulnerable against adversarial attacks---slightly perturbed inputs, to fool the system, but are not recognizable by humans. In this work we extend the adversarial$^{-1}$ concept, introduced in the image domain, to the speech recognition domain. By adapting the methodology we are able to identify adversarial inputs, in certain cases, with an accuracy of 99.9\%, while still detecting benign inputs with an accuracy of 99.8\%, for the investigated attacks. Furthermore, we present a technique to restore the correct label of an adversarial input, with up to 67.6\% accuracy. All program code for this work can be found on \url{https://github.com/OLStefan/Adversarials-1Speech-Recognition.}
ES2020-35
On the long-term learning ability of LSTM LMs
Wim Boes, Robbe Van Rompaey, Lyan Verwimp, Joris Pelemans, Hugo Van hamme, Patrick Wambacq
On the long-term learning ability of LSTM LMs
Wim Boes, Robbe Van Rompaey, Lyan Verwimp, Joris Pelemans, Hugo Van hamme, Patrick Wambacq
Abstract:
We inspect the long-term learning ability of Long Short-Term Memory language models (LSTM LMs) by evaluating a contextual extension based on the Continuous Bag-of-Words (CBOW) model for both sentence- and discourse-level LSTM LMs and by analyzing its performance. We evaluate on text and speech. Sentence-level models using the long-term contextual module perform comparably to vanilla discourse-level LSTM LMs. On the other hand, the extension does not provide gains for discourse-level models. These findings indicate that discourse-level LSTM LMs already rely on contextual information to perform long-term learning.
We inspect the long-term learning ability of Long Short-Term Memory language models (LSTM LMs) by evaluating a contextual extension based on the Continuous Bag-of-Words (CBOW) model for both sentence- and discourse-level LSTM LMs and by analyzing its performance. We evaluate on text and speech. Sentence-level models using the long-term contextual module perform comparably to vanilla discourse-level LSTM LMs. On the other hand, the extension does not provide gains for discourse-level models. These findings indicate that discourse-level LSTM LMs already rely on contextual information to perform long-term learning.
ES2020-36
Cross-Encoded Meta Embedding towards Transfer Learning
György Kovács, Rickard Brännvall, Johan Öhman, Marcus Liwicki
Cross-Encoded Meta Embedding towards Transfer Learning
György Kovács, Rickard Brännvall, Johan Öhman, Marcus Liwicki
Abstract:
In this paper we generate word meta-embeddings from already existing embeddings using cross-encoding. Previous approaches can only work with words that exist in each source embedding, while the architecture presented here drops this requirement. We demonstrate the method using two pre-trained embeddings, namely GloVE and FastText. Furthermore, we propose additional improvements to the training process of the meta-embedding. Results on six standard tests for word similarity show that the meta-embedding trained outperforms the original embeddings. Moreover, this performance can be further increased with the proposed improvements, resulting in a competitive performance with those reported earlier.
In this paper we generate word meta-embeddings from already existing embeddings using cross-encoding. Previous approaches can only work with words that exist in each source embedding, while the architecture presented here drops this requirement. We demonstrate the method using two pre-trained embeddings, namely GloVE and FastText. Furthermore, we propose additional improvements to the training process of the meta-embedding. Results on six standard tests for word similarity show that the meta-embedding trained outperforms the original embeddings. Moreover, this performance can be further increased with the proposed improvements, resulting in a competitive performance with those reported earlier.
ES2020-104
Exploring the feature space of character-level embeddings
Ivano Lauriola, Stefano Campese, Alberto Lavelli, Fabio Rinaldi, Fabio Aiolli
Exploring the feature space of character-level embeddings
Ivano Lauriola, Stefano Campese, Alberto Lavelli, Fabio Rinaldi, Fabio Aiolli
Abstract:
Recently, character-level embeddings have become popular in the Natural Language Processing community. These methods provide a representation of a word which depends solely on its inner structure, i.e. the sequence of characters. Convolutional and recurrent neural networks are the undisputed protagonists in this context, and they represent the state of the art for many character-level applications. In this work, we firstly compare different neural architectures against adaptive string kernels in simplified scenarios. Then, we propose a hybrid ensemble that injects structural kernel-based features into a neural architecture, providing an efficient and scalable solution. An all-around experimental assessment has been carried out on several string datasets, including biomedical entity recognition and sentiment analysis.
Recently, character-level embeddings have become popular in the Natural Language Processing community. These methods provide a representation of a word which depends solely on its inner structure, i.e. the sequence of characters. Convolutional and recurrent neural networks are the undisputed protagonists in this context, and they represent the state of the art for many character-level applications. In this work, we firstly compare different neural architectures against adaptive string kernels in simplified scenarios. Then, we propose a hybrid ensemble that injects structural kernel-based features into a neural architecture, providing an efficient and scalable solution. An all-around experimental assessment has been carried out on several string datasets, including biomedical entity recognition and sentiment analysis.
Supervised learning
ES2020-170
Detection of elementary particles with the WiSARD n-tuple classifier
Pedro Xavier, Massimo De Gregorio, Felipe França, Priscila Lima
Detection of elementary particles with the WiSARD n-tuple classifier
Pedro Xavier, Massimo De Gregorio, Felipe França, Priscila Lima
Abstract:
This work presents a weightless neural network model that learns multiple elementary particle collision phenomena. Having the AT- LAS Higgs Boson Machine Learning Challenge as the target dataset, a couple of abstractions were developed in order to achieve a fast and sim- ple algorithm that would otherwise require much more sophisticated tools. Experimental results over the Higgs Boson t - t decay and the B + meson decay shows that the WiSARD n-tuple classifier provide a generic and lightweight method for studying a broad range of particle decay modes.
This work presents a weightless neural network model that learns multiple elementary particle collision phenomena. Having the AT- LAS Higgs Boson Machine Learning Challenge as the target dataset, a couple of abstractions were developed in order to achieve a fast and sim- ple algorithm that would otherwise require much more sophisticated tools. Experimental results over the Higgs Boson t - t decay and the B + meson decay shows that the WiSARD n-tuple classifier provide a generic and lightweight method for studying a broad range of particle decay modes.
ES2020-171
Automatic Pain Intensity Recognition: Training Set Selection based on Outliers and Centroids
Peter Bellmann, Patrick Thiam, Friedhelm Schwenker
Automatic Pain Intensity Recognition: Training Set Selection based on Outliers and Centroids
Peter Bellmann, Patrick Thiam, Friedhelm Schwenker
Abstract:
In this study, we evaluate a person independent pain intensity recognition task, based on the BioVid Heat Pain Database. Previous works show that for such classification tasks, the overall performance can be increased by reducing the training data, based on certain criteria, such as different distance measures. This results in considering only a certain amount of participants from the training set, whose data distributions are defined to be the most similar to the data distribution of the participant from the test set. Counterintuitively, we propose to remove participants, which are identified as central points, from the training set, completely independent from the test set. Our evaluations show that this approach can lead to significant improvement of classification accuracy.
In this study, we evaluate a person independent pain intensity recognition task, based on the BioVid Heat Pain Database. Previous works show that for such classification tasks, the overall performance can be increased by reducing the training data, based on certain criteria, such as different distance measures. This results in considering only a certain amount of participants from the training set, whose data distributions are defined to be the most similar to the data distribution of the participant from the test set. Counterintuitively, we propose to remove participants, which are identified as central points, from the training set, completely independent from the test set. Our evaluations show that this approach can lead to significant improvement of classification accuracy.
ES2020-194
Binary and Multi-label Defect Classification of Printed Circuit Board based on Transfer Learning
George Azevedo, Leandro Silva, Agostinho Junior, Bruno Fernandes, Sérgio Oliveira
Binary and Multi-label Defect Classification of Printed Circuit Board based on Transfer Learning
George Azevedo, Leandro Silva, Agostinho Junior, Bruno Fernandes, Sérgio Oliveira
Abstract:
Automatic optical inspection for printed circuit board (PCB)is an important step to assure quality control in electronic manufacturing. Recently deep learning models have been used to detect and classify PCB defects. Since public PCB datasets usually are not large enough to train deep models from scratch, transfer learning has proved to bean effective strategy to overcome this limitation. In this paper we evaluate the influence of input image size for non-referential binary classification of PCB images from the DeepPCB dataset and moving further we evaluated a multi-label classification, both based on transfer learning. The best models achieved 99.5% accuracy for binary classification and mean accuracy of 95.16% for multi-label classification.
Automatic optical inspection for printed circuit board (PCB)is an important step to assure quality control in electronic manufacturing. Recently deep learning models have been used to detect and classify PCB defects. Since public PCB datasets usually are not large enough to train deep models from scratch, transfer learning has proved to bean effective strategy to overcome this limitation. In this paper we evaluate the influence of input image size for non-referential binary classification of PCB images from the DeepPCB dataset and moving further we evaluated a multi-label classification, both based on transfer learning. The best models achieved 99.5% accuracy for binary classification and mean accuracy of 95.16% for multi-label classification.
ES2020-143
SDOstream: Low-Density Models for Streaming Outlier Detection
Alexander Hartl, Félix Iglesias, Tanja Zseby
SDOstream: Low-Density Models for Streaming Outlier Detection
Alexander Hartl, Félix Iglesias, Tanja Zseby
Abstract:
Data commonly changes over time. Algorithms for anomaly detection must therefore be adapted to overcome the challenges of evolving data. We present SDOstream, a distance-based outlier detection algorithm for stream data that uses low-density models, therefore operating in linear time and avoiding the limitations of sliding windows and instance-based methods. SDOstream is designed to ensure a good integration in applications, hence the definition of "outlier" is not predetermined, but can be decided by the application based on distances to representative point locations. We describe the algorithm and evaluate algorithm performance with several datasets.
Data commonly changes over time. Algorithms for anomaly detection must therefore be adapted to overcome the challenges of evolving data. We present SDOstream, a distance-based outlier detection algorithm for stream data that uses low-density models, therefore operating in linear time and avoiding the limitations of sliding windows and instance-based methods. SDOstream is designed to ensure a good integration in applications, hence the definition of "outlier" is not predetermined, but can be decided by the application based on distances to representative point locations. We describe the algorithm and evaluate algorithm performance with several datasets.
ES2020-89
Locally Adaptive Nearest Neighbors
Jan Philip Göpfert, Heiko Wersing, Barbara Hammer
Locally Adaptive Nearest Neighbors
Jan Philip Göpfert, Heiko Wersing, Barbara Hammer
Abstract:
When training automated systems, it has been shown to be beneficial to adapt the representation of data by learning a problem-specific metric. This metric is global. We extend this idea and, for the widely used family of k nearest neighbors algorithms, develop a method that allows learning locally adaptive metrics. To demonstrate important aspects of how our approach works, we conduct a number of experiments on synthetic data sets, and we show its usefulness on real-world benchmark data sets.
When training automated systems, it has been shown to be beneficial to adapt the representation of data by learning a problem-specific metric. This metric is global. We extend this idea and, for the widely used family of k nearest neighbors algorithms, develop a method that allows learning locally adaptive metrics. To demonstrate important aspects of how our approach works, we conduct a number of experiments on synthetic data sets, and we show its usefulness on real-world benchmark data sets.
ES2020-106
Equilibrium Propagation for Complete Directed Neural Networks
Matilde Tristany Farinha, Sérgio Pequito, Pedro A. Santos, Mário Figueiredo
Equilibrium Propagation for Complete Directed Neural Networks
Matilde Tristany Farinha, Sérgio Pequito, Pedro A. Santos, Mário Figueiredo
Abstract:
Artificial neural networks, one of the most successful approaches to supervised learning, were originally inspired by their biological counterparts. However, the most successful learning algorithm for artificial neural networks, backpropagation, is considered biologically implausible. We contribute to the topic of biologically plausible neuronal learning by building upon and extending the equilibrium propagation learning framework. Specifically, we introduce: a new neuronal dynamics and learning rule for arbitrary network architectures; a sparsity-inducing method able to prune irrelevant connections; a dynamical-systems characterization of the models, using Lyapunov theory.
Artificial neural networks, one of the most successful approaches to supervised learning, were originally inspired by their biological counterparts. However, the most successful learning algorithm for artificial neural networks, backpropagation, is considered biologically implausible. We contribute to the topic of biologically plausible neuronal learning by building upon and extending the equilibrium propagation learning framework. Specifically, we introduce: a new neuronal dynamics and learning rule for arbitrary network architectures; a sparsity-inducing method able to prune irrelevant connections; a dynamical-systems characterization of the models, using Lyapunov theory.
ES2020-129
On-edge adaptive acoustic models: an application to acoustic person presence detection
Lode Vuegen, Peter Karsmakers
On-edge adaptive acoustic models: an application to acoustic person presence detection
Lode Vuegen, Peter Karsmakers
Abstract:
This paper validates a machine learning framework that enables processing on resource limited devices. The discussed framework allows both inference and learning to be executed on the edge. More specifically, a Least-Squares Support Vector Machine (LS-SVM) framework with a time-recursive learning algorithm is evaluated in an application where person presence is estimated based on acoustic signals only. For this purpose, a real-life acoustical dataset of 555 hours was collected in an office environment for the evaluation of the proposed on-edge machine learning framework.
This paper validates a machine learning framework that enables processing on resource limited devices. The discussed framework allows both inference and learning to be executed on the edge. More specifically, a Least-Squares Support Vector Machine (LS-SVM) framework with a time-recursive learning algorithm is evaluated in an application where person presence is estimated based on acoustic signals only. For this purpose, a real-life acoustical dataset of 555 hours was collected in an office environment for the evaluation of the proposed on-edge machine learning framework.
ES2020-111
Gaussian process regression for the estimation of stable univariate time-series processes
Georgios Birpoutsoukis, Julien M. Hendrickx
Gaussian process regression for the estimation of stable univariate time-series processes
Georgios Birpoutsoukis, Julien M. Hendrickx
Abstract:
In this paper, estimation of AutoRegressive (AR) and AutoRegressive Moving Average (ARMA) models is proposed in a Bayesian framework using a Gaussian Process Regression (GPR) approach. Impulse response properties of the underlying process to be modeled are exploited during the parameter estimation. As such, models of enhanced predictability can be consistently obtained, even in the case of large model orders. It is also proved that the proposed approach is strongly linked with the Prediction Error (PE) model estimation approaches, if the estimated parameters are regularized. Simulations are provided to illustrate the efficiency of the proposed approach.
In this paper, estimation of AutoRegressive (AR) and AutoRegressive Moving Average (ARMA) models is proposed in a Bayesian framework using a Gaussian Process Regression (GPR) approach. Impulse response properties of the underlying process to be modeled are exploited during the parameter estimation. As such, models of enhanced predictability can be consistently obtained, even in the case of large model orders. It is also proved that the proposed approach is strongly linked with the Prediction Error (PE) model estimation approaches, if the estimated parameters are regularized. Simulations are provided to illustrate the efficiency of the proposed approach.
ES2020-181
Problem Transformation Methods with Distance-Based Learning for Multi-Target Regression
Joonas Hämäläinen, Tommi Kärkkäinen
Problem Transformation Methods with Distance-Based Learning for Multi-Target Regression
Joonas Hämäläinen, Tommi Kärkkäinen
Abstract:
Multi-target regression is a special subset of supervised machine learning problems. Problem transformation methods are used in the field to improve the performance of basic methods. The purpose of this article is to test the use of recently popularized distance-based methods, the minimal learning machine (MLM) and the extreme minimal learning machine (EMLM), in problem transformation. The main advantage of the full data variants of these methods is the lack of any meta-parameter. The experimental results for the MLM and EMLM show promising potential, emphasizing the utility of the problem transformation especially with the EMLM.
Multi-target regression is a special subset of supervised machine learning problems. Problem transformation methods are used in the field to improve the performance of basic methods. The purpose of this article is to test the use of recently popularized distance-based methods, the minimal learning machine (MLM) and the extreme minimal learning machine (EMLM), in problem transformation. The main advantage of the full data variants of these methods is the lack of any meta-parameter. The experimental results for the MLM and EMLM show promising potential, emphasizing the utility of the problem transformation especially with the EMLM.
ES2020-117
Adapting Random Forests to Cope with Heavily Censored Datasets in Survival Analysis
Tossapol Pomsuwan, Alex Freitas
Adapting Random Forests to Cope with Heavily Censored Datasets in Survival Analysis
Tossapol Pomsuwan, Alex Freitas
Abstract:
We address a survival analysis task where the goal is to predict the time passed until a subject is diagnosed with an age-related disease. The main challenge is that subjects’ data are very often censored, i.e., their time to diagnosis is only partly known. We propose a new Random Forest variant to cope with censored data, and evaluate it in experiments predicting the time to diagnosis of 8 age-related diseases, for data from the English Longitudinal Study of Ageing (ELSA) database. In these experiments, the proposed Random Forest variant, in general, outperformed a well-known Random Forest variant for censored data.
We address a survival analysis task where the goal is to predict the time passed until a subject is diagnosed with an age-related disease. The main challenge is that subjects’ data are very often censored, i.e., their time to diagnosis is only partly known. We propose a new Random Forest variant to cope with censored data, and evaluate it in experiments predicting the time to diagnosis of 8 age-related diseases, for data from the English Longitudinal Study of Ageing (ELSA) database. In these experiments, the proposed Random Forest variant, in general, outperformed a well-known Random Forest variant for censored data.
ES2020-15
Model Variance for Extreme Learning Machine
Fabian Guignard, Mohamed Laib, Mikhail Kanevski
Model Variance for Extreme Learning Machine
Fabian Guignard, Mohamed Laib, Mikhail Kanevski
Abstract:
We derived theoretical formulas for the variance of extreme learning machine ensemble in a general case of a heteroskedastic noise. They provide a decomposition of the variance which helps in the understanding of how the different sources of randomness contribute. The application of the proposed method to simulated datasets shows the effectiveness of the newly-introduced estimations in replicating the expected variance behaviours.
We derived theoretical formulas for the variance of extreme learning machine ensemble in a general case of a heteroskedastic noise. They provide a decomposition of the variance which helps in the understanding of how the different sources of randomness contribute. The application of the proposed method to simulated datasets shows the effectiveness of the newly-introduced estimations in replicating the expected variance behaviours.
ES2020-72
Multi-Directional Laplacian Pyramids for Completion of Missing Data Entries
Neta Rabin
Multi-Directional Laplacian Pyramids for Completion of Missing Data Entries
Neta Rabin
Abstract:
A common pre-processing task in machine learning is handling missing data entries, also known as imputation. Standard techniques use mean values, regression or optimization based techniques for predicting the missing data values. In this paper, a kernel based technique is utilized for imputing data in a multi-scale manner. The construction is based on Laplacian pyramids, which operate on the row and column spaces of the data in several scales. Experimental results demonstrate the approach on publicly available datasets, and highlight its simple computational construction and convergence stability.
A common pre-processing task in machine learning is handling missing data entries, also known as imputation. Standard techniques use mean values, regression or optimization based techniques for predicting the missing data values. In this paper, a kernel based technique is utilized for imputing data in a multi-scale manner. The construction is based on Laplacian pyramids, which operate on the row and column spaces of the data in several scales. Experimental results demonstrate the approach on publicly available datasets, and highlight its simple computational construction and convergence stability.
ES2020-61
Navigational Freespace Detection for Autonomous Driving in Fixed Routes
aparajit narayan, elio tuci, william sachiti, aaron parsons
Navigational Freespace Detection for Autonomous Driving in Fixed Routes
aparajit narayan, elio tuci, william sachiti, aaron parsons
Abstract:
Vision-based modules are largely exploited by autonomous driving vehicles to identify the road area and to avoid collisions with other vehicles, pedestrians, etc. This paper illustrates the results of a comparative study in which eight different vision-based modules are evaluated for detecting free navigational space in urban environments. All modules are implemented using Convolutions Neural Networks. The distinctive and innovative feature of these modules is the way in which the navigational feespace is identified in the camera image. The modules generate the co-ordinates of a triangle, whose area represents the navigation freespace. The relative position of the triangle top corner with respect to the image centre points toward the vehicle direction of motion. Thus, when trained on a fixed route, these modules are able to successfully detect the road freepsace and to make appropriate decisions concerning where to go at roundabouts, intersections etc., in order to reach the final destination.
Vision-based modules are largely exploited by autonomous driving vehicles to identify the road area and to avoid collisions with other vehicles, pedestrians, etc. This paper illustrates the results of a comparative study in which eight different vision-based modules are evaluated for detecting free navigational space in urban environments. All modules are implemented using Convolutions Neural Networks. The distinctive and innovative feature of these modules is the way in which the navigational feespace is identified in the camera image. The modules generate the co-ordinates of a triangle, whose area represents the navigation freespace. The relative position of the triangle top corner with respect to the image centre points toward the vehicle direction of motion. Thus, when trained on a fixed route, these modules are able to successfully detect the road freepsace and to make appropriate decisions concerning where to go at roundabouts, intersections etc., in order to reach the final destination.
ES2020-34
Similarities between policy gradient methods in reinforcement and supervised learning
Eric Benhamou, David Saltiel
Similarities between policy gradient methods in reinforcement and supervised learning
Eric Benhamou, David Saltiel
Abstract:
Reinforcement learning (RL) is about sequential decision making and is traditionally opposed to supervised learning (SL) and unsupervised learning (USL). In RL, given the current state, the agent makes a decision that may influence the next state as opposed to SL where the next state remains the same, regardless of decisions taken. Although this difference is fundamental, SL and RL are not so different. In particular, we emphasize in this paper that gradient policy methods can be cast as a supervised learning problem where true label are replaced with discounted rewards. We provide a simple experiment where we interchange label and pseudo rewards to show that SL techniques can be directly translated into RL methods.
Reinforcement learning (RL) is about sequential decision making and is traditionally opposed to supervised learning (SL) and unsupervised learning (USL). In RL, given the current state, the agent makes a decision that may influence the next state as opposed to SL where the next state remains the same, regardless of decisions taken. Although this difference is fundamental, SL and RL are not so different. In particular, we emphasize in this paper that gradient policy methods can be cast as a supervised learning problem where true label are replaced with discounted rewards. We provide a simple experiment where we interchange label and pseudo rewards to show that SL techniques can be directly translated into RL methods.