Online event (Bruges, Belgium), October 06 - 08
Content of the proceedings
-
Federated Learning – Methods, Applications and Beyond
Evaluation metrics, and concept drift
Deep learning for graphs
Deep learning and image processing
Machine Learning for Measuring and Analyzing Online Social Communications
Natural language processing
Recurrent learning, and reinforcement learning
Complex Data: Learning Trustworthily, Automatically, and with Guarantees
Model selection
Unsupervised learning
Machine learning and data mining for urban mobility intelligence
Supervised learning
Interpretable Models in Machine Learning and Explainable Artificial Intelligence
Time series and signal processing
Classification
Federated Learning – Methods, Applications and Beyond
Federated Learning - Methods, Applications and beyond
Moritz Heusinger, Christoph Raab, Fabrice Rossi, Frank-Michael Schleif
https://doi.org/10.14428/esann/2021.ES2021-4
Moritz Heusinger, Christoph Raab, Fabrice Rossi, Frank-Michael Schleif
https://doi.org/10.14428/esann/2021.ES2021-4
Abstract:
In recent years the applications of machine learning models have increased rapidly, due to the large amount of available data and technological progress. While some domains like web analysis can benefit from this with only minor restrictions, other fields like in medicine with patient data are stronger regulated. In particular \emph{data privacy} plays an important role as recently highlighted by the trustworthy AI initiative of the EU or general privacy regulations in legislation. Another major challenge is, that the required training \emph{data is} often \emph{distributed} in terms of features or samples and unavailable for classical batch learning approaches. In 2016 Google came up with a framework, called \emph{Federated Learning} to solve both of these problems. We provide a brief overview on existing Methods and Applications in the field of vertical and horizontal \emph{Federated Learning}, as well as \emph{Fderated Transfer Learning}.
In recent years the applications of machine learning models have increased rapidly, due to the large amount of available data and technological progress. While some domains like web analysis can benefit from this with only minor restrictions, other fields like in medicine with patient data are stronger regulated. In particular \emph{data privacy} plays an important role as recently highlighted by the trustworthy AI initiative of the EU or general privacy regulations in legislation. Another major challenge is, that the required training \emph{data is} often \emph{distributed} in terms of features or samples and unavailable for classical batch learning approaches. In 2016 Google came up with a framework, called \emph{Federated Learning} to solve both of these problems. We provide a brief overview on existing Methods and Applications in the field of vertical and horizontal \emph{Federated Learning}, as well as \emph{Fderated Transfer Learning}.
Privacy-Preserving Kernel Computation For Vertically Partitioned Data
Mirko Polato, Alberto Gallinaro, Fabio Aiolli
https://doi.org/10.14428/esann/2021.ES2021-152
Mirko Polato, Alberto Gallinaro, Fabio Aiolli
https://doi.org/10.14428/esann/2021.ES2021-152
Abstract:
In this paper, we propose a secure and privacy-preserving technique for computing dot-product kernels on vertically distributed data. Our proposal is based on secure multi-party computation which provides theoretical guarantees on both security and privacy. We also provide a practical application of the method by adapting a kernel-based collaborative filtering technique to the federated setting. An extensive experimental evaluation shows the effectiveness of the proposed approach.
In this paper, we propose a secure and privacy-preserving technique for computing dot-product kernels on vertically distributed data. Our proposal is based on secure multi-party computation which provides theoretical guarantees on both security and privacy. We also provide a practical application of the method by adapting a kernel-based collaborative filtering technique to the federated setting. An extensive experimental evaluation shows the effectiveness of the proposed approach.
Decay Momentum for Improving Federated Learning
Miguel Fernandes, Catarina Silva, Joel Arrais, Alberto Cardoso, Bernardete Ribeiro
https://doi.org/10.14428/esann/2021.ES2021-106
Miguel Fernandes, Catarina Silva, Joel Arrais, Alberto Cardoso, Bernardete Ribeiro
https://doi.org/10.14428/esann/2021.ES2021-106
Abstract:
We propose two novel Federated Learning (FL) algorithms based on decaying momentum (Demon): Federated Demon (FedDemon) and Federated Demon Adam (FedDemonAdam). In particular, we apply Demon to Momentum Stochastic Gradient Descent (SGD) and Adam in a Federated setting, which as shown to improve results in a centralized environment. We empirically show that FedDemon and FedDemonAdam have a faster convergence rate and performance improvements compared to state-of-the-art algorithms including FedAvg, FedAvgM and FedAdam.
We propose two novel Federated Learning (FL) algorithms based on decaying momentum (Demon): Federated Demon (FedDemon) and Federated Demon Adam (FedDemonAdam). In particular, we apply Demon to Momentum Stochastic Gradient Descent (SGD) and Adam in a Federated setting, which as shown to improve results in a centralized environment. We empirically show that FedDemon and FedDemonAdam have a faster convergence rate and performance improvements compared to state-of-the-art algorithms including FedAvg, FedAvgM and FedAdam.
Continual Learning at the Edge: Real-Time Training on Smartphone Devices
Lorenzo Pellegrini, Vincenzo Lomonaco, Gabriele Graffieti, Davide Maltoni
https://doi.org/10.14428/esann/2021.ES2021-136
Lorenzo Pellegrini, Vincenzo Lomonaco, Gabriele Graffieti, Davide Maltoni
https://doi.org/10.14428/esann/2021.ES2021-136
Abstract:
On-device training for personalized learning is a challenging research problem. Being able to quickly adapt deep prediction models at the edge is necessary to better suit personal user needs. However, adaptation on the edge poses some questions on both the efficiency and sustainability of the learning process and on the ability to work under shifting data distributions. Indeed, naively fine-tuning a prediction model only on the newly available data results in catastrophic forgetting, a sudden erasure of previously acquired knowledge. In this paper, we detail the implementation and deployment of a hybrid continual learning strategy (AR1*) on a native Android application for real-time on-device personalization without forgetting. Our benchmark, based on an extension of the CORe50 dataset, shows the efficiency and effectiveness of our solution.
On-device training for personalized learning is a challenging research problem. Being able to quickly adapt deep prediction models at the edge is necessary to better suit personal user needs. However, adaptation on the edge poses some questions on both the efficiency and sustainability of the learning process and on the ability to work under shifting data distributions. Indeed, naively fine-tuning a prediction model only on the newly available data results in catastrophic forgetting, a sudden erasure of previously acquired knowledge. In this paper, we detail the implementation and deployment of a hybrid continual learning strategy (AR1*) on a native Android application for real-time on-device personalization without forgetting. Our benchmark, based on an extension of the CORe50 dataset, shows the efficiency and effectiveness of our solution.
Federated Learning Vector Quantization
Johannes Brinkrolf, Barbara Hammer
https://doi.org/10.14428/esann/2021.ES2021-141
Johannes Brinkrolf, Barbara Hammer
https://doi.org/10.14428/esann/2021.ES2021-141
Abstract:
Prototype-based methods such as LVQ techniques combine discriminative and generative aspects by representing models in terms of representative locations in the data space which enable an intuitive nearest-neighbor based classification. This fact has already been used in the context of incremental learners for streaming data which might be subject to drift. In this contribution, we demonstrate that this intuitive representation enables a very simple strategy for federated learning.
Prototype-based methods such as LVQ techniques combine discriminative and generative aspects by representing models in terms of representative locations in the data space which enable an intuitive nearest-neighbor based classification. This fact has already been used in the context of incremental learners for streaming data which might be subject to drift. In this contribution, we demonstrate that this intuitive representation enables a very simple strategy for federated learning.
Evaluation metrics, and concept drift
Judging competitions and benchmarks: a candidate election approach
Adrien Pavao, Isabelle Guyon, Michael Vaccaro
https://doi.org/10.14428/esann/2021.ES2021-122
Adrien Pavao, Isabelle Guyon, Michael Vaccaro
https://doi.org/10.14428/esann/2021.ES2021-122
Abstract:
Machine learning progress relies on algorithm benchmarks. We study the problem of declaring a winner, or ranking candidate algorithms, based on results obtained by judges (scores on various tasks). Inspired by social science and game theory on fair elections, we compare various ranking functions, ranging from simple score averaging to Condorcet methods. We devise novel empirical criteria to assess the quality of ranking functions, including the generalization to new tasks and the stability under judge or candidate perturbation. We conduct an empirical comparison on the results of 5 competitions and benchmarks (one artificially generated). While prior theoretical analyses indicate that no single ranking function satisfies all desired properties, our empirical study reveals that the classical average rank method fares well. However, some pairwise comparison methods can get better empirical results.
Machine learning progress relies on algorithm benchmarks. We study the problem of declaring a winner, or ranking candidate algorithms, based on results obtained by judges (scores on various tasks). Inspired by social science and game theory on fair elections, we compare various ranking functions, ranging from simple score averaging to Condorcet methods. We devise novel empirical criteria to assess the quality of ranking functions, including the generalization to new tasks and the stability under judge or candidate perturbation. We conduct an empirical comparison on the results of 5 competitions and benchmarks (one artificially generated). While prior theoretical analyses indicate that no single ranking function satisfies all desired properties, our empirical study reveals that the classical average rank method fares well. However, some pairwise comparison methods can get better empirical results.
Concept Drift Segmentation via Kolmogorov-Trees
Fabian Hinder, Barbara Hammer
https://doi.org/10.14428/esann/2021.ES2021-93
Fabian Hinder, Barbara Hammer
https://doi.org/10.14428/esann/2021.ES2021-93
Abstract:
The notion of concept drift refers to the phenomenon that the data distribution changes over time. If drift occurs, machine learning models need adjustment. Since drift can be inhomogeneous, suitable actions depending on the location in data space. In this paper we address the challenge to partition the data space into segments with homogeneous drift characteristics. We formalize this objective as an independence criterion, and derive a robust and efficient training algorithm based thereon. We evaluate the efficiency of the method in comparison to existing technologies: the identification of drifting clusters, and the estimation of a conditional density distribution.
The notion of concept drift refers to the phenomenon that the data distribution changes over time. If drift occurs, machine learning models need adjustment. Since drift can be inhomogeneous, suitable actions depending on the location in data space. In this paper we address the challenge to partition the data space into segments with homogeneous drift characteristics. We formalize this objective as an independence criterion, and derive a robust and efficient training algorithm based thereon. We evaluate the efficiency of the method in comparison to existing technologies: the identification of drifting clusters, and the estimation of a conditional density distribution.
Investigating Intensity and Transversal Drift in Hyperspectral Imaging Data
Valerie Vaquet, Patrick Menz, Udo Seiffert, Barbara Hammer
https://doi.org/10.14428/esann/2021.ES2021-64
Valerie Vaquet, Patrick Menz, Udo Seiffert, Barbara Hammer
https://doi.org/10.14428/esann/2021.ES2021-64
Abstract:
When measuring data with hyperspectral cameras drift in the data distribution occurs over time and when the sensing device is changed. Frequently, this drift is a combination of intensity and wavelength shifts. In this contribution, we demonstrate that transfer component analysis together with subsampling constitutes a particular efficient and simple technology for spectral offset elimination which is applied to avoid the negative impact of drift on the classification performance. We demonstrate that this approach performs on par or better in comparison to established methods, and we also provide a theoretical motivation why this technology can deal with both, intensity as well as wavelength shift provided bounds on the smoothness of the functional data are given.
When measuring data with hyperspectral cameras drift in the data distribution occurs over time and when the sensing device is changed. Frequently, this drift is a combination of intensity and wavelength shifts. In this contribution, we demonstrate that transfer component analysis together with subsampling constitutes a particular efficient and simple technology for spectral offset elimination which is applied to avoid the negative impact of drift on the classification performance. We demonstrate that this approach performs on par or better in comparison to established methods, and we also provide a theoretical motivation why this technology can deal with both, intensity as well as wavelength shift provided bounds on the smoothness of the functional data are given.
Predicting employee attrition with a more effective use of historical events
Abdel-Rahmen Korichi, Hamamache Kheddouci, Daniel West
https://doi.org/10.14428/esann/2021.ES2021-110
Abdel-Rahmen Korichi, Hamamache Kheddouci, Daniel West
https://doi.org/10.14428/esann/2021.ES2021-110
Abstract:
Attrition prediction research typically focuses on constructing models that involves one observation per employee over a limited time period, while the rest of the employees are discarded. Time-series attributes are transformed to non-time-series ones by applying statistical operations (e.g. sum, max, etc.). Such methods result in information loss and therefore less effective predictions. In this paper, we introduce a dynamic approach to employee attrition prediction, leveraging the longitudinal nature of the data, and allowing the models to generalize across behaviors and providing a closer estimate of the employee risk of leaving.
Attrition prediction research typically focuses on constructing models that involves one observation per employee over a limited time period, while the rest of the employees are discarded. Time-series attributes are transformed to non-time-series ones by applying statistical operations (e.g. sum, max, etc.). Such methods result in information loss and therefore less effective predictions. In this paper, we introduce a dynamic approach to employee attrition prediction, leveraging the longitudinal nature of the data, and allowing the models to generalize across behaviors and providing a closer estimate of the employee risk of leaving.
Enhash: A Fast Streaming Algorithm For Concept Drift Detection
Aashi Jindal, Prashant Gupta, Debarka Sengupta, Jayadeva Jayadeva
https://doi.org/10.14428/esann/2021.ES2021-43
Aashi Jindal, Prashant Gupta, Debarka Sengupta, Jayadeva Jayadeva
https://doi.org/10.14428/esann/2021.ES2021-43
Abstract:
We propose Enhash, a fast ensemble learner that detects concept drift in a data stream. A stream may consist of abrupt, gradual, virtual, or recurring events, or a mixture of various types of drift. Enhash employs projection hash to insert an incoming sample. Benchmark tests on 6 artificial and 4 real data sets consisting of various types of drift show that Enhash is competitive with state-of-the-art ensemble learners while being significantly faster. It also has moderate resource requirements.
We propose Enhash, a fast ensemble learner that detects concept drift in a data stream. A stream may consist of abrupt, gradual, virtual, or recurring events, or a mixture of various types of drift. Enhash employs projection hash to insert an incoming sample. Benchmark tests on 6 artificial and 4 real data sets consisting of various types of drift show that Enhash is competitive with state-of-the-art ensemble learners while being significantly faster. It also has moderate resource requirements.
Lifelong Learning from Event-based Data
Vadym Gryshchuk, Cornelius Weber, Chu Kiong Loo, Stefan Wermter
https://doi.org/10.14428/esann/2021.ES2021-146
Vadym Gryshchuk, Cornelius Weber, Chu Kiong Loo, Stefan Wermter
https://doi.org/10.14428/esann/2021.ES2021-146
Abstract:
Lifelong learning is a long-standing aim for artificial agents that act in dynamic environments, in which an agent needs to accumulate knowledge incrementally without forgetting previously learned representations. We investigate methods for learning from data produced by event cameras and compare techniques to mitigate forgetting while learning incrementally. We propose a model that is composed of both, feature extraction and continuous learning. Furthermore, we introduce a habituation-based method to mitigate forgetting. Our experimental results show that the combination of different techniques can help to avoid catastrophic forgetting while learning incrementally from the features provided by the extraction module.
Lifelong learning is a long-standing aim for artificial agents that act in dynamic environments, in which an agent needs to accumulate knowledge incrementally without forgetting previously learned representations. We investigate methods for learning from data produced by event cameras and compare techniques to mitigate forgetting while learning incrementally. We propose a model that is composed of both, feature extraction and continuous learning. Furthermore, we introduce a habituation-based method to mitigate forgetting. Our experimental results show that the combination of different techniques can help to avoid catastrophic forgetting while learning incrementally from the features provided by the extraction module.
Sample efficient localization and stage prediction with autoencoders
Sebastian Hoch, Sascha Lange, Janis Keuper
https://doi.org/10.14428/esann/2021.ES2021-24
Sebastian Hoch, Sascha Lange, Janis Keuper
https://doi.org/10.14428/esann/2021.ES2021-24
Abstract:
Engineering, construction and operation of complex machines involves a wide range of complicated, simultaneous tasks, which potentially could be automated. In this work, we focus on perception tasks in such systems, investigating deep learning approaches for multi-task transfer learning with limited training data. We show an approach that takes advantage of a technical systems' focus on selected objects and their properties. We create focused representations and simultaneously solve joint objectives in a system through multi-task learning with convolutional autoencoders. The focused representations are used as a starting point for the data-saving solution of the additional tasks. The efficiency of this approach is demonstrated using images and tasks of an autonomous circular crane with a grapple.
Engineering, construction and operation of complex machines involves a wide range of complicated, simultaneous tasks, which potentially could be automated. In this work, we focus on perception tasks in such systems, investigating deep learning approaches for multi-task transfer learning with limited training data. We show an approach that takes advantage of a technical systems' focus on selected objects and their properties. We create focused representations and simultaneously solve joint objectives in a system through multi-task learning with convolutional autoencoders. The focused representations are used as a starting point for the data-saving solution of the additional tasks. The efficiency of this approach is demonstrated using images and tasks of an autonomous circular crane with a grapple.
Transfer learning in Bayesian optimization for the calibration of a beam line in proton therapy
Valentin Hamaide, François Glineur
https://doi.org/10.14428/esann/2021.ES2021-79
Valentin Hamaide, François Glineur
https://doi.org/10.14428/esann/2021.ES2021-79
Abstract:
Bayesian optimization (BO) is a type of black-box method used to optimize a costly objective function for which we have no access to derivatives. In practice, it is frequent that a series of similar problems has to be solved, with the problem data changing moderately between instances. We investigate a transfer learning approach based on BO that reuses information from a previous configuration in order to speed up subsequent optimizations. Our approach involves learning the noise variance to apply to the function values of the previous configuration and adapting the exploration-exploitation trade-off of the acquisition function from the previous configuration. We apply those ideas to the calibration of a beam line in proton therapy where the goal is to find magnet currents to obtain a desired shape for the beam of protons, and for which the calibration has to be repeated for several configurations. We show that reusing information from a previous configuration allows a reduction in the number of iterations by more than 80\%, and that using BO is superior to the conventional Nelder-Mead algorithm for black box optimization and transfer learning.
Bayesian optimization (BO) is a type of black-box method used to optimize a costly objective function for which we have no access to derivatives. In practice, it is frequent that a series of similar problems has to be solved, with the problem data changing moderately between instances. We investigate a transfer learning approach based on BO that reuses information from a previous configuration in order to speed up subsequent optimizations. Our approach involves learning the noise variance to apply to the function values of the previous configuration and adapting the exploration-exploitation trade-off of the acquisition function from the previous configuration. We apply those ideas to the calibration of a beam line in proton therapy where the goal is to find magnet currents to obtain a desired shape for the beam of protons, and for which the calibration has to be repeated for several configurations. We show that reusing information from a previous configuration allows a reduction in the number of iterations by more than 80\%, and that using BO is superior to the conventional Nelder-Mead algorithm for black box optimization and transfer learning.
Domain Adversarial Tangent Learning Towards Interpretable Domain Adaptation
Christoph Raab, Sascha Saralajew, Frank-Michael Schleif
https://doi.org/10.14428/esann/2021.ES2021-103
Christoph Raab, Sascha Saralajew, Frank-Michael Schleif
https://doi.org/10.14428/esann/2021.ES2021-103
Abstract:
Deep learning struggles to generalize well to an unseen target domain of interest. Current domain adaptation methods simultaneously learn a classifier and an adversarial game for invariant representations but inadequately align local structures, while the underlying process is hard to interpret. We propose a new interpretable adversarial domain architecture, matching local manifold approximations across domains. Evaluated against related networks, the approach is competitive, while the adaptation process can be visually verified.
Deep learning struggles to generalize well to an unseen target domain of interest. Current domain adaptation methods simultaneously learn a classifier and an adversarial game for invariant representations but inadequately align local structures, while the underlying process is hard to interpret. We propose a new interpretable adversarial domain architecture, matching local manifold approximations across domains. Evaluated against related networks, the approach is competitive, while the adaptation process can be visually verified.
Deep learning for graphs
Deep learning for graphs
Davide Bacciu, Filippo Maria Bianchi, Benjamin Paassen, CesareC Alippi
https://doi.org/10.14428/esann/2021.ES2021-5
Davide Bacciu, Filippo Maria Bianchi, Benjamin Paassen, CesareC Alippi
https://doi.org/10.14428/esann/2021.ES2021-5
Abstract:
Deep learning for graphs encompasses all those models endowed with multiple layers of abstraction, which operate on data represented as graphs. The most common building blocks of these models are graph encoding layers, which compute a vector embedding for each node in a graph based on a sum of messages received from its neighbors. However, the family also includes architectures with decoders from vectors to graphs and models that process time-varying graphs and hypergraphs. In this paper, we provide an overview of the key concepts in the field, point towards open questions, and frame the contributions of the ESANN 2021 special session into the broader context of deep learning for graphs.
Deep learning for graphs encompasses all those models endowed with multiple layers of abstraction, which operate on data represented as graphs. The most common building blocks of these models are graph encoding layers, which compute a vector embedding for each node in a graph based on a sum of messages received from its neighbors. However, the family also includes architectures with decoders from vectors to graphs and models that process time-varying graphs and hypergraphs. In this paper, we provide an overview of the key concepts in the field, point towards open questions, and frame the contributions of the ESANN 2021 special session into the broader context of deep learning for graphs.
Dynamic Graph Echo State Networks
Domenico Tortorella, Alessio Micheli
https://doi.org/10.14428/esann/2021.ES2021-70
Domenico Tortorella, Alessio Micheli
https://doi.org/10.14428/esann/2021.ES2021-70
Abstract:
Dynamic temporal graphs represent evolving relations between entities, e.g. interactions between social network users or infection spreading. We propose an extension of graph echo state networks for the efficient processing of dynamic temporal graphs, with a sufficient condition for their echo state property, and an experimental analysis of reservoir layout impact. Compared to temporal graph kernels that need to hold the entire history of vertex interactions, our model provides a vector encoding for the dynamic graph that is updated at each time-step without requiring training. Experiments show accuracy comparable to approximate temporal graph kernels on twelve dissemination process classification tasks.
Dynamic temporal graphs represent evolving relations between entities, e.g. interactions between social network users or infection spreading. We propose an extension of graph echo state networks for the efficient processing of dynamic temporal graphs, with a sufficient condition for their echo state property, and an experimental analysis of reservoir layout impact. Compared to temporal graph kernels that need to hold the entire history of vertex interactions, our model provides a vector encoding for the dynamic graph that is updated at each time-step without requiring training. Experiments show accuracy comparable to approximate temporal graph kernels on twelve dissemination process classification tasks.
Improving Graph Variational Autoencoders with Multi-Hop Simple Convolutions
Erik Jhones Freitas do Nascimento, Amauri Souza, Diego Mesquita
https://doi.org/10.14428/esann/2021.ES2021-147
Erik Jhones Freitas do Nascimento, Amauri Souza, Diego Mesquita
https://doi.org/10.14428/esann/2021.ES2021-147
Abstract:
Variational auto-encoding architectures represent one of the most popular approaches to graph generative modeling. These models comprise encoder and a decoder networks, which map back and forth between the input and latent spaces. Notably, most of the literature in variational autoencoders (VAEs) for graphs focuses on developing more efficient architectures at the expense of increased complexity. In this work, we pursue an orthogonal direction and leverage multi-hop linear graph convolutional layers to create efficient yet simple encoders, boosting the performance of graph autoencoders. Our results demonstrate that our approach outperforms popular graph VAE baselines in link prediction tasks.
Variational auto-encoding architectures represent one of the most popular approaches to graph generative modeling. These models comprise encoder and a decoder networks, which map back and forth between the input and latent spaces. Notably, most of the literature in variational autoencoders (VAEs) for graphs focuses on developing more efficient architectures at the expense of increased complexity. In this work, we pursue an orthogonal direction and leverage multi-hop linear graph convolutional layers to create efficient yet simple encoders, boosting the performance of graph autoencoders. Our results demonstrate that our approach outperforms popular graph VAE baselines in link prediction tasks.
Application of Graph Convolutions in a Lightweight Model for Skeletal Human Motion Forecasting
Luca Hermes, Barbara Hammer, Malte Schilling
https://doi.org/10.14428/esann/2021.ES2021-145
Luca Hermes, Barbara Hammer, Malte Schilling
https://doi.org/10.14428/esann/2021.ES2021-145
Abstract:
Prediction of movements is essential for successful cooperation with intelligent systems. We propose a model that integrates organized spatial information as given through the moving body's skeletal structure. This inherent structure is exploited in our model through application of Graph Convolutions and we demonstrate how this allows leveraging the structured spatial information into competitive predictions that are based on a lightweight model that requires a comparatively small number of parameters.
Prediction of movements is essential for successful cooperation with intelligent systems. We propose a model that integrates organized spatial information as given through the moving body's skeletal structure. This inherent structure is exploited in our model through application of Graph Convolutions and we demonstrate how this allows leveraging the structured spatial information into competitive predictions that are based on a lightweight model that requires a comparatively small number of parameters.
Tangent Graph Convolutional Network
Luca Pasa, Nicolò Navarin, Alessandro Sperduti
https://doi.org/10.14428/esann/2021.ES2021-143
Luca Pasa, Nicolò Navarin, Alessandro Sperduti
https://doi.org/10.14428/esann/2021.ES2021-143
Abstract:
Most Graph Convolutions (GCs) proposed in the Graph Neural Networks (GNNs) literature share the principle of computing topologically enriched node representations based on the ones of their neighbors. In this paper, we propose a novel GNN named Tangent Graph Convolutional Network (TGCN) that, in addition to the traditional GC approach, exploits a novel GC that computes node embeddings based on the differences between the attributes of a vertex and the attributes of its neighbors. This allows the GC to characterize each node's neighbor by computing its tangent space representation with respect to the considered vertex.
Most Graph Convolutions (GCs) proposed in the Graph Neural Networks (GNNs) literature share the principle of computing topologically enriched node representations based on the ones of their neighbors. In this paper, we propose a novel GNN named Tangent Graph Convolutional Network (TGCN) that, in addition to the traditional GC approach, exploits a novel GC that computes node embeddings based on the differences between the attributes of a vertex and the attributes of its neighbors. This allows the GC to characterize each node's neighbor by computing its tangent space representation with respect to the considered vertex.
Transformers for Molecular Graph Generation
Tim Cofala, Oliver Kramer
https://doi.org/10.14428/esann/2021.ES2021-112
Tim Cofala, Oliver Kramer
https://doi.org/10.14428/esann/2021.ES2021-112
Abstract:
This work introduces an autoregressive generative model for graphs which is based on the transformer architecture and applied to the domain of molecular graph generation. Utilizing the multi-head self-attention mechanism to directly model distributions over atoms and bonds, it can sample new molecular graphs in an autoregressive manner. The benchmark framework MOSES is used to compare the proposed approach to other state-of-the-art molecule generation models. It is shown that the model is capable of generalizing from the training data to generate novel and realistic molecules.
This work introduces an autoregressive generative model for graphs which is based on the transformer architecture and applied to the domain of molecular graph generation. Utilizing the multi-head self-attention mechanism to directly model distributions over atoms and bonds, it can sample new molecular graphs in an autoregressive manner. The benchmark framework MOSES is used to compare the proposed approach to other state-of-the-art molecule generation models. It is shown that the model is capable of generalizing from the training data to generate novel and realistic molecules.
Inductive learning for product assortment graph completion
Marco Trincavelli, Haris Dukic, Georgios Deligiorgis, Pierpaolo Sepe, Davide Bacciu
https://doi.org/10.14428/esann/2021.ES2021-73
Marco Trincavelli, Haris Dukic, Georgios Deligiorgis, Pierpaolo Sepe, Davide Bacciu
https://doi.org/10.14428/esann/2021.ES2021-73
Abstract:
Global retailers have assortments that contain hundreds of thousands of products that can be linked by several types of relationships like style compatibility, "bought together", "watched together", etc. Graphs are a natural representation for assortments, where products are nodes and relations are edges. Relations like style compatibility are often produced by a manual process and therefore do not cover uniformly the whole graph. We propose to use inductive learning to enhance a graph encoding style compatibility of a fashion assortment, leveraging rich node information comprising textual descriptions and visual data. Then, we show how the proposed graph enhancement improves substantially the performance on transductive tasks with a minor impact on graph sparsity.
Global retailers have assortments that contain hundreds of thousands of products that can be linked by several types of relationships like style compatibility, "bought together", "watched together", etc. Graphs are a natural representation for assortments, where products are nodes and relations are edges. Relations like style compatibility are often produced by a manual process and therefore do not cover uniformly the whole graph. We propose to use inductive learning to enhance a graph encoding style compatibility of a fashion assortment, leveraging rich node information comprising textual descriptions and visual data. Then, we show how the proposed graph enhancement improves substantially the performance on transductive tasks with a minor impact on graph sparsity.
Deep learning and image processing
Evolutionary Deep Multi-Task Learning
Oliver Kramer
https://doi.org/10.14428/esann/2021.ES2021-14
Oliver Kramer
https://doi.org/10.14428/esann/2021.ES2021-14
Abstract:
Multi-task learning is an approach to reduce the amount of required training data by learning multiple tasks at the same time. In the context of neural networks, multi-task learning is performed by sharing weights or creating dependencies between weights of task-specific networks. In this work, we propose an algorithm that uses a simple evolutionary algorithm, which is able to match and also surpass learned weight sharing. We evaluate the performance of this method on CIFAR-100, cast as a multi-tasking problem, using an 18-layer residual network, and compare our results to literature.
Multi-task learning is an approach to reduce the amount of required training data by learning multiple tasks at the same time. In the context of neural networks, multi-task learning is performed by sharing weights or creating dependencies between weights of task-specific networks. In this work, we propose an algorithm that uses a simple evolutionary algorithm, which is able to match and also surpass learned weight sharing. We evaluate the performance of this method on CIFAR-100, cast as a multi-tasking problem, using an 18-layer residual network, and compare our results to literature.
Semantic Prediction: Which One Should Come First, Recognition or Prediction?
Hafez Farazi, Jan Nogga, Sven Behnke
https://doi.org/10.14428/esann/2021.ES2021-23
Hafez Farazi, Jan Nogga, Sven Behnke
https://doi.org/10.14428/esann/2021.ES2021-23
Abstract:
The ultimate goal of video prediction is not forecasting future pixel-values given some previous frames. Rather, the end goal of video prediction is to discover valuable internal representations from the vast amount of available unlabeled video data in a self-supervised fashion for downstream tasks. One of the primary downstream tasks is interpreting the scene's semantic composition and using it for decision-making. For example, by predicting human movements, an observer can anticipate human activities and collaborate in a shared workspace. There are two main ways to achieve the same outcome, given a pre-trained video prediction and pre-trained semantic extraction model; one can first apply predictions and then extract semantics or first extract semantics and then predict. We investigate these configurations using the Local Frequency Domain Transformer Network (LFDTN) as the video prediction model and U-Net as the semantic extraction model on synthetic and real datasets.
The ultimate goal of video prediction is not forecasting future pixel-values given some previous frames. Rather, the end goal of video prediction is to discover valuable internal representations from the vast amount of available unlabeled video data in a self-supervised fashion for downstream tasks. One of the primary downstream tasks is interpreting the scene's semantic composition and using it for decision-making. For example, by predicting human movements, an observer can anticipate human activities and collaborate in a shared workspace. There are two main ways to achieve the same outcome, given a pre-trained video prediction and pre-trained semantic extraction model; one can first apply predictions and then extract semantics or first extract semantics and then predict. We investigate these configurations using the Local Frequency Domain Transformer Network (LFDTN) as the video prediction model and U-Net as the semantic extraction model on synthetic and real datasets.
Deep Graph Convolutional Networks for Wind Speed Prediction
Tomasz Stańczyk, Siamak Mehrkanoon
https://doi.org/10.14428/esann/2021.ES2021-25
Tomasz Stańczyk, Siamak Mehrkanoon
https://doi.org/10.14428/esann/2021.ES2021-25
Abstract:
In this paper, we introduce a new model for wind speed prediction based on spatio-temporal graph convolutional networks. Here, weather stations are treated as nodes of a graph with a learnable adjacency matrix, which determines the strength of relations between the stations based on the historical weather data. The self-loop connection is added to the learnt adjacency matrix and its strength is controlled by additional learnable parameter. Experiments performed on real datasets collected from weather stations located in Denmark and the Netherlands show that our proposed model outperforms previously developed baseline models on the referenced datasets.
In this paper, we introduce a new model for wind speed prediction based on spatio-temporal graph convolutional networks. Here, weather stations are treated as nodes of a graph with a learnable adjacency matrix, which determines the strength of relations between the stations based on the historical weather data. The self-loop connection is added to the learnt adjacency matrix and its strength is controlled by additional learnable parameter. Experiments performed on real datasets collected from weather stations located in Denmark and the Netherlands show that our proposed model outperforms previously developed baseline models on the referenced datasets.
Benign overfitting of fully connected Deep Nets:A Sobolev space viewpoint
Stephane Chretien, Emmanuel Caron-Parte
https://doi.org/10.14428/esann/2021.ES2021-37
Stephane Chretien, Emmanuel Caron-Parte
https://doi.org/10.14428/esann/2021.ES2021-37
Abstract:
Deep neural nets have undergone tremendous improvementsin the last decade, which revolutionised the field of machine learning in abroad and lasting manner, achieving unprecedented performance in suchdiverse fields as image analysis, point cloud registration, natural languageprocessing and model free control. On the theoretical side, understandingthe underpinnings of deep learning remains a formidable challenge, despiteimpressive breakthroughs in the last decade. One particularly interestingnew prospect is the analysis of the double descent phenomenon describedin Belkin et al. [2019], a counter-intuitive theory bringing new insighton the performance of learning systems in the greatly over-parametrisedregime.The list of contribution to the understanding of the double descentparadigm has grown substantially in the last two years, but all availableresults in the literature mainly focus on the linear and the kernel setups.In the present paper, we study the overparametrised part of the doubledescent curve introduced in Belkin et al. [2019] and propose a new approachto the study of benign overfitting in the setting of learning Sobolev maps.
Deep neural nets have undergone tremendous improvementsin the last decade, which revolutionised the field of machine learning in abroad and lasting manner, achieving unprecedented performance in suchdiverse fields as image analysis, point cloud registration, natural languageprocessing and model free control. On the theoretical side, understandingthe underpinnings of deep learning remains a formidable challenge, despiteimpressive breakthroughs in the last decade. One particularly interestingnew prospect is the analysis of the double descent phenomenon describedin Belkin et al. [2019], a counter-intuitive theory bringing new insighton the performance of learning systems in the greatly over-parametrisedregime.The list of contribution to the understanding of the double descentparadigm has grown substantially in the last two years, but all availableresults in the literature mainly focus on the linear and the kernel setups.In the present paper, we study the overparametrised part of the doubledescent curve introduced in Belkin et al. [2019] and propose a new approachto the study of benign overfitting in the setting of learning Sobolev maps.
Correlated Weights Neural Layer with external control
Slawomir Golak
https://doi.org/10.14428/esann/2021.ES2021-26
Slawomir Golak
https://doi.org/10.14428/esann/2021.ES2021-26
Abstract:
The correlated weights neural layer is a generalization of the convolutional layer constituting the core of CNN networks. The CWNL layer takes advantage of weights correlated with coordinates of a neuron and its inputs, calculated by a dedicated neural subnet. In this work, a modified CWNL layer is proposed, which allows the parameterized spatial manipulation (and any other global transformation) of a pattern. The externally controlled CWNL layer can be used in existing neural network architectures, giving them the ability of internal pattern transformation without any modification of the training process.
The correlated weights neural layer is a generalization of the convolutional layer constituting the core of CNN networks. The CWNL layer takes advantage of weights correlated with coordinates of a neuron and its inputs, calculated by a dedicated neural subnet. In this work, a modified CWNL layer is proposed, which allows the parameterized spatial manipulation (and any other global transformation) of a pattern. The externally controlled CWNL layer can be used in existing neural network architectures, giving them the ability of internal pattern transformation without any modification of the training process.
Comprehensive Analysis of the Screening of COVID-19 Approaches in Chest X-ray Images from Portable Devices
Daniel Iglesias, Joaquim de Moura, Jorge Novo, Marcos Ortega
https://doi.org/10.14428/esann/2021.ES2021-31
Daniel Iglesias, Joaquim de Moura, Jorge Novo, Marcos Ortega
https://doi.org/10.14428/esann/2021.ES2021-31
Abstract:
Computer-aided diagnosis plays an important role in the COVID-19 pandemic. Currently, it is recommended to use X-ray imaging to diagnose and assess the evolution in patients. Particularly, radiologists are asked to use portable acquisition devices to minimize the risk of cross-infection, facilitating an effective separation of suspected patients with other low-risk cases. In this work, we present an automatic COVID-19 screening, considering 6 representative state-of-the-art deep network architectures on a portable chest X-ray dataset that was specifically designed for this proposal. Exhaustive experimentation demonstrates that the models can separate COVID-19 cases from NON-COVID-19 cases, achieving a 97.68% of global accuracy.
Computer-aided diagnosis plays an important role in the COVID-19 pandemic. Currently, it is recommended to use X-ray imaging to diagnose and assess the evolution in patients. Particularly, radiologists are asked to use portable acquisition devices to minimize the risk of cross-infection, facilitating an effective separation of suspected patients with other low-risk cases. In this work, we present an automatic COVID-19 screening, considering 6 representative state-of-the-art deep network architectures on a portable chest X-ray dataset that was specifically designed for this proposal. Exhaustive experimentation demonstrates that the models can separate COVID-19 cases from NON-COVID-19 cases, achieving a 97.68% of global accuracy.
Data-Efficient Training of High-Resolution Images in Medical Domain
Shruti Kunde, Amey Pandit, Kushagra Mahajan, Monika Sharma, Rekha Singhal, Lovekesh Vig
https://doi.org/10.14428/esann/2021.ES2021-57
Shruti Kunde, Amey Pandit, Kushagra Mahajan, Monika Sharma, Rekha Singhal, Lovekesh Vig
https://doi.org/10.14428/esann/2021.ES2021-57
Abstract:
The ability of Graphical Processor Units (GPUs) to quickly train dataand compute-intensive deep networks has led to rapid advancements across diverse domains such as robotics, medical imaging and autonomous driving. However, memory constraints with GPU-based training for memory-intensive deep networks have forced researchers to adopt various workarounds: 1) resize the input image, 2) divide input image into smaller patches, or use smaller batch-sizes in order to fit both the model and batch training data into GPU memory.While these alternatives perform well when dealing with natural images, they suffer from 1) loss of highresolution information, 2) loss of global context and 3) sub-optimal batch sizes. Such issues will likely to become more pressing for domains like medical imaging, where data is scarce and images are often of very high resolution with subtle features. Therefore, in this paper, we demonstrate that training can be made more data-efficient by using a distributed training setup with high-resolution images and larger effective batch sizes, with batches being distributed across multiple nodes. The distributed GPU training framework, which partitions the data and only shares model parameters across different GPUs, gets around the memory constraints of single GPU training. We conduct a study in which experiments are performed for different image resolutions (ranging from 112112 to 10241024) and different number of images per class to determine the effect of image resolutions on network performance. We illustrate our findings on two medical imaging datasets namely, SD-198 skin-lesion and NIH Chest X-rays.
The ability of Graphical Processor Units (GPUs) to quickly train dataand compute-intensive deep networks has led to rapid advancements across diverse domains such as robotics, medical imaging and autonomous driving. However, memory constraints with GPU-based training for memory-intensive deep networks have forced researchers to adopt various workarounds: 1) resize the input image, 2) divide input image into smaller patches, or use smaller batch-sizes in order to fit both the model and batch training data into GPU memory.While these alternatives perform well when dealing with natural images, they suffer from 1) loss of highresolution information, 2) loss of global context and 3) sub-optimal batch sizes. Such issues will likely to become more pressing for domains like medical imaging, where data is scarce and images are often of very high resolution with subtle features. Therefore, in this paper, we demonstrate that training can be made more data-efficient by using a distributed training setup with high-resolution images and larger effective batch sizes, with batches being distributed across multiple nodes. The distributed GPU training framework, which partitions the data and only shares model parameters across different GPUs, gets around the memory constraints of single GPU training. We conduct a study in which experiments are performed for different image resolutions (ranging from 112112 to 10241024) and different number of images per class to determine the effect of image resolutions on network performance. We illustrate our findings on two medical imaging datasets namely, SD-198 skin-lesion and NIH Chest X-rays.
CAS-Net: A Novel Coronary Artery Segmentation Neural Network
Rawaa Hamdi, Asma Kerkeni, Mouhamed hédi Bedoui, Asma Ben Abdallah
https://doi.org/10.14428/esann/2021.ES2021-157
Rawaa Hamdi, Asma Kerkeni, Mouhamed hédi Bedoui, Asma Ben Abdallah
https://doi.org/10.14428/esann/2021.ES2021-157
Abstract:
In conventional X-ray coronary angiography, accurate coronary artery segmentation is a crucial and challenging step in the assessment of coronary artery disease. In this paper, we propose a new architecture (CAS-Net) for coronary artery segmentation. It is based on Residual U-Net and it includes both channel and spatial attention mechanism in the center part to generate hierarchical rich features of coronary arteries. Experiments are conducted on a private dataset of 150 images. The results show that CAS-Net outperforms the state-of-the-art method achieving the highest accuracy of 96.91% and Dice of 82.70%.
In conventional X-ray coronary angiography, accurate coronary artery segmentation is a crucial and challenging step in the assessment of coronary artery disease. In this paper, we propose a new architecture (CAS-Net) for coronary artery segmentation. It is based on Residual U-Net and it includes both channel and spatial attention mechanism in the center part to generate hierarchical rich features of coronary arteries. Experiments are conducted on a private dataset of 150 images. The results show that CAS-Net outperforms the state-of-the-art method achieving the highest accuracy of 96.91% and Dice of 82.70%.
Enhancing brain decoding using attention augmented deep neural networks
Ismail Alaoui Abdellaoui, Jesús García Fernández, Caner Sahinli, Siamak Mehrkanoon
https://doi.org/10.14428/esann/2021.ES2021-67
Ismail Alaoui Abdellaoui, Jesús García Fernández, Caner Sahinli, Siamak Mehrkanoon
https://doi.org/10.14428/esann/2021.ES2021-67
Abstract:
Neuroimaging techniques have shown to be valuable when studying brain activity. This paper uses Magnetoencephalography (MEG) data, provided by the Human Connectome Project (HCP), and different deep learning models to perform brain decoding. Specifically, we investigate to which extent one can infer the task performed by a subject based on its MEG data. In order to capture the most relevant features of the signals, self and global attention are incorporated into our models. The obtained results show that the inclusion of attention improves the performance and generalization of the models across subjects.
Neuroimaging techniques have shown to be valuable when studying brain activity. This paper uses Magnetoencephalography (MEG) data, provided by the Human Connectome Project (HCP), and different deep learning models to perform brain decoding. Specifically, we investigate to which extent one can infer the task performed by a subject based on its MEG data. In order to capture the most relevant features of the signals, self and global attention are incorporated into our models. The obtained results show that the inclusion of attention improves the performance and generalization of the models across subjects.
Improved and Generalized Vine Line Detection on Aerial Images Using Asymmetrical Neural Networks and ML Subclassifiers
Jérôme Treboux, Rolf Ingold, Dominique Genoud
https://doi.org/10.14428/esann/2021.ES2021-68
Jérôme Treboux, Rolf Ingold, Dominique Genoud
https://doi.org/10.14428/esann/2021.ES2021-68
Abstract:
It is widely accepted that deep neural networks are very effi- cient for detecting objects in images. They reach their limit when detect- ing multiple instances of long lines in low-resolution images. We present an original methodology for the recognition of vine lines in low-resolution satellite images. The method consists in combining an asymmetrical neural network with a sub-classifier. We first compare a traditional U-Net archi- tecture with an asymmetrical U-Net architecture designed for precision agriculture. We then highlight the significant improvement in vine line detection when a Random Forest is added after the customized U-Net. This methodology addresses the complex task of dissociating vine lines from other agricultural objects. As a result, our experiments improve the precision from 0.83 to 0.94 over our optimized neural network.
It is widely accepted that deep neural networks are very effi- cient for detecting objects in images. They reach their limit when detect- ing multiple instances of long lines in low-resolution images. We present an original methodology for the recognition of vine lines in low-resolution satellite images. The method consists in combining an asymmetrical neural network with a sub-classifier. We first compare a traditional U-Net archi- tecture with an asymmetrical U-Net architecture designed for precision agriculture. We then highlight the significant improvement in vine line detection when a Random Forest is added after the customized U-Net. This methodology addresses the complex task of dissociating vine lines from other agricultural objects. As a result, our experiments improve the precision from 0.83 to 0.94 over our optimized neural network.
Cross-modal verification for 3D object detection
Haodi ZHANG, Alexandrina Rogozan, Abdelaziz Bensrhair
https://doi.org/10.14428/esann/2021.ES2021-97
Haodi ZHANG, Alexandrina Rogozan, Abdelaziz Bensrhair
https://doi.org/10.14428/esann/2021.ES2021-97
Abstract:
To overcome the deficiency in the single modality of LiDAR point cloud, we propose a cross-modal verification (CMV) model for reducing 3D object detection false positives. The abundant color and texture information in image modality allow the classification of the projection region of 3D bounding box proposal in the image plane. Three 3D object detectors are adopted as backbone and eight evaluation metrics are used to fully investigate the proposed model. The experiment results show that the proposed CMV model removes more than 50\% of false positives in 3D object detection proposals and significantly improves the performance of 3D object detection.
To overcome the deficiency in the single modality of LiDAR point cloud, we propose a cross-modal verification (CMV) model for reducing 3D object detection false positives. The abundant color and texture information in image modality allow the classification of the projection region of 3D bounding box proposal in the image plane. Three 3D object detectors are adopted as backbone and eight evaluation metrics are used to fully investigate the proposed model. The experiment results show that the proposed CMV model removes more than 50\% of false positives in 3D object detection proposals and significantly improves the performance of 3D object detection.
Fourier-based Video Prediction through Relational Object Motion
Malte Mosbach, Sven Behnke
https://doi.org/10.14428/esann/2021.ES2021-125
Malte Mosbach, Sven Behnke
https://doi.org/10.14428/esann/2021.ES2021-125
Abstract:
The ability to predict future outcomes conditioned on observed video frames is crucial for intelligent decision-making in autonomous systems. Recently, deep recurrent architectures have been applied to the task of video prediction. However, this often results in blurry predictions and requires tedious training on large datasets. Here, we explore a different approach by (1) using frequency-domain approaches for video prediction and (2) explicitly inferring object-motion relationships in the observed scene. The resulting predictions are consistent with the observed dynamics in a scene and do not suffer from blur.
The ability to predict future outcomes conditioned on observed video frames is crucial for intelligent decision-making in autonomous systems. Recently, deep recurrent architectures have been applied to the task of video prediction. However, this often results in blurry predictions and requires tedious training on large datasets. Here, we explore a different approach by (1) using frequency-domain approaches for video prediction and (2) explicitly inferring object-motion relationships in the observed scene. The resulting predictions are consistent with the observed dynamics in a scene and do not suffer from blur.
Object Detection on Thermal Images: Performance of YOLOv4 Trained on Small Datasets
Maxence Chaverot, Maxime Carré, Michel Jourlin, Abdelaziz Bensrhair, Richard Grisel
https://doi.org/10.14428/esann/2021.ES2021-130
Maxence Chaverot, Maxime Carré, Michel Jourlin, Abdelaziz Bensrhair, Richard Grisel
https://doi.org/10.14428/esann/2021.ES2021-130
Abstract:
Thermal sensors are underrepresented in the field of Advanced Driver-Assistance Systems whereas their capabilities to acquire images independently of weather or daytime can be very helpful to achieve optimal pedestrian and vehicle detection. This underrepresentation is due to the small amount of available public datasets. This lack of training samples, and the difficulties of building such datasets are a real hurdle to the development of an object detector dedicated to thermal images. Thanks to YOLOv4 and its detection performance, we show in this paper that fine-tuning this neural network requires few samples to achieve satisfying performance, outperforming the results of state-of-the-art detectors.
Thermal sensors are underrepresented in the field of Advanced Driver-Assistance Systems whereas their capabilities to acquire images independently of weather or daytime can be very helpful to achieve optimal pedestrian and vehicle detection. This underrepresentation is due to the small amount of available public datasets. This lack of training samples, and the difficulties of building such datasets are a real hurdle to the development of an object detector dedicated to thermal images. Thanks to YOLOv4 and its detection performance, we show in this paper that fine-tuning this neural network requires few samples to achieve satisfying performance, outperforming the results of state-of-the-art detectors.
Temperature as a Regularizer for Semantic Segmentation
Chanho Kim, Won-Sook Lee
https://doi.org/10.14428/esann/2021.ES2021-158
Chanho Kim, Won-Sook Lee
https://doi.org/10.14428/esann/2021.ES2021-158
Abstract:
A data-oriented approach including all deep learning methods is usually suffered by overfitting. A regularizer has been, from the beginning, introduced to resolve this problem. Inspired by Generative Adversarial Network (GAN), our framework generates the adversarial loss to penalize a segmentation model like a regularizer. We introduce temperature as a regularizer when calculating Least-Square losses. Temperature affects losses in both a discriminator and a generator in our DCGAN framework. Our experiment suggests L2 losses on top of the original LSGAN losses for optimization. This new regularizer using temperature improves semantic segmentation accuracy both in Pixel accuracy and mean Intersection-of Union.
A data-oriented approach including all deep learning methods is usually suffered by overfitting. A regularizer has been, from the beginning, introduced to resolve this problem. Inspired by Generative Adversarial Network (GAN), our framework generates the adversarial loss to penalize a segmentation model like a regularizer. We introduce temperature as a regularizer when calculating Least-Square losses. Temperature affects losses in both a discriminator and a generator in our DCGAN framework. Our experiment suggests L2 losses on top of the original LSGAN losses for optimization. This new regularizer using temperature improves semantic segmentation accuracy both in Pixel accuracy and mean Intersection-of Union.
Machine Learning for Measuring and Analyzing Online Social Communications
Machine Learning for Measuring and Analyzing Online Social Communications
Chris Bronk, Amaury Lendasse, Peggy Lindner, Dan S. Wallach, Barbara Hammer
https://doi.org/10.14428/esann/2021.ES2021-3
Chris Bronk, Amaury Lendasse, Peggy Lindner, Dan S. Wallach, Barbara Hammer
https://doi.org/10.14428/esann/2021.ES2021-3
Abstract:
In this paper, we propose a framework for application of a novel machine learning-based system for analyzing online social communications. As a example, we are targeting anti-Semitic graphical memes posted to social media. We presented very promising preliminary results on a Facebook dataset that consists of a total of 10000 labeled memes. We can conclude that machine learning will soon be able to successfully analyze and monitor complex social communications.
In this paper, we propose a framework for application of a novel machine learning-based system for analyzing online social communications. As a example, we are targeting anti-Semitic graphical memes posted to social media. We presented very promising preliminary results on a Facebook dataset that consists of a total of 10000 labeled memes. We can conclude that machine learning will soon be able to successfully analyze and monitor complex social communications.
Toxicity Detection in Online Comments with Limited Data: A Comparative Analysis
Max Lübbering, Maren Pielka, Kajaree Das, Michael Gebauer, Rajkumar Ramamurthy, Christian Bauckhage, Rafet Sifa
https://doi.org/10.14428/esann/2021.ES2021-48
Max Lübbering, Maren Pielka, Kajaree Das, Michael Gebauer, Rajkumar Ramamurthy, Christian Bauckhage, Rafet Sifa
https://doi.org/10.14428/esann/2021.ES2021-48
Abstract:
We present a comparative study on toxicity detection, focusing on the problem of identifying toxicity types of low prevalence and possibly even unobserved at training time. For this purpose, we train our models on a dataset that contains only a weak type of toxicity, and test whether they are able to generalize to more severe toxicity types. We find that representation learning and ensembling exceed the classification performance of simple classifiers on toxicity detection, while also providing significantly better generalization and robustness. All models benefit from a larger training set size, which even extends to the toxicity types unseen during training.
We present a comparative study on toxicity detection, focusing on the problem of identifying toxicity types of low prevalence and possibly even unobserved at training time. For this purpose, we train our models on a dataset that contains only a weak type of toxicity, and test whether they are able to generalize to more severe toxicity types. We find that representation learning and ensembling exceed the classification performance of simple classifiers on toxicity detection, while also providing significantly better generalization and robustness. All models benefit from a larger training set size, which even extends to the toxicity types unseen during training.
Emotional Intensity Level Analysis of Speech Emotional Intensity Estimation
Megumi Kawase
https://doi.org/10.14428/esann/2021.ES2021-118
Megumi Kawase
https://doi.org/10.14428/esann/2021.ES2021-118
Abstract:
An estimation procedure using three models to determine the appropriate emotional intensity from the 10 listed emotional intensity classes for utterances has been developed in order to support better communication between humans and machines. In order to improve estimation performance, utterances were divided into segments and an estimated emotional intensity and its probability were produced as outputs. Two feature vectors were produced from the outputs and these features were used for the utterance-level classification using Support Vector Machine and Random Forest techniques. In the results, the accuracy of emotional intensity estimation in two out of three models was improved using the procedure proposed. In addition, features which contributed to the estimations were analyzed.
An estimation procedure using three models to determine the appropriate emotional intensity from the 10 listed emotional intensity classes for utterances has been developed in order to support better communication between humans and machines. In order to improve estimation performance, utterances were divided into segments and an estimated emotional intensity and its probability were produced as outputs. Two feature vectors were produced from the outputs and these features were used for the utterance-level classification using Support Vector Machine and Random Forest techniques. In the results, the accuracy of emotional intensity estimation in two out of three models was improved using the procedure proposed. In addition, features which contributed to the estimations were analyzed.
Natural language processing
Weightless Neural Networks for text classification using tf-idf
Antonio Sorgente, Massimo De Gregorio, Giuseppe Vettigli
https://doi.org/10.14428/esann/2021.ES2021-58
Antonio Sorgente, Massimo De Gregorio, Giuseppe Vettigli
https://doi.org/10.14428/esann/2021.ES2021-58
Abstract:
While Weightless Neural Networks (WNN) have been proven effective in Natural Language Processing (NLP) applications, they require the use of highly customized features as they work on binary inputs. However, recent advancements have brought methodologies able to adapt WNN to real numbers showing competitive results on many classification tasks, but they often struggle on sparse data. In this paper, we show that WNN can successfully use sparse linguistic features, like tf-idf, using appropriate transformations. We also show that WNN can be used to improve the performances of existing models for Mixed Language Sentiment Analysis and that it has competitive performances for news categorization.
While Weightless Neural Networks (WNN) have been proven effective in Natural Language Processing (NLP) applications, they require the use of highly customized features as they work on binary inputs. However, recent advancements have brought methodologies able to adapt WNN to real numbers showing competitive results on many classification tasks, but they often struggle on sparse data. In this paper, we show that WNN can successfully use sparse linguistic features, like tf-idf, using appropriate transformations. We also show that WNN can be used to improve the performances of existing models for Mixed Language Sentiment Analysis and that it has competitive performances for news categorization.
End-to-end Keyword Spotting using Xception-1d
Juan Gómez-Sanchis, Juan Gómez-Sanchis, Marcelino Martinez-Sober, Joan Vila-Francés, Antonio-José Serrano López, Emilio Soria Olivas
https://doi.org/10.14428/esann/2021.ES2021-21
Juan Gómez-Sanchis, Juan Gómez-Sanchis, Marcelino Martinez-Sober, Joan Vila-Francés, Antonio-José Serrano López, Emilio Soria Olivas
https://doi.org/10.14428/esann/2021.ES2021-21
Abstract:
The field of conversational agents is growing fast and there is an increasing need for algorithms that enhance natural interaction. In this work we show how we achieved state of the art results in the Keyword Spotting field by adapting and tweaking the Xception algorithm, which achieved outstanding results in several computer vision tasks. We obtained about 96\% accuracy when classifying audio clips belonging to 35 different categories, beating human annotation at the most complex tasks proposed.
The field of conversational agents is growing fast and there is an increasing need for algorithms that enhance natural interaction. In this work we show how we achieved state of the art results in the Keyword Spotting field by adapting and tweaking the Xception algorithm, which achieved outstanding results in several computer vision tasks. We obtained about 96\% accuracy when classifying audio clips belonging to 35 different categories, beating human annotation at the most complex tasks proposed.
Unsupervised Word Representations Learning with Bilinear Convolutional Network on Characters
Thomas Luka, Laure Soulier, David Picard
https://doi.org/10.14428/esann/2021.ES2021-38
Thomas Luka, Laure Soulier, David Picard
https://doi.org/10.14428/esann/2021.ES2021-38
Abstract:
In this paper, we propose a new unsupervised method for learning word embedding with raw characters as input representations, bypassing the problems arising from the use of a dictionary. To achieve this purpose, we translate the distributional hypothesis into a unsupervised metric learning objective, which allows to consider only an encoder instead of an encoder-decoder architecture. We propose to use a convolutional neural network with bilinear product blocks and residual connections to encode co-occurrences patterns. We show the efficiency of our approach by comparing it with classical word embedding methods such as fastText and GloVe on several benchmarks.
In this paper, we propose a new unsupervised method for learning word embedding with raw characters as input representations, bypassing the problems arising from the use of a dictionary. To achieve this purpose, we translate the distributional hypothesis into a unsupervised metric learning objective, which allows to consider only an encoder instead of an encoder-decoder architecture. We propose to use a convolutional neural network with bilinear product blocks and residual connections to encode co-occurrences patterns. We show the efficiency of our approach by comparing it with classical word embedding methods such as fastText and GloVe on several benchmarks.
TSR-DSAW: Table Structure Recognition via Deep Spatial Association of Words
Arushi Jain, Shubham Paliwal, Monika Sharma, Lovekesh Vig
https://doi.org/10.14428/esann/2021.ES2021-109
Arushi Jain, Shubham Paliwal, Monika Sharma, Lovekesh Vig
https://doi.org/10.14428/esann/2021.ES2021-109
Abstract:
Existing methods for Table Structure Recognition (TSR) from camera-captured or scanned documents perform poorly on complex-tables consisting of nested rows / columns, multi-line texts and missing cell data. This is because current data-driven methods work by simply training deep models on large volumes of data and fail to generalize when an unseen table structure is encountered. In this paper, we propose to train a deep network to capture the spatial associations between different word pairs present in the table image for unravelling the table structure. We present an end-to-end pipeline, named TSR-DSAW: TSR via Deep Spatial Association of Words, which outputs a digital representation of a table image in a structured format such as HTML. Given a table image as input, the proposed method begins with the detection of all the words present in the image using a text-detection network like CRAFT which is followed by the generation of word-pairs using dynamic programming. These word-pairs are highlighted in individual images and subsequently, fed into a DenseNet-121 classifier trained to capture spatial associations such as same-row, same-column, same-cell or none. Finally, we perform post-processing on the output of the word-association classifier to generate the table structure in HTML format. We evaluate our TSR-DSAW pipeline on two publicly available table-image datasets - PubTabNet and ICDAR 2013, and demonstrate improvement over previous methods suchas TableNet and DeepDeSRT.
Existing methods for Table Structure Recognition (TSR) from camera-captured or scanned documents perform poorly on complex-tables consisting of nested rows / columns, multi-line texts and missing cell data. This is because current data-driven methods work by simply training deep models on large volumes of data and fail to generalize when an unseen table structure is encountered. In this paper, we propose to train a deep network to capture the spatial associations between different word pairs present in the table image for unravelling the table structure. We present an end-to-end pipeline, named TSR-DSAW: TSR via Deep Spatial Association of Words, which outputs a digital representation of a table image in a structured format such as HTML. Given a table image as input, the proposed method begins with the detection of all the words present in the image using a text-detection network like CRAFT which is followed by the generation of word-pairs using dynamic programming. These word-pairs are highlighted in individual images and subsequently, fed into a DenseNet-121 classifier trained to capture spatial associations such as same-row, same-column, same-cell or none. Finally, we perform post-processing on the output of the word-association classifier to generate the table structure in HTML format. We evaluate our TSR-DSAW pipeline on two publicly available table-image datasets - PubTabNet and ICDAR 2013, and demonstrate improvement over previous methods suchas TableNet and DeepDeSRT.
Sparse mixture of von Mises-Fisher distribution
Florian Barbaro, Fabrice Rossi
https://doi.org/10.14428/esann/2021.ES2021-115
Florian Barbaro, Fabrice Rossi
https://doi.org/10.14428/esann/2021.ES2021-115
Abstract:
Mixtures of von Mises-Fisher distributions can be used to cluster data on the unit hypersphere. This is particularly adapted for high-dimensional directional data such as texts. We propose in this article to estimate a von Mises mixture using a $l_1$ penalized likelihood. This leads to sparse prototypes that improve both clustering quality and interpretability. We introduce an expectation-maximisation (EM) algorithm for this estimation and show the advantages of the approach on real data benchmark. We propose to explore the trade-off between the sparsity term and the likelihood one with a simple path following algorithm.
Mixtures of von Mises-Fisher distributions can be used to cluster data on the unit hypersphere. This is particularly adapted for high-dimensional directional data such as texts. We propose in this article to estimate a von Mises mixture using a $l_1$ penalized likelihood. This leads to sparse prototypes that improve both clustering quality and interpretability. We introduce an expectation-maximisation (EM) algorithm for this estimation and show the advantages of the approach on real data benchmark. We propose to explore the trade-off between the sparsity term and the likelihood one with a simple path following algorithm.
Towards Robust Auxiliary Tasks for Language Adaptation
Gil Rocha, Henrique Lopes Cardoso
https://doi.org/10.14428/esann/2021.ES2021-134
Gil Rocha, Henrique Lopes Cardoso
https://doi.org/10.14428/esann/2021.ES2021-134
Abstract:
To overcome the lack of annotated resources in less-resourced languages, unsupervised language adaptation methods have been explored. Based on multilingual word embeddings, Adversarial Training has been successfully employed in a variety of tasks and languages. With recent neural language models, empirical analysis on the task of natural language inference suggests that more challenging auxiliary tasks for Adversarial Training should be formulated to further improve language adaptation. We propose rethinking such auxiliary tasks for language adaptation.
To overcome the lack of annotated resources in less-resourced languages, unsupervised language adaptation methods have been explored. Based on multilingual word embeddings, Adversarial Training has been successfully employed in a variety of tasks and languages. With recent neural language models, empirical analysis on the task of natural language inference suggests that more challenging auxiliary tasks for Adversarial Training should be formulated to further improve language adaptation. We propose rethinking such auxiliary tasks for language adaptation.
Recurrent learning, and reinforcement learning
Continual Learning with Echo State Networks
Andrea Cossu, Davide Bacciu, Antonio Carta, Claudio Gallicchio, Vincenzo Lomonaco
https://doi.org/10.14428/esann/2021.ES2021-80
Andrea Cossu, Davide Bacciu, Antonio Carta, Claudio Gallicchio, Vincenzo Lomonaco
https://doi.org/10.14428/esann/2021.ES2021-80
Abstract:
Continual Learning (CL) refers to a learning setup where data is non stationary and the model has to learn without forgetting existing knowledge. The study of CL for sequential patterns revolves around trained recurrent networks. In this work, instead, we introduce CL in the context of Echo State Networks (ESNs), where the recurrent component is kept fixed. We provide the first evaluation of catastrophic forgetting in ESNs and we highlight the benefits in using CL strategies which are not applicable to trained recurrent models. Our results confirm the ESN as a promising model for CL and open to its use in streaming scenarios.
Continual Learning (CL) refers to a learning setup where data is non stationary and the model has to learn without forgetting existing knowledge. The study of CL for sequential patterns revolves around trained recurrent networks. In this work, instead, we introduce CL in the context of Echo State Networks (ESNs), where the recurrent component is kept fixed. We provide the first evaluation of catastrophic forgetting in ESNs and we highlight the benefits in using CL strategies which are not applicable to trained recurrent models. Our results confirm the ESN as a promising model for CL and open to its use in streaming scenarios.
RecLVQ: Recurrent Learning Vector Quantization
Jensun Ravichandran, Thomas Villmann, Marika Kaden
https://doi.org/10.14428/esann/2021.ES2021-90
Jensun Ravichandran, Thomas Villmann, Marika Kaden
https://doi.org/10.14428/esann/2021.ES2021-90
Abstract:
Learning Vector Quantizers (LVQ) and its cost-function- based variant called Generalized Learning Vector Quanitzation (GLVQ) are powerful, yet simple and interpretable classification models. Even though GLVQ is an effective tool for classifying vectorial data, it can- not handle raw sequence data of potentially different lengths. Usually, this problem is solved by manually engineering fixed-length features or by employing recurrent networks. Therefore, a natural idea is to incorporate recurrent units for data processing into the GLVQ network structure. The processed data can then be compared in a latent space for classification decisions. We demonstrate the ability of this approach on illustrative classification problems.
Learning Vector Quantizers (LVQ) and its cost-function- based variant called Generalized Learning Vector Quanitzation (GLVQ) are powerful, yet simple and interpretable classification models. Even though GLVQ is an effective tool for classifying vectorial data, it can- not handle raw sequence data of potentially different lengths. Usually, this problem is solved by manually engineering fixed-length features or by employing recurrent networks. Therefore, a natural idea is to incorporate recurrent units for data processing into the GLVQ network structure. The processed data can then be compared in a latent space for classification decisions. We demonstrate the ability of this approach on illustrative classification problems.
Improvement on Generative Adversarial Network for Targeted Drug Design
Beatriz P. Santos, Maryam Abbasi, Tiago Pereira, Bernardete Ribeiro, Joel Arrais
https://doi.org/10.14428/esann/2021.ES2021-96
Beatriz P. Santos, Maryam Abbasi, Tiago Pereira, Bernardete Ribeiro, Joel Arrais
https://doi.org/10.14428/esann/2021.ES2021-96
Abstract:
This paper provides a generative network framework that can replicate the molecular space distribution to satisfy a set of desirable features. The approach incorporates two effective machine learning techniques: an Encoder-Decoder architecture that converts the string notations of molecules into latent space and a generative adversarial network to learn the data distribution and generate new compounds. We train this joint model on a dataset that includes stereo-chemical information. The results show an improvement in the Encoder-Decoder performance, reaching 89% of correctly reconstructed molecules. The framework can generate a wide variety of compounds biased towards specific molecular properties using Transfer Learning.
This paper provides a generative network framework that can replicate the molecular space distribution to satisfy a set of desirable features. The approach incorporates two effective machine learning techniques: an Encoder-Decoder architecture that converts the string notations of molecules into latent space and a generative adversarial network to learn the data distribution and generate new compounds. We train this joint model on a dataset that includes stereo-chemical information. The results show an improvement in the Encoder-Decoder performance, reaching 89% of correctly reconstructed molecules. The framework can generate a wide variety of compounds biased towards specific molecular properties using Transfer Learning.
Reservoir Computing by Discretizing ODEs
Claudio Gallicchio
https://doi.org/10.14428/esann/2021.ES2021-101
Claudio Gallicchio
https://doi.org/10.14428/esann/2021.ES2021-101
Abstract:
We draw connections between Reservoir Computing (RC) and Ordinary Differential Equations, introducing a novel class of models called Euler State Networks (EuSNs). The proposed approach is featured by system dynamics that are both stable and non-dissipative, hence enabling an effective transmission of input signals over time. At the same time, EuSN is featured by untrained recurrent dynamics, preserving all the computational advantages of RC models. Through experiments on several benchmarks for time-series classification, we empirically show that EuSN can substantially narrow the performance gap between RC and fully trainable recurrent neural networks.
We draw connections between Reservoir Computing (RC) and Ordinary Differential Equations, introducing a novel class of models called Euler State Networks (EuSNs). The proposed approach is featured by system dynamics that are both stable and non-dissipative, hence enabling an effective transmission of input signals over time. At the same time, EuSN is featured by untrained recurrent dynamics, preserving all the computational advantages of RC models. Through experiments on several benchmarks for time-series classification, we empirically show that EuSN can substantially narrow the performance gap between RC and fully trainable recurrent neural networks.
Constraint optimization for Echo State Networks applied to satellite image forecasting
Jochen J. Steil, Yannic Lieder
https://doi.org/10.14428/esann/2021.ES2021-135
Jochen J. Steil, Yannic Lieder
https://doi.org/10.14428/esann/2021.ES2021-135
Abstract:
The paper proposes to deal with noisy, sparse or short training data sequences by adding domain knowledge to the learning process of Echo State Networks (ESNs). Known constraints like monotony in the output, periodicity or bounds on output values are encoded as inequality constraints on the output weights to be learned. Exploiting that the output of an ESN is linear in the weights, Quadratic Programming is then used to obtain optimize these. The method is applied to the prediction of pixel values from monthly, noisy satellite images of a short history of five years, thereby enabling cleaning of images from clouds or snow.
The paper proposes to deal with noisy, sparse or short training data sequences by adding domain knowledge to the learning process of Echo State Networks (ESNs). Known constraints like monotony in the output, periodicity or bounds on output values are encoded as inequality constraints on the output weights to be learned. Exploiting that the output of an ESN is linear in the weights, Quadratic Programming is then used to obtain optimize these. The method is applied to the prediction of pixel values from monthly, noisy satellite images of a short history of five years, thereby enabling cleaning of images from clouds or snow.
Deep Echo State Networks for Functional Ambulation Categories Estimation
Luca Pedrelli, Marco Tramontano, Giuseppe Vannozzi, Andrea Mannini
https://doi.org/10.14428/esann/2021.ES2021-149
Luca Pedrelli, Marco Tramontano, Giuseppe Vannozzi, Andrea Mannini
https://doi.org/10.14428/esann/2021.ES2021-149
Abstract:
In this work, we introduce a novel application for the automatic estimation of Functional Ambulatory Category (FAC) based on deep Echo State Networks (ESNs). FAC is a clinical scale for assessing the gait ability used in post-stroke rehabilitation and, in general, for disease monitoring. In this application, the estimation is performed automatically by analyzing signals gathered from wearable sensors (located on both tibiae, pelvis, trunk and head) during the execution of a walking test. This is performed by analysing the whole time series through the DeepESN model without preprocessing. The experimental results show that the use of a deep recurrent neural network allows the model to exploit the richness contained in the whole raw temporal signal improving the performance w.r.t. the shallow recurrent model. Overall, our approach obtained 0.37 of mean absolute error with a maximum error of 0.78 resulting very accurate in the classification of the gait ability through the estimation of the FAC value. Considering the experimental results obtained, the proposed approach represents a good baseline for medical applications based on the automatic estimation of the FAC scale.
In this work, we introduce a novel application for the automatic estimation of Functional Ambulatory Category (FAC) based on deep Echo State Networks (ESNs). FAC is a clinical scale for assessing the gait ability used in post-stroke rehabilitation and, in general, for disease monitoring. In this application, the estimation is performed automatically by analyzing signals gathered from wearable sensors (located on both tibiae, pelvis, trunk and head) during the execution of a walking test. This is performed by analysing the whole time series through the DeepESN model without preprocessing. The experimental results show that the use of a deep recurrent neural network allows the model to exploit the richness contained in the whole raw temporal signal improving the performance w.r.t. the shallow recurrent model. Overall, our approach obtained 0.37 of mean absolute error with a maximum error of 0.78 resulting very accurate in the classification of the gait ability through the estimation of the FAC value. Considering the experimental results obtained, the proposed approach represents a good baseline for medical applications based on the automatic estimation of the FAC scale.
An Algorithmic Approach to Establish a Lower Bound for the Size of Semiring Neural Networks
Martin Böhm, Thomas Schmid
https://doi.org/10.14428/esann/2021.ES2021-33
Martin Böhm, Thomas Schmid
https://doi.org/10.14428/esann/2021.ES2021-33
Abstract:
Semiring neural networks have been introduced as a recurrent neural network-type representation of weighted automata with the potential to learn a recognizable series. Whether a given semiring neural network actually can or cannot compute a recognizable series, however, depends on the size of the network. Therefore, it is desirable to determine whether a proposed size is too small before initiation of the training procedure. Here, we present an algorithm that achieves this in polynomial time. As there is a one-to-one correspondence between semiring neural networks and weighted automata, our algorithm can also be used to derive lower bounds for the size of a recognizing automaton. Our algorithm complements previous work in this area as it works over commutative zero-sum-free semirings.
Semiring neural networks have been introduced as a recurrent neural network-type representation of weighted automata with the potential to learn a recognizable series. Whether a given semiring neural network actually can or cannot compute a recognizable series, however, depends on the size of the network. Therefore, it is desirable to determine whether a proposed size is too small before initiation of the training procedure. Here, we present an algorithm that achieves this in polynomial time. As there is a one-to-one correspondence between semiring neural networks and weighted automata, our algorithm can also be used to derive lower bounds for the size of a recognizing automaton. Our algorithm complements previous work in this area as it works over commutative zero-sum-free semirings.
Echo-state neural networks forecasting steelworks off-gases for their dispatching in CH4 and CH3OH syntheses reactors
Ismael Matino, Stefano Dettori, Valentina Colla, Katharina Rechberger, Nina Kieberger
https://doi.org/10.14428/esann/2021.ES2021-41
Ismael Matino, Stefano Dettori, Valentina Colla, Katharina Rechberger, Nina Kieberger
https://doi.org/10.14428/esann/2021.ES2021-41
Abstract:
In the era of European Green Deal, steelworks are committed to reduce their CO2 emissions by preserving their competitiveness. One of the options to achieve such aim is the valorization of process off gases. Methane and Methanol production can be obtained by coupling novel reactors with an advanced control system that dispatches these gases after enrichment with green hydrogen. Knowing in advance the gases availability and composition is fundamental. The paper present Echo State Networks based-models that are applied to this aim and achieve adequate forecasting accuracy also in case of highly dynamic processes.
In the era of European Green Deal, steelworks are committed to reduce their CO2 emissions by preserving their competitiveness. One of the options to achieve such aim is the valorization of process off gases. Methane and Methanol production can be obtained by coupling novel reactors with an advanced control system that dispatches these gases after enrichment with green hydrogen. Knowing in advance the gases availability and composition is fundamental. The paper present Echo State Networks based-models that are applied to this aim and achieve adequate forecasting accuracy also in case of highly dynamic processes.
Deep Learning Model for Context-Dependent Survival Analysis
Raphaël Langhendries, Jérôme Lacaille
https://doi.org/10.14428/esann/2021.ES2021-49
Raphaël Langhendries, Jérôme Lacaille
https://doi.org/10.14428/esann/2021.ES2021-49
Abstract:
In this article, we introduce a deep learning model (denoted thereafter \emph{DCM: Deep Contextual Model}) for survival analysis able of predicting the probability that a subject meets an \emph{event of interest} according to its past life. The subject and the \emph{event of interest} can be diverse depending on the field of application, thus the model can be applied in various contexts. We present an application in the aerospace field that consists in forecasting hot corrosion in turbofan.
In this article, we introduce a deep learning model (denoted thereafter \emph{DCM: Deep Contextual Model}) for survival analysis able of predicting the probability that a subject meets an \emph{event of interest} according to its past life. The subject and the \emph{event of interest} can be diverse depending on the field of application, thus the model can be applied in various contexts. We present an application in the aerospace field that consists in forecasting hot corrosion in turbofan.
Behavior Constraining in Weight Space for Offline Reinforcement Learning
Phillip Swazinna, Steffen Udluft, Daniel Hein, Thomas Runkler
https://doi.org/10.14428/esann/2021.ES2021-83
Phillip Swazinna, Steffen Udluft, Daniel Hein, Thomas Runkler
https://doi.org/10.14428/esann/2021.ES2021-83
Abstract:
In offline reinforcement learning, a policy needs to be learned from a single pre-collected dataset. Typically, policies are thus regularized during training to behave similarly to the data generating policy, by adding a penalty based on a divergence between action distributions of generating and trained policy. We propose a new algorithm, which constrains the policy directly in its weight space instead, and demonstrate its effectiveness in experiments.
In offline reinforcement learning, a policy needs to be learned from a single pre-collected dataset. Typically, policies are thus regularized during training to behave similarly to the data generating policy, by adding a penalty based on a divergence between action distributions of generating and trained policy. We propose a new algorithm, which constrains the policy directly in its weight space instead, and demonstrate its effectiveness in experiments.
Multiobjective Reinforcement Learning in Optimized Drug Design
Maryam Abbasi, Tiago Pereira, Beatriz P. Santos, Bernardete Ribeiro, Joel Arrais
https://doi.org/10.14428/esann/2021.ES2021-87
Maryam Abbasi, Tiago Pereira, Beatriz P. Santos, Bernardete Ribeiro, Joel Arrais
https://doi.org/10.14428/esann/2021.ES2021-87
Abstract:
Machine learning has been increasingly applied with successes in generating synthetically reasonable molecules. However, a complete system capable of both producing valid molecules and optimizing multiple traits has remained elusive. This paper employs multiobjective reinforcement learning to draw a framework to design compounds. Different multiobjective techniques have been evaluated, such as weighted sum and Chebyshev. The results show that the implemented model can be effectively optimized towards different and competing molecular properties. Nonetheless, the model implemented with the weighted sum scalarization technique with a weight of 0.55 for biological affinity is the one with the most appropriate trade-off for the different evaluated properties.
Machine learning has been increasingly applied with successes in generating synthetically reasonable molecules. However, a complete system capable of both producing valid molecules and optimizing multiple traits has remained elusive. This paper employs multiobjective reinforcement learning to draw a framework to design compounds. Different multiobjective techniques have been evaluated, such as weighted sum and Chebyshev. The results show that the implemented model can be effectively optimized towards different and competing molecular properties. Nonetheless, the model implemented with the weighted sum scalarization technique with a weight of 0.55 for biological affinity is the one with the most appropriate trade-off for the different evaluated properties.
Density Independent Self-organized Support for Q-Value Function Interpolation in Reinforcement Learning
Antonin Calba, Alain Dutech, Jeremy Fix
https://doi.org/10.14428/esann/2021.ES2021-62
Antonin Calba, Alain Dutech, Jeremy Fix
https://doi.org/10.14428/esann/2021.ES2021-62
Abstract:
In this paper, we propose a contribution in the field of Reinforcement Learning (RL) with continuous state space. Our work is along the line of previous works involving a vector quantization algorithm for learning the state space representation on top of which a function approximation takes place. In particular, our contribution compares the performances of the Kohonen SOM and the Rougier DSOM with the Göppert function approximation scheme on both the mountain car problem. We give a particular focus to DSOM as it is less sensitive to the density of inputs and opens interesting perspectives in RL.
In this paper, we propose a contribution in the field of Reinforcement Learning (RL) with continuous state space. Our work is along the line of previous works involving a vector quantization algorithm for learning the state space representation on top of which a function approximation takes place. In particular, our contribution compares the performances of the Kohonen SOM and the Rougier DSOM with the Göppert function approximation scheme on both the mountain car problem. We give a particular focus to DSOM as it is less sensitive to the density of inputs and opens interesting perspectives in RL.
Complex Data: Learning Trustworthily, Automatically, and with Guarantees
Complex Data: Learning Trustworthily, Automatically, and with Guarantees
Luca Oneto, Nicolò Navarin, Battista Biggio, Federico Errica, Alessio Micheli, Franco Scarselli, Monica Bianchini, Alessandro Sperduti
https://doi.org/10.14428/esann/2021.ES2021-6
Luca Oneto, Nicolò Navarin, Battista Biggio, Federico Errica, Alessio Micheli, Franco Scarselli, Monica Bianchini, Alessandro Sperduti
https://doi.org/10.14428/esann/2021.ES2021-6
Abstract:
Machine Learning (ML) achievements enabled automatic extraction of actionable information from data in a wide range of decision-making scenarios. This demands for improving both ML technical aspects (e.g., design and automation) and human-related metrics (e.g., fairness, robustness, privacy, and explainability), with performance guarantees at both levels. The aforementioned scenario posed three main challenges: (i) Learning from Complex Data (i.e., sequence, tree, and graph data), (ii) Learning Trustworthily, and (iii) Learning Automatically with Guarantees. The focus of this special session is on addressing one or more of these challenges with the final goal of Learning Trustworthily, Automatically, and with Guarantees from Complex Data.
Machine Learning (ML) achievements enabled automatic extraction of actionable information from data in a wide range of decision-making scenarios. This demands for improving both ML technical aspects (e.g., design and automation) and human-related metrics (e.g., fairness, robustness, privacy, and explainability), with performance guarantees at both levels. The aforementioned scenario posed three main challenges: (i) Learning from Complex Data (i.e., sequence, tree, and graph data), (ii) Learning Trustworthily, and (iii) Learning Automatically with Guarantees. The focus of this special session is on addressing one or more of these challenges with the final goal of Learning Trustworthily, Automatically, and with Guarantees from Complex Data.
The Benefits of Adversarial Defence in Generalisation
Luca Oneto, Sandro Ridella, Davide Anguita
https://doi.org/10.14428/esann/2021.ES2021-28
Luca Oneto, Sandro Ridella, Davide Anguita
https://doi.org/10.14428/esann/2021.ES2021-28
Abstract:
Recent researches have been shown that models induced by machine learning, in particular by deep learning, can be easily fooled by an adversary who carefully crafts imperceptible, at least from the human perspective, or physically plausible modifications of the input data. This discovery gave birth to a new field of research, the adversarial machine learning, where new methods of attacks and defence are developed continuously, mimicking what is happening from long time in cybersecurity. In this paper we will show that the drawbacks of inducing models from data less prone to be misled actually provides some benefits when it comes to assess their generalisation abilities.
Recent researches have been shown that models induced by machine learning, in particular by deep learning, can be easily fooled by an adversary who carefully crafts imperceptible, at least from the human perspective, or physically plausible modifications of the input data. This discovery gave birth to a new field of research, the adversarial machine learning, where new methods of attacks and defence are developed continuously, mimicking what is happening from long time in cybersecurity. In this paper we will show that the drawbacks of inducing models from data less prone to be misled actually provides some benefits when it comes to assess their generalisation abilities.
Slope: A First-order Approach for Measuring Gradient Obfuscation
Maura Pintor, Luca Demetrio, Giovanni Manca, Battista Biggio, Fabio Roli
https://doi.org/10.14428/esann/2021.ES2021-99
Maura Pintor, Luca Demetrio, Giovanni Manca, Battista Biggio, Fabio Roli
https://doi.org/10.14428/esann/2021.ES2021-99
Abstract:
Evaluating adversarial robustness is a challenging problem. Many defenses have been shown to provide a false sense of security by unintentionally obfuscating gradients, hindering the optimization process of gradient-based attacks. Such defenses have been subsequently shown to fail against adaptive attacks crafted to circumvent gradient obfuscation. In this work, we present Slope, a metric that detects obfuscated gradients by comparing the expected and the actual increase of the attack loss after one iteration. We show that our metric can detect the presence of obfuscated gradients in many documented cases, providing a useful debugging tool towards improving adversarial robustness evaluations.
Evaluating adversarial robustness is a challenging problem. Many defenses have been shown to provide a false sense of security by unintentionally obfuscating gradients, hindering the optimization process of gradient-based attacks. Such defenses have been subsequently shown to fail against adaptive attacks crafted to circumvent gradient obfuscation. In this work, we present Slope, a metric that detects obfuscated gradients by comparing the expected and the actual increase of the attack loss after one iteration. We show that our metric can detect the presence of obfuscated gradients in many documented cases, providing a useful debugging tool towards improving adversarial robustness evaluations.
Robust Malware Classification via Deep Graph Networks on Call Graph Topologies
Federico Errica, Giacomo Iadarola, Fabio Martinelli, Francesco Mercaldo, Alessio Micheli
https://doi.org/10.14428/esann/2021.ES2021-82
Federico Errica, Giacomo Iadarola, Fabio Martinelli, Francesco Mercaldo, Alessio Micheli
https://doi.org/10.14428/esann/2021.ES2021-82
Abstract:
We propose a malware classification system that is shown to be robust to some common intra-procedural obfuscation techniques. Indeed, by training the Contextual Graph Markov Model on the call graph representation of a program, we classify it using only topological information, which is unaffected by such obfuscations. In particular, we show that the structure of the call graph is sufficient to achieve good accuracy on a multi-class classification benchmark.
We propose a malware classification system that is shown to be robust to some common intra-procedural obfuscation techniques. Indeed, by training the Contextual Graph Markov Model on the call graph representation of a program, we classify it using only topological information, which is unaffected by such obfuscations. In particular, we show that the structure of the call graph is sufficient to achieve good accuracy on a multi-class classification benchmark.
Boundary-Based Fairness Constraints in Decision Trees and Random Forests
Géraldin Nanfack, Valentin Delchevalerie, Benoit Frénay
https://doi.org/10.14428/esann/2021.ES2021-69
Géraldin Nanfack, Valentin Delchevalerie, Benoit Frénay
https://doi.org/10.14428/esann/2021.ES2021-69
Abstract:
Decision Trees (DTs) and Random Forests (RFs) are popular models in Machine Learning (ML) thanks to their interpretability and efficiency to solve real-world problems. However, DTs may sometimes learn rules that treat different groups of people unfairly, by paying attention to sensitive features like for example gender, age, income, language, etc. Even if several solutions have been proposed to reduce the unfairness for different ML algorithms, few of them apply to DTs. This work aims to transpose a successful method proposed by Zafar et al. to reduce the unfairness in boundary based ML models to DTs.
Decision Trees (DTs) and Random Forests (RFs) are popular models in Machine Learning (ML) thanks to their interpretability and efficiency to solve real-world problems. However, DTs may sometimes learn rules that treat different groups of people unfairly, by paying attention to sensitive features like for example gender, age, income, language, etc. Even if several solutions have been proposed to reduce the unfairness for different ML algorithms, few of them apply to DTs. This work aims to transpose a successful method proposed by Zafar et al. to reduce the unfairness in boundary based ML models to DTs.
Model selection
NNBMSS: a Novel and Fast Method for Model Structure Selection
Amaury Lendasse, Kallin Khan, Edward Ratner
https://doi.org/10.14428/esann/2021.ES2021-9
Amaury Lendasse, Kallin Khan, Edward Ratner
https://doi.org/10.14428/esann/2021.ES2021-9
Abstract:
In this paper, we present a new method to perform model structure selection. This proposed method can be used to select the complexity of any continuous regression method. We also present an asymptotic mathematical proof of the proposed method and the new method is illustrated on a benchmark. Compared to the well-known 10-fold Cross-Validation, the computational time associated to our new method is approximately divided by a factor 8 as illustrated on the benchmark.
In this paper, we present a new method to perform model structure selection. This proposed method can be used to select the complexity of any continuous regression method. We also present an asymptotic mathematical proof of the proposed method and the new method is illustrated on a benchmark. Compared to the well-known 10-fold Cross-Validation, the computational time associated to our new method is approximately divided by a factor 8 as illustrated on the benchmark.
Pruning Neural Networks with Supermasks
Vincent Rolfs, Matthias Kerzel, Stefan Wermter
https://doi.org/10.14428/esann/2021.ES2021-126
Vincent Rolfs, Matthias Kerzel, Stefan Wermter
https://doi.org/10.14428/esann/2021.ES2021-126
Abstract:
The Lottery Ticket hypothesis by Frankle and Carbin states that a randomly initialized dense network contains a smaller subnetwork that, when trained in isolation, will match the performance of the original network. However, identifying this pruned subnetwork usually requires repeated training to determine optimal pruning thresholds. We present a novel approach to accelerate the pruning: By methodically evaluating different Supermasks, the threshold for selecting neurons as part of a pruned Lottery Ticket network can be determined without additional training. We evaluate the method on the MNIST dataset and achieve a size reduction of over 60\% without a drop in performance.
The Lottery Ticket hypothesis by Frankle and Carbin states that a randomly initialized dense network contains a smaller subnetwork that, when trained in isolation, will match the performance of the original network. However, identifying this pruned subnetwork usually requires repeated training to determine optimal pruning thresholds. We present a novel approach to accelerate the pruning: By methodically evaluating different Supermasks, the threshold for selecting neurons as part of a pruned Lottery Ticket network can be determined without additional training. We evaluate the method on the MNIST dataset and achieve a size reduction of over 60\% without a drop in performance.
Compact Neural Architecture Search for Local Climate Zones Classification
Rene Traore, Andrés Camero, Xiaoxiang Zhu
https://doi.org/10.14428/esann/2021.ES2021-55
Rene Traore, Andrés Camero, Xiaoxiang Zhu
https://doi.org/10.14428/esann/2021.ES2021-55
Abstract:
State-of-the-art Computer Vision models achieve impressive performance but with an increasing complexity. Great advances have been made towards automatic model design, but accounting for model performance and low complexity is still an open challenge. In this study, we propose a neural architecture search strategy for high performance low complexity classification models, that combines an efficient search algorithm with mechanisms for reducing complexity. We tested our proposal on a real World remote sensing problem, the Local Climate Zone classification. The results show that our proposal achieves state-of-the-art performance, while being at least 91.8 more compact in terms of size and FLOPs.
State-of-the-art Computer Vision models achieve impressive performance but with an increasing complexity. Great advances have been made towards automatic model design, but accounting for model performance and low complexity is still an open challenge. In this study, we propose a neural architecture search strategy for high performance low complexity classification models, that combines an efficient search algorithm with mechanisms for reducing complexity. We tested our proposal on a real World remote sensing problem, the Local Climate Zone classification. The results show that our proposal achieves state-of-the-art performance, while being at least 91.8 more compact in terms of size and FLOPs.
Unsupervised learning
Anomalous Cluster Detection in Large Networks with Diffusion-Percolation Testing
Corentin Larroche, Johan Mazel, Stéphan Clémençon
https://doi.org/10.14428/esann/2021.ES2021-32
Corentin Larroche, Johan Mazel, Stéphan Clémençon
https://doi.org/10.14428/esann/2021.ES2021-32
Abstract:
We propose a computationally efficient procedure for elevated mean detection on a connected subgraph of a network with node-related scalar observations. Our approach relies on two intuitions: first, a significant concentration of high observations in a connected subgraph implies that the subgraph induced by the nodes associated with the highest observations has a large connected component. Secondly, a greater detection power can be obtained in certain cases by denoising the observations using the network structure. Numerical experiments show that our procedure’s detection performance and computational efficiency are both competitive.
We propose a computationally efficient procedure for elevated mean detection on a connected subgraph of a network with node-related scalar observations. Our approach relies on two intuitions: first, a significant concentration of high observations in a connected subgraph implies that the subgraph induced by the nodes associated with the highest observations has a large connected component. Secondly, a greater detection power can be obtained in certain cases by denoising the observations using the network structure. Numerical experiments show that our procedure’s detection performance and computational efficiency are both competitive.
Calliope - A Polyphonic Music Transformer
Andrea Valenti, Stefano Berti, Davide Bacciu
https://doi.org/10.14428/esann/2021.ES2021-63
Andrea Valenti, Stefano Berti, Davide Bacciu
https://doi.org/10.14428/esann/2021.ES2021-63
Abstract:
The polyphonic nature of music makes the application of deep learning to music modelling a challenging task. On the other hand, the Transformer architecture seems to be a good fit for this kind of data. In this work, we present Calliope, a novel autoencoder model based on Transformers for the efficient modelling of multi-track sequences of polyphonic music. The experiments show that our model is able to improve the state of the art on musical sequence reconstruction and generation, with remarkably good results especially on long sequences.
The polyphonic nature of music makes the application of deep learning to music modelling a challenging task. On the other hand, the Transformer architecture seems to be a good fit for this kind of data. In this work, we present Calliope, a novel autoencoder model based on Transformers for the efficient modelling of multi-track sequences of polyphonic music. The experiments show that our model is able to improve the state of the art on musical sequence reconstruction and generation, with remarkably good results especially on long sequences.
Dynamic clustering and modeling of temporal data subject to common regressive effects
Louise Bonfils, Allou Same, Latifa Oukhellou
https://doi.org/10.14428/esann/2021.ES2021-121
Louise Bonfils, Allou Same, Latifa Oukhellou
https://doi.org/10.14428/esann/2021.ES2021-121
Abstract:
Clustering is used in many applicative fields to summarize information into a small number of groups. Motivated by behavioral extraction issues from urban data, the interest of this paper is to propose a classification method that allows modeling the evolution of cluster profiles over time while considering common regressive effects. The parameters of the proposed model are estimated using variational approximation because maximum likelihood estimation is not suitable in this case. The ability of the model to estimate parameters is evaluated using various simulated data and compared with two other models.
Clustering is used in many applicative fields to summarize information into a small number of groups. Motivated by behavioral extraction issues from urban data, the interest of this paper is to propose a classification method that allows modeling the evolution of cluster profiles over time while considering common regressive effects. The parameters of the proposed model are estimated using variational approximation because maximum likelihood estimation is not suitable in this case. The ability of the model to estimate parameters is evaluated using various simulated data and compared with two other models.
Stochastic quartet approach for fast multidimensional scaling
Pierre Lambert, Cyril de Bodt, Michel Verleysen, Lee John
https://doi.org/10.14428/esann/2021.ES2021-59
Pierre Lambert, Cyril de Bodt, Michel Verleysen, Lee John
https://doi.org/10.14428/esann/2021.ES2021-59
Abstract:
Multidimensional scaling is a statistical process that aims to embed high-dimensional data into a lower-dimensional, more manageable space. Common MDS algorithms tend to have some limitations when facing large data sets due to their high time and spatial complexities. This paper attempts to tackle the problem by using a stochastic approach to MDS which uses gradient descent to optimise a loss function defined on randomly designated quartets of points. This method mitigates the quadratic memory usage by computing distances on the fly, and has iterations in O(N) time complexity, with N samples. Experiments show that the proposed method provides competitive results in reasonable time. Public codes are available at https://github.com/PierreLambert3/SQuaD-MDS.git.
Multidimensional scaling is a statistical process that aims to embed high-dimensional data into a lower-dimensional, more manageable space. Common MDS algorithms tend to have some limitations when facing large data sets due to their high time and spatial complexities. This paper attempts to tackle the problem by using a stochastic approach to MDS which uses gradient descent to optimise a loss function defined on randomly designated quartets of points. This method mitigates the quadratic memory usage by computing distances on the fly, and has iterations in O(N) time complexity, with N samples. Experiments show that the proposed method provides competitive results in reasonable time. Public codes are available at https://github.com/PierreLambert3/SQuaD-MDS.git.
Federated Learning approach for SpectralClustering
Elena Hernández-Pereira, Oscar Fontenla-Romero, Bertha Guijarro-Berdiñas, Beatriz Pérez Sánchez
https://doi.org/10.14428/esann/2021.ES2021-95
Elena Hernández-Pereira, Oscar Fontenla-Romero, Bertha Guijarro-Berdiñas, Beatriz Pérez Sánchez
https://doi.org/10.14428/esann/2021.ES2021-95
Abstract:
Spectral clustering is a clustering paradigm that has been shown to be more effective in finding clusters with non-convex shapes than some traditional algorithms such as k-means. However, this algorithm is not directly applicable when the data is naturally distributed in different locations, as it happens in many Internet of Things scenarios. In this work, we propose a distributed spectral clustering to create a cooperative federated model to deal with those cases in which the data is distributed in different sites and with data privacy concerns. We demonstrate that sharing a minimal amount of information allows this distributed version of the spectral clustering to achieve good behavior for clustering several synthetic data sets.
Spectral clustering is a clustering paradigm that has been shown to be more effective in finding clusters with non-convex shapes than some traditional algorithms such as k-means. However, this algorithm is not directly applicable when the data is naturally distributed in different locations, as it happens in many Internet of Things scenarios. In this work, we propose a distributed spectral clustering to create a cooperative federated model to deal with those cases in which the data is distributed in different sites and with data privacy concerns. We demonstrate that sharing a minimal amount of information allows this distributed version of the spectral clustering to achieve good behavior for clustering several synthetic data sets.
Validating static call graph-based malware signatures using community detection methods
Attila Mester, Zalán Bodó
https://doi.org/10.14428/esann/2021.ES2021-27
Attila Mester, Zalán Bodó
https://doi.org/10.14428/esann/2021.ES2021-27
Abstract:
Due to the increasing number of new malware appearing daily, it is impossible to manually inspect each sample. By applying data mining techniques to analyze the program code, we can help manual processing. In this paper we propose a method to extract signatures from the executable binary of a malware, in order to query the local neighborhood in real time. The method is validated by applying community detection algorithms on the common fingerprint-based malware graph to identify families, and assessing these with evaluation metrics used in the field (e.g. modularity, family majority, etc.). The signatures are obtained via static code analysis, using function call n-grams and applying locality-sensitive hashing techniques to enable the match between functions with highly similar instruction lists.
Due to the increasing number of new malware appearing daily, it is impossible to manually inspect each sample. By applying data mining techniques to analyze the program code, we can help manual processing. In this paper we propose a method to extract signatures from the executable binary of a malware, in order to query the local neighborhood in real time. The method is validated by applying community detection algorithms on the common fingerprint-based malware graph to identify families, and assessing these with evaluation metrics used in the field (e.g. modularity, family majority, etc.). The signatures are obtained via static code analysis, using function call n-grams and applying locality-sensitive hashing techniques to enable the match between functions with highly similar instruction lists.
Impact of data subsamplings in Fast Multi-Scale Neighbor Embedding.
Pierre Lambert, Lee John, Michel Verleysen, Cyril de Bodt
https://doi.org/10.14428/esann/2021.ES2021-60
Pierre Lambert, Lee John, Michel Verleysen, Cyril de Bodt
https://doi.org/10.14428/esann/2021.ES2021-60
Abstract:
Fast multi-scale neighbor embedding (f-ms-NE) is an algorithm that maps high-dimensional data to a low-dimensional space by preserving the multi-scale data neighborhoods. To lower its time complexity, f-ms-NE uses random subsamplings to estimate the data properties at multiple scales. To improve this estimation and study the f-ms-NE sensitivity to randomness, this paper generalizes the f-ms-NE cost function by averaging several subsamplings. Experiments reveal that this can slightly improve DR quality while maintaining reasonable computation times. Codes are available at https://github.com/cdebodt/Fast_Multi-scale_NE.
Fast multi-scale neighbor embedding (f-ms-NE) is an algorithm that maps high-dimensional data to a low-dimensional space by preserving the multi-scale data neighborhoods. To lower its time complexity, f-ms-NE uses random subsamplings to estimate the data properties at multiple scales. To improve this estimation and study the f-ms-NE sensitivity to randomness, this paper generalizes the f-ms-NE cost function by averaging several subsamplings. Experiments reveal that this can slightly improve DR quality while maintaining reasonable computation times. Codes are available at https://github.com/cdebodt/Fast_Multi-scale_NE.
Semi-supervised learning with Bayesian Confidence Propagation Neural Network
Naresh Balaji Ravichandran, Anders Lansner, Pawel Herman
https://doi.org/10.14428/esann/2021.ES2021-156
Naresh Balaji Ravichandran, Anders Lansner, Pawel Herman
https://doi.org/10.14428/esann/2021.ES2021-156
Abstract:
Learning internal representations from data using no or few labels is useful for machine learning research, as it allows using massive amounts of unlabeled data. In this work, we use the Bayesian Confidence Propagation Neural Network (BCPNN) model developed as a biologically plausible model of the cortex. Recent work has demonstrated that these networks can learn useful internal representations from data using local Bayesian-Hebbian learning rules. In this work, we show how such representations can be leveraged in a semi-supervised setting by introducing and comparing different classifiers. We also evaluate and compare such networks with other popular semi-supervised classifiers.
Learning internal representations from data using no or few labels is useful for machine learning research, as it allows using massive amounts of unlabeled data. In this work, we use the Bayesian Confidence Propagation Neural Network (BCPNN) model developed as a biologically plausible model of the cortex. Recent work has demonstrated that these networks can learn useful internal representations from data using local Bayesian-Hebbian learning rules. In this work, we show how such representations can be leveraged in a semi-supervised setting by introducing and comparing different classifiers. We also evaluate and compare such networks with other popular semi-supervised classifiers.
Combining Attack Success Rate and DetectionRate for effective Universal Adversarial Attacks
Valentina Poggioni, Alina Elena Baia, Alfredo Milani
https://doi.org/10.14428/esann/2021.ES2021-160
Valentina Poggioni, Alina Elena Baia, Alfredo Milani
https://doi.org/10.14428/esann/2021.ES2021-160
Abstract:
In the framework of Adversarial Machine Learning, several detection and protection techniques are used to characterize specific attack-defense scenarios. In this paper, we present universal, unrestricted black-box adversarial attacks based on a multi-objective nested evolutionary algorithm able to incorporate the detection rate and a measure of image quality into the attack building phase.
In the framework of Adversarial Machine Learning, several detection and protection techniques are used to characterize specific attack-defense scenarios. In this paper, we present universal, unrestricted black-box adversarial attacks based on a multi-objective nested evolutionary algorithm able to incorporate the detection rate and a measure of image quality into the attack building phase.
Machine learning and data mining for urban mobility intelligence
Machine learning and data mining for urban mobility intelligence
Etienne Come, Latifa Oukhellou, Allou Same, Lijun Sun
https://doi.org/10.14428/esann/2021.ES2021-7
Etienne Come, Latifa Oukhellou, Allou Same, Lijun Sun
https://doi.org/10.14428/esann/2021.ES2021-7
Abstract:
The last few decades have seen a faster development of digital systems for observing the mobility of people and goods. Various sensing systems - such as radio communication, Wi-Fi, Bluetooth, validation of smart cards, mobile phone, and road traffic monitoring systems - have enabled researchers and practitioners to acquire large amounts of data, which generally refer to individual and collective trajectories. The mobility data can be further enriched with side information, such as text corpora from social media, survey data, and weather information. These massive data, temporally and spatially structured, can benefit from advanced machine learning and data mining methods, providing decision aid tools, and contributing to the development of safer, cleaner, and more efficient transportation systems. They can also help to implement new mobility services for the user. This article provides an overview of methodological advances in temporal and spatial mobility data processing.
The last few decades have seen a faster development of digital systems for observing the mobility of people and goods. Various sensing systems - such as radio communication, Wi-Fi, Bluetooth, validation of smart cards, mobile phone, and road traffic monitoring systems - have enabled researchers and practitioners to acquire large amounts of data, which generally refer to individual and collective trajectories. The mobility data can be further enriched with side information, such as text corpora from social media, survey data, and weather information. These massive data, temporally and spatially structured, can benefit from advanced machine learning and data mining methods, providing decision aid tools, and contributing to the development of safer, cleaner, and more efficient transportation systems. They can also help to implement new mobility services for the user. This article provides an overview of methodological advances in temporal and spatial mobility data processing.
Multivariate Time Series Multi-Coclustering. Application to Advanced Driving Assistance System Validation
Etienne Goffinet, Mustapha Lebbah, Hanane Azzag, Loïc Giraldi, Anthony Coutant
https://doi.org/10.14428/esann/2021.ES2021-119
Etienne Goffinet, Mustapha Lebbah, Hanane Azzag, Loïc Giraldi, Anthony Coutant
https://doi.org/10.14428/esann/2021.ES2021-119
Abstract:
Driver assistance systems development remains a technical challenge for car manufacturers. Validating these systems requires to assess the assistance systems performances in a considerable number of driving contexts. Groupe Renault uses massive simulation for this task, which allows reproducing the complexity of physical driving conditions precisely and produces large volumes of multivariate time series. We present the operational constraints and scientific challenges related to these datasets and our proposal of an adapted model-based multiple coclustering approach, which creates several independent partitions by grouping redundant variables. This method natively performs model selection, missing values inference, noisy samples handling, confidence interval production, while keeping a sparse parameter numbers. The proposed model is evaluated on a synthetic dataset, and applied to a driver assistance system validation use-case.
Driver assistance systems development remains a technical challenge for car manufacturers. Validating these systems requires to assess the assistance systems performances in a considerable number of driving contexts. Groupe Renault uses massive simulation for this task, which allows reproducing the complexity of physical driving conditions precisely and produces large volumes of multivariate time series. We present the operational constraints and scientific challenges related to these datasets and our proposal of an adapted model-based multiple coclustering approach, which creates several independent partitions by grouping redundant variables. This method natively performs model selection, missing values inference, noisy samples handling, confidence interval production, while keeping a sparse parameter numbers. The proposed model is evaluated on a synthetic dataset, and applied to a driver assistance system validation use-case.
Unsupervised Real-time Anomaly Detection for Multivariate Mobile Phone Traffic Series
Evelyne Akopyan, Angelo Furno, Nour-Eddin El Faouzi, Eric Gaume
https://doi.org/10.14428/esann/2021.ES2021-113
Evelyne Akopyan, Angelo Furno, Nour-Eddin El Faouzi, Eric Gaume
https://doi.org/10.14428/esann/2021.ES2021-113
Abstract:
Real-time anomaly detection in urban areas from massive data is a recent research field with challenging requirements. This paper presents a lightweight and robust framework for real-time anomaly detection in multivariate time-series extracted from large-scale Mobile-phone Network Data (MND). Our solution relies on unsupervised machine learning applied to MND collected at individual antennas of a nation-wide French mobile phone network operator. The proposed framework is based on a two-step approach: (i) the offline stage aims at assessing the typical behaviour of the antennas; (ii) the online stage performs real-time comparison of incoming data with respect to the detected typical behaviour. Results related to a real case-study of terrorist attack in the city of Lyon show that our framework can successfully detect an emergency event almost instantaneously and locate the anomalous area with high precision.
Real-time anomaly detection in urban areas from massive data is a recent research field with challenging requirements. This paper presents a lightweight and robust framework for real-time anomaly detection in multivariate time-series extracted from large-scale Mobile-phone Network Data (MND). Our solution relies on unsupervised machine learning applied to MND collected at individual antennas of a nation-wide French mobile phone network operator. The proposed framework is based on a two-step approach: (i) the offline stage aims at assessing the typical behaviour of the antennas; (ii) the online stage performs real-time comparison of incoming data with respect to the detected typical behaviour. Results related to a real case-study of terrorist attack in the city of Lyon show that our framework can successfully detect an emergency event almost instantaneously and locate the anomalous area with high precision.
In-Station Train Movements Prediction: from Shallow to Deep Multi Scale Models
Gianluca Boleto, Luca Oneto, Matteo Cardellini, Marco Maratea, Mauro Vallati, Renzo Canepa, Davide Anguita
https://doi.org/10.14428/esann/2021.ES2021-29
Gianluca Boleto, Luca Oneto, Matteo Cardellini, Marco Maratea, Mauro Vallati, Renzo Canepa, Davide Anguita
https://doi.org/10.14428/esann/2021.ES2021-29
Abstract:
Public railway transport systems play a crucial role in servicing the global society and are the transport backbone of a sustainable economy. While a lot of effort has been spent in predicting inter-station trains movements to support stakeholders (i.e., infrastructure managers, train operators, and travellers) decisions, the problem of predicting in-station movements, while being crucial to improve train dispatching (i.e., empowering human or automatic dispatchers), has been far more less investigated. In fact, stations are the most critical points in a railway networks: even small improvements in the estimation of the duration of trains movements can strongly improve the dispatching efficiency in coping with the increase in capacity demand and with delays. In this work we will first leverage on state of the art shallow models, fed by domain experts with domain specific features, to improve the current predictive systems. Then, we will leverage on a custom deep multi scale model able to automatically learn a representation and improve the accuracy of the shallow models. Results on real-world data coming from the Italian railway network will support our proposal.
Public railway transport systems play a crucial role in servicing the global society and are the transport backbone of a sustainable economy. While a lot of effort has been spent in predicting inter-station trains movements to support stakeholders (i.e., infrastructure managers, train operators, and travellers) decisions, the problem of predicting in-station movements, while being crucial to improve train dispatching (i.e., empowering human or automatic dispatchers), has been far more less investigated. In fact, stations are the most critical points in a railway networks: even small improvements in the estimation of the duration of trains movements can strongly improve the dispatching efficiency in coping with the increase in capacity demand and with delays. In this work we will first leverage on state of the art shallow models, fed by domain experts with domain specific features, to improve the current predictive systems. Then, we will leverage on a custom deep multi scale model able to automatically learn a representation and improve the accuracy of the shallow models. Results on real-world data coming from the Italian railway network will support our proposal.
Deep Neural Networks for Classification of Riding Patterns: with a focus on explainability
milad leyli abadi, abderrahmane boubezoul
https://doi.org/10.14428/esann/2021.ES2021-51
milad leyli abadi, abderrahmane boubezoul
https://doi.org/10.14428/esann/2021.ES2021-51
Abstract:
The powered two-wheelers (PTW) are among the most vulnerable transport users. It is crucial to identify the appropriate action that should be undertaken during a specific situation to reduce the risk. In this article, the aim is to improve the current state of the art in identification of riding patterns through neural network architectures and to explain how a decision is made by a model which is considered as a black box. In this regard, a new visualization tool specific to time series is suggested to help identify the most influential factors and hopefully to develop appropriate risk mitigation strategies.
The powered two-wheelers (PTW) are among the most vulnerable transport users. It is crucial to identify the appropriate action that should be undertaken during a specific situation to reduce the risk. In this article, the aim is to improve the current state of the art in identification of riding patterns through neural network architectures and to explain how a decision is made by a model which is considered as a black box. In this regard, a new visualization tool specific to time series is suggested to help identify the most influential factors and hopefully to develop appropriate risk mitigation strategies.
A Lightweight Approach for Origin-Destination Matrix Anonymization
Benoit Matet, Etienne Come, Angelo Furno, Loïc Bonnetain, Latifa Oukhellou, Nour-Eddin El Faouzi
https://doi.org/10.14428/esann/2021.ES2021-56
Benoit Matet, Etienne Come, Angelo Furno, Loïc Bonnetain, Latifa Oukhellou, Nour-Eddin El Faouzi
https://doi.org/10.14428/esann/2021.ES2021-56
Abstract:
Personal trajectory data are becoming more and more accessible and have a high value in transport planning and mobility characterisation, at the cost of a risk for user's privacy. Addressing this risk is usually computationally expensive and can lead to losing most of the data utility. We explore a new, light-weight approach to Origin/Destination-matrix anonymization that is easily scalable. We apply it to trip records from New York City Taxi and Limousine Commission (TLC) and measure the resulting utility loss with a generalization error function.
Personal trajectory data are becoming more and more accessible and have a high value in transport planning and mobility characterisation, at the cost of a risk for user's privacy. Addressing this risk is usually computationally expensive and can lead to losing most of the data utility. We explore a new, light-weight approach to Origin/Destination-matrix anonymization that is easily scalable. We apply it to trip records from New York City Taxi and Limousine Commission (TLC) and measure the resulting utility loss with a generalization error function.
Supervised learning
Supervised learning of convex piecewise linear approximations of optimization problems
Laurine Duchesne, Quentin Louveaux, Louis Wehenkel
https://doi.org/10.14428/esann/2021.ES2021-74
Laurine Duchesne, Quentin Louveaux, Louis Wehenkel
https://doi.org/10.14428/esann/2021.ES2021-74
Abstract:
We propose to use input convex neural networks (ICNN) to build convex approximations of non-convex feasible sets of optimization problems, in the form of a set of linear equalities and inequalities in a lifted space. Our approach may be tailored to yield both inner- and outer- approximations, or to maximize its accuracy in regions closer to the minimum of a given objective function. We illustrate the method on two-dimensional toy problems and motivate it by various instances of reliability management problems of large-scale electric power systems.
We propose to use input convex neural networks (ICNN) to build convex approximations of non-convex feasible sets of optimization problems, in the form of a set of linear equalities and inequalities in a lifted space. Our approach may be tailored to yield both inner- and outer- approximations, or to maximize its accuracy in regions closer to the minimum of a given objective function. We illustrate the method on two-dimensional toy problems and motivate it by various instances of reliability management problems of large-scale electric power systems.
Real-time On-edge Classification: an Application to Domestic Acoustic Event Recognition
Lode Vuegen, Peter Karsmakers
https://doi.org/10.14428/esann/2021.ES2021-84
Lode Vuegen, Peter Karsmakers
https://doi.org/10.14428/esann/2021.ES2021-84
Abstract:
In this paper two different convolutional neural network (CNN) architectures are investigated for the purpose of real-time on-edge domestic acoustic event classification. For training and evaluation of the models, a real-life acoustical dataset was recorded in 72 different home environments. A quantization-aware training scheme was applied that takes into account that the models need to run on 8-bit fixed-point processing hardware. Once trained, the models were successfully deployed on an ARM cortex-M7 microcontroller unit (i.MX RT1064). This study indicates that the used procedure can lead to an efficient and real-time embedded on-edge implementation of a domestic sound event classifier that does not sacrifices classification performance compared to its floating-point counterpart.
In this paper two different convolutional neural network (CNN) architectures are investigated for the purpose of real-time on-edge domestic acoustic event classification. For training and evaluation of the models, a real-life acoustical dataset was recorded in 72 different home environments. A quantization-aware training scheme was applied that takes into account that the models need to run on 8-bit fixed-point processing hardware. Once trained, the models were successfully deployed on an ARM cortex-M7 microcontroller unit (i.MX RT1064). This study indicates that the used procedure can lead to an efficient and real-time embedded on-edge implementation of a domestic sound event classifier that does not sacrifices classification performance compared to its floating-point counterpart.
Functional Gradient Descent for n-Tuple Regression
Rafael Katopodis, Priscila Lima, Felipe França
https://doi.org/10.14428/esann/2021.ES2021-35
Rafael Katopodis, Priscila Lima, Felipe França
https://doi.org/10.14428/esann/2021.ES2021-35
Abstract:
n-tuple neural networks have been in the past applied to a wide range of learning domains. However, for the particular area of regression, existing systems have displayed two shortcomings: little flexibility in the objective function being optimized and an inability to handle nonstationarity in an online learning setting. A novel n-tuple system is proposed to address these issues. The new architecture leverages the idea of functional gradient descent, drawing inspiration from its use in kernel methods. Furthermore, its capabilities are showcased in two reinforcement learning tasks, which involves both nonstationary online learning and task-specific objective functions.
n-tuple neural networks have been in the past applied to a wide range of learning domains. However, for the particular area of regression, existing systems have displayed two shortcomings: little flexibility in the objective function being optimized and an inability to handle nonstationarity in an online learning setting. A novel n-tuple system is proposed to address these issues. The new architecture leverages the idea of functional gradient descent, drawing inspiration from its use in kernel methods. Furthermore, its capabilities are showcased in two reinforcement learning tasks, which involves both nonstationary online learning and task-specific objective functions.
Estimating uncertainty in radiation oncology dose prediction with dropout and bootstrap in U-Net models
Lee John, Alyssa Vanginderdeuren, Margerie Huet-Dastarac, Ana Maria Barragan Montero
https://doi.org/10.14428/esann/2021.ES2021-117
Lee John, Alyssa Vanginderdeuren, Margerie Huet-Dastarac, Ana Maria Barragan Montero
https://doi.org/10.14428/esann/2021.ES2021-117
Abstract:
Deep learning models, such as U-Net, can be used to efficiently predict the optimal dose distribution in radiotherapy treatment planning. In this work, we want to supplement the prediction model with a measurement of its uncertainty at each voxel. For this purpose, a full Bayesian approach would, however, be too costly. Instead, we compare, based on their correlation with the actual error, three simpler methods, namely, the dropout, the bootstrap and a modification of the U-Net. These methods can be easily adapted to other architectures. 200 patients with head and neck cancer were used in this work.
Deep learning models, such as U-Net, can be used to efficiently predict the optimal dose distribution in radiotherapy treatment planning. In this work, we want to supplement the prediction model with a measurement of its uncertainty at each voxel. For this purpose, a full Bayesian approach would, however, be too costly. Instead, we compare, based on their correlation with the actual error, three simpler methods, namely, the dropout, the bootstrap and a modification of the U-Net. These methods can be easily adapted to other architectures. 200 patients with head and neck cancer were used in this work.
Hierarchical Planning in Multilayered State-Action Networks
Matthias Brucklacher, Hanspeter A. Mallot, Tristan Baumann
https://doi.org/10.14428/esann/2021.ES2021-45
Matthias Brucklacher, Hanspeter A. Mallot, Tristan Baumann
https://doi.org/10.14428/esann/2021.ES2021-45
Abstract:
The ability to decompose large tasks into smaller subtasks allows humans to solve complex problems step-by-step. To transfer this ability to an automated system, we propose a spiking neural network inspired by the neurobiological mechanics of spatial cognition to represent space on multiple levels of abstraction. As behavioral experiments suggest that humans integrate spatial knowledge in a graph of places, neurons in the state-action network encode locations while connections between them represent transition actions. In a series of simulation experiments, the influence of hierarchy on planning speed and on the resulting route choice in comparison to single-level models is investigated. We find that the model chooses biased subgoals in line with experiments on human navigation.
The ability to decompose large tasks into smaller subtasks allows humans to solve complex problems step-by-step. To transfer this ability to an automated system, we propose a spiking neural network inspired by the neurobiological mechanics of spatial cognition to represent space on multiple levels of abstraction. As behavioral experiments suggest that humans integrate spatial knowledge in a graph of places, neurons in the state-action network encode locations while connections between them represent transition actions. In a series of simulation experiments, the influence of hierarchy on planning speed and on the resulting route choice in comparison to single-level models is investigated. We find that the model chooses biased subgoals in line with experiments on human navigation.
Distribution Preserving Multiple Hypotheses Prediction for Uncertainty Modeling
Tobias Leemann, Moritz Sackmann, Jörn Thielecke, Ulrich Hofmann
https://doi.org/10.14428/esann/2021.ES2021-16
Tobias Leemann, Moritz Sackmann, Jörn Thielecke, Ulrich Hofmann
https://doi.org/10.14428/esann/2021.ES2021-16
Abstract:
Many supervised machine learning tasks, such as future state prediction in dynamic systems, require precise modeling of a forecast’s uncertainty. The Multiple Hypotheses Prediction (MHP) approach addresses this problem by providing several hypotheses that represent possible outcomes. Unfortunately, with the common l2 loss function, these hypotheses do not preserve the data distribution’s characteristics.We propose an alternative loss for distribution preserving MHP and review relevant theorems supporting our claims. Furthermore, we empirically show that our approach yields more representative hypotheses on a synthetic and a real-world motion prediction data set. The outputs of the proposed method can directly be used in sampling-based Monte-Carlo methods.
Many supervised machine learning tasks, such as future state prediction in dynamic systems, require precise modeling of a forecast’s uncertainty. The Multiple Hypotheses Prediction (MHP) approach addresses this problem by providing several hypotheses that represent possible outcomes. Unfortunately, with the common l2 loss function, these hypotheses do not preserve the data distribution’s characteristics.We propose an alternative loss for distribution preserving MHP and review relevant theorems supporting our claims. Furthermore, we empirically show that our approach yields more representative hypotheses on a synthetic and a real-world motion prediction data set. The outputs of the proposed method can directly be used in sampling-based Monte-Carlo methods.
Orientation Adaptive Minimal Learning Machine for Directions of Atomic Forces
Antti Pihlajamäki, Joakim Linja, Joonas Hämäläinen, Paavo Nieminen, Sami Malola, Tommi Kärkkäinen, Hannu Häkkinen
https://doi.org/10.14428/esann/2021.ES2021-34
Antti Pihlajamäki, Joakim Linja, Joonas Hämäläinen, Paavo Nieminen, Sami Malola, Tommi Kärkkäinen, Hannu Häkkinen
https://doi.org/10.14428/esann/2021.ES2021-34
Abstract:
Machine learning (ML) force fields are one of the most common applications of ML in nanoscience. However, commonly these methods are trained on potential energies of atomic systems and force vectors are omitted. Here we present a ML framework, which tackles the greatest difficulty on using forces in ML: accurate prediction of force direction. We use the idea of Minimal Learning Machine to device a method which can adapt to the orientation of an atomic environment to estimate the directions of force vectors. The method was tested with linear alkane molecules.
Machine learning (ML) force fields are one of the most common applications of ML in nanoscience. However, commonly these methods are trained on potential energies of atomic systems and force vectors are omitted. Here we present a ML framework, which tackles the greatest difficulty on using forces in ML: accurate prediction of force direction. We use the idea of Minimal Learning Machine to device a method which can adapt to the orientation of an atomic environment to estimate the directions of force vectors. The method was tested with linear alkane molecules.
Estimating Formulas for Model Performance Under Noisy Labels Using Symbolic Regression
Fech Scen Khoo, Dawei Zhu, Michael A. Hedderich , Dietrich Klakow
https://doi.org/10.14428/esann/2021.ES2021-65
Fech Scen Khoo, Dawei Zhu, Michael A. Hedderich , Dietrich Klakow
https://doi.org/10.14428/esann/2021.ES2021-65
Abstract:
We present a generic formula characterizing the learning of our model under a variety of label-noise settings. This is achieved by using the symbolic regressor model, a genetic programming algorithm, from which we learn functions based on a large set of performance evaluations. Equipped with the knowledge from the regressor, we find a universal formula governing the model performance with respect to noise. This result from our empirical approach could have qualitative applications in mitigating the performance of real-world noisy data and could complement certain noise-robust models.
We present a generic formula characterizing the learning of our model under a variety of label-noise settings. This is achieved by using the symbolic regressor model, a genetic programming algorithm, from which we learn functions based on a large set of performance evaluations. Equipped with the knowledge from the regressor, we find a universal formula governing the model performance with respect to noise. This result from our empirical approach could have qualitative applications in mitigating the performance of real-world noisy data and could complement certain noise-robust models.
A Multi-ELM Model for Incomplete Data
Baichuan Chi, Amaury Lendasse, Edward Ratner, Renjie Hu
https://doi.org/10.14428/esann/2021.ES2021-162
Baichuan Chi, Amaury Lendasse, Edward Ratner, Renjie Hu
https://doi.org/10.14428/esann/2021.ES2021-162
Abstract:
This paper presents a novel model of Extreme Learning Machines (ELMs) for incomplete data. ELMs are fast accurate randomized neu- ral networks. Nevertheless ELM can only be applied on the complete dataset. Therefore, a novel Multi-ELM Model for incomplete data is pro- posed, consisting of multiple secondary ELMs and one primary ELM. The secondary ELMS are approximating the hidden layer output in the primary ELM for the data with missing values. As summarized in the experimental Section, this model can be applied on data with any missing patterns, without using imputations and can outperform the traditional imputation methods within a reasonable fraction of missing values (0% to 20%), as it avoids the noises intruded by imputations.
This paper presents a novel model of Extreme Learning Machines (ELMs) for incomplete data. ELMs are fast accurate randomized neu- ral networks. Nevertheless ELM can only be applied on the complete dataset. Therefore, a novel Multi-ELM Model for incomplete data is pro- posed, consisting of multiple secondary ELMs and one primary ELM. The secondary ELMS are approximating the hidden layer output in the primary ELM for the data with missing values. As summarized in the experimental Section, this model can be applied on data with any missing patterns, without using imputations and can outperform the traditional imputation methods within a reasonable fraction of missing values (0% to 20%), as it avoids the noises intruded by imputations.
Interpretable Models in Machine Learning and Explainable Artificial Intelligence
The Coming of Age of Interpretable and Explainable Machine Learning Models
Paulo Lisboa, Sascha Saralajew, Alfredo Vellido, Thomas Villmann
https://doi.org/10.14428/esann/2021.ES2021-2
Paulo Lisboa, Sascha Saralajew, Alfredo Vellido, Thomas Villmann
https://doi.org/10.14428/esann/2021.ES2021-2
Abstract:
Machine learning-based systems are now part of a wide array of real-world applications seamlessly embedded in the social realm. In the wake of this realisation, strict legal regulations for these systems are currently being developed, addressing some of the risks they may pose. This is the coming of age of the interpretability and explainability problems in machine learning-based data analysis, which can no longer be seen just as an academic research problem. In this tutorial, associated to ESANN 2021 special session on Interpretable Models in Machine Learning and Explainable Artificial Intelligence, we discuss explainable and interpretable machine learning as post-hoc and ante-hoc strategies to address these problems and highlight several aspects related to them, including their assessment. The contributions accepted for the session are then presented in this context.
Machine learning-based systems are now part of a wide array of real-world applications seamlessly embedded in the social realm. In the wake of this realisation, strict legal regulations for these systems are currently being developed, addressing some of the risks they may pose. This is the coming of age of the interpretability and explainability problems in machine learning-based data analysis, which can no longer be seen just as an academic research problem. In this tutorial, associated to ESANN 2021 special session on Interpretable Models in Machine Learning and Explainable Artificial Intelligence, we discuss explainable and interpretable machine learning as post-hoc and ante-hoc strategies to address these problems and highlight several aspects related to them, including their assessment. The contributions accepted for the session are then presented in this context.
AGLVQ - Making Generalized Vector Quantization Algorithms Aware of Context
Torben Graeber, Sebastian Vetter, Sascha Sarajalew, Michael Unterreiner, Dieter Schramm
https://doi.org/10.14428/esann/2021.ES2021-40
Torben Graeber, Sebastian Vetter, Sascha Sarajalew, Michael Unterreiner, Dieter Schramm
https://doi.org/10.14428/esann/2021.ES2021-40
Abstract:
Generalized Learning Vector Quantization methods are a powerful and robust approach for classification tasks. They compare incoming samples with representative prototypes for each target class. While prototypes are physically interpretable, they do not account for changes in the environment. We propose a novel framework for the incorporation of context information into prototype generation. We can model dependencies in a modular way ranging from polynomials to neural networks. Evaluations on artficial and real-world datasets show an increase in performance and meaningful prototype adaptations.
Generalized Learning Vector Quantization methods are a powerful and robust approach for classification tasks. They compare incoming samples with representative prototypes for each target class. While prototypes are physically interpretable, they do not account for changes in the environment. We propose a novel framework for the incorporation of context information into prototype generation. We can model dependencies in a modular way ranging from polynomials to neural networks. Evaluations on artficial and real-world datasets show an increase in performance and meaningful prototype adaptations.
A Parameterless t-SNE for Faithful Cluster Embeddings from Prototype-based Learning and CONN Similarity
Josh Taylor, Erzsébet Merényi
https://doi.org/10.14428/esann/2021.ES2021-138
Josh Taylor, Erzsébet Merényi
https://doi.org/10.14428/esann/2021.ES2021-138
Abstract:
We propose an improvement to t-SNE which allows automated specification of its perplexity parameter using topological information about a data manifold revealed through neural prototype-based learning. This information is contained in the CONN (CONNectivity) similarity of neural prototypes, which expresses the strength (weakness) of topological connectivity at various points within the manifold. Experiments show that improvements, collectively called CONNt-SNE, are capable of producing meaningful and trustworthy low-dimensional embeddings without the need to heuristically optimize over (i.e., grid search) t-SNE's perplexity space. Data-driven perplexity determination improves our confidence that any structure appearing in the embeddings is valid and not merely an artifact of spurious parameterization.
We propose an improvement to t-SNE which allows automated specification of its perplexity parameter using topological information about a data manifold revealed through neural prototype-based learning. This information is contained in the CONN (CONNectivity) similarity of neural prototypes, which expresses the strength (weakness) of topological connectivity at various points within the manifold. Experiments show that improvements, collectively called CONNt-SNE, are capable of producing meaningful and trustworthy low-dimensional embeddings without the need to heuristically optimize over (i.e., grid search) t-SNE's perplexity space. Data-driven perplexity determination improves our confidence that any structure appearing in the embeddings is valid and not merely an artifact of spurious parameterization.
Handling Correlations in Random Forests: which Impacts on Variable Importance and Model Interpretability?
Marie Chavent, Jérôme Lacaille, Alex Mourer, Madalina Olteanu
https://doi.org/10.14428/esann/2021.ES2021-155
Marie Chavent, Jérôme Lacaille, Alex Mourer, Madalina Olteanu
https://doi.org/10.14428/esann/2021.ES2021-155
Abstract:
The present manuscript tackles the issues of model interpretability and variable importance in random forests, in the presence of correlated input variables. Variable importance criteria based on random permutations are known to be sensitive when input variables are correlated, and may lead for instance to unreliability in the importance ranking. In order to overcome some of the problems raised by correlation, an original variable importance measure is introduced. The proposed measure builds upon an algorithm which clusters the input variables based on their correlations, and summarises each such cluster by a synthetic variable. The effectiveness of the proposed criterion is illustrated through simulations in a regression context, and compared with several existing variable importance measures.
The present manuscript tackles the issues of model interpretability and variable importance in random forests, in the presence of correlated input variables. Variable importance criteria based on random permutations are known to be sensitive when input variables are correlated, and may lead for instance to unreliability in the importance ranking. In order to overcome some of the problems raised by correlation, an original variable importance measure is introduced. The proposed measure builds upon an algorithm which clusters the input variables based on their correlations, and summarises each such cluster by a synthetic variable. The effectiveness of the proposed criterion is illustrated through simulations in a regression context, and compared with several existing variable importance measures.
The partial response SVM
Bradley Walters, Sandra Ortega-Martorell, Ivan Olier, Paulo Lisboa
https://doi.org/10.14428/esann/2021.ES2021-36
Bradley Walters, Sandra Ortega-Martorell, Ivan Olier, Paulo Lisboa
https://doi.org/10.14428/esann/2021.ES2021-36
Abstract:
We introduce a probabilistic algorithm for binary classification based on the SVM through the application of the ANOVA decomposition for multivariate functions to express the logit of the Platt estimate of the posterior probability as a non-redundant sum of functions of fewer variables (partial responses) followed by feature selection with the Lasso. The partial response SVM (prSVM) is compared with previous interpretable models of the SVM. Its accuracy and stability are demonstrated with real-world data sets.
We introduce a probabilistic algorithm for binary classification based on the SVM through the application of the ANOVA decomposition for multivariate functions to express the logit of the Platt estimate of the posterior probability as a non-redundant sum of functions of fewer variables (partial responses) followed by feature selection with the Lasso. The partial response SVM (prSVM) is compared with previous interpretable models of the SVM. Its accuracy and stability are demonstrated with real-world data sets.
The LVQ-based Counter Propagation Network -- an Interpretable Information Bottleneck Approach
Marika Kaden, Ronny Schubert, Mehrdad Mohannazadeh Bakhtiari, Lucas Schwarz, Thomas Villmann
https://doi.org/10.14428/esann/2021.ES2021-88
Marika Kaden, Ronny Schubert, Mehrdad Mohannazadeh Bakhtiari, Lucas Schwarz, Thomas Villmann
https://doi.org/10.14428/esann/2021.ES2021-88
Abstract:
In this paper we present a realization of the information-bottleneck-paradigm by means of an improved counter propagation network. It combines an unsupervised vector quantizer for data compression with a subsequent supervised learning vector quantization model. The approach is mathematically justified and yields an interpretable model for classification under the constraint of data compression.
In this paper we present a realization of the information-bottleneck-paradigm by means of an improved counter propagation network. It combines an unsupervised vector quantizer for data compression with a subsequent supervised learning vector quantization model. The approach is mathematically justified and yields an interpretable model for classification under the constraint of data compression.
Geometric Probing of Word Vectors
Madina Babazhanova, Maxat Tezekbayev, Zhenisbek Assylbekov
https://doi.org/10.14428/esann/2021.ES2021-105
Madina Babazhanova, Maxat Tezekbayev, Zhenisbek Assylbekov
https://doi.org/10.14428/esann/2021.ES2021-105
Abstract:
This paper studies the informativeness of linguistic properties such as part-of-speech and named entities encoded in word representations. First, we find directions that correspond to these properties using the method of Elazar et al. (2020). Then such directions are compared with the principal vectors obtained from application of PCA to word embeddings. As a result, we find that the part-of-speech information is more important for word embeddings than the named entity property.
This paper studies the informativeness of linguistic properties such as part-of-speech and named entities encoded in word representations. First, we find directions that correspond to these properties using the method of Elazar et al. (2020). Then such directions are compared with the principal vectors obtained from application of PCA to word embeddings. As a result, we find that the part-of-speech information is more important for word embeddings than the named entity property.
Context-specific sampling method for contextual explanations
Manik Madhikermi, Avleen Malhi, Kary Främling
https://doi.org/10.14428/esann/2021.ES2021-124
Manik Madhikermi, Avleen Malhi, Kary Främling
https://doi.org/10.14428/esann/2021.ES2021-124
Abstract:
Explaining the result of machine learning models is an active research topic in Artificial Intelligence (AI) domain with an objective to provide mechanisms to understand and interpret the results of the underlying black-box model in a human-understandable form. With this objective, several eXplainable Artificial Intelligence (XAI) methods have been designed and developed based on varied fundamental principles. Some methods such as Local interpretable model agnostic explanations (LIME), SHAP (SHapley Additive exPlanations) are based on the surrogate model while others such as Contextual Importance and Utility (CIU) do not create or rely on the surrogate model to generate its explanation. Despite the difference in underlying principles, these methods use different sampling techniques such as uniform sampling, weighted sampling for generating explanations. CIU, which emphasizes a context-aware decision explanation, employs a uniform sampling method for the generation of representative samples. In this research, we target uniform sampling methods which generate representative samples that do not guarantee to be representative in the presence of strong non-linearities or exceptional input feature value combinations. The objective of this research is to develop a sampling method that addresses these concerns. To address this need, a new adaptive weighted sampling method has been proposed. In order to verify its efficacy in generating explanations, the proposed method has been integrated with CIU, and tested by deploying the special test case.
Explaining the result of machine learning models is an active research topic in Artificial Intelligence (AI) domain with an objective to provide mechanisms to understand and interpret the results of the underlying black-box model in a human-understandable form. With this objective, several eXplainable Artificial Intelligence (XAI) methods have been designed and developed based on varied fundamental principles. Some methods such as Local interpretable model agnostic explanations (LIME), SHAP (SHapley Additive exPlanations) are based on the surrogate model while others such as Contextual Importance and Utility (CIU) do not create or rely on the surrogate model to generate its explanation. Despite the difference in underlying principles, these methods use different sampling techniques such as uniform sampling, weighted sampling for generating explanations. CIU, which emphasizes a context-aware decision explanation, employs a uniform sampling method for the generation of representative samples. In this research, we target uniform sampling methods which generate representative samples that do not guarantee to be representative in the presence of strong non-linearities or exceptional input feature value combinations. The objective of this research is to develop a sampling method that addresses these concerns. To address this need, a new adaptive weighted sampling method has been proposed. In order to verify its efficacy in generating explanations, the proposed method has been integrated with CIU, and tested by deploying the special test case.
SmoothLRP: Smoothing LRP by Averaging over Stochastic Input Variations
Arne Raulf, Sina Däubener, Ben Hack, Axel Mosig, Asja Fischer
https://doi.org/10.14428/esann/2021.ES2021-139
Arne Raulf, Sina Däubener, Ben Hack, Axel Mosig, Asja Fischer
https://doi.org/10.14428/esann/2021.ES2021-139
Abstract:
Explanations of neural networks predictions are a necessity for deploying neural network in safety critical domains. Several methods were developed which identify most relevant input features, such as sensitivity analysis and layer-wise relevance propagation (LRP). It has been shown that the noise in the explanations from the sensitivity analysis can be reduced by averaging over noisy input images, a method referred to as SmoothGrad. We investigate the application of the same principle to LRP and find that it smooths the resulting relevance function leading to improved explanations. Moreover, it can be applied for restoring the correct label of adversarial examples.
Explanations of neural networks predictions are a necessity for deploying neural network in safety critical domains. Several methods were developed which identify most relevant input features, such as sensitivity analysis and layer-wise relevance propagation (LRP). It has been shown that the noise in the explanations from the sensitivity analysis can be reduced by averaging over noisy input images, a method referred to as SmoothGrad. We investigate the application of the same principle to LRP and find that it smooths the resulting relevance function leading to improved explanations. Moreover, it can be applied for restoring the correct label of adversarial examples.
A Baseline for Shapley Values in MLPs: from Missingness to Neutrality
Cosimo Izzo, Aldo Lipani, Ramin Okhrati, Francesca Medda
https://doi.org/10.14428/esann/2021.ES2021-18
Cosimo Izzo, Aldo Lipani, Ramin Okhrati, Francesca Medda
https://doi.org/10.14428/esann/2021.ES2021-18
Abstract:
Deep neural networks have gained momentum based on their accuracy, but their interpretability is often criticised. As a result, they are labelled as black boxes. In response, several methods have been proposed in the literature to explain their predictions. Among the explanatory methods, Shapley values is a feature attribution method favoured for its robust theoretical foundation. However, the analysis of feature attributions using Shapley values requires choosing a baseline that represents the concept of missingness. An arbitrary choice of baseline could negatively impact the explanatory power of the method and possibly lead to incorrect interpretations. In this paper, we present a method for choosing a baseline according to a neutrality value: as a parameter selected by decision-makers, the point at which their choices are determined by the model predictions being either above or below it. Hence, the proposed baseline is set based on a parameter that depends on the actual use of the model. This procedure stands in contrast to how other baselines are set, i.e. without accounting for how the model is used. We empirically validate our choice of baseline in the context of binary classification tasks, using two datasets: a synthetic dataset and a dataset derived from the financial domain.
Deep neural networks have gained momentum based on their accuracy, but their interpretability is often criticised. As a result, they are labelled as black boxes. In response, several methods have been proposed in the literature to explain their predictions. Among the explanatory methods, Shapley values is a feature attribution method favoured for its robust theoretical foundation. However, the analysis of feature attributions using Shapley values requires choosing a baseline that represents the concept of missingness. An arbitrary choice of baseline could negatively impact the explanatory power of the method and possibly lead to incorrect interpretations. In this paper, we present a method for choosing a baseline according to a neutrality value: as a parameter selected by decision-makers, the point at which their choices are determined by the model predictions being either above or below it. Hence, the proposed baseline is set based on a parameter that depends on the actual use of the model. This procedure stands in contrast to how other baselines are set, i.e. without accounting for how the model is used. We empirically validate our choice of baseline in the context of binary classification tasks, using two datasets: a synthetic dataset and a dataset derived from the financial domain.
Time series and signal processing
Quantifying Resemblance of Synthetic Medical Time-Series
Karan Bhanot, Saloni Dash, Joseph Pedersen, Isabelle Guyon, Kristin Bennett
https://doi.org/10.14428/esann/2021.ES2021-108
Karan Bhanot, Saloni Dash, Joseph Pedersen, Isabelle Guyon, Kristin Bennett
https://doi.org/10.14428/esann/2021.ES2021-108
Abstract:
Access to medical data is often restricted due to privacy laws e.g. HIPAA and GDPR. We address the viability of substituting real data with synthetic data to protect privacy while maintaining utility. Medical data records are fundamentally longitudinal, with one patient having multiple health events influenced by covariates like gender, age etc. Synthesis of medical data, hence, falls under time-series generative modeling. We demonstrate methods to measure synthetic medical time-series quality on datasets from previously published synthetic data research. We deploy four time-series metrics to quantify resemblance in synthetic and real covariate plots while comparing baseline data generation methods.
Access to medical data is often restricted due to privacy laws e.g. HIPAA and GDPR. We address the viability of substituting real data with synthetic data to protect privacy while maintaining utility. Medical data records are fundamentally longitudinal, with one patient having multiple health events influenced by covariates like gender, age etc. Synthesis of medical data, hence, falls under time-series generative modeling. We demonstrate methods to measure synthetic medical time-series quality on datasets from previously published synthetic data research. We deploy four time-series metrics to quantify resemblance in synthetic and real covariate plots while comparing baseline data generation methods.
Differentially Private Time Series Generation
Hiba Arnout, Johanna Bronner, Thomas Runkler
https://doi.org/10.14428/esann/2021.ES2021-20
Hiba Arnout, Johanna Bronner, Thomas Runkler
https://doi.org/10.14428/esann/2021.ES2021-20
Abstract:
Privacy issues prevent data owner from improving Machine Learning (ML) performance as it makes external collaborations binding. To allow data sharing without confidentiality concerns, we propose in this work methods to generate time series in a privacy preserving manner. We combine the existing generative models for time series namely TimeGAN [1] ClaRe-GAN [2] and C-RNN-GAN [3] with differential privacy. This is achieved by changing their original discriminator with a private discriminator that relies on the differentially private stochastic gradient method (DPSGD) [4]. Our experiments show that the developed methods - in particular TimeGAN and ClaRe-GAN - outperform the existing and unique differentially private model for time series of RCGAN [5] in terms of privacy and accuracy.
Privacy issues prevent data owner from improving Machine Learning (ML) performance as it makes external collaborations binding. To allow data sharing without confidentiality concerns, we propose in this work methods to generate time series in a privacy preserving manner. We combine the existing generative models for time series namely TimeGAN [1] ClaRe-GAN [2] and C-RNN-GAN [3] with differential privacy. This is achieved by changing their original discriminator with a private discriminator that relies on the differentially private stochastic gradient method (DPSGD) [4]. Our experiments show that the developed methods - in particular TimeGAN and ClaRe-GAN - outperform the existing and unique differentially private model for time series of RCGAN [5] in terms of privacy and accuracy.
Fusion of estimations from two modalities using the Viterbi's algorithm: application to fetal heart rate monitoring
Rémi Souriau, Julie Fontecave-Jallon, Bertrand Rivet
https://doi.org/10.14428/esann/2021.ES2021-61
Rémi Souriau, Julie Fontecave-Jallon, Bertrand Rivet
https://doi.org/10.14428/esann/2021.ES2021-61
Abstract:
The Viterbi's algorithm allows to estimate latent time series according to observations in a hidden Markov model. This algorithm can be used to merge estimations from different modalities as proposed in this paper. Such a multi-modal estimation is more efficient than mono-modal estimations when the modalities are subject to independent noises. In this paper, this improvement is evaluated in function of noise level of modalities. Experiences on toy data and actual signals to estimate the fetal heart rate show that merging modalities will provide better estimations on average than using the modalities separately.
The Viterbi's algorithm allows to estimate latent time series according to observations in a hidden Markov model. This algorithm can be used to merge estimations from different modalities as proposed in this paper. Such a multi-modal estimation is more efficient than mono-modal estimations when the modalities are subject to independent noises. In this paper, this improvement is evaluated in function of noise level of modalities. Experiences on toy data and actual signals to estimate the fetal heart rate show that merging modalities will provide better estimations on average than using the modalities separately.
Convolutional Neural Network Architecture for Classification of Aircraft Engines Flight Time Series
Delphine Bay, Clémence Bisot
https://doi.org/10.14428/esann/2021.ES2021-91
Delphine Bay, Clémence Bisot
https://doi.org/10.14428/esann/2021.ES2021-91
Abstract:
During each flight, an aircraft engine sends data to a ground system. This data corresponds to different sensors measurements (temperatures, pressures, vibrations...) collected at key moments of the flight. It constitutes rich multivariate time series used to monitor the engine's health. In this article, we used flight data to predict the main removal cause of the engine. The problem falls within the framework of time series classification. This article proposes an interpretable neural network architecture which fits with the physical understanding of the modeled phenomenon in order to address the problem on a real-world, industrial dataset.
During each flight, an aircraft engine sends data to a ground system. This data corresponds to different sensors measurements (temperatures, pressures, vibrations...) collected at key moments of the flight. It constitutes rich multivariate time series used to monitor the engine's health. In this article, we used flight data to predict the main removal cause of the engine. The problem falls within the framework of time series classification. This article proposes an interpretable neural network architecture which fits with the physical understanding of the modeled phenomenon in order to address the problem on a real-world, industrial dataset.
Multi-perspective embedding for non-metric time series classification
Maximilian Münch, Simon Heilig, Frank-Michael Schleif
https://doi.org/10.14428/esann/2021.ES2021-114
Maximilian Münch, Simon Heilig, Frank-Michael Schleif
https://doi.org/10.14428/esann/2021.ES2021-114
Abstract:
The interest in time series analysis is rapidly increasing, providing new challenges for machine learning. Over many decades, Dynamic Time Warping (DTW) is referred to as the de facto standard distance measure for time series and the tool of choice when analyzing such data. Nevertheless, DTW has two major drawbacks: (a) it is non-metric and therefore hard to handle by standard machine learning techniques, and (b) it is not well suited for multi-dimensional time series. For this purpose, we propose a multi-perspective embedding of the time series into a complex-valued vector space and the evaluation by a model that is able to handle complex-valued data. The approach is evaluated on various multi-dimensional time series data and with different classifier techniques.
The interest in time series analysis is rapidly increasing, providing new challenges for machine learning. Over many decades, Dynamic Time Warping (DTW) is referred to as the de facto standard distance measure for time series and the tool of choice when analyzing such data. Nevertheless, DTW has two major drawbacks: (a) it is non-metric and therefore hard to handle by standard machine learning techniques, and (b) it is not well suited for multi-dimensional time series. For this purpose, we propose a multi-perspective embedding of the time series into a complex-valued vector space and the evaluation by a model that is able to handle complex-valued data. The approach is evaluated on various multi-dimensional time series data and with different classifier techniques.
IF: Iterative Fractional Optimization
Sarthak Chatterjee, Subhro Das, Sérgio Pequito
https://doi.org/10.14428/esann/2021.ES2021-133
Sarthak Chatterjee, Subhro Das, Sérgio Pequito
https://doi.org/10.14428/esann/2021.ES2021-133
Abstract:
Most optimization problems lack closed-form solutions of the argument that minimizes a given function, and even if these were available it might be prohibitive to compute it. As such, we rely on iterative numerical algorithms to find an approximate solution. In this paper, we propose to leverage fractional calculus in the context of time series analysis methods to devise a new iterative algorithm. Specifically, we propose to leverage autoregressive fractional-order integrative moving average time series, whose coefficients encode a proxy for local spatial information. We provide evidence that our algorithm is efficient and particularly suitable for cases where the Hessian is ill-conditioned.
Most optimization problems lack closed-form solutions of the argument that minimizes a given function, and even if these were available it might be prohibitive to compute it. As such, we rely on iterative numerical algorithms to find an approximate solution. In this paper, we propose to leverage fractional calculus in the context of time series analysis methods to devise a new iterative algorithm. Specifically, we propose to leverage autoregressive fractional-order integrative moving average time series, whose coefficients encode a proxy for local spatial information. We provide evidence that our algorithm is efficient and particularly suitable for cases where the Hessian is ill-conditioned.
Classification
A Relational Model for One-Shot Classification
Arturs Polis, Alexander Ilin
https://doi.org/10.14428/esann/2021.ES2021-75
Arturs Polis, Alexander Ilin
https://doi.org/10.14428/esann/2021.ES2021-75
Abstract:
We show that a deep learning model with built-in relational inductive bias can bring benefits to sample-efficient learning, without relying on extensive data augmentation. The proposed one-shot classification model performs relational matching of a pair of inputs in the form of local and pairwise attention. Our approach solves perfectly the one-shot image classification Omniglot challenge. Our model exceeds human level accuracy, as well as the previous state of the art, with no data augmentation.
We show that a deep learning model with built-in relational inductive bias can bring benefits to sample-efficient learning, without relying on extensive data augmentation. The proposed one-shot classification model performs relational matching of a pair of inputs in the form of local and pairwise attention. Our approach solves perfectly the one-shot image classification Omniglot challenge. Our model exceeds human level accuracy, as well as the previous state of the art, with no data augmentation.
Instance-Based Multi-Label Classification via Multi-Target Distance Regression
Joonas Hämäläinen, Paavo Nieminen, Tommi Kärkkäinen
https://doi.org/10.14428/esann/2021.ES2021-104
Joonas Hämäläinen, Paavo Nieminen, Tommi Kärkkäinen
https://doi.org/10.14428/esann/2021.ES2021-104
Abstract:
Interest in multi-target regression and multi-label classification techniques and their applications have been increasing lately. Here, we use the distance-based supervised method, minimal learning machine (MLM), as a base model for multi-label classification. We also propose and test a hybridization of unsupervised and supervised techniques, where prototype-based clustering is used to reduce both the training time and the overall model complexity. In computational experiments, competitive or improved quality of the obtained models compared to the state-of-the-art techniques was observed.
Interest in multi-target regression and multi-label classification techniques and their applications have been increasing lately. Here, we use the distance-based supervised method, minimal learning machine (MLM), as a base model for multi-label classification. We also propose and test a hybridization of unsupervised and supervised techniques, where prototype-based clustering is used to reduce both the training time and the overall model complexity. In computational experiments, competitive or improved quality of the obtained models compared to the state-of-the-art techniques was observed.
A bag of nodes primer on weightless graph classification
Raul Barbosa, Diego Carvalho, Priscila Lima, Felipe França
https://doi.org/10.14428/esann/2021.ES2021-107
Raul Barbosa, Diego Carvalho, Priscila Lima, Felipe França
https://doi.org/10.14428/esann/2021.ES2021-107
Abstract:
This paper proposes a weightless architecture for graph classification scenarios. This architecture is a three-headed arrangement composed of graph hand-picked features, a quantization method and a final classifier. Although multiple new strategies for graph classification have been proposed in recent years, it is still necessary to settle comparable studies with respect to weightless neural networks. The proposed architecture is evaluated along with other baseline classifiers and independent strategies, showing that weightless architectures are able to compete with other well-established methods such as graph kernels.
This paper proposes a weightless architecture for graph classification scenarios. This architecture is a three-headed arrangement composed of graph hand-picked features, a quantization method and a final classifier. Although multiple new strategies for graph classification have been proposed in recent years, it is still necessary to settle comparable studies with respect to weightless neural networks. The proposed architecture is evaluated along with other baseline classifiers and independent strategies, showing that weightless architectures are able to compete with other well-established methods such as graph kernels.
Gradient representations in ReLU networks as similarity functions
Bálint Daróczy, Dániel Rácz
https://doi.org/10.14428/esann/2021.ES2021-153
Bálint Daróczy, Dániel Rácz
https://doi.org/10.14428/esann/2021.ES2021-153
Abstract:
Feed-forward networks can be interpreted as mappings with linear decision surfaces at the level of the last layer. We investigate how the tangent space of the network can be exploited to refine the decision in case of ReLU (Rectangular Linear Unit) activations. We show that a simple Riemannian metric parametrized on the parameters of the network forms a similarity function at least as good as the original network and we suggest a sparse metric to increase the similarity gap.
Feed-forward networks can be interpreted as mappings with linear decision surfaces at the level of the last layer. We investigate how the tangent space of the network can be exploited to refine the decision in case of ReLU (Rectangular Linear Unit) activations. We show that a simple Riemannian metric parametrized on the parameters of the network forms a similarity function at least as good as the original network and we suggest a sparse metric to increase the similarity gap.