Papers by Madalina Drugan
Proceedings of the 11th International Conference on Agents and Artificial Intelligence, 2019
The online game Agar.io has become massively popular on the internet due to its intuitive game de... more The online game Agar.io has become massively popular on the internet due to its intuitive game design and its ability to instantly match players with others around the world. The game has a continuous input and action space and allows diverse agents with complex strategies to compete against each other. In this paper we focus on the pellet eating task in the game, in which an agent has to learn to optimize its navigation strategy to grow maximally in size within a specific time period. This work first investigates how different state representations affect the learning process of a Q-learning algorithm combined with artificial neural networks which are used for representing the Q-function. The representations examined range from raw pixel values to extracted handcrafted feature vision grids. Secondly, the effects of using different resolutions for the representations are examined. Finally, we compare the performance of different value function network architectures. The architectures examined are two convolutional Deep Q-networks (DQN) of varying depth and one multilayer perceptron. The results show that the use of handcrafted feature vision grids significantly outperforms the direct use of raw pixel input. Furthermore, lower resolutions of 42 × 42 lead to better performances than larger resolutions of 84 × 84.
Proceedings of the 10th International Conference on Agents and Artificial Intelligence, 2018
In this paper, we investigate which exploration method yields the best performance in the game Bo... more In this paper, we investigate which exploration method yields the best performance in the game Bomberman. In Bomberman the controlled agent has to kill opponents by placing bombs. The agent is represented by a multi-layer perceptron that learns to play the game with the use of Q-learning. We introduce two novel exploration strategies: Error-Driven-ε and Interval-Q, which base their explorative behavior on the temporaldifference error of Q-learning. The learning capabilities of these exploration strategies are compared to five existing methods: Random-Walk, Greedy, ε-Greedy, Diminishing ε-Greedy, and Max-Boltzmann. The results show that the methods that combine exploration with exploitation perform much better than the Random-Walk and Greedy strategies, which only select exploration or exploitation actions. Furthermore, the results show that Max-Boltzmann exploration performs the best in overall from the different techniques. The Error-Driven-ε exploration strategy also performs very well, but suffers from an unstable learning behavior.
Proceedings of the 10th International Conference on Agents and Artificial Intelligence, 2018
In this paper we propose the use of vision grids as state representation to learn to play the gam... more In this paper we propose the use of vision grids as state representation to learn to play the game Tron using neural networks and reinforcement learning. This approach speeds up learning by significantly reducing the number of unique states. Furthermore, we introduce a novel opponent modelling technique, which is used to predict the opponent's next move. The learned model of the opponent is subsequently used in Monte-Carlo roll-outs, in which the game is simulated n-steps ahead in order to determine the expected value of conducting a certain action. Finally, we compare the performance using two different activation functions in the multi-layer perceptron, namely the sigmoid and exponential linear unit (Elu). The results show that the Elu activation function outperforms the sigmoid activation function in most cases. Furthermore, vision grids significantly increase learning speed and in most cases this also increases the agent's performance compared to when the full grid is used as state representation. Finally, the opponent modelling technique allows the agent to learn a predictive model of the opponent's actions, which in combination with Monte-Carlo roll-outs significantly increases the agent's performance.
Neurocomputing, 2017
Take-down policy If you believe that this document breaches copyright please contact us providing... more Take-down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.
Proceedings of the 9th International Conference on Agents and Artificial Intelligence, 2017
This paper focuses on cyber-security simulations in networks modeled as a Markov game with incomp... more This paper focuses on cyber-security simulations in networks modeled as a Markov game with incomplete information and stochastic elements. The resulting game is an adversarial sequential decision making problem played with two agents, the attacker and defender. The two agents pit one reinforcement learning technique, like neural networks, Monte Carlo learning and Q-learning, against each other and examine their effectiveness against learning opponents. The results showed that Monte Carlo learning with the Softmax exploration strategy is most effective in performing the defender role and also for learning attacking strategies.
2016 IEEE Symposium Series on Computational Intelligence (SSCI), 2016
DOI to the publisher's website. • The final author version and the galley proof are versions of t... more DOI to the publisher's website. • The final author version and the galley proof are versions of the publication after peer review. • The final published version features the final layout of the paper including the volume, issue and page numbers. Link to publication General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal. If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the "Taverne" license above, please follow below link for the End User Agreement:
2015 IEEE Symposium Series on Computational Intelligence, 2015
When reinforcement learning is applied to large state spaces, such as those occurring in playing ... more When reinforcement learning is applied to large state spaces, such as those occurring in playing board games, the use of a good function approximator to learn to approximate the value function is very important. In previous research, multilayer perceptrons have often been quite successfully used as function approximator for learning to play particular games with temporal difference learning. With the recent developments in deep learning, it is important to study if using multiple hidden layers or particular network structures can help to improve learning the value function. In this paper, we compare five different structures of multilayer perceptrons for learning to play the game Tic-Tac-Toe 3D, both when training through self-play and when training against the same fixed opponent they are tested against. We compare three fully connected multilayer perceptrons with a different number of hidden layers and/or hidden units, as well as two structured ones. These structured multilayer perceptrons have a first hidden layer that is only sparsely connected to the input layer, and has units that correspond to the rows in Tic-Tac-Toe 3D. This allows them to more easily learn the contribution of specific patterns on the corresponding rows. One of the two structured multilayer perceptrons has a second hidden layer that is fully connected to the first one, which allows the neural network to learn to non-linearly integrate the information in these detected patterns. The results on Tic-Tac-Toe 3D show that the deep structured neural network with integrated pattern detectors has the strongest performance out of the compared multilayer perceptrons against a fixed opponent, both through self-training and through training against this fixed opponent.
Many real-world stochastic environments are inherently multi-objective environments with multiple... more Many real-world stochastic environments are inherently multi-objective environments with multiple possibly conflicting objectives. Techniques from multi-objective optimization are imported into the multi-armed bandits (MAB) problem for efficient exploration/exploitation mechanisms of reward vectors. We introduce the $\varepsilon$-approximate Pareto MAB algorithm that uses the $\varepsilon$-dominance relation such that its upper confidence bound does not depend on the number of best arms, an important feature for environments with relatively many optimal arms. We experimentally show that the $\varepsilon$-approximate Pareto MAB algorithms outperform the performance of the Pareto UCB1 algorithm on a multi-objective Bernoulli problem inspired by a real world control application.
Proteomics, 2010
Recently, we introduced a novel proteomics method employing a metalloendopeptidase with Lys-N spe... more Recently, we introduced a novel proteomics method employing a metalloendopeptidase with Lys-N specificity to produce proteolytic peptides. Fragmentation spectra generated by electron transfer dissociation, for a large proportion of the Lys-N proteolytic peptides, were found to be dominated by extensive series of c-type ions. Taking advantage of this unique spectral property, we developed an algorithm, LysNDeNovo, to facilitate de novo sequencing of these peptides. LysNDeNovo contains simple and naive heuristics to demonstrate a proof of concept, i.e. that Lys-N peptide electron transfer dissociation spectra are perfectly suited for de novo interpretation. A stringent "golden" dataset of peptides identified by conventional database search algorithms was taken to validate the performance of LysNDeNovo. The results on this dataset indicate that LysNDeNovo was able to confidently identify a considerable proportion (42%), without requiring any prior genome or protein sequences. Results of similar quantity and quality could also be obtained on a much more extensive experimental dataset, illustrating the potential for higher throughput de novo sequencing using these methods.
Parallel Problem Solving from Nature, PPSN XI, 2010
Stochastic Pareto local search (SPLS) methods are local search algorithms for multi-objective com... more Stochastic Pareto local search (SPLS) methods are local search algorithms for multi-objective combinatorial optimization problems that restart local search from points generated using a stochastic process. Examples of such stochastic processes are Brownian motion (or random processes), and the ones resulting from the use of mutation and recombination operators. We propose a path-guided mutation operator for SPLS where an individual solution is mutated in the direction of the path to another individual solution in order to restart a PLS. We study the exploration of the landscape of the bi-objective Quadratic assignment problem (bQAP) using SPLSs that restart the PLSs from: i) uniform randomly generated solutions, ii) solutions generated from best-so-far local optimal solutions with uniform random mutation and iii) with path-guided mutation. Experiments on a bQAP with a large number of facilities and high correlation between the flow matrices show that using mutation, and especially path-guided mutation, is beneficial for performance of SPLS. The performance of SPLSs is partially explained using their dynamical behavior like the probability of escaping the local optima and the speed of enhancing the Pareto front.
Proceedings of the 13th annual conference on Genetic and evolutionary computation - GECCO '11, 2011
Page 1. Generalized Adaptive Pursuit Algorithm for Genetic Pareto Local Search Algorithms M˘ad˘al... more Page 1. Generalized Adaptive Pursuit Algorithm for Genetic Pareto Local Search Algorithms M˘ad˘alina M. Drugan Utrecht University Department of Information and Computing Sciences PO Box 80.089, 3508TB Utrecht The Netherlands madalina@cs.uu.nl ...
2013 IEEE Congress on Evolutionary Computation, 2013
ABSTRACT In this work, we develop a new paradigm, called Meta-Evolutionary Algorithms, motivated ... more ABSTRACT In this work, we develop a new paradigm, called Meta-Evolutionary Algorithms, motivated by the challenging, continuous problems encountered in the domain of satisfiability in fuzzy logics (SAT∞). In Meta-Evolutionary Algorithms, the individuals in a population are optimization algorithms them-selves. Mutation at the meta-population level is handled by performing an optimization step in each optimization algorithm, and recombination at the meta-population level is handled by exchanging information between different algorithms. We analyse different recombination operators and empirically show that simple Meta-Evolutionary Algorithms are able to outperform CMA-ES on a set of SAT∞ benchmark problems.
Lecture Notes in Computer Science, 2004
Markov chain Monte Carlo (MCMC) is a popular class of algorithms to sample from a complex distrib... more Markov chain Monte Carlo (MCMC) is a popular class of algorithms to sample from a complex distribution. A key issue in the design of MCMC algorithms is to improve the proposal mechanism and the mixing behaviour. This has led some authors to propose the use of a population of MCMC chains, while others go even further by integrating techniques from evolutionary computation (EC) into the MCMC framework. This merging of MCMC and EC leads to a class of algorithms, we call Evolutionary Markov Chain Monte Carlo (EMCMC). In this paper we first survey existing EMCMC algorithms and categorise them in two classes: family-competitive EMCMC and populationdriven EMCMC. Next, we introduce the Elitist Coupled Acceptance rule and the Fitness Ordered Tempering algorithm.
Nature Methods, 2008
We introduce a method for sequencing peptides by mass spectrometry using a metalloendopeptidase t... more We introduce a method for sequencing peptides by mass spectrometry using a metalloendopeptidase that cleaves proteins at the amino side of lysine (Lys-N). When analyzed by electron transfer dissociation (ETD)-based mass spectrometric sequencing, Lys-N-digested peptides that contain a single lysine residue produce spectra dominated by c-type fragment ions, providing simple ladders for sequence determination. This method should be a valuable strategy for de novo sequencing and the analysis of post-translational modifications.
Uploads
Papers by Madalina Drugan