Dávid Pál

Home address

800 High School Way, Apt. 242
Mountain View, CA, 94041
USA

davidko.pal@gmail.com

cell phone: +1 (646) 206 4832
home phone: +1 (650) 386 1616

Work address

190 Mathilda Place
Sunnyvale, CA 94086
USA

dapal@uber.com

My research interests are machine learning, algorithms, game theory and combinatorics. Curriculum Vitae.

Since June 2025, I work at Uber as a machine learning engineer. Previously, I worked as a software engineer at Google, as a research scientist at Yahoo, as an applied researcher at Expedia, and as a machine learning engineer at Instacart and Coupang.

I received a PhD in Computer Science from University of Waterloo in Canada. My advisor was Shai Ben-David. I was a post-doctoral fellow at AICML at Department of Computing Science at University of Alberta, Canada under supervision of Csaba Szepesvári.

I did my undergrad at Faculty of Mathematics, Physics and Informatics at Comenius University in Bratislava, Slovakia. My undergrad thesis advisor was Martin Škoviera. At Comenius University, I also organized Internet Problem Solving Contest, and local math and programming contests and olympiads for high school students.

My brother Martin is also a computer scientist.

In 2023, I taught a machine learning course at NYU.

Publications

Jonathan Gu, Dávid Pál, Kevin Ryan

Auction for Double-Wide Ads

Summary:
We propose an auction for online advertising where each ad occupies either one square or two horizontally-adjacent squares of a grid of squares. Our primary application are ads for products shown on retail websites such as Instacart or Amazon where the products are naturally organized into a grid. We propose efficient algorithms for computing the optimal layout of the ads and pricing of the ads. The auction is a generalization of the generalized second-price (GSP) auction used by internet search engines (e.g. Google, Microsoft Bing, Yahoo!).

Bib info:
draft

Download:
[Local copy] [Arxiv]
Francesco Orabona, Dávid Pál

Parameter-free Stochastic Optimization of Variationally Coherent Functions

Summary:
We design an algorithm for first-order stochastic optimization of functions over ℝ^d with bounded gradients. The algorithm converges almost surely to the global minimizer of a variationally coherent function. The class of variationally coherent functions contains convex functions and non-convex functions (quasi-convex, pseudo-convex). In particular, the class contains all continuously differentiable strictly convex functions with a minimizer. If the function is convex, for the same algorithm, function values converge to the minimum value at a rate Õ(‖x^*‖T^1/2+ε).

Bib info:
submitted to COLT 2021

Download:
[COLT 2021 submission] [Arxiv]
Alina Beygelzimer, Dávid Pál, Balázs Szörényi, Devanathan Thiruvenkatachari,
Chen-Yu Wei, Chicheng Zhang

Bandit Multiclass Linear Classification: Efficient Algorithms for the Separable Case

Summary:
Online multiclass linear classification with bandit feedback is a bandit version of the standard online multiclass classification problem using linear predictor. The bandit feedback is only binary, whether or not the label was correct or not. In this paper we focus on linearly separable case. We define two notions of linear separability, weak and strong. Under the strong linear separability condition, we design an efficient algorithm that achieves a near-optimal mistake bound of O(K/γ²). Under the more challenging weak linear separability condition, we design an efficient algorithm with a mistake bound of 2^{O(min(K log²(1/γ),
sqrt(1/γ) log K))}. We also prove various other results (hardness, mistake lower bounds, analyze simple algorithm).

Bib info:
Proceedings of ICML 2019

Download:
[ICML 2019 Proceedings] [Arxiv] [Local copy] [Poster] [Slides] [GitHub]
Alexander Golovnev, Dávid Pál, Balázs Szörényi

The information-theoretic value of unlabeled data in semi-supervised learning

Summary:
We quantify the separation between the numbers of labeled examples required to learn in two settings: Settings with and without the knowledge of the distribution of the unlabeled data. More specifically, we prove a separation by Θ(log(n)) multiplicative factor for the class of projections over the Boolean hypercube of dimension n. We prove that there is no separation for the class of all functions on domain of any size. Learning with the knowledge of the distribution (a.k.a. fixed-distribution learning) can be viewed as an idealized scenario of semi-supervised learning where the number of unlabeled data points is so great that the unlabeled distribution is known exactly. For this reason, we call the separation the value of unlabeled data.

Bib info:
Proceedings of ICML 2019

Download:
[ICML 2019 Proceedings] [Arxiv] [Local copy] [Poster] [Slides] [GitHub]
Satyen Kale, Zohar Karnin, Tengyuan Liang, Dávid Pál

Adaptive Feature Selection: Computationally Efficient Online Sparse Linear Regression under RIP

Summary:
Online sparse linear regression is an online problem where an algorithm repeatedly chooses a subset of coordinates to observe in an adversarially chosen feature vector, makes a real-valued prediction, receives the true label, and incurs the squared loss. The goal is to design an online learning algorithm with sublinear regret to the best sparse linear predictor in hindsight. Without any assumptions, this problem is known to be computationally intractable. In this paper, we make the assumption that data matrix satisfies restricted isometry property, and show that this assumption leads to computationally efficient algorithms with sublinear regret for two variants of the problem. In the first variant, the true label is generated according to a sparse linear model with additive Gaussian noise. In the second, the true label is chosen adversarially.

Bib info:
Proceedings of ICML 2017

Download:
[Arxiv] [Local copy] [ICML 2017]
Francesco Orabona, Dávid Pál

Open Problem: Parameter-Free and Scale-Free Online Algorithms

Summary:
There are algorithms that are adaptive to the competitor. There are algorithms that are adaptive to scale of the loss vectors. Are there efficient algorithms for online linear optimization that are both? For learning with expert advice it is possible to construct such algorithm; it is somewhat inefficient, however. Is there a linear-time algorithm? When the decision set is the Hilbert space, no algorithm, efficient or not, is known. We offer $100 for an efficient algorithm for Hilbert space.

Bib info:
Proceedings of COLT 2016

Download:
[COLT 2016 Proceedings version] [Local copy] [slides] [GitHub]
Francesco Orabona, Dávid Pál

From Coin-Betting to Parameter-Free Online Learning

Summary:
We show how to derive algorithms for online linear optimization over a Hilbert space and learning with expert advice from coin-betting algorithms. Using adaptive Kelly betting based on Krichevsky-Trofimov estimator as the base betting algorithm, we derive new algorithms for online linear optimization over a Hilbert space and learning with expert advice. The algorithms are very simple. The algorithm for learning with expert advice has regret O(√T (1 + D(u|p))) where D(u|p) is the Kullback-Leibler divergence between any competitor u and algorithm's prior p. Regret of the algorithm for online linear optimization over Hilbert spaces matches the currently known best results.

Bib info:
Proceedings of NIPS 2016

Download:
[arXiv] [NIPS 2016 Proceedings] [GitHub] ICML 2016 AUTOML Workshop: [paper] [slides]
Francesco Orabona, Dávid Pál

Scale-Free Online Learning

Summary:
This is the journal version of the ALT 2015 paper. We design and analyze 3 algorithms for online linear optimization that are invariant to scaling of the loss vectors. Two of them are instances of Follow The Regularized Leader (FTRL), the third is an instance of Mirror Descent. We show regret bounds for all 3 algorithms. One of the FTRL-based algorithms works for any domain, bounded or unbouded. In contrast, we give examples demonstrating that the Mirror-Descent-based algorithm has at least linear regret if the Bregman divergence induced by the regularizer is unbounded; this happens, in particular, when the domain is unbounded.

Bib info:
Theorerical Computer Science

Download:
[journal version] [journal website] [arXiv] [COLT 2016 Impromptu talk] [GitHub] [sources]
Francesco Orabona, Dávid Pál

Optimal Non-Asymptotic Lower Bound on the Minimax Regret of Learning with Expert Advice

Summary:
We show a non-asymptotic optimal lower bounds on the expected value of maximum of independent Gaussians and the expected value of maximum of independent symmetric random walks. Both bounds are asymptotically optimal, i.e. they imply known limit theorems for these quantities. Simple application of the lower bound for symmetric random walks is a lower bound of the regret for the problem of learning with expert advice which is a classical online learning problem.

Bib info:
draft

Download:
[arXiv]
Satyen Kale, Chansoo Lee, Dávid Pál

Hardness of Online Sleeping Combinatorial Optimization Problems

Summary:
We show computational hardness of sleeping variants of six combinatorial sequential decision making problems (Online Shortest Paths, Online Minimum Spanning Tree, Online k-Subsets, Online k-Truncated Permutations, Online Minimum Cut, and Online Bipartite Matching). We show that sleeping variants of these problems are as hard as PAC learning DNF expressions.

Bib info:
accepted to NIPS 2016

Download:
[arXiv]
Francesco Orabona, Dávid Pál

Scale-Free Algorithms for Online Linear Optimization

Summary:
We design algorithms for online linear optimization that are invariant to scaling of the loss vectors. Scale invariance, as a design principle, almost instantly leads to better algorithms and cleaner regret analysis. We design and analyze two scale-free variants of Follow The Regularized Leader Algorithm. The two algorithms are simple, fully adaptive and work with any strongly convex regularizer.

Bib info:
Proceedings of ALT 2015

Download:
[PDF] [arXiv] Slides [1] [2] [3]
Gábor Bartók, Dávid Pál, Csaba Szepesvári, István Szita

Online Learning Lecture Notes

Summary:
In spring 2011, Csaba taught an online learning course with occasional help from Gábor and István and me. As we were teaching, we were continuously writing lecture notes. The focus of the course was no-regret non-stochastic learning (learning with expert advice, switching experts, follow the regularized leader algorithm, online linear regression), and brief explanation of online-to-batch conversion and multi-armed bandits. Our goal was to develop the theory from first principles.

Bib info:
lecture notes

Download:
[PDF]
Yasin Abbasi-Yadkori, Dávid Pál, Csaba Szepesvári

Online-to-Confidence-Set Conversions and Application to Sparse Stochastic Bandits

Summary:
We show how to convert a regret bound for an online algorithm for linear prediction under quadratic loss (e.g. online least squares, online gradient, p-norm algorithm) into a confidence set for the weight vector of the linear model under any sub-Gaussian noise. The lower the regret of the algorithm, the smaller the confidence set. As an application, we use these confidence sets to construct online algorithms for the linear stochastic bandit problem. Using an online linear prediction algorithm for sparse models, we are able to construct an algorithm for the sparse linear stochastic bandit problem i.e. problem where the underlying linear model is sparse.

Bib info:
Proceedings of AISTATS 2012

Download:
[PDF], [Slides], [Overview Talk]
Yasin Abbasi-Yadkori, Dávid Pál, Csaba Szepesvári

Improved Algorithms for Linear Stochastic Bandits

Summary:
We provide an upper bound for the tail probability of the norm of a vector-valued martingale. This bound immediately gives rise to a confidence set (ellipsoid) for online regularized linear least-squares estimator. We apply these confidence sets to the stochastic multi-armed bandit problem and the stochastic linear bandit problem. We show that the regret of the modified UCB algorithm is constant for any fixed confidence parameter! Furthermore, we improve the previously known bounds for the stochastic linear bandit problem by a poly-logarithmic factor.

Bib info:
Proceedings of NIPS 2011

Download:
[PDF], [NIPS 2011 Proceedings], [arXiv]
Gábor Bartók, Dávid Pál, Csaba Szepesvári

Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments

Summary:
We continue with our ALT 2010 paper by classification of minimax regret of all partial games with finite number of actions and outcomes. However, we restrict ourselves to stochastic environments, i.e. the environment chooses outcomes i.i.d. from a fixed but unknown probability distribution. We show that regret of any game falls into one of the four categories: zero, T^1/2, T^2/3 or T. We provide algorithms that achieve that regret within logarithmic factor. We believe that this result can be lifted to adversarial setting also; however that's left as a future work.

Bib info:
Proceedings of COLT 2011

Download:
[PDF]
Gábor Bartók, Dávid Pál, Csaba Szepesvári

Toward a Classification of Partial Monitoring Games

Summary:
Partial monitoring games are online learning problems where a decision maker repeatedly makes a decision, receives a feedback and suffers a loss. Multi-armed bandits and dynamic pricing are special cases. We give characterization of min-max regret of all finite partial monitoring games with two outcomes. We show that regret of any such game falls into one of the four categories: zero, T^1/2, T^2/3 or T. Our characterization is nice and simple combinatorial/geometrical condition on the loss matrix.

Bib info:
Proceedings of ALT 2010, Gábor received for this paper the best student paper award.

Download:
[PDF], [arXiv]
Dávid Pál, Barnabás Póczos, Csaba Szepesvári

Estimation of Rényi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs

Summary:
We consider simple and computationally efficient nonparametric estimators of Rényi entropy and mutual information based on an i.i.d. sample drawn from an unknown, absolutely continuous distribution over R^d. The estimators are calculated as the sum of p-th powers of the Euclidean lengths of the edges of the `generalized nearest-neighbor' graph of the sample and the empirical copula of the sample respectively. Under mild conditions we prove the almost sure consistency of the estimators. In addition, we derive high probability error bounds assuming that the density underlying the sample is Lipschitz continuous.

Bib info:
Proceedings of NIPS 2010

Download:
conference version: [PDF], extended version with proofs: [PDF], [arXiv], poster: [PDF], [NIPS 2010 Proceedings]
Shai Ben-David, Dávid Pál, Shai Shalev-Shwartz

Agnostic Online Learning

Summary:
We generalize the Littlestone's online learning model to the agnostic setting where no hypothesis makes zero error. Littlestone defined a combinatorial parameter of the hypothesis class, which we call Littlestone's dimension and which determines the worst-case number of prediction mistakes made by an online learning algorithm in the realizable setting. Point of our paper is that Littlestone's dimension characterizes learnability in the agnostic case as well. Namely, we give upper and lower bounds on the regret in terms of the Littlestone's dimension.

This is a similar story to what happened to the Valiant's PAC model. The key quantity there is Vapnik-Chervonekis dimension. Valiant's PAC model is the ``realizable case''. Our work can be paralleled to what Haussler and others did in 1992 when they generalized the PAC model to the agnostic setting and showed that Vapnik-Chervonekis dimension remains the key quantity characterizing learnability there as well.

Bib info:
Proceedings of COLT 2009

Download:
[PDF], [slides], [slides sources]
Ph.D. thesis - University of Waterloo

Contributions to Unsupervised and Semi-Supervised Learning

Summary:
My thesis is essentially a compilation of the two older papers on clustering stability and an part of the paper on semi-supervised learning. The formal gaps, especially in the first paper on clustering stability, were filled in and presentation was improved.

Download:
[PDF], [sources]
Tyler Lu, Dávid Pál, Martin Pál

Showing Relevant Ads via Lipschitz Context Multi-Armed Bandits

Summary:
We study context multi-armed bandit problems where the context comes from a metric space and the payoff satisfies a Lipschitz condition with respect to the metric. Abstractly, a context multi-armed bandit problem models a situation in which, in a sequence of independent trials, an online algorithm chooses an action based on a given context (side information) from a set of possible actions so as to maximize the total payoff of the chosen actions. The payoff depends on both the action chosen and the context. In contrast, context-free multi-armed bandit problems, studied previously, model situations where no side information is available and the payoff depends only on the action chosen.

For concreteness we focus in this paper on an application to Internet search advertisement where the task is to display ads to a user of a Internet search engine based on his search query so as to maximize the click-through rate of the ads displayed. We cast this problem as a context multi-armed bandit problem where queries and ads form metric spaces and the payoff function is Lipschitz with respect to both the metrics. We present an algorithm, give upper bound on its regret and show an (almost) matching lower bound on the regret of any algorithm.

Bib info:
Proceedings of AISTATS 2010

Download:
[PDF], [slides], [slides sources]
Shai Ben-David, Tyler Lu, Dávid Pál, Miroslava Sotáková

Learning Low-Density Separators

Summary:
We define a novel, basic, unsupervised learning problem - learning the lowest density homogeneous hyperplane separator of an unknown probability distribution. This task is relevant to several problems in machine learning, such as semi-supervised learning and clustering stability. We investigate the question of existence of a universally consistent algorithm for this problem. We propose two natural learning paradigms and prove that, on input unlabeled random samples generated by any member of a rich family of distributions, they are guaranteed to converge to the optimal separator for that distribution. We complement this result by showing that no learning algorithm for our task can achieve uniform learning rates (that are independent of the data generating distribution).

Bib info:
Proceedings of AISTATS 2009

Download:
[PDF] [arXiv]
Shai Ben-David, Tyler Lu, Dávid Pál

Does Unlabeled Data Provably Help? Worst-case Analysis of the Sample Complexity of Semi-Supervised Learning

Summary:
We mathematically study the potential benefits of the use of unlabeled data to decrease classification error. We propose a simple model of semi-supervised learning (SSL) in which the unlabeled data distribution is perfectly known to the learner. We compare this model with the standard PAC model (and its agnostic version) for supervised learning, in which the unlabeled distribution is uknown to the learner. Does there exists a supervised algorithm which for any unlabeled distribution needs in the worst case (over possible targets) at most by a constant factor more labeled samples than any SSL learner? It seems that the ERM algorithm is such a candidate. We prove, for some special cases, that this indeed the case.

Bib info:
Proceedings of COLT 2008

Note:
See also Tyler's Master's thesis.

Download:
[PDF] [sources]
Gagan Aggarwal, S. Muthukrishnan, Dávid Pál, Martin Pál

General Auction Mechanism for Search Advertising

Summary:
We propose a new auction mechanism for search advertising that generalizes both Generalized Second Price auction (which is currently, in 2008, used by Google, Yahoo! and MSN) and the famous Vickrey-Clarke-Groves mechanism adapted to the search auctions. Our mechanism allows each bidder to specify a subset of slots in which (s)he is interested, and the value and the maximum price of each slot. For the auctioneer (the search engine) it allows to specify for every slot-bidder pair a reserve (minimum) price. Our auction computes the bidder-optimal stable matching. The running time is O(nk³), where n is the number of bidders and k is the number of slots. We also prove that our auction mechanism is truth-revealing, that is, dominant strategy of each bidder is not to lie.

Bib info:
Proceedings of WWW 2009

Download:
[PDF]
Shai Ben-David, Dávid Pál, Hans Ulrich Simon

Stability of K-means Clusterings

Summary:
This is follow-up of the previous paper. We consider the clustering algorithm A which minimizes the k-means cost precisely. (Such algorithm is not realistic, since the optimization problem is known to be NP-hard. Nevertheless, the classical Lloyd's heuristic a.k.a. "the k-mean algorithm" tries to do this.) We analyze how stable is the clustering outputted by A with respect to a random draw of the sample points. Given a probability distribution, draw two i.i.d. samples of the same size. If with high probability, the clustering of the first sample does not differ from the clustering of the second sample too much, we say that A is stable on the probability distribution. For discrete probability distributions we show that instability happens if and only if the probability distribution has two or more clusterings with optimal k-means cost.

Bib info:
Proceedings of COLT 2007

Download:
[PDF] (extended version), [PDF] (short conference version)
Shai Ben-David, Dávid Pál, Ulrike von Luxburg

A Sober look at Clustering Stability

Summary:
We investigate how stable is the clustering outputted by a clustering algorithm with respect to a random draw of the sample points. Given a probability distribution, draw two i.i.d. samples of the same size. If with high probability, the clustering of the first sample does not differ from the clustering of the second sample too much, we say that algorithm is stable on the probability distribution. We show that for that for algorithms that optimize some cost function stability depends on the uniqueness of the optimal solution of the optimization problem for the "infinite" sample represented by the probability distribution.

Bib info:
Proceedings of COLT 2006, Best Student Paper Award

Download:
[PDF], [Slides in PDF], [Slides source files]
Dávid Pál, Martin Škoviera

Colouring Cubic Graphs by Small Steiner Triple Systems

Summary:
Steiner triple system is a collection of 3-element subsets (called triples) of a finite set of points, such that every two points appear together in exactly one triple. Given Steiner triple system S, one can ask whether it is possible to color the edges of a cubic graph with points of S, in such way that colors of the three edges at a vertex form a triple of S. We construct a small Steiner triple system (of order 21) which colors all simple cubic graphs.

Note:
This is the main result of my "Magister" Thesis at Comenius University under supervision of my advisor Martin Škoviera, 2004.

Bib info:
Graphs and Combinatorics, 2007

Download:
[PDF], [Postscript], [Source files]
Diploma thesis - Comenius University

Steiner Colorings of Cubic Graphs

Summary:
Steiner triple system is a collection of 3-element subsets (called triples) of a finite set of points, such that every two points appear together in exactly one triple. Given Steiner triple system S, one can ask whether it is possible to color the edges of a cubic graph with points of S, in such way that colors of the three edges at a vertex form a triple of S. The main result of my diploma thesis is that every simple cubic graph is colorable by a Steiner triple system of order 21 that is a product of the trivial Steiner triple and Fano plane. Certain cubic graphs with multiple edges are not colorable by any Steiner triple system. All the other ones are colorable by the Steiner triple system above.

Download:
[PDF], [Postscript], [Source files]

Last updated: April 28, 2026

Dávid Pál

Home address

Work address

Publications

Summary:

Bib info:

Download:

Summary:

Bib info:

Download:

Summary:

Bib info:

Download:

Summary:

Bib info:

Download:

Summary:

Bib info:

Download:

Summary:

Bib info:

Download:

Summary:

Bib info:

Download:

Summary:

Bib info:

Download:

Summary:

Bib info:

Download:

Summary:

Bib info:

Download:

Summary:

Bib info:

Download:

Summary:

Bib info:

Download:

Summary:

Bib info:

Download:

Summary:

Bib info:

Download:

Summary:

Bib info:

Download:

Summary:

Bib info:

Download:

Summary:

Bib info:

Download:

Summary:

Bib info:

Download:

Summary:

Download:

Summary:

Bib info:

Download:

Summary:

Bib info:

Download:

Summary:

Bib info:

Note:

Download:

Summary:

Bib info:

Download:

Summary:

Bib info:

Download:

Summary:

Bib info:

Download:

Summary: