Publications
http://spadro.eu/?q=publications
frFaster rates for policy learning
http://spadro.eu/?q=node/45
<div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p>This article improves the existing proven rates of regret decay in optimal policy estimation. We give a margin-free result showing that the regret decay for estimating a within-class optimal policy is second-order for empirical risk minimizers over Donsker classes, with regret decaying at a faster rate than the standard error of an efficient estimator of the value of an optimal policy. We also give a result from the classification literature that shows that faster regret decay is possible via plug-in estimation provided a margin condition holds. Four examples are considered.</p></div></div></div>Fri, 21 Apr 2017 05:45:17 +0000zenno45 at http://spadro.euOn the estimation of the mean of a random vector
http://spadro.eu/?q=node/44
<div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p>We study the problem of estimating the mean of a multivariate distribution based on independent samples. The main result is the proof of existence of an estimator with a non-asymptotic sub-Gaussian performance for all distributions satisfying some mild moment assumptions.</p>
</div></div></div>Mon, 13 Mar 2017 14:40:12 +0000zenno44 at http://spadro.euA Minkowski Theorem for Quasicrystals
http://spadro.eu/?q=node/43
<div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p>The aim of this paper is to generalize Minkowskiâ€™s theorem. This theorem is usually stated for a centrally symmetric convex body and a lattice both included in R^n. In some situations, one may replace the lattice by a more general set for which a notion of density exists. In this paper, we prove a Minkowski theorem for quasicrystals, which bounds from below the frequency of differences appearing in the quasicrystal and belonging to a centrally symmetric convex body.</p></div></div></div>Mon, 13 Mar 2017 14:38:06 +0000zenno43 at http://spadro.euTargeted sequential design for targeted learning inference of the optimal treatment rule and its mean reward
http://spadro.eu/?q=node/42
<div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p>This article studies the targeted sequential inference of an optimal treatment rule (TR) and its mean reward in the non-exceptional case, i.e., assuming that there is no stratum of the baseline covariates where treatment is neither beneficial nor harmful, and under a companion margin assumption. Our pivotal estimator, whose definition hinges on the targeted minimum loss estimation (TMLE) principle, actually infers the mean reward under the current estimate of the optimal TR. This data-adaptive statistical parameter is worthy of interest on its own.</p></div></div></div>Tue, 13 Dec 2016 01:13:49 +0000zenno42 at http://spadro.euRefined Lower Bounds for Adversarial Bandits
http://spadro.eu/?q=node/40
<div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p>We provide new lower bounds on the regret that must be suffered by adversarial bandit algorithms. The new results show that recent upper bounds that either (a) hold with high-probability or (b) depend on the total lossof the best arm or (c) depend on the quadratic variation of the losses, are close to tight. Besides this we prove two impossibility results. First, the existence of a single arm that is optimal in every round cannot improve the regret in the worst case. Second, the regret cannot scale with the effective range of the losses.</p></div></div></div>Sat, 27 Aug 2016 08:05:58 +0000zenno40 at http://spadro.euConditional quantile sequential estimation for stochastic codes
http://spadro.eu/?q=node/39
<div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p>This paper is devoted to the estimation of conditional quantile, more precisely the quantile of the output of a real stochastic code whose inputs are in R d. In this purpose, we introduce a stochastic algorithm based on Robbins-Monro algorithm and on k-nearest neighbors theory. We propose conditions on the code for that algorithm to be convergent and study the non-asymptotic rate of convergence of the means square error. Finally, we give optimal parameters of the algorithm to obtain the best rate of convergence.</p>
</div></div></div>Mon, 22 Aug 2016 09:41:53 +0000zenno39 at http://spadro.euOn Explore-Then-Commit Strategies
http://spadro.eu/?q=node/38
<div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p>We study the problem of minimising regret in two-armed bandit problems with Gaussian rewards. Our objective is to use this simple setting to illustrate that strategies based on an exploration phase (up to a stopping time) followed by exploitation are necessarily suboptimal. The results hold regardless of whether or not the difference in means between the two arms is known.</p></div></div></div>Mon, 22 Aug 2016 09:36:21 +0000zenno38 at http://spadro.euExplore First, Exploit Next: The True Shape of Regret in Bandit Problems
http://spadro.eu/?q=node/37
<div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p>We revisit lower bounds on the regret in the case of multi-armed bandit problems. We obtain non-asymptotic, distribution-dependent bounds and provide straightforward proofs based only on well-known properties of Kullback-Leibler divergences. These bounds show in particular that in an initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they are deprived of all unnecessary complications.</p>
</div></div></div>Mon, 22 Aug 2016 09:35:16 +0000zenno37 at http://spadro.euMaximin Action Identification: A New Bandit Framework for Games
http://spadro.eu/?q=node/36
<div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p>We study an original problem of pure exploration in a strategic bandit model motivated by Monte Carlo Tree Search. It consists in identifying the best action in a game, when the player may sample random outcomes of sequentially chosen pairs of actions. We propose two strategies for the fixed-confidence setting: Maximin-LUCB, based on lower-and upper-confidence bounds; and Maximin-Racing, which operates by successively eliminating the sub-optimal actions. We discuss the sample complexity of both methods and compare their performance empirically.</p></div></div></div>Mon, 22 Aug 2016 09:33:38 +0000zenno36 at http://spadro.euOptimal Best Arm Identification with Fixed Confidence
http://spadro.eu/?q=node/35
<div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p>We provide a complete characterization of the complexity of best-arm identification in one-parameter bandit problems. We prove a new, tight lower bound on the sample complexity. We propose the 'Track-and-Stop' strategy, which is proved to be asymptotically optimal. It consists in a new sampling rule (which tracks the optimal proportions of arm draws highlighted by the lower bound) and in a stopping rule named after Chernoff, for which we give a new analysis.</p>
</div></div></div>Mon, 22 Aug 2016 09:33:03 +0000zenno35 at http://spadro.eu