Figures and Tables from this paper
- figure 1
- table 1
- table 2
- figure 4
Topics
Gang Of Bandits (opens in a new tab)Linear Bandit Algorithm (opens in a new tab)Bandit Algorithms (opens in a new tab)Recommendation Systems (opens in a new tab)Contextual Bandits (opens in a new tab)Network Structure (opens in a new tab)Clusters (opens in a new tab)Real-world Datasets (opens in a new tab)
151 Citations
- Qingyun WuHuazheng WangQuanquan GuHongning Wang
- 2016
Computer Science
SIGIR
This paper develops a collaborative contextual bandit algorithm, in which the adjacency graph among users is leveraged to share context and payoffs among neighboring users while online updating, and rigorously proves an improved upper regret bound.
- 105
- Highly Influenced
- PDF
- Meng FangD. Tao
- 2014
Computer Science, Mathematics
KDD
This paper formalizes the networked bandit problem and proposes an algorithm that considers not only the selected arm, but also the relationships between arms, in that it decides an arm depending on integrated confidence sets constructed from historical data.
- 27
- Bryce BernDawson D'almeidaWill KnospePaul Reich
- 2020
Computer Science
This paper explores the process of replicating the experiments and results from A Gang of Bandits in order to validate the paper and formalize the question about recommendation systems to be a multi-armed ban-dit problem.
- Highly Influenced
- PDF
- Xiaotong ChengCheng PanS. Maghsudi
- 2023
Computer Science
ICML
This work proposes CLUB-HG, a novel algorithm that integrates a game-theoretic approach into clustering inference and discovers the underlying users’ clusters inContextual bandit algorithms.
- Highly Influenced
- PDF
- Sharan Vaswani
- 2018
Computer Science, Mathematics
This thesis goes beyond the well-studied multi-armed bandit model to consider structured bandit settings and their applications, and proposes a bootstrapping approach and establishes theoretical regret bounds for it.
- Highly Influenced
- Liu YangBo LiuLeyu LinFeng XiaKai ChenQiang Yang
- 2020
Computer Science
RecSys
The proposed ClexB policy for online RecSys explores knowledge transfer and further aids the inferences about user interests and estimates user clustering more accurately and with less uncertainty via explorable-clustering.
- 16
- PDF
- M. HerbsterStephen PasterisFabio VitaleM. Pontil
- 2021
Computer Science
NeurIPS
Two learning algorithms are presented, GABA-I and GABA-II, which exploit the network structure to bias towards functions of low Ψ values and highlight improvements of both algorithms over running independent standard MABs across users.
- 9
- PDF
- A. CarpentierMichal Valko
- 2016
Computer Science
AISTATS
BARE is proposed, a bandit strategy for which a regret guarantee is proved that scales with the detectable dimension, a problem dependent quantity that is often much smaller than the number of nodes.
- 38
- PDF
- A. GhoshAbishek SankararamanK. Ramchandran
- 2022
Computer Science
ECML/PKDD
This paper seeks to theoretically understand the problem of minimizing regret in an N users heterogeneous stochastic linear bandits framework by studying the problem of minimizing regret in an N users heterogeneous stochastic linear bandits framework.
- PDF
- Trong-The NguyenHady W. Lauw
- 2014
Computer Science
CIKM
This work proposes an algorithm to divide the population of users into multiple clusters, and to customize the bandits to each cluster, and this clustering is dynamic, i.e., users can switch from one cluster to another, as their preferences change.
- 52
- Highly Influenced
- PDF
...
...
25 References
- Balázs SzörényiR. Busa-FeketeIstván HegedüsRóbert OrmándiMárk JelasityB. Kégl
- 2013
Computer Science
ICML
This work shows that the probability of playing a suboptimal arm at a peer in iteration t = Ω(log N) is proportional to 1/(Nt) where N denotes the number of peers participating in the network.
- 102
- PDF
- Aleksandrs Slivkins
- 2011
Mathematics, Computer Science
COLT
This work considers similarity information in the setting of contextual bandits, a natural extension of the basic MAB problem, and presents algorithms that are based on adaptive partitions, and take advantage of "benign" payoffs and context arrivals without sacrificing the worst-case performance.
- 438 [PDF]
- S. CaronB. KvetonM. LelargeSmriti Bhagat
- 2012
Computer Science
UAI
This paper considers stochastic bandits with side observations, a model that accounts for both the exploration/exploitation dilemma and relationships between arms and provides efficient algorithms based on upper confidence bounds to leverage this additional information and derive new bounds improving on standard regret guarantees.
- 105
- PDF
- Swapna BuccapatnamA. EryilmazN. Shroff
- 2013
Computer Science
52nd IEEE Conference on Decision and Control
The investigations in this work reveal the significant gains that can be obtained even through static network-aware policies, and proposes a randomized policy that explores actions for each user at a rate that is a function of her network position.
- 38
- PDF
- S. KarH. PoorShuguang Cui
- 2011
Computer Science, Mathematics
IEEE Conference on Decision and Control and…
A collaborative and adaptive distributed allocation rule DA is proposed and is shown to achieve the lower bound on the expected average regret for a connected inter-bandit communication network.
- 32
- PDF
- Lihong LiWei ChuJ. LangfordR. Schapire
- 2010
Computer Science
WWW '10
This work model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks.
- 2,662 [PDF]
- Wei ChuLihong LiL. ReyzinR. Schapire
- 2011
Computer Science, Mathematics
AISTATS
An O (√ Td ln (KT ln(T )/δ) ) regret bound is proved that holds with probability 1− δ for the simplest known upper confidence bound algorithm for this problem.
- 944
- Highly Influential
- PDF
- Yasin Abbasi-YadkoriD. PálCsaba Szepesvari
- 2011
Computer Science, Mathematics
NIPS
A simple modification of Auer's UCB algorithm achieves with high probability constant regret and improves the regret bound by a logarithmic factor, though experiments show a vast improvement.
- 1,549
- PDF
- Varsha DaniThomas P. HayesS. Kakade
- 2008
Mathematics, Computer Science
COLT
A nearly complete characterization of the classical stochastic k-armed bandit problem in terms of both upper and lower bounds for the regret is given, and two variants of an algorithm based on the idea of “upper confidence bounds” are presented.
- 831
- PDF
- Shie MannorOhad Shamir
- 2011
Computer Science, Mathematics
NIPS
Practical algorithms with provable regret guarantees are developed, which depend on non-trivial graph-theoretic properties of the information feedback structure and partially-matching lower bounds are provided.
- 205 [PDF]
...
...
Related Papers
Showing 1 through 3 of 0 Related Papers