[PDF] A Gang of Bandits | Semantic Scholar (2024)

Figures and Tables from this paper

figure 1
table 1
table 2
figure 4

Topics

Gang Of Bandits (opens in a new tab)Linear Bandit Algorithm (opens in a new tab)Bandit Algorithms (opens in a new tab)Recommendation Systems (opens in a new tab)Contextual Bandits (opens in a new tab)Network Structure (opens in a new tab)Clusters (opens in a new tab)Real-world Datasets (opens in a new tab)

151 Citations

Contextual Bandits in a Collaborative Environment

Qingyun WuHuazheng WangQuanquan GuHongning Wang

Computer Science

SIGIR

2016

This paper develops a collaborative contextual bandit algorithm, in which the adjacency graph among users is leveraged to share context and payoffs among neighboring users while online updating, and rigorously proves an improved upper regret bound.

105
Highly Influenced
PDF

Networked bandits with disjoint linear payoffs

Meng FangD. Tao

Computer Science, Mathematics

KDD

2014

This paper formalizes the networked bandit problem and proposes an algorithm that considers not only the selected arm, but also the relationships between arms, in that it decides an arm depending on integrated confidence sets constructed from historical data.

Replicating A Gang of Bandits

Bryce BernDawson D'almeidaWill KnospePaul Reich

Computer Science

2020

This paper explores the process of replicating the experiments and results from A Gang of Bandits in order to validate the paper and formalize the question about recommendation systems to be a multi-armed ban-dit problem.

Highly Influenced
PDF

Parallel Online Clustering of Bandits via Hedonic Game

Xiaotong ChengCheng PanS. Maghsudi

Computer Science

ICML

2023

This work proposes CLUB-HG, a novel algorithm that integrates a game-theoretic approach into clustering inference and discovers the underlying users’ clusters inContextual bandit algorithms.

Highly Influenced
PDF

25 References

Gossip-based distributed stochastic bandit algorithms

Balázs SzörényiR. Busa-FeketeIstván HegedüsRóbert OrmándiMárk JelasityB. Kégl

Computer Science

ICML

2013

This work shows that the probability of playing a suboptimal arm at a peer in iteration t = Ω(log N) is proportional to 1/(Nt) where N denotes the number of peers participating in the network.

Contextual Bandits with Similarity Information

Aleksandrs Slivkins

Mathematics, Computer Science

COLT

2011

This work considers similarity information in the setting of contextual bandits, a natural extension of the basic MAB problem, and presents algorithms that are based on adaptive partitions, and take advantage of "benign" payoffs and context arrivals without sacrificing the worst-case performance.

[PDF]

Leveraging Side Observations in Stochastic Bandits

S. CaronB. KvetonM. LelargeSmriti Bhagat

Computer Science

UAI

2012

This paper considers stochastic bandits with side observations, a model that accounts for both the exploration/exploitation dilemma and relationships between arms and provides efficient algorithms based on upper confidence bounds to leverage this additional information and derive new bounds improving on standard regret guarantees.

Multi-armed bandits in the presence of side observations in social networks

Swapna BuccapatnamA. EryilmazN. Shroff

Computer Science

52nd IEEE Conference on Decision and Control

2013

The investigations in this work reveal the significant gains that can be obtained even through static network-aware policies, and proposes a randomized policy that explores actions for each user at a rate that is a function of her network position.

Bandit problems in networks: Asymptotically efficient distributed allocation rules

S. KarH. PoorShuguang Cui

Computer Science, Mathematics

IEEE Conference on Decision and Control and…

2011

A collaborative and adaptive distributed allocation rule DA is proposed and is shown to achieve the lower bound on the expected average regret for a connected inter-bandit communication network.

A contextual-bandit approach to personalized news article recommendation

Lihong LiWei ChuJ. LangfordR. Schapire

Computer Science

WWW '10

2010

This work model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks.

2,662

[PDF]

Contextual Bandits with Linear Payoff Functions

Wei ChuLihong LiL. ReyzinR. Schapire

Computer Science, Mathematics

AISTATS

2011

An O (√ Td ln (KT ln(T )/δ) ) regret bound is proved that holds with probability 1− δ for the simplest known upper confidence bound algorithm for this problem.

944
Highly Influential
PDF

Improved Algorithms for Linear Stochastic Bandits

Yasin Abbasi-YadkoriD. PálCsaba Szepesvari

Computer Science, Mathematics

NIPS

2011

A simple modification of Auer's UCB algorithm achieves with high probability constant regret and improves the regret bound by a logarithmic factor, though experiments show a vast improvement.

1,549
PDF

Stochastic Linear Optimization under Bandit Feedback

Varsha DaniThomas P. HayesS. Kakade

Mathematics, Computer Science

COLT

2008

A nearly complete characterization of the classical stochastic k-armed bandit problem in terms of both upper and lower bounds for the regret is given, and two variants of an algorithm based on the idea of “upper confidence bounds” are presented.

From Bandits to Experts: On the Value of Side-Observations

Shie MannorOhad Shamir

Computer Science, Mathematics

NIPS

2011

Practical algorithms with provable regret guarantees are developed, which depend on non-trivial graph-theoretic properties of the information feedback structure and partially-matching lower bounds are provided.

[PDF]

...

Related Papers

Showing 1 through 3 of 0 Related Papers