# Bandit_simulations
Bandit algorithms simulations and analysis for online learning
This repo is part of my interest to learn more about optimisation for online learning algorithms which are heavily centerd on bandit theory. Based on what I understand, there are different types of bandit problems:
- __Multi-armed bandits:__ Bandits arms are inherently non-differentiable except for their inherent reward function. For multiple arm bandits, the objective is to determine the bandit with the highest reward function via online learning, which is a classic explore-versus-exploit problem.
- __Contextual bandits:__ Bandits with features (aka context) that interact differently with different actions. Different contextual features will require different actions to return the reward. This can be perceived as a classification problem: given input features aka context, what is the right classification of "actions" that will return high accuracy/reward?
This repo is segmented into both Python and R.
- Python:
- __Phase 1 (MAB analysis):__
Comprises coding of certain Multi-Armed Bandit algorithms for experimentation.
- __Phase 2 (CB analysis):__
Implementation of contextual bandit algorithms starting with LinUCB Disjoint and LinUCB Hybrid based on [A Contextual-Bandit Approach to Personalized News Article Recommendation](https://arxiv.org/pdf/1003.0146.pdf).
- __Phase 3 (CB analysis):__ Utilise use `vowpal wabbit` package for online learning for contextual bandits simulation
- R:
- __Phase 4 (MAB & CB analysis):__ Using R library package `contextual` that has a comprehensive ecosystem for different algorithm and policies
## Analysis and Code Implementation
__Phase 1 MAB analysis includes:__
- [Epsilon Greedy](https://github.com/kfoofw/bandit_simulations/blob/master/python/multiarmed_bandits/analysis/eps-greedy.md)
- [SoftMax](https://github.com/kfoofw/bandit_simulations/blob/master/python/multiarmed_bandits/analysis/softmax.md)
- [UCB](https://github.com/kfoofw/bandit_simulations/blob/master/python/multiarmed_bandits/analysis/ucb.md)
- [Thompson Sampling](https://github.com/kfoofw/bandit_simulations/blob/master/python/multiarmed_bandits/analysis/ts.md)
__Phase 2 CB analysis (Currently ongoing):__
- [LinUCB Disjoint Implementation and Analysis with a Dataset](https://github.com/kfoofw/bandit_simulations/blob/master/python/contextual_bandits/analysis/linUCB%20disjoint%20implementation%20and%20analysis.md)
- [LinUCB Hybrid Implementation and Analysis with a MovieLens Dataset for Recommender Systems](https://github.com/kfoofw/bandit_simulations/blob/master/python/contextual_bandits/analysis/linUCB%20hybrid%20implementation%20and%20analysis.md)
## Special Mention
A portion of the MAB code is based on the book ["Bandit Algorithms for Website Optimization"](https://www.oreilly.com/library/view/bandit-algorithms-for/9781449341565/) by John Myles White.
Microsoft's `vowpal wabbit` package for Python can be found in this [Github repo](https://github.com/VowpalWabbit/vowpal_wabbit).
The R package for `contextual` can be found in this [Github repo](https://github.com/Nth-iteration-labs/contextual).
没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
收起资源包目录
用于在线学习的Bandit算法模拟___下载.zip (99个子文件)
bandit_simulations-master
r
scripts
contextual
CABs
simulations_onlineadvertising.R 3KB
MABs
ts_bootstrap.R 1KB
mabs.R 1024B
epsilongreedy.R 584B
python
contextual_bandits
data
ml-100k
u3.test 387KB
u.occupation 193B
u3.base 1.51MB
README 7KB
u2.base 1.51MB
u5.base 1.51MB
u.genre 202B
u1.test 383KB
u.info 36B
mku.sh 643B
u5.test 388KB
u4.base 1.51MB
u.item 231KB
ub.test 182KB
u.data 1.89MB
ub.base 1.71MB
ua.base 1.71MB
u.user 22KB
u2.test 386KB
allbut.pl 716B
u1.base 1.51MB
u4.test 388KB
ua.test 182KB
news_dataset.txt 2.04MB
analysis
linUCB disjoint implementation and analysis.md 20KB
linUCB hybrid implementation and analysis.md 35KB
img
paper_hybrid_stddev_defn.png 7KB
paper_eqn_2.png 3KB
paper_eqn_5.png 9KB
hybrid_simulation_alpha_0.5.png 12KB
hybrid_simulation_alpha_1.0.png 11KB
paper_stddev_defn.png 2KB
hybrid_simulation_avg_reward.png 21KB
paper_hybrid_argmax.png 23KB
simulation_alpha_0.5.png 11KB
simulation_alpha_1.5.png 12KB
paper_disjoint_algo.png 66KB
hybrid_simulation_alpha_0.25.png 12KB
ucb_eqn.png 2KB
compare_disjoint_hybrid.png 18KB
ucb_example1.png 6KB
paper_theta_simplified.png 2KB
paper_b_defn.png 872B
paper_hybrid_eqn.png 4KB
caltech_linucb_lect_hybrid.png 128KB
paper_mean_defn.png 2KB
caltech_linucb_lect.png 229KB
paper_A_defn.png 2KB
ucb_example.png 6KB
paper_eqn_3.png 3KB
hybrid_simulation_alpha_1.5.png 11KB
paper_hybrid_algo.png 157KB
simulation_alpha_1.0.png 12KB
notebooks
avg_hybrid_ctr.pickle 5KB
hybrid_ctr.pickle 0B
disjoint_ctr.pickle 1.26MB
LinUCB_disjoint.ipynb 60KB
avg_disjoint_ctr.pickle 5KB
LinUCB_hybrid.ipynb 228KB
multiarmed_bandits
analysis
ts.md 13KB
softmax.md 13KB
eps-greedy.md 12KB
ucb.md 13KB
img
bayes_rule_denom.png 1KB
bayes_rule_likelihood.png 2KB
cum-reward_5-arms_0dot1-0dot9_ts.png 27KB
cum-reward_5-arms_0dot1-0dot9_epsg.png 42KB
cum-reward_5-arms_0dot8-0dot9_epsg.png 32KB
cum-regret_5-arms_0dot8-0dot9_ts.png 24KB
cum-regret_5-arms_0dot8-0dot9_ucb.png 26KB
rate-best-arm_5-arms_0dot8-0dot9_epsg.png 51KB
rate-best-arm_5-arms_0dot8-0dot9_ucb.png 31KB
rate-best-arm_5-arms_0dot8-0dot9_ts.png 30KB
softmax_eqn.png 2KB
bayes_rule_posterior.png 6KB
cum-reward_5-arms_0dot8-0dot9_soft.png 32KB
cum-reward_5-arms_0dot1-0dot9_ucb.png 27KB
ucb_eqn.png 2KB
cum-regret_5-arms_0dot8-0dot9_epsg.png 38KB
rate-best-arm_5-arms_0dot1-0dot9_soft.png 48KB
bayes_rule.png 2KB
rate-best-arm_5-arms_0dot1-0dot9_epsg.png 55KB
rate-best-arm_5-arms_0dot1-0dot9_ts.png 28KB
cum-regret_5-arms_0dot8-0dot9_soft.png 36KB
rate-best-arm_5-arms_0dot1-0dot9_ucb.png 36KB
cum-reward_5-arms_0dot8-0dot9_ucb.png 27KB
rate-best-arm_5-arms_0dot8-0dot9_soft.png 43KB
cum-reward_5-arms_0dot1-0dot9_soft.png 40KB
bayes_rule_prior.png 3KB
cum-reward_5-arms_0dot8-0dot9_ts.png 27KB
notebooks
analysis.ipynb 2.64MB
.gitignore 52B
README.md 3KB
.Rhistory 20KB
bandit_simulations.Rproj 205B
共 99 条
- 1
资源评论
快撑死的鱼
- 粉丝: 2w+
- 资源: 9156
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 国产化中间件tongweb(嵌入式)
- 折笔画输入法4.02.zip
- tinygltf加载资源demo
- 车用驱动电机原理与控制基础-P147公式(6-71)~(6-74)
- Springboot 基于AbstractRoutingDataSource+aop读写分离方案实现代码
- 2023-04-06-项目笔记 - 第三百七十一阶段 - 4.4.2.369全局变量的作用域-369 -2025.01.07
- IMG_20250107_152102.jpg
- metrics-server配置文件yml
- GE通用公司PREDIX工业物联网平台技术白皮书
- 2023-04-06-项目笔记 - 第三百七十一阶段 - 4.4.2.369全局变量的作用域-369 -2025.01.07
- 国产化中间件tongweb(非嵌入式)
- 支持ARM架构的minio镜像,版本号为minio-RELEASE.2020-07-18T18-48-16Z-arm64
- Python自动化批量文件管理工具:基于自定义规则过滤文件夹中特定文件
- 学习flv.js的demo案例
- 广东省高清卫星地图全图
- 中国城镇智慧燃气发展研究报告
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功