Reinforcement Learning for Joint Optimization of Multiple Rewards

Agarwal, Mridul; Aggarwal, Vaneet

Computer Science > Machine Learning

arXiv:1909.02940 (cs)

[Submitted on 6 Sep 2019 (v1), last revised 9 Jan 2023 (this version, v4)]

Title:Reinforcement Learning for Joint Optimization of Multiple Rewards

Authors:Mridul Agarwal, Vaneet Aggarwal

View PDF

Abstract:Finding optimal policies which maximize long term rewards of Markov Decision Processes requires the use of dynamic programming and backward induction to solve the Bellman optimality equation. However, many real-world problems require optimization of an objective that is non-linear in cumulative rewards for which dynamic programming cannot be applied directly. For example, in a resource allocation problem, one of the objectives is to maximize long-term fairness among the users. We notice that when an agent aim to optimize some function of the sum of rewards is considered, the problem loses its Markov nature. This paper addresses and formalizes the problem of optimizing a non-linear function of the long term average of rewards. We propose model-based and model-free algorithms to learn the policy, where the model-based policy is shown to achieve a regret of $\Tilde{O}\left(LKDS\sqrt{\frac{A}{T}}\right)$ for $K$ objectives combined with a concave $L$-Lipschitz function. Further, using the fairness in cellular base-station scheduling, and queueing system scheduling as examples, the proposed algorithm is shown to significantly outperform the conventional RL approaches.

Comments:	Accepted JMLR, Jul 2022
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Information Theory (cs.IT); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
Cite as:	arXiv:1909.02940 [cs.LG]
	(or arXiv:1909.02940v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1909.02940
Journal reference:	JMLR, 2022

Submission history

From: Vaneet Aggarwal [view email]
[v1] Fri, 6 Sep 2019 14:48:07 UTC (484 KB)
[v2] Thu, 28 Nov 2019 20:42:51 UTC (744 KB)
[v3] Fri, 19 Mar 2021 05:10:01 UTC (339 KB)
[v4] Mon, 9 Jan 2023 13:31:39 UTC (228 KB)

Computer Science > Machine Learning

Title:Reinforcement Learning for Joint Optimization of Multiple Rewards

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Reinforcement Learning for Joint Optimization of Multiple Rewards

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators