Do LLM Agents Have Regret?

Plus, more links to make you a little bit smarter today.

Jan 08, 2025

I Made A Fundraiser For My Upcoming Book

Markov Chains and You!

This is an introductory article to a new series I’ll be trying out. I love asking ChatGPT conversations about stuff I don’t understand/want to understand more, because it’s gotten to the point where its pretty damn accurate for most topics, and I pretty much get to ask as many stupid questions as I want. But then I got to thinking — maybe other people can learn from this. So, I get to learn about new things I’m curious about, but also I get a free topic for an article to write for my readers? Sign me up!

In this study, we evaluate the optimization capabilities of Large Language Models (LLMs) across diverse mathematical and combinatorial optimization tasks, where each task is described in natural language. These tasks require LLM to iteratively generate and evaluate solutions through interactive prompting, where each optimization step involves generating new solutions based on past results and then pass to subsequent iterations. We demonstrate that LLMs can perform various optimization algorithms and act as effective black-box optimizers, capable of intelligently optimizing unknown functions. We also introduce three simple yet informative metrics to evaluate optimization performance, applicable across diverse tasks and less sensitive to test sample variations. Our findings reveal that LLMs excel at optimizing small-scale problems with limited data and their performance is significantly affected by the dimension of problem and values, highlighting the need for further research in LLM optimization.

Do LLM Agents Have Regret? A Case Study in Online Learning and Games

Large language models (LLMs) have been increasingly employed for (interactive) decision making, via the development of LLM-based autonomous agents. Despite their emerging successes, the performance of LLM agents in decision-making has not been fully investigated through quantitative metrics, especially in the multi-agent setting when they interact with each other, a typical scenario in real-world LLM-agent applications. To better understand the limits of LLM agents in these interactive environments, we propose to study their interactions in benchmark decision-making settings in online learning and game theory, through the performance metric of regret. We first empirically study the no-regret behaviors of LLMs in canonical online learning problems, as well as the emergence of equilibria when LLM agents interact through playing repeated games. We then provide some theoretical insights into the sublinear regret growth in the cases we observed, under certain assumptions on the supervised pre-training and the rationality model of human decision-makers who generate the data. Notably, we also identify (simple) cases where advanced LLMs such as GPT-4 fail to achieve regret sublinear in time. To further promote the no-regret behaviors, we propose a novel unsupervised training loss of regret-loss, which, in contrast to the supervised pre-training loss, does not require the labels of (optimal) actions. Finally, we establish the statistical guarantee of generalization bound for regret-loss minimization, and more importantly, the optimization guarantee that minimizing such a loss may automatically lead to known no-regret learning algorithms, when single-layer self-attention models are used. Our further experiments demonstrate the effectiveness of our regret-loss, especially in addressing the above “regrettable” cases.

Automated Market Making and Loss-Versus-Rebalancing

We consider the market microstructure of automated market makers (AMMs) from the perspective of liquidity providers (LPs). Our central contribution is a “Black-Scholes formula for AMMs”. We identify the main adverse selection cost incurred by LPs, which we call “loss-versus rebalancing” (LVR, pronounced “lever”). LVR captures costs incurred by AMM LPs due to stale prices that are picked off by better informed arbitrageurs. We derive closed-form expressions for LVR applicable to all automated market makers. Our model is quantitatively realistic, matching actual LP returns empirically, and shows how CFMM protocols can be redesigned to reduce or eliminate LVR.

On Fairness of Low-Rank Adaptation of Large Models

Low-rank adaptation of large models, particularly LoRA, has gained traction due to its computational efficiency. This efficiency, contrasted with the prohibitive costs of full-model fine-tuning, means that practitioners often turn to LoRA without a complete understanding of its ramifications. In this study, we focus on fairness and ask whether LoRA has an unexamined impact on utility, calibration, and resistance to membership inference across different subgroups (e.g., genders, races, religions) compared to a full-model fine-tuning baseline. We present extensive experiments across vision and language domains and across classification and generation tasks using ViT-Base, Swin-v2-Large, Llama-2 7B, and Mistral 7B. Intriguingly, experiments suggest that while one can isolate cases where LoRA exacerbates model bias across subgroups, the pattern is inconsistent—in many cases, LoRA has equivalent or even improved fairness compared to the base model or its full fine-tuning baseline. We also examine the complications of evaluating fine-tuning fairness relating to task design and model token bias, calling for more careful fairness evaluations in future work.

The Astukari Newsletter

Discussion about this post

The Astukari Newsletter

Do LLM Agents Have Regret?

Plus, more links to make you a little bit smarter today.

I Made A Fundraiser For My Upcoming Book

Markov Chains and You!

Towards Optimizing with Large Language Model

Do LLM Agents Have Regret? A Case Study in Online Learning and Games

Automated Market Making and Loss-Versus-Rebalancing

On Fairness of Low-Rank Adaptation of Large Models

Discussion about this post