Learning to Play Atari in a World of Tokens

Plus, more links to make you a little bit smarter today.

Feb 12, 2025

Stress Testing My New Art Practice App

A Deep Dive on Networking

This past week, I’ve had to refresh myself on my networking and cybersecurity knowledge. I figure that this makes for a perfect opportunity to do an explainer post on each topic. This week, we’ll be starting with networking…

This paper proposes a new algorithm for learning accurate tree-based models while ensuring the existence of recourse actions. Algorithmic Recourse (AR) aims to provide a recourse action for altering the undesired prediction result given by a model. Typical AR methods provide a reasonable action by solving an optimization task of minimizing the required effort among executable actions. In practice, however, such actions do not always exist for models optimized only for predictive performance. To alleviate this issue, we formulate the task of learning an accurate classification tree under the constraint of ensuring the existence of reasonable actions for as many instances as possible. Then, we propose an efficient top-down greedy algorithm by leveraging the adversarial training techniques. We also show that our proposed algorithm can be applied to the random forest, which is known as a popular framework for learning tree ensembles. Experimental results demonstrated that our method successfully provided reasonable actions to more instances than the baselines without significantly degrading accuracy and computational efficiency.

Learning to Play Atari in a World of Tokens

Model-based reinforcement learning agents utilizing transformers have shown improved sample efficiency due to their ability to model extended context, resulting in more accurate world models. However, for complex reasoning and planning tasks, these methods primarily rely on continuous representations. This complicates modeling of discrete properties of the real world such as disjoint object classes between which interpolation is not plausible. In this work, we introduce discrete abstract representations for transformer-based learning (DART), a sample-efficient method utilizing discrete representations for modeling both the world and learning behavior. We incorporate a transformer-decoder for auto-regressive world modeling and a transformer-encoder for learning behavior by attending to task-relevant cues in the discrete representation of the world model. For handling partial observability, we aggregate information from past time steps as memory tokens. DART outperforms previous state-of-the-art methods that do not use look-ahead search on the Atari 100k sample efficiency benchmark with a median human-normalized score of 0.790 and beats humans in 9 out of 26 games. We release our code at this https URL.

Virtual avatar generation models as world navigators

We introduce SABR-CLIMB, a novel video model simulating human movement in rock climbing environments using a virtual avatar. Our diffusion transformer predicts the sample instead of noise in each diffusion step and ingests entire videos to output complete motion sequences. By leveraging a large proprietary dataset, NAV-22M, and substantial computational resources, we showcase a proof of concept for a system to train general-purpose virtual avatars for complex tasks in robotics, sports, and healthcare.

Decision Machines: Congruent Decision Trees

The decision tree recursively partitions the input space into regions and derives axis-aligned decision boundaries from data. Despite its simplicity and interpretability, decision trees lack parameterized representation, which makes it prone to overfitting and difficult to find the optimal structure. We propose Decision Machines, which embed Boolean tests into a binary vector space and represent the tree structure as a matrices, enabling an interleaved traversal of decision trees through matrix computation. Furthermore, we explore the congruence of decision trees and attention mechanisms, opening new avenues for optimizing decision trees and potentially enhancing their predictive power.

SUBER: An RL Environment with Simulated Human Behavior for Recommender Systems

Reinforcement learning (RL) has gained popularity in the realm of recommender systems due to its ability to optimize long-term rewards and guide users in discovering relevant content. However, the successful implementation of RL in recommender systems is challenging because of several factors, including the limited availability of online data for training on-policy methods. This scarcity requires expensive human interaction for online model training. Furthermore, the development of effective evaluation frameworks that accurately reflect the quality of models remains a fundamental challenge in recommender systems. To address these challenges, we propose a comprehensive framework for synthetic environments that simulate human behavior by harnessing the capabilities of large language models (LLMs). We complement our framework with in-depth ablation studies and demonstrate its effectiveness with experiments on movie and book recommendations. Using LLMs as synthetic users, this work introduces a modular and novel framework to train RL-based recommender systems. The software, including the RL environment, is publicly available on GitHub.

The Astukari Newsletter

Discussion about this post

The Astukari Newsletter

Learning to Play Atari in a World of Tokens

Plus, more links to make you a little bit smarter today.

Stress Testing My New Art Practice App

A Deep Dive on Networking

Learning Decision Trees and Forests with Algorithmic Recourse

Learning to Play Atari in a World of Tokens

Virtual avatar generation models as world navigators

Decision Machines: Congruent Decision Trees

SUBER: An RL Environment with Simulated Human Behavior for Recommender Systems

Discussion about this post