Training a Generally Curious Agent

Plus, more links to make you a little bit smarter today.

Jul 07, 2025

Making a new video every week until I make $5 million - Week 12

Last time I had to study for a technical interview, I ended up getting two full blog articles out of it. Seeing how successful it was, I’ve decided to do it again – this time on software architecture!

Visual guide to SSH tunneling and port forwarding

To make it quick, I wish I had known about port forwarding and tunneling earlier. With this blog post, I try to understand it better myself and share some experiences and tips with you.

Tutorial on Using Machine Learning and Deep Learning Models for Mental Illness Detection

Social media has become an important source for understanding mental health, providing researchers with a way to detect conditions like depression from user-generated posts. This tutorial provides practical guidance to address common challenges in applying machine learning and deep learning methods for mental health detection on these platforms. It focuses on strategies for working with diverse datasets, improving text preprocessing, and addressing issues such as imbalanced data and model evaluation. Real-world examples and step-by-step instructions demonstrate how to apply these techniques effectively, with an emphasis on transparency, reproducibility, and ethical considerations. By sharing these approaches, this tutorial aims to help researchers build more reliable and widely applicable models for mental health research, contributing to better tools for early detection and intervention.

Trends and Reversion in Financial Markets on Time Scales from Minutes to Decades

We empirically analyze the reversion of financial market trends with time horizons ranging from minutes to decades. The analysis covers equities, interest rates, currencies and commodities and combines 14 years of futures tick data, 30 years of daily futures prices, 330 years of monthly asset prices, and yearly financial data since medieval times.
Across asset classes, we find that markets are in a trending regime on time scales that range from a few hours to a few years, while they are in a reversion regime on shorter and longer time scales. In the trending regime, weak trends tend to persist, which can be explained by herding behavior of investors. However, in this regime trends tend to revert before they become strong enough to be statistically significant, which can be interpreted as a return of asset prices to their intrinsic value. In the reversion regime, we find the opposite pattern: weak trends tend to revert, while those trends that become statistically significant tend to persist.
Our results provide a set of empirical tests of theoretical models of financial markets. We interpret them in the light of a recently proposed lattice gas model, where the lattice represents the social network of traders, the gas molecules represent the shares of financial assets, and efficient markets correspond to the critical point. If this model is accurate, the lattice gas must be near this critical point on time scales from 1 hour to a few days, with a correlation time of a few years.

Training a Generally Curious Agent

Efficient exploration is essential for intelligent systems interacting with their environment, but existing language models often fall short in scenarios that require strategic information gathering. In this paper, we present Paprika, a fine-tuning approach that enables language models to develop general decision-making capabilities that are not confined to particular environments. By training on synthetic interaction data from different tasks that require diverse strategies, Paprika teaches models to explore and adapt their behavior on a new task based on environment feedback in-context without more gradient updates. Experimental results show that models fine-tuned with Paprika can effectively transfer their learned decision-making capabilities to entirely unseen tasks without additional training. Unlike traditional training, our approach's primary bottleneck lies in sampling useful interaction data instead of model updates. To improve sample efficiency, we propose a curriculum learning strategy that prioritizes sampling trajectories from tasks with high learning potential. These results suggest a promising path towards AI systems that can autonomously solve novel sequential decision-making problems that require interactions with the external world.

The Astukari Newsletter

Discussion about this post

The Astukari Newsletter

Training a Generally Curious Agent

Plus, more links to make you a little bit smarter today.

Making a new video every week until I make $5 million - Week 12

5 Software Architectures Everyone Should Know

Visual guide to SSH tunneling and port forwarding

Tutorial on Using Machine Learning and Deep Learning Models for Mental Illness Detection

Trends and Reversion in Financial Markets on Time Scales from Minutes to Decades

Training a Generally Curious Agent

Discussion about this post