Want to know the best way to gain attention on Steam? Add a demo.

Plus, more links to make you a little bit smarter today.

Jun 30, 2025

Making a new video every week until I make $5 million - Week 11

We’ve talked a lot about software development on this website, primarily in regards to specific tools or techniques. But we’ve never talked about the process of making software. How does that work?

Want to know the best way to gain attention on Steam? Add a demo.

'Valve was saying, in not so subtle terms, that everyone needs to make a demo.'

WebGames: Challenging General-Purpose Web-Browsing AI Agents

We introduce WebGames, a comprehensive benchmark suite designed to evaluate general-purpose web-browsing AI agents through a collection of 50+ interactive challenges. These challenges are specifically crafted to be straightforward for humans while systematically testing the limitations of current AI systems across fundamental browser interactions, advanced input processing, cognitive tasks, workflow automation, and interactive entertainment. Our framework eliminates external dependencies through a hermetic testing environment, ensuring reproducible evaluation with verifiable ground-truth solutions. We evaluate leading vision-language models including GPT-4o, Claude Computer-Use, Gemini-1.5-Pro, and Qwen2-VL against human performance. Results reveal a substantial capability gap, with the best AI system achieving only 43.1% success rate compared to human performance of 95.7%, highlighting fundamental limitations in current AI systems' ability to handle common web interaction patterns that humans find intuitive. The benchmark is publicly available at this http URL, offering a lightweight, client-side implementation that facilitates rapid evaluation cycles. Through its modular architecture and standardized challenge specifications, WebGames provides a robust foundation for measuring progress in development of more capable web-browsing agents.

WebLLM: A High-Performance In-Browser LLM Inference Engine

Advancements in large language models (LLMs) have unlocked remarkable capabilities. While deploying these models typically requires server-grade GPUs and cloud-based inference, the recent emergence of smaller open-source models and increasingly powerful consumer devices have made on-device deployment practical. The web browser as a platform for on-device deployment is universally accessible, provides a natural agentic environment, and conveniently abstracts out the different backends from diverse device vendors. To address this opportunity, we introduce WebLLM, an open-source JavaScript framework that enables high-performance LLM inference entirely within web browsers. WebLLM provides an OpenAI-style API for seamless integration into web applications, and leverages WebGPU for efficient local GPU acceleration and WebAssembly for performant CPU computation. With machine learning compilers MLC-LLM and Apache TVM, WebLLM leverages optimized WebGPU kernels, overcoming the absence of performant WebGPU kernel libraries. Evaluations show that WebLLM can retain up to 80% native performance on the same device, with room to further close the gap. WebLLM paves the way for universally accessible, privacy-preserving, personalized, and locally powered LLM applications in web browsers. The code is available at: this https URL.

What do I think about Lua after shipping a project with 60,000 lines of code?

Hi there! This is Oleg from Luden.io. We decided to have a deep and meaningful conversation about Lua programming language with Ivan Trusov, lead programmer of the video game Craftomation 101. It contains ~60,000 lines of Lua code and is made with Defold game engine.

What Is Analog Computing?

You don’t need 0s and 1s to perform computations, and in some cases it’s better to avoid them.

The Astukari Newsletter

Discussion about this post

The Astukari Newsletter

Want to know the best way to gain attention on Steam? Add a demo.

Plus, more links to make you a little bit smarter today.

Making a new video every week until I make $5 million - Week 11

What SDLC Means (And Why It Matters for Software Devs)

Want to know the best way to gain attention on Steam? Add a demo.

WebGames: Challenging General-Purpose Web-Browsing AI Agents

WebLLM: A High-Performance In-Browser LLM Inference Engine

What do I think about Lua after shipping a project with 60,000 lines of code?

What Is Analog Computing?

Discussion about this post