A finance simulation built on model diversity

A new build from the Hugging Face community turns a woodland market sandbox into a more interactive finance drama, with each character powered by a different small language model. The project, called Thousand Token Wood version 2, moves beyond the earlier toy simulation by putting the player in the role of a shadowy financier who can lend money, trade on information, bribe rivals, and manipulate alliances.

The creator describes the game as a second field report from the Build Small Hackathon and frames it as an experiment in how far small models can go when the surrounding system is carefully engineered. Rather than relying on one model across every agent, the new version assigns different labs' models to different creatures in the simulation, creating variation in how each character behaves.

Four labs, four model behaviors

Version 2 uses four models under the 32B limit: OpenAI's gpt-oss-20b, OpenBMB's MiniCPM3-4B, NVIDIA's Nemotron-Mini-4B, and a fine-tuned 0.5B Qwen model. The developer argues that the point is not simply to show off multiple models, but to make the simulated market feel more distinct because each participant reasons and responds differently.

That heterogeneity, according to the write-up, also exposed a practical lesson. The biggest challenges were not in the models themselves, but in how they were served. The project hit a shared deployment issue tied to vLLM, which required the CUDA toolkit's compiler to be present. Once the models were moved to a CUDA development image, the setup worked across the board.

The post also notes a handful of model-specific configuration quirks. gpt-oss-20b runs in its native MXFP4 format and fits on a 24GB L4 GPU. MiniCPM3 required a trust flag for remote code, while Nemotron loaded more cleanly. In each case, the differences were handled with small configuration changes rather than major rewrites.

Hidden tips and market surveillance

The biggest gameplay change is the insider-tip system. The player can whisper information that may be either true or false. Acting on a real tip can lead to profits, but repeated suspicious gains increase the risk of drawing the magistrate's attention. Once the player crosses a threshold, the investigation can lead to fines, frozen assets, or exile.

Because the creatures in the simulation should not know whether a tip is genuine, the project treats secrecy as a security problem rather than a prompt-writing problem. The truth value is stored outside the model prompts, stripped from the public event record, and checked by tests that scan the full prompt history for banned tokens. The author emphasizes that secret information should be assumed leaky unless the system can prove it is not.

Memory and relationships shape the economy

Version 2 also adds persistent relationships between characters and with the player. Each creature carries a sentiment score that shifts based on events such as loans, betrayals, alliances, or market manipulation. A hostile creature may refuse to borrow or demand worse terms, while allies may coordinate more closely and avoid undercutting each other.

To keep the system manageable for small models, the game does not feed full history back into prompts. Instead, each agent receives a bounded summary that compresses its strongest relationships into a short description. That approach keeps prompts from ballooning while still allowing behavior to change over time.

The author says this combination of bounded memory, hidden information, and model diversity makes the simulation feel more alive without depending on larger models. A sample run reportedly produced valid offers from all four models, no leaks of hidden tip data across scanned prompts, and consequences ranging from profitable insider positioning to regulatory scrutiny and eventual exile after a loan default or margin call.

The broader takeaway, according to the post, is that small models can work well as reliable format generators when wrapped in structure, parsing, and guardrails. The result is not just a technical demo, but a playable market story driven by constrained AI agents.