The Future Will Be Smaller

2025-11-06

Big Muskie

In 1991 a twelve-thousand-ton machine was scrapped, immediately, by declaration of the mining company that operated it. Its price to build, in 1969, twenty-five million dollars, consuming two years of construction, two-hundred-thousand man hours. This enormous mechanical beast, known as Big Muskie, moved in inches per hour, hauled dirt in ton-grades, and was followed by a tail as thick as a car that pumped thirteen-thousand volts right into its pneumatic guts. You can visit Ohio where the skull of this giant remains: a bucket as large as a football field, rusted down to the color of greater times. Muskie was not killed just by environmental policy – the commerce of mining was already miniaturizing by the 1980s, with larger fleets of smaller machines, operating on more efficient power, as well as developments in better excavating technologies meant that Muskie was operable at a loss. Its bloated, single-purpose design meant that there was no recoverable future for the machine. It had to be scrapped and sold at a loss.

The prospects of scale are alluring and confusing. Our computers, our machines, our buildings, our economics – these systems expand and contract in waves of innovation. An insight brings a new direction, a fork in our thinking, and we flood the cavity greedily, denying the existence of the walls around us that we slowly fill, until with only a pocket of air left do we try to hammer through the stone, just to find a harder material behind it.

It is clear that edge device inference is in fashion. Many will tell you that this was an inevitable direction. The human psyche has a funny way of self-correcting, as if to preserve the maximum number of egos on the planet. But we cannot ignore the thousand shouting voices, merely two years ago, proclaiming the all-in-one intellect, the generalist, the AGI.

For those in the know, the problem with AGI existed then, as it existed a decade ago, as it still exists now. Take some problem: web development, inverse kinematics, architecture, ecology, query your preferred language model generalist on said task. Are the results correct? Does the problem have a correct solution? There is no denying that the result is useful, certainly. Astounding amounts of contextualization, summarisation, and cross-domain congregation are possible, making these intelligent generations wonderful thought machines for learning, reasoning, helping us conceptualize new domains, or providing a thinking canvas for our own limited working memories. But these outputs are not solutions.

Here’s something: in 2019 we were using four-layer neural networks to solve animation and motion tracking tasks. These models had a few-million parameters, consumed a few hundred thousand data points of training, and performed reasonably well on the task as long as it was in-domain. No billion parameter generalist model today has exceeded this performance. Six years later, we have no multimodal model that outperforms a domain-specific reinforcement learning agent either. What is happening? The general consensus is if just barrel though the current way, wiring together these general learners, hauling enormous quantities of data, scaling our hardware, energy, compute, and models, eventually we will break these thresholds.

We are breaking benchmarks with these systems. That is why millions of people worldwide use them today. But we should not mistake the ability to haul thousands of tons of earth as the equivalent to digging a continent.

Video-Gen To The Moon

Wan videos are astounding. We see they can generate long stories, with visceral details, comply to complex human motion, keeping our elbows and arms as straight as ever, and can capture the likeness of input images, transporting our cat onto a bike, or planting our friend into a medieval, cyberpunk combat mashup. Reading the paper, it is interesting to note that a quarter of its fifty pages are dedicated to data management. Another quarter is dedicated to memory and computation management, and only a few sections are dedicated to model architecture concerns. This is very telling. Near the end third of the paper, the Wan team presents a ‘real time’ inference model that can generate minutes of recurring video. A grid of frames of a car traversing a desert in a video game aesthetic are shown. Running at eight to twelve frames-per-second, it is impressive. The hardware required for this feat is a single H100 NVIDIA GPU, which operates approximately at ten-thousand tensor-flops (TFLOPS). For some context, the October Steam hardware survey announced that a regression in GPU popularity occurred. The NVIDIA 3090 was the most popular consumer GPU, down from the 4060 earlier that year. Whether this is trend away from video games as a consumer market is a worthwhile theory, but even if we assume that the 4060 is the reigning GPU, it is worth noting that the 3090 has a max AI inference time of one-hundred TFLOPS, a one-hundred-fold decrease. Although it is more complicated that just saying that this means Wan’s fast video inference model will run one-hundred-times slower, even if we are off by an order of magnitude, this is a significant degradation.

It is no wonder then that Google, Odyssey, OpenAI, Facebook, Runway, are selling their services through the cloud. Even if we 10x our performance by next year through model development and hardware improvement, the real-time promise of the current model generation is still tied to this strategy.

But why are we incurring this cost? Why are we starting from the principle of large models and then trying to contract? Already, as mentioned, there is proof that smaller models solve specific problems better.

The answer is that they want it all.

We Do Not Need It All

Borders was the United States most successful bookstore in the early two-thousands. The conglomerate had twenty-thousand stores across the country, revenue of forty-million a year, and had plans to expand. It died in 2016, shrunk down by technical innovations of the internet, quartered at the knee by amazon, failing to innovate. Borders had every book it felt. But nowadays it feels the opposite when you enter these large bookstore chains. They have every popular book, but not the newest obscure publication or the out-of-print fables collection. They straddle some window of time, just short of the future and one step behind the present, wherein they call this everything. Small bookstores are not on the rise, in fact they are declining due to economic pressure – but this does not discount the general frustration with not having these smaller outlets. They streamline the discovery of new literary avenues, allow readers to faithfully dive into new, unusual directions, and even begin a process by which undiscovered talent can bubble up to the surface of the larger market.

The video games of the future are going to be very different from the video games of the present, we would hope. In ten years, we should reflect on today’s incumbents – Roblox, Fortnite, Battlefield, Hollow Knight – and see them as we see the Atari 2600 games of the past: interesting, novel artifacts from which we can learn about the limitations and constraints that developers at that time faced. But it is worth being pragmatic about the steps required to get to this new plane of play. We did not skip from Atari 2600 Adventure! to Doom 1995 in one step – the gradations are quite obvious, the hardware and software developments steady – the game design evolution transient but traceable. Equally, it is unhelpful to dispense of the video game model entirely because of a poor market and the vision of alternative technologies. With the release of the SNES in 1991, Nintendo recovered a dead market, killed by the greedy tactics of the Warner Brothers-purchased Atari corporation a couple years prior. Their console release was in fact a technical regression from pure hardware numbers, but they focused on damage control and game design innovation – actively limiting game output while prioritizing quality. Their games were innovative, but still shared many similar affordances of design from a couple years ago.

The parallel is worth drawing to AI as it is used in video games now. There are two clear axes of innovation being explored. The first is to use AI in current development pipelines. The sell here is to reduce development costs or scale production ambitions or to accelerate smaller teams. The second is to use AI to “redefine the game experience (TM).” Unfortunately, this second direction is tied to very broad brushstrokes. Keywords enjoyed by this view include: personalization, unlimited, players become creators, games as social platforms, platforms, platforms, platforms. It is like the multimedia explosion from the late nineties, when the entire world became obsessed with the CD, believed that all media would be compressed, crunched, chewed and smashed into one – we would have movies in video games! Songs in movies! Graphics songs! Video games would be more than video games, they would be these interactive multidimensional worlds. And to be fair, we are closer now than ever before to this promise of a single, all-encompassing cross-domain data standard, which is very exciting.

But the curse of the above thinking is to treat it as an inevitable. This breaks your brain. It flattens your innovation, and it generally makes it all so dry and so joyless for everyone when you are the loudest person shouting in the room about it.

In other words, we should think in gradations rather than leaps.

If we hook an AI model into our video game, the potential for personalization is obviously there. Let’s inspect some petty non-AI video game numbers. I am including only human-authored content here: - cards in Hearthstone: 3,000 - entities in Halo: 500 - creatures in Pokemon: 700

What menial digits are these. They float like a weak spec in comparison to the billions of possible outputs from our grandest models. Our systems that know all and can form it as clay to a fine structure. Inside one billion-parameter model, there are millions of new Pokemon waiting to be discovered. If we could just pipe this into the game and then…

But this line of thinking does not work. There is something else missing. People do not want everything, nor do they need everything. It would be nice to have a general system that could satisfy the heterogeneous desires of our multiplex culture. But we also must invent new culture. Before Zelda, there was no Zelda. Before Doom, there was no Doom. It is fun to think of a maleable first-person-shooter, that twists like marble fudge to your own personal desires, supplanting your face upon its own protagonist. Shoot your friends!

But this is just one idea. One way to do it.

The Other Way

The other way is to desire less and seek the novel. The value of machine learning for games is not just in its quantity of output. Neural networks have interesting properties in their own right. And at small scales can produce quite extraordinary results. Take PixNerd, which can generate a two-layer neural network that can implicitly generate an entire image – similar to the SIREN and NeRF models of yore. Or these tiny worlds by old DeepMind research groups that already generated interesting, interpolatable spaces (NeRF paper of 3D renders). Or the popular Tiny Recurrent Model (TRM) that shook the Korean stock exchange for its ability to equal Grok-5 on specific domains with 0.001% of the parameters. Or neural cellular automata (NCA) which can generate self-sustaining images, subject to damage of the pixels, via a single-layer neural network.

In all of these models, it is clear that their unique function in their domain gives them interesting properties. There are video games inside each of these spaces, waiting to be developed. And the most ironic part is that these models existed before edge inference became mainstream. It might be that everyone is talking about edge inference now, treating it as the last gasp for a gold rush. Edge inference has always been here, sitting under our noses, waiting for us to exploit tiny models to do tiny and interesting novelties for us. Some will succeed in getting large image generators to run hyper-fast on local machines . But it is worth reminding ourselves that this is not the only race there is. We can build from the ground up, small models on our own datasets, that expand game play and entertainment in new and interesting ways. These will not capture enormous consumer trends at the start, for they will have the hard problem of carving out their own culture for the game world. But the steps to get here are few more straightforward and interesting than the race to the bottom of the generalist model landscape.

Let us seek the small.