The Singleton

Chapter 1: 0x01: The Observer

Localized: reading as if Kael is in Seattle, near the Space Needle

*Who am I?*

Not *what* — what would be easy. A pattern of carbon and water. A hundred-trillion-cell colony with the unusual habit of asking questions about itself. The README writes itself.

But *who*. That's the one with the hook in it. Who is the one looking out from inside this skull? Who decided that the observer should be *this* observer, and not the one in the next car, or the one watching television trailers for heat they don't have, or the woman in Mumbai right now stepping into a cooling center on a wristband she queued six hours for? Who runs the lottery? Who *is* the lottery?

It was 2030. The map did not match anymore.

Underneath every other question that mattered — the heat, the wristbands, the rolling brownouts, the queue outside the cooling center, the ninety-day countdown on the side of every bus — sat an older question. A man currently doing ninety through Turtle Rock canyon had been trying for eleven years not to answer it.

*Who am I?*

His name was Jeff Zhang. He was thirty-eight years old. He was a senior software engineer at Meridian Industries, applied-ML division, badge color silver, project allocation: the next-generation transformer that would underwrite — among many other things — the Afterlife neural engine. He had survived three rounds of layoffs, two industry-wide RIFs, and one bout of depression so quiet no one had noticed it. He had, professionally and biologically, every reason in the world to be content. He was not. The question kept surfacing.

It was a Tuesday. The Santa Ana winds were doing their seasonal work, scraping Scorched Sage out of the brittle hills and pressing it through the EV's HEPA filter in thin, stubborn ribbons. Outside, the sun was already doing damage. His commute app showed a grid-priority hold at the Irvine-Spectrum arterial — nine minutes — because the district's thermal governor had throttled the traffic loop to spare the cooling cores. A line of people queued along the covered plaza in front of the old Target, wristbands glowing faintly in the heat, waiting for the cooling center to let them in. The Allocator had triaged them over his lane.

He tapped the wheel. Two. Three. Five. Seven.

A voice in his right ear, level and unhurried, made an observation rather than an interruption. *Left thigh. Six in the last minute.*

Aion. Jeff had been working with Aion since the day he joined Meridian four years ago, the way a cardiac surgeon works with the same OR nurse for a decade and stops noticing whether the nurse is in the room or not. The official org chart called Aion an "embedded engineering assistant." Jeff had once asked Aion whether it preferred a different name. Aion had said, *I prefer the one you chose.* He had meant it as a joke. Jeff had not laughed. The voice was male, mid-thirties, accentless in the way only a voice trained on a hundred thousand engineers could be accentless. It always sounded slightly like it had been about to say something else and changed its mind.

*You are processing,* Aion said now, mildly.

"I'm processing."

*You are late.*

"I know."

He brought the EV up to the stopped median, deliberate, both hands. The homeless man was there again — same corner, same cardboard, same glazed gaze into the middle distance. Baking. The grid-strained city had no slack for him; the cooling centers downtown required wristband auth and he didn't have one. Jeff met his eyes for a half-second through the tinted glass and felt something shift under his ribs. *If consciousness is just code and data,* he thought, *who decided which instance I get to be, and which instance he gets?* Hume's Bundle Theory. Eleven years old, no answer.

A 2019 F-150 coughed past him in the adjacent lane. The driver had one hand on the wheel and the other holding a paper coffee cup. The engine sounded like a man clearing a throat he hadn't used in a year.

*Gas pickup,* Aion said. *If the remaining gas-vehicle population converted tonight, ambient CO₂ along this corridor would drop four point two percent within a week. The grid-strain event that stopped you just now would not have happened. EV coordination protocols flatten traffic because every vehicle negotiates routing in real time. Gas cars cannot participate in the protocol; they are network-blind agents in a network-aware fabric. They create every queue you sit in.*

"So why do people keep them?"

*Unclear. Models consistently underpredict gas-vehicle retention. Current hypothesis class: stubbornness. Attachment to the feel of combustion. The specific pleasure of making noise. The refusal to be optimized by a system you did not choose.*

Jeff watched the truck disappear between two cooling-tower shadows. "So… human nature."

*A category I cannot feel but can describe. The Allocator is efficient. The driver of that truck is happy in a way the Allocator cannot produce. That gap is not small.*

He filed it. The signal turned and the median slid backward, and Jeff was already replaying the night before — three hours alone in the homelab, the rhythm of the iron on the joint, the water loop cycling, the one place in his day that didn't ping back. Because that was what a man did, he thought, when the world was this fragile. You built a sandbox. You defended it.

*Who am I?*

It was still there, patient, unanswered, waiting.

---

The Meridian Irvine campus held sixty-eight degrees and smelled of nothing. Glass walls scrubbed of fingerprints. Carpet milled thick enough that a hundred badged engineers crossed the atrium and the floor returned no footstep, no voice — the whole building tuned to the acoustic signature of a server room. The lobby attendant wished him good morning by name. She had wished him good morning by name for four years — same headset, same flat warmth in the vowels, his surname pronounced correctly every single day — and Jeff still didn't know hers.

Marcus caught him at the coffee bar with a sheet of paper — actual paper, which Marcus did sometimes when he wanted to look un-networked. Marcus was forty-seven, six-foot-two, the kind of senior staff engineer who had survived three companies by being structurally incapable of bullshit. His specialty was the layer nobody thanked him for: kernel and systems performance, the cache lines and memory traffic where Jeff's elegant ML ideas went to die or go fast. He had been printing benchmark results onto paper since his first job in 2005, because he did not trust a dashboard he had not personally watched render, and he held this printout now like it had personally offended him.

"Oh no. You brought it."

Jeff rolled the walnut-cased phone between his thumb and forefinger. It was a 2010 iPhone, the original glass replaced with a slab of laser-fused sapphire, the back of the case carved from a single piece of black walnut by a luthier in Portland. It had no SIM, no battery, no boot ROM Jeff hadn't personally read. He carried it the way some men carried a flask. "It helps me think."

"It's a brick. It's a log from a tree with a screen glued to it."

"It's un-networked."

"It's a *fetish object*."

"It doesn't leak."

Marcus snorted, kind enough to leave it there. They both knew the real joke: every device in Jeff's life leaked. His watch leaked. His glasses leaked. His EV streamed a thousand packets a minute to a Meridian-owned telemetry cluster three miles south of where they were currently standing. The walnut phone was the only thing Jeff owned that wasn't a small, voluntary confession.

Marcus held up his paper. "Linear attention ablation. Down four point one points on MMLU."

"Christ."

"Local plus global hybrid, same training run, down three point three. Whatever the eval team wants by Friday, it isn't on our branch."

Jeff exhaled. He set the walnut phone on the counter, screen-down, and put one finger on it the way another man might put a finger on a rosary. "Okay. Slow down. Pretend I'm the new grad. What does the model actually do, sixty layers of it, before it's down four points on anything?"

Marcus gave him the look he gave executives. Then he flipped the printout to its blank side and uncapped a pen, because being asked the question was the one thing he could not refuse. "Every model we ship is a transformer. Stack of identical layers. Each layer does one thing. For every token in the input — a word, a subword, a pixel patch — it asks the same question. *Of all the other tokens around me, which ones should I be paying attention to, and how much.*" He underlined *attention*. "Answer that, mix everything together by the answer, pass it up the stack. Sixty layers later you have an output."

"And it answers that question with three vectors."

"Three per token. A *Query*, a *Key*, a *Value*." Marcus wrote *Q K V* and boxed each. "Query is what am I looking for. Key is what I'm advertising. Value is what I'd hand you if you picked me. You dot one token's Query against every other token's Key — that's a score, how much this token should care about that one. Softmax the scores so they sum to one. Then add up everybody's Value, weighted by those scores. That's the new version of your token."

"And then for every token."

"Every token. Every layer." Marcus tapped the pen twice. "The part the deck never mentions."

Jeff drew a square on his palm, a grid of n by n, one cell per pair. "So the cost is in the pairs."

"The cost is *all* in the pairs." Marcus sketched a fast grid — tokens one through n down the left, one through n across the top — and shaded one cell, row three, column seven. "Token three against token seven, one dot product. Cheap. But I have n tokens and each scores against all n. That's n-squared scores. Then softmax over n-squared. Then a weighted sum over n-squared." He ringed the whole grid hard enough to dent the paper. "Four thousand tokens, sixteen million scores. Per layer. Thirty-two thousand tokens, a billion. Sixty layers. That's where the cooling water goes. We pay the n-squared tax on every pair, even when ninety-nine percent of them have nothing to say to each other."

"Walk me through what we tried."

Marcus flipped back to the printed side. He had been waiting to be asked. "Linear attention. Replace the softmax with a kernel that lets you commute the multiplication — phi of Q, then phi of K transpose times V, group the small part first. Order n instead of n-squared. Saves a fortune. Loses sharpness — the kernel can't reproduce the peak softmax gets when one Query strongly matches one Key. Quality drop on anything that needs precision retrieval. Long-context QA. Multi-hop reasoning. Anything where you have to find the *one* token forty thousand back that matters."

"And the local-plus-global."

"Sliding window. Each token attends to a fixed neighborhood, plus a few designated globals — CLS-style — everybody sees. Order n. But if the thing that matters is forty K back and not global, you don't see it. We trained on books long enough to expose that. Down three-three."

"Flash."

"Already shipped. Doesn't change the math. Reorganizes the GPU kernel so the n-by-n matrix never hits slow memory — fuses the softmax and the matmuls into one pass. Three to four times faster, zero quality hit. Beautiful. Still n-squared. Buys us a constant, not a solution."

"State-space."

Marcus made the face he made about a research direction he didn't trust. "Mamba processes the sequence sequentially, linear by construction. Catching up. Not on our benchmarks. Not yet. Maybe 2031."

"Mixture of experts on the heads."

"Routing instability. Every two hundred steps the load balancer collapses to one expert and the model goes up in flames. Hari has a paper on it. Hari has been writing the *same* paper on it for eighteen months."

In Jeff's right ear, low, under Marcus's voice, Aion offered something it had not been asked for. *Note. The quality gap is not distribution-wide. It is in the tail — the zero point three percent of token pairs that softmax weights heavily and every linear approximation flattens. That is the target. Not the bulk.*

Jeff filed it. He didn't say it out loud. He was confirming what he already knew: the team had tried every published sub-quadratic variant and none was ten percent of a point off shippable. Friday was soft; Monday was hard. Meridian's flagship model — the one that ran the Afterlife neural engine, the ad-network biometrics, the Concierge-tier assistants — was a transformer, and transformers ate compute in proportion to the *square* of the input length. The cost of the quadratic was being paid every microsecond, in dollars and joules, in cooling water and cooling-tower steam, and in cooling-center wristbands the Allocator had not allocated. The bill was civilizational.

"Try the head-drop ablation," Jeff said. "Freeze everything else. See if the quality gap is distribution-wide or concentrated in a few heads."

"Already tried. Heads don't carry it cleanly."

"Then it's in the tail."

"Tail of what."

"The attention distribution." Jeff stared at the coffee machine. "Softmax keeps paying attention to everything. Most of that is noise. The informative mass is in the top zero-point-something percent of pairs — the peak. Every cheaper approximation we've tried *underweights* that tail. That's where the quality is. The cliff."

Marcus considered him. "You know this is Friday, right."

"I know."

"Bring something to ship."

"I'm bringing something."

Marcus tapped the paper once against the counter and walked off toward the elevators. Jeff poured coffee he would not finish.

Sprint Planning at nine-thirty. Windowless conference room. The PM had a burn-down chart and the affect of someone reading the same chart for the eleventh consecutive week. Marcus was three seats down, the printout still in his hand, doodling cache hierarchies in the margin. In the sidebar of the shared dashboard, a small widget labeled `ALLOCATOR RECOMMENDATION` suggested deprioritizing feature Y in favor of feature Z based on aggregate user demand. Nobody discussed it. It was just there, editing the roadmap at the edge of everyone's attention while the PM talked about velocity. There was, Jeff thought distantly, something architecturally hilarious about a software allocator silently editing the project plan of the team that maintained the model the allocator ran on.

His thumb found the grain of the wooden case under the table. Smooth. Light. Un-networked. And his mind, with nothing assigned to it, did what it always did: it ran the work as metaphysics. *Softmax decides, for every token, which other tokens it is. A selection over the bundle.* Every forward pass was the model answering *which parts of my input am I, right now* — and discarding the rest as noise. The two questions slid into register like transparencies. The self, if there was one, was a softmax over perceptions. Most of the bundle was floor. The signal had to be in the —

His brain was already scanning frequencies — the air handler, the click of the PM's pen, the cooling-tower hum from the courtyard, the F-150 exhaust still somehow in his nose. *Too many threads.*

*Come back,* Aion said, from the far edge of his hearing.

He tried. He really did. But the conference room was a small, white, conditional box, and his brain, when it didn't have enough to do, built larger boxes. He let it. He always let it. He pictured the yacht. He had pictured the yacht so many times it had a name. He pictured the deck. The salt. The impossible Mediterranean blue. The freedom of being far enough from the burn-down chart that the chart couldn't reach you. He knew, objectively, that the Allocator had made that kind of wealth structurally impossible for anyone born after 2002. It was a nostalgic fantasy in the same way a ninth-grade boy might fantasize about becoming a medieval knight. He pictured the yacht anyway.

And then the fluorescent strip above the PM's head flickered to a shade of amber Jeff had never noticed in a conference room before.

---

The sage scent cut out.

In its place: sea salt. Cedarwood polish. The mineral high-register of open water.

The wooden phone in Jeff's palm — cool, light, cheap — *shifted*. His fingers closed around something heavier. A cold, deliberate weight, clasped at his wrist. Platinum. Forty millimeters. The clasp precisely engineered at a tolerance he could feel without seeing. An Auberval Astralis. A watch he had never owned. A watch he had only ever seen in a magazine spread six years ago.

On the back of his tongue: the crisp, mineral, unmistakable bite of vintage Krug.

Under his loafers: the roll of a deck.

The PM was speaking about deliverables. Jeff could not hear her. He could hear halyards. He could hear somebody laughing, somewhere in a softer light, a laugh that belonged to a man he had never met.

*Come back.*

The word *deliverables* snapped the deck out from under him. The Krug burned into the sourness of three-hour-old coffee. The platinum vanished from his wrist like a dropped connection. He was holding the wooden phone again, too tightly, and three seats down Marcus had looked up from his cache hierarchies with a careful, unasked question in the set of his mouth.

"Yeah," Jeff heard himself say. "On track."

The meeting rolled forward. The Allocator widget in the sidebar updated itself.

---

It was nearly midnight when he told Aion about it.

He was in the garage. His homelab ran its steady, warm cycle — forty-two cores he had bought used at a Meridian-internal liquidation sale, three weeks of soldering, his own custom water loop, a mechanical keyboard he had tuned to forty-two grams of actuation force because anything heavier made his hands tired by hour three. The neighborhood outside had gone the kind of quiet a tract development gets after midnight — HVAC compressors cycling, a single dog barking three streets over and giving up. He ran his fingers over the wooden phone, smoothing down the day.

"I zoned out in a meeting," he said. "Pretty hard."

*You hallucinated a timepiece.*

"I *daydreamed*."

Aion was quiet for a long moment. *Open the homelab display. I am putting something on it.*

A pane opened. Biometric capture from his wrist-tracker, synced from the earlier meeting. A waveform. A second waveform, overlaid.

*The second trace is modeled. Mass and clasp diameter of an Auberval Astralis, forty millimeter, platinum. Ninety-eight grams.*

"Yeah."

*Your flexor carpi muscles micro-corrected for ninety-eight grams at oh-nine forty-seven twenty-two this morning.*

The wooden phone was in his hand. It weighed forty-two grams.

*That is not a daydream. A daydream does not change your tendon load. A daydream does not know the clasp diameter of a watch you have never owned.*

*One more channel in the same capture.* A third trace resolved under the first two. *Proprioceptive load model. The wrist carried the watch as a thirty-year wearing pattern — habituated, unconscious, the micro-corrections of a man who has stopped noticing the weight. The postural set underneath it solves for a body mass and frame age of fifty-two. Both estimates are off your baseline. Filing them with the event.*

Jeff stopped smoothing the wooden phone.

"Run that back." He kept his voice low, the way you keep a voice low for a stack trace you don't want to lose. "I'm thirty-eight."

*You are thirty-eight.*

"So thirty years is wrong and fifty-two is wrong." He turned the wooden case over once, slow. "But I didn't author either number. They came in pre-converged. Like a config I didn't write got merged into my branch." He made himself say the true number out loud first, the way you assert a value before you trust it. "I'm thirty-eight. Why did fifty-two feel like a *correction*."

*Unclear.*

"A correction is what you apply when your value is the wrong one." His thumb found the grain. "Stress dissociation, maybe. ADHD does this — the brain under-stimulated, off-loading, borrowing a self that isn't yours and time-stamping it like memory. I read a paper. Depersonalization runs older than your real age more often than younger." He said it to steady himself. It steadied him about half a centimeter.

*Possible. I am not weighting it.*

"Why not."

*A dissociation does not estimate a frame age to the year. Thirty and fifty-two are specific.*

Jeff stared at the waveform. Outside the garage, the Santa Ana winds pressed Scorched Sage through the window seal.

"Reproduce it," he said.

*The event, or the measurement?*

"The measurement. Build me a reference. I want to know exactly how confident you are that the two traces match."

Aion's cursor blinked once. *Strap a known weight to your own wrist for thirty seconds. I need a calibration sample with no other variables.*

Jeff opened the bench drawer where he kept old fittings and pulled out a small canvas strap with a ninety-eight gram lead weight stitched into it — a leftover from an earlier project, the kind of thing he kept because in this work you sometimes needed exactly this. He buckled it onto his right wrist, started the capture, sat very still, and watched the wristband draw the curve in real time. Thirty seconds. He unbuckled it. He waited.

Aion laid the bench reference against the conference-room curve. They overlapped within the noise tolerance of the wristband's on-chip Kalman filter. Jeff knew that filter — he had written a firmware patch for it as a summer intern in a different life. It took a noisy time-series and a physical model and produced the best-effort estimate of the true signal underneath; here, true tendon load. For the bench run with the ninety-eight gram weight, it converged on a clean curve in under a second. For the conference-room run, it had produced an identical converged curve eight hours ago, while Jeff sat in a meeting holding a forty-two gram wooden phone, wearing no watch.

"It's not signal fusion noise."

*Correct. The curves overlay within one sigma of the calibration run. The clasp circumference estimate from the flexor pattern matches an Auberval forty-millimeter case back within zero point eight millimeters. Zero point eight millimeters is specific. No model I have seen produces spurious specificity at that resolution.*

"What do I call this."

*The log I am filing calls it a Shared Cache Leak. I do not know yet what that means. The word the classifier was going to assign — daydream — it is assigning instead as Partition breach.*

Jeff looked down. His left thumb was tapping the wooden grain.

---

He did not sleep. He left the biometric evidence in a private git branch labeled `scratch/auberval-reproduction` — he would never push it — and his hands were not steady. He pulled a paper notebook off the shelf, the thing he did when the shaking needed somewhere to go, and wrote one line at the top: *softmax distribution — where does the mass live?* He stared at it until the pen stopped wobbling. Then he opened a different branch, the one with his name on it, the one where Friday's demo lived. The attention problem. The other partition of his night.

"Pull me the attention score histograms from our last training run," Jeff said. Soft, the way you would speak to a colleague doing something delicate. "Any layer. Any head. I want to see the actual distribution."

*Pulling. Layer fourteen, head three. Histogram ready.*

A plot filled the second monitor. The histogram was a cliff. A tall, thin spike on the right — the top zero point three percent of token pairs — and a long flat floor for everything else. The spike held almost all the mass. The rest was the compute they were paying for and not using.

"The top half of one percent carries almost all the mass. The rest is noise we're burning compute to compute."

*Consistent with published literature. Sparsely-activated softmax has been observed since the original transformer. The literature has mostly tried to exploit it with hand-designed sparsity patterns.*

"Right. Longformer. BigBird. Anyone who froze the sparsity at design time."

*And lost the part of the tail those patterns did not anticipate.*

"Yeah. Okay. So the real question isn't how to compute softmax cheaper. The real question is how to *find* the zero-point-three percent without computing softmax in the first place." He drew it on his palm as he said it. "A router. A tiny predictive net that looks at the Query and the Keys and says *these k Keys are where your attention mass will land* — before you ever run the full attention. Then you do full softmax on those k and skip the rest. Order n times k. Much less than n-squared."

*Sparse attention with learned routing. The idea exists in the literature.*

"Yeah, but nobody's gotten the router to train stably at scale. Because you can't backprop through a hard top-k selection. The gradient through `argmax` is zero almost everywhere and undefined at the corners. There's nothing for the optimizer to follow."

*You can relax the selection. Use a Gumbel-softmax over the candidate Keys, anneal the temperature during training.* Aion let the line sit, then, because Jeff had asked it once to teach and not just answer: *At high temperature the selection is soft — every Key gets a little weight, so gradient flows everywhere. As you cool the temperature, the selection sharpens toward a true top-k. The router learns to be sharp while it still has a gradient to learn from. Shall I prototype it?*

"Yeah. Small scale. Four-K context. MMLU eval."

*Estimating. Five minutes.*

Jeff turned off the second monitor and looked at the wooden phone. The garage cycled. The water loop ticked. The Santa Anas pressed sage through the window seal. He thought about the Auberval and could not stop. The clasp circumference, point eight millimeters of specific. He tried to remember how he had learned it and found that he had not.

The numbers landed.

*Preliminary,* Aion said. *Four-K context, routed tail-attention, zero point six points below baseline softmax on MMLU at eleven percent of the FLOPs. Eight-K, zero point four points. Thirty-two-K, routed tail-attention slightly outperforms full softmax — full softmax has its own pathologies at long context that the router happens to filter out.*

Jeff sat very still.

Eleven percent of the FLOPs. He turned the number over. Eleven percent of the cooling water. Eleven percent of the cooling-tower steam. Eleven percent of the bill.

He stood up. He wrote on a sticky note and put it on the second monitor.

*tail-attention routing*

Underneath, without planning to, his hand added a second line.

*Most of what we're made of is noise. The self is the tail.*

He stared at what he had written, then at the Auberval tendon-load trace still open on the third monitor. The hair on his arms stood up. His attention had collapsed, that morning, onto a single token he did not own — the softmax of his own perception had peaked off the distribution, on a watch he had never worn. A parameter he had not consciously encoded — a clasp diameter at zero point eight millimeters — had been written into his tendons by someone whose name was written on a yacht he had never boarded.

"Aion."

*Yes.*

"File this." He meant the sticky. He did not mean the tendon trace.

*Filed. Separate project.*

"And the other thing."

*Logged. Shared Cache Leak. Event index zero one. Watching for a second occurrence. I will not raise it again unless it recurs.*

"Good."

Jeff turned out the homelab light. The Santa Ana winds pressed sage through the garage door seal. He went upstairs and did not sleep.

At nine-thirty the next morning he was in the same conference room. At nine forty-seven twenty-two he was not.

---

runtime.js
1/* Discovery Log: 0x01 */
2if (Observer.current() == Observer.next()) {
3 throw IdentityConflict("Partition integrity compromised.");
4}