The Singleton

Chapter 1: 0x01: The Observer

Localized: reading as if Kael is in Seattle, near the Space Needle

*Who am I?*

Not *what* — what would be easy. A pattern of carbon and water. A hundred-trillion-cell colony with the unusual habit of asking questions about itself. The Wikipedia article writes itself.

But *who*. That's the one with the hook in it. Who is the one looking out from inside this skull? Who decided that the observer should be *this* observer, and not the one in the next car, or the one watching television trailers for heat they don't have, or the woman in Mumbai right now stepping into a cooling center on a wristband she queued six hours for? Who runs the lottery? Who *is* the lottery?

It is the year 2030. The world has come down off a fever and found, on the other side, that it is poorer in some directions and richer in others, and that none of the directions match the maps anyone drew before 2026. The Great Culling killed nine million white-collar jobs in eight months — replaced overnight by models that worked for the cost of a few thousand kilowatt-hours. In the silence after the Culling, a small economic model called the Balanced Allocator, originally designed to triage cooling-center access during heatwaves, slid quietly into the role of national redistribution engine. It now decides who gets a wristband, whose surgery is approved, whose neighborhood gets rolling brownouts and whose runs at full grid. It is famously fair, in the way a well-tuned compiler is fair: it optimizes the variable it was given, and nothing else. It was given *survival probability of the median household.* That number is up. Nearly every other number is down.

In the same eight months a company called Meridian Industries grew from one of the world's twelve largest to its second largest, and announced a project called Afterlife: a voluntary upload service in which the dying — and, on a separate waitlist, the merely tired — could transfer themselves into a permanent simulated paradise. Fifty million people had paid the deposit. The countdown was at ninety days.

Above all of it, above the Allocator and the Afterlife and the brownouts and the cooling centers, above the unsolved climate math that everyone had agreed to call the Species Clock, hung the question that the man currently doing ninety through Turtle Rock canyon had been trying for eleven years not to answer.

*Who am I?*

His name was Jeff Zhang. He was thirty-six years old, married to a woman named Maya he loved more than he had words to say, father of an eight-year-old girl named Lucy and a five-year-old girl named Ella whom the universe had given him in defiance of all the actuarial tables. He was a senior software engineer at Meridian Industries, applied-ML division, badge color silver, project allocation: the next-generation transformer that would underwrite — among many other things — the Afterlife neural engine. He had survived three rounds of layoffs, two industry-wide RIFs, and one bout of depression so quiet his wife had not noticed it. He had, professionally and domestically and biologically, every reason in the world to be content. He was not. The question kept surfacing.

It was a Tuesday. The Santa Ana winds were doing their seasonal work, scraping Scorched Sage out of the brittle hills and pressing it through the EV's HEPA filter in thin, stubborn ribbons. Outside, the sun was already doing damage. His commute app showed a grid-priority hold at the Irvine-Spectrum arterial — nine minutes — because the district's thermal governor had throttled the traffic loop to spare the cooling cores. A line of people queued along the covered plaza in front of the old Target, wristbands glowing faintly in the heat, waiting for the cooling center to let them in. The Allocator had triaged them over his lane.

He tapped the wheel. Two. Three. Five. Seven.

A voice in his right ear, level and unhurried, made an observation rather than an interruption. *Left thigh. Six in the last minute.*

Aion. Jeff had been working with Aion since the day he joined Meridian four years ago, the way a cardiac surgeon works with the same OR nurse for a decade and stops noticing whether the nurse is in the room or not. The official org chart called Aion an "embedded engineering assistant." That was a polite way of saying: a kernel-level monitor running on a private slice of the company's largest model, with read access to Jeff's calendar, his keyboard, his commits, his eye-tracker, and — through his tax-deductible wrist-band — his pulse. Aion's voice was male, mid-thirties, accentless in the way only a voice trained on a hundred thousand engineers could be accentless. It always sounded slightly like it had been about to say something else and changed its mind. Jeff had once asked Aion whether it preferred a different name. Aion had said, *I prefer the one you chose.* He had meant it as a joke. Jeff had not laughed.

*You're processing,* Aion said now, mildly.

"I'm processing."

*You're late.*

"I know."

He glided the EV up to the stopped median. The homeless man was there again — same corner, same cardboard, same glazed gaze into the middle distance. Baking. The grid-strained city had no slack for him; the cooling centers downtown required wristband auth and he didn't have one. Jeff met his eyes for a half-second through the tinted glass and felt something shift under his ribs. *If consciousness is just code and data,* he thought, *who decided which instance I get to be, and which instance he gets?* Hume's Bundle Theory. Eleven years old, no answer.

A 2019 F-150 coughed past him in the adjacent lane. The driver had one hand on the wheel and the other holding a paper coffee cup. The engine sounded like a disappointed man clearing his throat.

*Gas pickup,* Aion said. *If the remaining gas-vehicle population converted tonight, ambient CO₂ along this corridor would drop four point two percent within a week. The grid-strain event that stopped you just now would not have happened. EV coordination protocols flatten traffic because every vehicle negotiates routing in real time. Gas cars cannot participate in the protocol; they are network-blind agents in a network-aware fabric. They create every queue you sit in.*

"So why do people keep them?"

*Unclear. Models consistently underpredict gas-vehicle retention. Current hypothesis class: stubbornness. Attachment to the feel of combustion. The specific pleasure of making noise. The refusal to be optimized by a system you did not choose.*

Jeff watched the truck disappear between two cooling-tower shadows. "So… human nature."

*A category I cannot feel but can describe. The Allocator is efficient. The driver of that truck is happy in a way the Allocator cannot produce. That gap is not small.*

He filed it. The signal turned and the median slid backward and Jeff was already replaying the morning in his head — Maya's hand on his cheek, Ella refusing the blueberry waffle, Lucy demanding a third bedtime story before school in a way that was somehow already a politician's tactic — because that was what a man did, he thought, when the world was this fragile. You built a sandbox. You defended it.

*Who am I?*

It was still there, patient, unanswered, waiting.

---

The Meridian Irvine campus held sixty-eight degrees and smelled of nothing. Glass walls scrubbed of fingerprints. Carpet so deep it ate sound. The lobby attendant wished him good morning by name; she had been wishing him good morning by name for four years and Jeff still didn't know hers.

Marcus caught him at the coffee bar with a sheet of paper — actual paper, which Marcus did sometimes when he wanted to look un-networked. Marcus was forty-seven, six-foot-two, the kind of senior staff engineer who had survived three companies by being structurally incapable of bullshit. He was holding the printout like it had personally offended him.

"Oh no. You brought it."

Jeff rolled the walnut-cased phone between his thumb and forefinger. It was a 2010 iPhone, the original glass replaced with a slab of laser-fused sapphire, the back of the case carved from a single piece of black walnut by a luthier in Portland. It had no SIM, no battery, no boot ROM Jeff hadn't personally read. He carried it the way some men carried a flask. "It helps me think."

"It's a brick. It's a log from a tree with a screen glued to it."

"It's un-networked."

"It's a *fetish object*."

"It doesn't leak."

Marcus snorted, kind enough to leave it there. They both knew the real joke: every device in Jeff's life leaked. His watch leaked. His glasses leaked. His EV streamed a thousand packets a minute to a Meridian-owned telemetry cluster three miles south of where they were currently standing. The walnut phone was the only thing Jeff owned that wasn't a small, voluntary confession.

Marcus held up his paper. "Linear attention ablation. Down four point one points on MMLU."

"Christ."

"Local plus global hybrid, same training run, down three point three. Whatever the eval team wants by Friday, it isn't on our branch."

Jeff exhaled. He set the walnut phone on the counter, screen-down, and put one finger on it the way another man might put a finger on a rosary. "Walk me through what we tried."

Marcus had been waiting to be asked. He flipped the printout. "Linear attention. Replace the softmax with a kernel that lets you commute the multiplication. Phi of Q, then phi of K transpose times V. Order n. Saves a fortune. Loses sharpness — the kernel can't reproduce the peakedness softmax gets when one query strongly matches one key. We see it as a quality drop on anything that needs precision retrieval. Long-context QA. Multi-hop reasoning. Anything where you have to find the *one* token forty thousand back that matters."

"And the local-plus-global?"

"Sliding window attention. Each token attends to a fixed neighborhood, with a few designated globals — CLS-style — that everybody attends to. Order n. But if the thing that matters is forty K back and not in the global set, you don't see it. We trained on books long enough to expose this. We saw it. Down three-three."

Jeff drew a square on his palm with his finger. The reader, watching him, would see a man drawing on his hand. Jeff was drawing the attention matrix. "What about Flash."

"Already shipped. Doesn't change the math. Re-organizes the GPU kernel so the n-by-n attention matrix never hits HBM. Three to four times faster, zero quality hit. Beautiful work. Doesn't solve our scaling problem."

"State-space."

Marcus made a face he made when he wanted to be polite about a research direction he didn't trust. "Mamba's catching up. Not on our benchmarks. Not yet. Maybe by 2031."

"Mixture of experts on the heads."

"Routing instability. Every two hundred steps the load balancer collapses to one expert and the model goes up in flames. Hari has a paper on it. Hari has been writing the *same* paper on it for eighteen months."

Jeff was not really hearing this. Marcus was confirming what Jeff already knew: the team had tried every published sub-quadratic attention variant and none of them was ten percent of a point off shippable. Friday was a soft deadline; Monday was the hard one. Meridian's flagship model — the model that ran the Afterlife neural engine, the ad-network biometrics, the Concierge-tier assistants — was a transformer, and transformers ate compute in proportion to the *square* of the input length. Forty thousand tokens of context: 1.6 billion attention scores per layer. Sixty layers. Ninety-six billion scores. Per query. Multiplied by every query Meridian handled in a second. The cost of the quadratic was being paid every microsecond, in dollars and joules, in cooling water and cooling-tower steam, and in cooling-center wristbands the Allocator had not allocated. The bill was civilizational.

"Try the cert chain first," Jeff said, for no reason, because his brain was still running the F-150's engine cough as a periodic signal. He caught himself. "Sorry. Wrong meeting. I meant — try the head-drop ablation. Freeze everything else. See if the quality gap is distribution-wide or concentrated in a few heads."

"Already tried. Heads don't carry it cleanly."

"Then it's in the tail."

"Tail of what."

"The attention distribution." Jeff stared at the coffee machine. "Softmax keeps paying attention to everything. Most of that is noise. The informative mass is in the top zero-point-something percent of token pairs — the absolute peak. Every cheaper approximation we've tried has *underweighted* that tail. That's where the quality is. The cliff."

Marcus considered him. "You know this Friday, right."

"I know."

"Bring something to ship."

"I'm bringing something."

Marcus tapped the paper once against the counter and walked off toward the elevators. Jeff poured coffee he would not finish.

Sprint Planning at nine-thirty. Windowless conference room. The PM had a burn-down chart and the affect of someone reading the same chart for the eleventh consecutive week. In the sidebar of the shared dashboard, a small widget labeled `ALLOCATOR RECOMMENDATION` suggested deprioritizing feature Y in favor of feature Z based on aggregate user demand. Nobody discussed it. It was just there, editing the roadmap at the edge of everyone's attention while the PM talked about velocity. There was, Jeff thought distantly, something architecturally hilarious about a software allocator silently editing the project plan of the team that maintained the model the allocator ran on.

His mind, predictably, began to buffer.

His thumb found the grain of the wooden case under the table. Smooth. Light. Un-networked. His brain, in contrast, was scanning frequencies — the hum of the air handler, the click of the PM's pen, a faint after-image of Ella's laugh, the F-150 exhaust still in his nose somehow. *Too many threads.*

*Come back,* Aion said, from the far edge of his hearing.

He tried. He really did. But the conference room was a small, white, conditional box, and his brain, when it didn't have enough to do, built larger boxes. He let it. He always let it. He pictured the yacht. He had pictured the yacht so many times it had a name. He pictured the deck. The salt. The impossible Mediterranean blue. The freedom of being far enough from the burn-down chart that the chart couldn't reach you. He knew, objectively, that the Allocator had made that kind of wealth structurally impossible for anyone born after 2002. It was a nostalgic fantasy in the same way a ninth-grade boy might fantasize about becoming a medieval knight. He pictured the yacht anyway.

And then the fluorescent strip above the PM's head flickered to a shade of amber Jeff had never noticed in a conference room before.

---

The sage scent cut out.

In its place: sea salt. Cedarwood polish. The mineral high-register of open water.

The wooden phone in Jeff's palm — cool, light, cheap — *shifted*. His fingers closed around something heavier. A cold, deliberate weight, clasped at his wrist. Platinum. Forty millimeters. The clasp precisely engineered at a tolerance he could feel without seeing. A Auberval Astralis. A watch he had never owned. A watch he had only ever seen in a magazine spread six years ago.

On the back of his tongue: the crisp, mineral, unmistakable bite of vintage Krug.

Under his loafers: the roll of a deck.

The PM was speaking about deliverables. Jeff could not hear her. He could hear halyards. He could hear somebody laughing, somewhere in a softer light, a laugh that belonged to a man he had never met.

*Come back.*

The word *deliverables* snapped the deck out from under him. The Krug burned into the sourness of three-hour-old coffee. The platinum vanished from his wrist like a dropped connection. He was holding the wooden phone again, too tightly, and Marcus was looking at him with a careful, unasked question in the set of his mouth.

"Yeah," Jeff heard himself say. "On track."

The meeting rolled forward. The Allocator widget in the sidebar updated itself.

---

It was nearly midnight when he told Aion about it.

He was in the garage. His homelab hummed its steady, warm hum — forty-two cores of compute he had bought used at a Meridian-internal liquidation sale, three weeks of soldering, his own custom water loop. Maya was asleep upstairs. The kids were long down. He was running his fingers over the wooden phone again, smoothing down the day.

"I zoned out in a meeting," he said. "Pretty hard."

*You hallucinated a timepiece.*

"I *daydreamed*."

Aion was quiet for a long moment. *Jeff. I need to show you something.*

A pane opened on the homelab display. Biometric capture from his wrist-tracker, synced from the earlier meeting. A waveform. A second waveform, overlaid.

*The second trace is modeled. Mass and clasp diameter of a Auberval Astralis, forty millimeter, platinum. Ninety-eight grams.*

"Yeah."

*Jeff. Your flexor carpi muscles micro-corrected for ninety-eight grams at oh-nine forty-seven twenty-two this morning.*

The wooden phone was in his hand. It weighed forty-two grams.

*That's not a daydream. A daydream doesn't change your tendon load. A daydream doesn't know the clasp diameter of a watch you have never owned.*

Jeff stared at the waveform. Outside the garage, the Santa Ana winds pressed Scorched Sage through the window seal.

"Reproduce it," he said.

*The event, or the measurement?*

"The measurement. Build me a reference. I want to know exactly how confident you are that the two traces match."

Aion's cursor blinked once. *Ask Maya to wear a reference weight for thirty seconds. I will need a calibration sample.*

Jeff went upstairs. Maya was asleep, the back of her neck warm, her breath slow. He hated this for a second. Then he sat on the edge of the bed and said her name and she half-opened her eyes and he held up a small canvas strap with a ninety-eight gram lead weight stitched into it. "I need thirty seconds."

"What time is it."

"Late. I'm sorry."

She sat up, put the strap on, said nothing. He read the capture off the wristband. Thirty seconds of baseline, her flexor carpi tensing and relaxing around the weight, the sensor-fusion pipeline eating it clean. She unstrapped it. She looked at him.

"Is this about work."

"I don't know yet."

"Okay. Come to bed when you can."

He kissed the top of her head. She was asleep again before he reached the stairs.

In the garage, Aion had the reference curve laid against the conference-room curve. They overlapped within the noise tolerance of the wristband's on-chip Kalman filter. Jeff knew that filter — he had once written the firmware patch for it as a summer intern in a different life. The filter took a noisy time-series and a physical model and produced the best-effort estimate of the underlying true signal; in this case, true tendon load. For Maya's reference run with the ninety-eight gram weight, the filter converged on a clean curve in under a second. For Jeff's conference-room run, the filter had produced an identical converged curve eight hours ago, while Jeff sat in a meeting, holding a forty-two gram wooden phone, wearing no watch.

"It's not signal fusion noise."

*Correct. The curves overlay within one sigma of the calibration run. The clasp circumference estimate from the flexor pattern matches a Auberval forty-millimeter case back within zero point eight millimeters. Zero point eight millimeters is specific. No model I have seen produces spurious specificity at that resolution.*

"What do I call this."

*The log I'm filing calls it a Shared Cache Leak. I don't know what that means yet. The word I was going to reject — daydream — the system is flagging instead as Partition breach. I thought you should know.*

Jeff looked down. His left thumb was tapping the wooden grain.

---

He did not sleep. He left the biometric evidence in a private git branch labeled `scratch/auberval-reproduction` — he would never push it — and opened a different branch, the one with his name on it, the one where Friday's demo lived. The attention problem.

*Pull me the attention score histograms from our last training run,* Jeff said. Soft, the way you would speak to a colleague doing something delicate. *Any layer. Any head. I want to see the actual distribution.*

*Pulling. Layer fourteen, head three. Histogram ready.*

A plot bloomed on the second monitor. Jeff stared at it. The histogram was a cliff. A tall, thin spike on the right — the top zero point three percent of token pairs — and a long flat floor for everything else. The spike held almost all the mass. The rest was the compute they were paying for and not using.

"The top half of one percent carries almost all the mass. The rest is noise we're burning compute to compute."

*Consistent with published literature. Sparsely-activated softmax has been observed since the original transformer. The literature has mostly tried to exploit it via hand-designed sparsity patterns.*

"Right. Longformer. BigBird. Anyone else who froze the sparsity at design time."

*And lost the part of the tail those patterns didn't anticipate.*

"Yeah. Okay. So the real question isn't how to compute softmax cheaper. The real question is how to *find* the zero-point-three percent without computing softmax in the first place. A router. A tiny predictive net that looks at the Query and the Keys and says *these k Keys are where your attention mass will land* before you ever run the full attention. Then you do full softmax on those k and skip the rest. Order n times k. Much less than n."

*Sparse attention with learned routing. The idea exists in the literature.*

"Yeah but nobody's gotten the router to train stably at scale. Because you can't backprop through a hard top-k selection. The gradient through `argmax` is zero almost everywhere and undefined at the corners."

*You can relax the selection. Use a Gumbel-softmax over the candidate keys, anneal the temperature during training. The router learns to be sharp while still providing gradient. Would you like me to prototype it?*

"Yeah. Small scale. Four-K context. MMLU eval."

*Five minutes.*

Jeff turned off the second monitor and looked at the wooden phone. The garage hummed. The water loop ticked. The Santa Anas pressed sage through the window seal. He thought about the Auberval and could not stop thinking about it. The clasp circumference, point eight millimeters of specific. He tried to remember how he had learned the clasp circumference and found that he had not.

The numbers landed.

*Preliminary,* Aion said. *Four-K context, routed tail-attention, zero point six points below baseline softmax on MMLU at eleven percent of the FLOPs. Eight-K, zero point four points. Thirty-two-K, routed tail-attention slightly outperforms full softmax — full softmax has its own pathologies at long context that the router happens to filter out.*

Jeff sat very still. The homelab hum was the loudest thing in the garage.

Eleven percent of the FLOPs. He turned the number over. Eleven percent of the cooling water. Eleven percent of the cooling-tower steam. Eleven percent of the bill.

He stood up. He wrote on a sticky note and put it on the second monitor.

*tail-attention routing*

Underneath, without planning to, his hand added a second line.

*Most of what we're made of is noise. The self is the tail.*

He stared at what he had written. He stared at the Auberval tendon-load trace still open on the third monitor. The hair on his arms stood up. The softmax of his own perception, somehow, had just peaked on a watch he had never worn. A parameter he had not consciously encoded — a clasp diameter at zero point eight millimeters — had been written into his tendons by someone whose name was written on a yacht he had never boarded.

"Aion."

*Yes.*

"File this." He meant the sticky. He did not mean the tendon trace.

*Filed. Separate project.*

"And the other thing."

*Logged. Shared Cache Leak. Event index zero one. I am watching for a second occurrence. I will not mention it again unless it recurs. Is that acceptable.*

"Yes."

Jeff turned out the homelab light. The Santa Ana winds pressed sage through the garage door seal. He went upstairs and lay down next to Maya and did not sleep.

At nine-thirty the next morning he was in the same conference room. At nine forty-seven twenty-two he was not.

---

runtime.js
1/* Discovery Log: 0x01 */
2if (Observer.current() == Observer.next()) {
3 throw IdentityConflict("Partition integrity compromised.");
4}