Cortex: the scaffolding around the model

Towards high-agency: building a personal AI assistant that remembers, compounds, and gets more done.

June 1, 2026 · 18 min read

#ai #productivity #claude #personal-infrastructure #system-design

For about a month now I’ve been running a personal AI setup I call Cortex, and it’s changed how much I get done. Across work and personal life both, I move faster, I keep more going at once, and I still go deeper on each thing, without the old cost in hours or attention. It’s one of the first concrete results of what now looks less like a bet and more like evidence: the real gains increasingly go to whoever rebuilds how they work around AI and is willing to build that scaffolding themselves. I’m writing it down in the hope it gives you something to build your own from.

Almost none of that came from a smarter model. Claude, for instance, already handled most of what I asked of it. What I was after was a higher level of agency, and a higher level of context and memory. In practice that meant building a substrate layer, one that for now sits on top of Claude Code and gives me a few things the raw model and the built-in scaffolding don’t: one place I work from, memory and abstraction that carry across sessions, context that loads minimally per task, my recurring moves captured as reusable skills and scripts, and a system that gets a little sharper the more I use it.

Still, the goal here wasn’t to build a new product or a full-blown framework. There are tons of those already, some overkill, most more mature than anything I’d write. What I wanted was something cut to fit me and my workflows, reshaping those workflows in the process, through a lot of critical thinking and brainstorming, until I had a usable MVP that paid off from day one. After that, the point was to make the system able to learn and improve itself the more I used it.

As you might have guessed, it’s not rocket-science, and I borrowed freely from a lot of stuff high-agency people are working on to build it. Andrej Karpathy’s LLM-wiki sketch, Daniel Miessler’s PAI and fabric, Garry Tan’s GBrain, playing w/ Hermes and OpenClaw, and the wider personal-knowledge-management tradition, back to Vannevar Bush’s Memex, all left fingerprints on it, to name a few. So treat what follows as a description you can borrow from and adapt to your own work.

The problem

Before Cortex, I was simply not happy or satisfied with the same frictions that kept throttling me. I was way slower than I wanted to be, I could only really hold one or two threads at a time, and I rarely got to go as deep as the work deserved when I multitasked heavily. Agents were already all over my workflows by then, but as tireless, insatiable, competent rubber-duck peers. I kept catching actions I ran all the time and abstracting them into skills, and flows I repeated often and crystallizing them into scripts. Every time, I’d feed in all the context they needed, then iterate, layering on my own vision and style. It still didn’t satisfy me because the glue between everything was missing, and also, it was an AI-enhanced way of working, while what I wanted was to restructure the whole thing: how I plan, how I structure, how I parallelize, questioning all of it, and working out where to put myself and where to put a set of agents to get what I was actually after.

That missing glue showed up as the same handful of concrete frictions, again and again:

Every session started cold, so I’d paste the same preamble back in, watch it drift, and burn tokens re-explaining myself.
Reading/watching material piled up faster than I could process it (Papers, videos, blogposts, talks, bookmarks). I’d read something once and lose it. No compounding.
Output started from zero every time. Blogposts, conference submissions, project iterations. The same ideas re-explained, the same research re-located, the same diagrams re-drawn.
Decisions didn’t propagate. Something I worked out in one session got re-debated in the next, because nothing wrote it down where the next session would look.
Tools sprawled. Claude Code, the web app, a notes app or three, ad-hoc scripts. For those, each was fine on its own but there was nothing to connect them.
And with several parallel workstreams running, a recurring question I couldn’t answer at a glance: what should I actually work on right now?

So the real work lives around the model, in the system that feeds it context and catches its output. Personalized scaffolding over raw capability is the leitmotif here, we all know it already :)

Five principles

Every design choice in Cortex comes back to five ideas.

Code Before Prompts. If bash can do something deterministically, bash does it, so the model reasons while the code executes. Turning a YouTube URL into a markdown transcript is a shell script, while deciding whether that transcript is worth my time is a skill the model runs. A script is deterministic and never hangs on the model’s mood, so pushing every mechanical step down to cheap, predictable bash reserves the expensive model calls for the places that genuinely need judgment. It saves tokens and time, and the whole thing is easier to reason about because the deterministic parts behave the same way every run.

One nitpick: I don’t really write these scripts, and they aren’t written for me to read. An agent drafts and tests each one, I validate it, and from then on the audience is the next agent that has to run and maintain it. That changes how they look. They can run long, with deeply nested helpers and verbose names, because they’re optimized for one reader: an agent that loads the whole file, understands its contract, and changes it safely. And they’re documented for exactly that reader, each script leading with what it does, its inputs and outputs, and how it fails, in plain comments an agent can lean on.

Unix philosophy. Everything is built as small, reusable pieces that do one thing and pipe into each other, text in and text out. The pieces are deliberately heterogeneous: a skill the model runs and a plain CLI tool sit at the same level and compose the same way, so the output of one becomes the input of the next. Logs stay append-only with deterministic prefixes I can grep. Skills can call scripts, and scripts never call skills, which keeps the flow pointing one way.

Scaffolding > model. A weaker model in a good system beats a stronger model in a bad one, so Cortex is the system and the model stays swappable. v2 will make that literally true, letting me switch freely between frontier models and drop down to local runners for privacy-sensitive work.

Minimal viable context. Each session loads only the CLAUDE.md, MEMORY.md, and TODO.md for the directory I’m in, because loading everything is expensive and incoherent. The whole setup is a tree of scaffolding, and wherever I spawn a session, exactly what that spot needs gets loaded at spawn time.

Compound knowledge. Every interaction should leave the system a little smarter: a query that produced a good answer becomes a wiki page, a decision worth keeping becomes a line in MEMORY.md, and a pattern I keep repeating becomes a skill.

These pull against each other in practice, so Minimal viable context fights the urge to load just one more file while Code Before Prompts fights the urge to let the model wing something a script should own. They’re principles, not commandments, and I basically break them when the situation earns it.

The architecture

The top level is a handful of opinionated directories:

TEXT

~/cortex/
├── CLAUDE.md      ← system rules, loaded at every session entry
├── config/        ← reference docs, templates, cheatsheet
│   ├── reference/      ← architecture decisions, canonical notes
│   └── templates/      ← project / wiki / persona scaffolds
├── skills/        ← [git repo] reusable prompt patterns (daily, wiki, work, meta, personas)
├── scripts/       ← [git repo] deterministic bash (ingest, statusline, dashboard, ...)
├── knowledge/     ← LLM wikis; each subdir is its own git repo
│   └── <wiki-name>/    ← raw/ + wiki/ + index.md + log.md + CLAUDE.md
├── projects/      ← active workstreams; each subdir is its own git repo
│   └── <project-name>/ ← CLAUDE.md + MEMORY.md + TODO.md + log.md + drafts/
├── trackers/      ← [git repo] recurring commitments (learning, planning, ...)
├── posts/         ← [git repo] long-form writing
│   └── <slug>/         ← post.md + assets/
├── inbox/         ← unversioned landing zone
│   ├── clipper/        ← web-clipper dropoff
│   ├── papers-to-process/
│   ├── video-transcripts/
│   └── project-ideas/
└── archive/       ← closed projects (cold storage)

A few structural moves do most of the work here, and they compound.

Each project and each wiki is its own git repo. A wiki is a standalone, vertical knowledge base that exists to be reused: it can ground a research project, a personal deep-dive, or the next conference submission. Projects and wikis relate many-to-many, so a project draws on whatever mix of wikis it needs, and a wiki feeds whatever projects call for it. Per-directory isolation keeps that clean, so I can share one wiki publicly without exposing a private project, and archiving something is just moving a directory out. Nothing else breaks.

CLAUDE.md auto-loads at session start. No “so, tell me about your setup” dance every time. Who I am, how the place is laid out, and the rules I work by are all in context before I type the first word.

MEMORY.md is rewritten, never appended. It always reflects the current state. When something changes, I rewrite the relevant section, and the old version stays in the git log where history belongs. From what I saw across tons of projects, this is the rule most people get wrong as they accumulate, and the file slowly becomes a journal the model has to re-read just to figure out what’s true now. When I settle something I’ll want to revisit, I leave a one-line decision in a Key Decisions section, with a why and a how-to-apply, so the reasoning stays grep-able instead of fading into a half-memory.

log.md is append-only, through a script. Every entry gets a timestamp and one of a few types (session, decision, milestone, blocker), and I never edit it by hand. That’s where the chronology lives, so MEMORY.md doesn’t have to carry it.

Those last two split memory into a current-state file and a history file on purpose, and that maps onto how context moves through the system. I think of context in three temperatures:

Level	What	Where	Refreshed
Hot	the session’s working context	the context window, auto-loaded at entry	every session
Warm	active project and tracker state	MEMORY.md + TODO.md on disk	when I touch the project
Cold	compiled knowledge and finished work	wiki pages, archived projects	on ingest, rarely after

What it looks like in practice

These are a few of the loops, not the headline features, and the examples stay deliberately generic. The point is that one set of scaffolding quietly powers a lot of small, different things, and they feed each other.

I stopped bookmarking. I hand a link or a folder to the triage_source skill; it reads my current goals first, gives a personal skip / skim / read verdict, and suggests where a keeper belongs, a wiki page or a reference pinned to a project.
Wikis that compile themselves. A wiki can start from a folder of notes I wrote years ago, well before any of this, or get bootstrapped from scratch for a brand-new topic. Either way I drop sources in its raw/, run ingest, and it writes the summary and concept pages, links them, and lets the next source extend them instead of duplicating. Ask a question later and it answers from the compiled pages, with citations, so I’m working off grounded footing rather than half-remembered reading.
A knowledge base that audits itself. A lint pass hunts contradictions, orphan pages, and stale claims, so the wiki degrades loudly instead of silently.
Ingest almost anything in one line. Small scripts turn a PDF, a YouTube video, a blog post, or a whole repo into clean markdown the system can hold, so “keep this” costs one command.
A whole-inbox sweep. When the backlog piles up, one pass triages everything, drops what’s already been ingested, and routes the rest to a wiki or a project.
A status line that knows where I am. A couple of dynamic lines that shift by context, wiki, tracker, or project: live counts, an inbox pulse, a nudge when a MEMORY.md is bloating.
A council of personas. I keep personas for the angles I reach for again and again, a skeptic, a security reviewer, a career lens, and they turn out sharper than narrow single-purpose skills. For a murky call I convene a few at once and read the disagreement; I can even set them to red-team each other, so the answer that comes back has already survived its own opposition.
Thinking tools on tap. Before committing to a plan I can decompose it to first principles, or red-team it for failure modes, as a structured pass rather than a vibe check.
Hand work to a delegate. For anything parallelizable, I spin a persona and a task into its own terminal tab and keep working; the result is waiting when I get back.
Months of history, one grep away. Every log line starts with a dated, typed prefix, so the whole timeline is queryable with a single command, no database involved.
Sessions that write themselves down. Closing one rewrites the relevant MEMORY, appends a log line, and hands me a commit message, so the next session opens already informed.

The loops above compose. A few are common enough that I run them as named workflows: triaging a whole source backlog in one ranked pass, sweeping the inbox (web clips, saved links, dropped files) end to end, or deep-diving a new topic from one command to a finished wiki, where skills go find the authoritative sources, pull them into raw/, ingest and lint them, and leave me a corpus I can query and draft from.

Staying oriented

With so much to do in life, for me the hardest question is usually just what to do next, so a slice of Cortex points straight at this need. TODOs pile up everywhere, scattered across every project, tracker, and wiki, so a primitive sweeps them all into one ranked view, with a coverage check so nothing quietly falls off the radar, and that aggregate is what the rest reads from. Every project and tracker carries one status emoji, urgent, stalled, doing, or continuous, that I read at a glance. /status recomputes a live picture from the files on demand, from a one-line pulse up to a synthesized cross-project briefing, generated fresh every time. The weekly-plan ritual reconciles what actually happened, then forces a real choice: at most three priorities for the week, each checked against my longer-term goals and weighed against the hours I actually have. And when I just want to start, a time-block pass turns those hours into a schedule, or in commit mode names the single next task and its first step, so deciding costs nothing.

How it changed the way I work

A month in, the clearest change is in how I work, and it shows up the same way across every kind of task.

I move faster, because the context a task needs is already on disk, where the system loads it without me re-explaining anything. I run more in parallel, because each project and each wiki carries its own state, so I can drop a thread and pick up another without paying to reload it. And I go a bit deeper, because what I learn compounds in one place, in the wiki pages and MEMORY.md files on disk, so every pass at a problem starts further along than the last one did. Faster, wider, and deeper at the same time, which used to feel like three settings I had to trade against each other.

It also pulled the rest of my life into view. I’m a productivity nerd, so planning and prioritizing was already something I did, and did decently, but having every thread, work and personal alike, visible in one place changed the math. I can look at the whole board and pick the next thing with some confidence, rather than trusting my memory to surface it.

The model is the same one I had in April. Everything that changed sits in the scaffolding around it, quietly doing the compounding work I used to do by hand, or more honestly, used to drop.

Knowing when to stop building

The most useful decision I made was declaring the build phase over.

Once a system feels almost done, polishing it is seductive. A decay pass that retires stale pages, typed links between wiki entries, a deterministic context-loading hook. That’s anticipatory engineering, solving problems I haven’t actually hit yet. So at some point I wrote down a transition: the build phase is finished, and from here I use it. After that line, every structural improvement goes to a backlog, and it graduates to active work only when I explicitly ask or when the same friction has surfaced across several real sessions. Cortex enforces this on itself, with a meta-project whose backlog holds all the tempting improvements I’ve decided not to make yet.

One thing I underestimated: there’s a phase between “built” and “in steady use” that shapes the schema more than the build ever did. Call it ingestion, the stretch where I pulled scattered material (old notes, exported workspaces, repo clones, folders of unread papers) into the system. Real content stress-tests a schema in ways no up-front decision can. Several conventions only appeared here, not during the build: an archive/ for finished work (the question “where does this go when it’s done?” had no answer until something was actually done), the small status vocabulary I read at a glance, the rule that keeps strategic tracker notes separate from operational project notes.

What v1 leaves out

This is a description of v1, and v1 has edges I built in on purpose, some of which I now feel as lock-in.

It’s tied to a single agentic harness, every session runs through one model, and everything Cortex touches flows to the same vendor by default. No private lane. Those were deliberate choices for velocity, and the price, honestly, is vendor lock-in I’ve accepted for now, a trade I’d make again to ship. For now that’s fine, because nothing I run through Cortex is especially privacy-sensitive. The sensitive stuff is exactly where I’ve started experimenting: the same scaffolding, pointed at local runners, for the tasks I’d rather keep on my machine. It’s early, a real work in progress, and not mature enough to call part of v1.

v2 is already designed, sitting in a backlog behind explicit triggers. It decouples Cortex from any one harness, adds a privacy lane (per-skill rules about what may leave the machine, with sensitive work routed to a local model), and hardens a chunk of context-loading from a soft instruction into a deterministic hook that fires at spawn time. There’s a longer-horizon thread too, about memory dynamics. Garry Tan’s GBrain has a piece it calls the dream cycle, where the system revisits and reconsolidates its own memory on a schedule. The consolidate-prune-link discipline I run by hand is exactly what agent-memory engines like Supermemory, Letta, and Mem0 automate. Even though I stayed stubbornly files-only, the decay and consolidation ideas owe a lot to them, and the file taxonomy here is already a memory hierarchy.

There’s a more ambitious idea I’m playing with, and I want to put it out there even while it’s half-built. Picture the local model as the orchestrator for privacy-sensitive tasks: it handles whatever is simple on its own, and when it needs more horsepower it runs the prompt through a local privacy filter and a round of anonymization first, sends the scrubbed version to a frontier model, then de-anonymizes the answer before acting on it. The local runner gets to borrow the frontier model’s intelligence without ever handing it anything sensitive. I’m experimenting with it now; it’ll probably land as v1.2, or fold straight into v2.

Taking the parts that fit

Most of Cortex is bash and markdown, so the structure is forkable. What’s worth copying is likely the specs and the principles, alongside technical aspects. On the flip side, the content itself is personal and won’t transfer.

If you want to try it, fork the top-level directory shape, the per-project CLAUDE / MEMORY / TODO / log convention, the three-layer wiki (raw/ to wiki/ to a CLAUDE.md schema), and the script-versus-skill boundary. Build your own skills, scripts, and trackers; you can hand those specs to your favorite agentic harness and have it grill you with design questions. Then you can have a working MVP and start working on it, letting it improve as you use it.

Now, three warnings I learned the hard way:

Don’t try to stand up a complete Cortex on day one, but start with the layout and a wiki or two, and grow from real friction.
Don’t pre-build skills for patterns you haven’t repeated yet.
Don’t skip the rewrite-not-append discipline, since it’s the least intuitive piece and the one that carries the most weight.

I’m planning to share a stripped-down, anonymized version of this scaffolding, since for the time being there’s too much personal stuff embedded in it. It’s the kind of thing you could hand to your own agent and say “build me this.” Until then, the shape above is enough to start from.

Closing

I built Cortex so my sessions would stop forgetting. Yours will look different, and that’s the point! Most of the value turned out to be in writing things down in the right place.

This is the personal build, and I’m leaning on the same scaffolding in places I can’t show here (employer’s NDAs), where the lesson holds just as well: most of the leverage is in the scaffolding around the model, and in actually sitting down to build it yourself.

If you’re building something in this direction, I’d genuinely like to compare notes. How are you handling memory? How are you handing off to others?