I’m building Audiqa, a music library management system designed to bring large collections into shape and keep them coherent as they evolve over time. At its core is a decision engine. Its job is to deal with duplicate files, conflicting track information, uneven quality, and the question of which version should become the library’s reference copy.
In practice, Audiqa inspects a collection, identifies likely conflicts, and proposes what should happen next. But proposals are not the same thing as changes. The library should only be updated once those decisions are settled. That separation sounded clean. Elegant, even. The kind of thing you draw on a whiteboard and feel good about yourself.
I was three hours into redesigning the system when the spec stopped making sense. The import workflow described a sequence of steps: scan the files, work through the decision process, then apply changes to the library. But two adjacent steps depended on each other in a way that could not work — step 8 needed data that would not exist until step 9 produced it. A sequencing bug. Fix the step order, pat yourself on the back, move on.
Except it wasn’t a sequencing bug. The spec assumed the import process runs unattended: scan files, resolve duplicates, choose preferred metadata, update the library. One continuous flow, no human involved.
But that is not how it should work. People who care enough about their music libraries to use a tool like Audiqa are not casual listeners. These are the people who have spent years grooming their collections — fixing tags, choosing preferred editions, preserving distinctions other software would happily flatten away. So when the system finds a conflict — one copy is tagged as the original CD release and another as the 1999 remaster — that is not just a data mismatch. It is a judgement call sitting on top of someone else’s accumulated care.
Audiqa should propose the most sensible resolution, but proposals are not permission to rewrite a collection that may have taken years to shape. Maybe those versions should stay distinct. Maybe the remaster really is the better default. Maybe the user deliberately kept both because they care about the difference. The point is not just that a human has to be involved. It is that the user has already invested judgement into the library, and Audiqa does not get to just waltz in and overrule that.
That single realisation did not just change one step. It invalidated several connected design decisions that had quietly inherited the same assumption. Five sections of a 1,600-line spec, all resting on one belief I had never written down.
ADRs are great until they’re not
Once I understood that the problem was not really about step ordering, I had to ask where design decisions actually live. Most teams write down important technical decisions somewhere: a spec, a design note, a short record explaining what was chosen and why. In software, one common name for those records is Architecture Decision Records, or ADRs. If you are not writing them, start. They are genuinely good.
But they have a blind spot. They capture what I decided, not what I believed was true when I decided it. Every decision rests on assumptions. Some are obvious enough that nobody bothers writing them down. Others are so embedded in your mental model that you do not even notice they are there — until they break and take half the spec with them.
Some assumptions are stable technical facts. For example, part of Audiqa’s move detection relies on the low-level rule that a file’s internal ID only makes sense within a single storage system, not across different disks or network drives.
Others are guesses about user behaviour. One of my decisions was about what Audiqa should do when someone changes its rules later and wants earlier decisions reconsidered. If a user has already gone through a batch of duplicates, kept the version they want, and cleaned up the rest, should Audiqa only reconsider the library as it exists now? Or should it try to reconstruct earlier decisions from full history? My original design assumed that, by that point, most people would already have removed the duplicates they did not want. That assumption shaped how much of the earlier decision process needed to be revisited.
Both are assumptions. One is bedrock, the other is a guess about user behaviour. The written decision record treats them identically: neither is recorded.
Now find everything else it broke
When you learn something new, the hard part is not updating one decision. It is finding every other decision that was built on the same foundation.
A single assumption about user behaviour affected more than one design choice in my ADRs. If that assumption broke, multiple decisions needed revisiting, but the records did not link to each other through the shared assumption. I would have had to re-read every note and mentally reconstruct which ones depended on the invalidated belief. That is not engineering. That is homework. It is tractable at 8 ADRs. It is not tractable at 80.
So write the bloody things down
The fix was to make assumptions referenceable — not just a sentence buried inside a decision note, but their own entries that multiple decisions can point to.
I built this in Anytype1 with three kinds of entries:
Decision Record — the decision note itself: what was decided, what alternatives were considered, and why. It links to the assumptions it depends on.
Assumption — a statement the decision depends on and that could later turn out to be wrong. It carries a status and links back to the decisions that rely on it.
Learning — a new insight that may challenge an assumption. It links to the assumption it affects and the decisions that may need to be revisited.
A learning challenges an assumption. An assumption is relied on by one or more decision records. When a learning arrives, you link it to the assumption it challenges. The assumption’s status changes. The connected decisions light up. You do not have to go hunting for them.
Maybe the question was wrong
In practice, the structure changed the conversation quickly. I started with the step-sequencing contradiction, explored a few scenarios, and wrote down what each one revealed. Some confirmed assumptions I had never stated. Others contradicted what the spec had quietly assumed. The important learning was not “move step 8 below step 9.” It was “this workflow contains a review boundary I never modelled explicitly.” Once that became visible, the contradiction stopped being a sequencing problem and became an assumptions problem.
That points at something bigger. There is a difference between checking whether a decision seems reasonable and checking whether the picture behind it was right in the first place. Most design reviews stay with the decision itself. Is this the right sequence? Is this the right trade-off? Does this seem sensible? Those are useful questions. But sometimes the more important question sits underneath them: what had I assumed about the user, the workflow, or the boundaries of the system that made this decision seem sensible at all?
That was the real shift here. At first, I was asking which step should happen first. The deeper question was whether I was even modelling the workflow correctly. Was this really one continuous automated process, or was there a review boundary in the middle that the design had failed to account for? Once I asked that second question, the contradiction stopped looking like a sequencing issue. The steps were not simply in the wrong order. The underlying picture was wrong.
Organisational theory has a name for this — double-loop learning2 — which sounds like something a management consultant charges you to explain over lunch. But the idea is dead simple: instead of only asking whether the answer inside the current model is correct, you also ask whether the model itself is right.
That does not magically find everything. Once a system grows beyond a bare-bones weekend script held together by three folders, one YAML file, and optimism, missing something becomes inevitable. There will always be assumptions I failed to write down, consequences I did not spot, and links I only notice in hindsight.
The point is not perfect coverage. The point is to rely a little less on memory and luck. This structure gives me a better chance of finding the decisions most likely to be affected when something important shifts, instead of re-reading every decision note and hoping I spot the damage before it spots me.
For Audiqa, that shift matters for more than just design hygiene. Making assumptions explicit is not just a way to produce cleaner architecture. It is also a way to build software that does not treat the user like an idiot. Not everyone wants software to silently “fix” a music library they may have spent years shaping. Audiqa should not only be good at making decisions. It should be good at knowing when to shut up and let the person who actually lives with the library have the final say.
-
Anytype is a local-first knowledge tool with typed objects and relations. The same pattern works in Notion (with databases and relations), Obsidian (with Dataview queries), or even a flat directory of markdown files with consistent YAML frontmatter. The tool matters less than the structure: assumptions as linkable objects, not inline prose. ↩︎
-
Chris Argyris coined the term in the 1970s. Single-loop learning adjusts actions within existing assumptions. Double-loop learning questions the assumptions themselves. Most teams operate almost exclusively in single-loop mode because questioning assumptions feels slower — until an unexamined assumption invalidates a month of design work. ↩︎