I kept losing ideas in the exact window where they were still alive.
Not the big dramatic ones. The smaller, more useful ones: a sentence that made a feature click, a tiny architecture improvement, a thing I wanted to search later, a half-formed thought that was only half-formed because I had not said it out loud yet.
Notes apps were technically available, but they had too much ceremony. Unlock the phone. Find the app. Open a new note. Decide whether this belongs in work notes or personal notes or some doomed inbox. By the time the cursor was blinking, the thought had already been polished into something less true, or worse, disappeared entirely. Also for someone wanting to reduce my phone usage this was not a great option.
So I built jot around a deliberately boring interaction:
Press Command Shift J, speak, stop speaking, saved.
That is the product.
The shape mattered more than the stack
The first version was not really about transcription quality. It was about reducing the number of decisions between having a thought and capturing it.
Most voice note tools still ask for a final little act of discipline. You press record, talk, then press stop, then maybe name the note, then maybe choose where it goes. That last bit is small, but it is exactly where weak ideas die. Jot listens for silence and saves automatically. Four seconds of quiet means the thought is done enough.
That one decision shaped the whole app:
- it had to live in the Mac menu bar
- it had to open from anywhere
- it had to start recording immediately
- it had to save without asking for a title
- it had to treat raw thoughts as valid, not as drafts that needed cleanup first
Once that was clear, the stack was just the toolbelt.
Audio is weird..
Microphone permission is simple until it is not.
In a browser, the permission model is familiar: ask for the mic, get a stream, record. In a desktop WebView, you are suddenly negotiating with macOS privacy settings, app entitlements, the WebView runtime, and whatever assumptions your framework makes about secure contexts.
Once recording worked, silence detection became the next weird edge. Four seconds felt natural on my earbuds. It felt slightly too long on a built-in MacBook microphone in a quiet room. Background noise could keep a recording alive. A long thinking pause could save too early.
I ended up treating silence detection less like a perfect algorithm and more like a tweakable dial. The default should feel good, but the threshold needs to be configurable because rooms, microphones, and people are not consistent.
Making sense of my ideas
The transcript is useful because it captures the thought before it gets edited by self-consciousness. But raw speech is messy. It rambles. It repeats. It has the shape of thinking, not the shape of a note.
That is why jot has an enrich action.
After a note is saved, it can be classified and rewritten into a more useful structure:
- idea: core insight, why it matters, angles to explore, next steps
- task: goal, definition of done, subtasks, blockers
- remember: fact, why it matters, related context
- other: cleaned-up readable version
The important part is that enrichment is optional. Capture stays fast. Organization happens later, when there is more patience available.
Obsidian sync
I did not want jot to become another private database of forgotten thoughts. The notes needed to land somewhere I already look.
So jots sync into an Obsidian vault as markdown files with YAML frontmatter. The folder structure mirrors the categories:
Jots/
Inbox/
Ideas/
Tasks/
Remember/
Other/
Each file gets a UUID, title, timestamp, transcription engine, and tags. Because the folder is just markdown, it can be committed to a private git remote, searched, edited, linked, and moved around like any other note.
That constraint made the app better. It forced jot to be a capture tool, not a knowledge-management empire.
Fin
Jot does not want to be the place where every note is organized forever. It wants to be the fastest path from thought to durable text.
The tool works. I use it. That is still my favorite benchmark.
If I have a thought worth keeping, I press Command Shift J, say the messy version, and let the app catch it before I can talk myself out of it.
You can see the project page at jot.