Collaborative editing with AI is really, really hard

So you have a rich text editor. You have an AI. You want the AI to suggest changes in the rich text editor. You probably have some image in your head of the AI suggesting changes as you type, live, as if it was another collaborator.

We had that idea, too. It’s much harder than it looks. We couldn’t find anyone who actually does it.

ChatGPT Canvas simply halts the editor while the AI is at work, and they gave an excellent talk about how hard the general area is. Cursor live-edits the file, but it works exclusively for plaintext. Notion AI live-edits your workspace, but edits seem to rewrite the entire block, and blow away your changes if you were typing at the same time. Google Docs and Coda AI both write a suggestion and then ask you to click a button to insert that suggestion.

So: is live collaborative editing with an agent possible? And how ill-advised is it, actually? Come, learn from our pain.

Caveat emptor. I'm pretty sure no one has this completely figured out. If you're working on something like this and want to trade notes, we'd love to hear from you! Via Twitter (I am @hausdorff_space, DMs open), or in the text editor-focused Discord server we co-admin, Text Editors Hate You Too.

What we did

Our goal for the “AI agents” feature in Moment is to allow the user to edit rich text documents while the AI is at work, as if it was just another live collaborator among potentially many other human live collaborators. You can roughly see how the feature works here:

That's the vision, anyway. But, while it is not 100% complete, we’ve seen enough to advocate for it. Roughly the approach is thus:

We serialize the document to plain-old Markdown. ~Everything worth keeping in Google Docs/Notion/whatever can be expressed in plain-old Markdown. Yes, really. What .docx is to Word, .md is to us.

The user edits a document via the in-browser rich text editor. Each edit eventually causes the entire document to be serialized to disk as a .md file.

The AI edits the .md files themselves, directly. Agents like claude, amp, and copilot were built to edit text files. It is best to simply let them do this. They struggle well with .patch files and JSON-serialized OT or CRDT ops.

The AI-“suggested changes” are constructed by inspecting the diff. As the AI changes the Markdown, we project the diff “up” into the in-browser EditorState. We do this by using the diff to construct a Transaction that transforms your EditorState into the AI’s EditorState. There is some subtlety here, but as you will see, this is easier than it sounds.

We believe these decisions make it possible to have live collaborative editing with an agent. The rest of this post is about the details required to make them work.

1. We serialize the document to plain-old Markdown

Ok, so, look, this is controversial, but you want to represent your document data as Markdown text. You do. LLMs are string-in, string-out machines, and they are trained extensively on Markdown. You can try to fine-tune the models, or add a bunch of scaffolding to allow it to output a potentially-invalid JSON string that can be hydrated to a ProseMirror Transaction object (or whatever). OpenAI tried this kind of thing and then say they backed away from it.

So: yes, you can do that. OR, you can have the winds of the entire RL budget of opus-4.5 blowing in your sails, and just use a text format like Markdown.

Yes, I know all the retorts, and they are all wrong.

“You can’t represent all the features of Notion/GDocs/Word/whatever.” Yes you can. Split panes? <section>. Highlighting and font size? Normal marks, just a <span>. Comments? Normal marks + store-aside comment text. Empty lines and trailing whitespace? Same as HTML: <br> and &#x20. Rendering options, pub date, authors? Frontmatter. Tables? Have you heard of <table>? The only thing you can’t represent in Markdown is how big of a quitter you are.

“Markdown is a bad file format.” Absolutely incorrect. Markdown is a terrible file format. But it’s what’s out there, and what the models are good at. If the best technology won we’d all be running Solaris or something.

“.md ↔ EditorState is a lossy conversion.” Yeah it is. You’re going to have to handle weirdo edge cases like when you make the space after a word **bold ** by accident. You’re going to have to figure out how to split marks that stagger each other. And you know what? It’s going to be worth it.

“It’s just hard.” Yeah well this is computers buddy, we don’t get paid by the hug.

Recommendations

So, every document is stored (e.g., in database, in .md files) as Markdown text. For the in-browser rich text editor, we use ProseMirror. (More on ProseMirror later; you should definitely use it.) To transform a .md file into a ProseMirror EditorState object, we use remark, and in particular, @handlewithcare/remark-prosemirror.

We use a handful of community-standard Markdown extensions, mostly related to GitHub-Flavored Markdown, e.g., micromark-extension-gfm-task-list-item, which produces checklist TODO items. We have a bunch of customization for features specific to us.

Seriously, not .patch files or CRDT ops. Maybe someday the models will get really good at things like producing accuate line counts in diffs, but for now, they are not, and they are sure to regularly generate nonsense that fails to apply to your document, or even basically-corrupts it. claude, amp, and copilot mostly work by having the models generate regex-replace operations, which they are far better at.

2. The user edits a document via the in-browser rich text editor

In order for users to see suggested changes from the AI, they have to edit using the Moment Desktop App. These suggested changes look like this:

Users can edit the .md files manually, but of course, they won’t see any of this.

Recommendations

We’ll talk about how we actually generate the “suggested changes” transactions in the next couple sections.

For the actual editor view, we use React and @handlewithcare/react-prosemirror. It avoids the pervasive state-tearing issues that are obviously noticeable in hand-rolled React/ProseMirror integrations, as well as in TipTap.

Yes, ProseMirror. I know there is a lot of ProseMirror baggage out there. I've got my own. It is also the only real option to do something this complex. If you have used ProseMirror with React before it seemed like it simply wasn’t working, seriously consider giving react-prosemirror a shot. If you attempt a raw integration or use TipTap, I (unfortunately) pretty much guarantee that for nontrivial usage, you will run into state-tearing issues and they will drive you to the very edge of your sanity.

3. The AI edits the Markdown itself, directly

In our model, users can open an embedded terminal and run claude, amp, or copilot directly. These TUIs edit the .md files directly, and when the Moment Desktop App detects the change, it projects the change “up” into the EditorState. In the next section, we’ll talk about how, but for now you can see copilot at work to in a Moment document (I've entasked it with making the Gettysburg Address more exciting):

Recommendations

This is perhaps peculiar to our product, but if you’re using an embedded terminal to run claude (especially; amp and copilot do better), as of January 2025, it is VERY IMPORTANT that you run xterm.js v6 (NOT v5), and NOT ghostty-web:

claude appears to extensively use DEC mode 2026 (see also).

xterm.js v5 does not support DEC mode 2026.

xterm.js v6 does support it, but is in pre-release right now.

claude, amp, and copilot make use of mouse click events, and as of writing, ghostty-web does not support that, except for clicking links (as I understand it).

4. Construct “AI-suggested changes” by inspecting the diff

In our product, we have to watch the filesystem for changes .md files. This has many complications, so let’s set that aside and talk only about the problem at hand: when the Markdown text changes, we need to compute the differences, and project them “up” into the EditorState as suggested changes.

So you have your ProseMirror EditorState object, and the AI’s ProseMirror EditorState object. The overall idea is to find a list of transactional Steps that transforms yours into theirs. The simplest solution is to do something like the following.

The Extremely Naïve Solution

Our first cut at this will produce some pretty wacky diffs, but it actually works pretty well like 60% of the time.

Pause the in-browser editor while the AI is at work. (Bad; we will remove this constraint later.)

When the AI is finished, take their .md file and “hydrate” it into a ProseMirror EditorState object.

Compare your EditorState and the AI’s EditorState block-by-block.

For any block that is different (ProseMirror blocks are value-comparable), create a transaction Step that replaces that whole block.

Once you have a sequence of steps, use transformToSuggestionTransaction from @handlewithcare/prosemirror-suggest-changes to turn the step into a “suggested change.”

Apply the set of steps to your EditorState! In our app, we render these “suggestions” with red and green backgrounds (deletion and insertion, respectively) and a little button that lets you accept the changes.

Our prototype code looked like this (nb., there are probably bugs here!):

Our production implementation of createStepForBlockChange is reasonably complicated, but the simplest thing to do here is to just always replace the block, which is what we do when the types differ:

The last thing we need to do—turning the Steps into a suggested change—looks roughly like this. We add a nice meta tag to indicate it was an “external” change, for bookkeeping!

The Less Naïve Solution

The previous regimen does work, but has two deep flaws (as well as a bunch of more minor flaws we'll just ignore):

It does not handle one-after-another successive edits from AI agents super well.

We have to pause the editor while the AI is working.

This is certainly fixable, with more effort.

Do not change what the AI sees. If you’re on a filesystem, don’t persist to disk; if your agents run in the cloud, let them work in peace. You can still let users edit the document, but those writes should not make it back to the AI until it's totally done.

Once the agent is done, compute the suggested changes, and “merge them together”. You can use an off-the-shelf collab library like @stepwisehq/prosemirror-collab-commit to do this, or you can do a three-way merge yourself.

Unfortunately, the last step is a “draw the rest of the owl” moment. For us, this is deeply entwined in our collab layer, so we don’t have helpful code to share here.

The idea of prosemirror-collab-commit is to keep track the editor, and then “automatically rebase” incoming changes. You’d have to use their sendableCommit(aiEditorState) to generate the changes, and then receiveCommitTransaction to apply them to userEditorState.

Another approach is to just store the user’s changes, and apply the incoming AI changes “through” the StepMap.

Other considerations

The main one is performance. We have previously written about the lengths we’ve gone through to ensure the app runs at 60fps, even if you have many collaborators.

There is not an easy way to move the “compute the EditorState diff” work off the render thread. We could have introduced significant complexity by adding another JavaScript process to manage these events, but we judged this to be not worth it. So, the diffing does block the render thread. Empirically the user seems not to be typing enough to notice this, though.

An interlude about filesystems

This is more of an aside, but in Moment’s case, claude, amp, and copilot will deal exclusively with .md files, on your local disk. The Moment Desktop App’s job is to watch for changes to those files and “propagate them up” into the EditorState, as described above. But, hidden in this problem is a nice, miniature Byzantine generals problem.

Consider:

Every time a user edits the app, we need to call write_text_file.

How do we know whether we’re overwriting an AI agent’s recent change?

You can read the file and check the contents haven’t changed, but you’ll have a split-second between your read and the write, where the agent wrote to the file, and then you’d overwrite it.

You can write your version of the file to scratch, and then right before you call write_text_file, do a read, and then do an OS rename, but you still have the time-of-check to time-of-use (TOCTOU) race.

And you can’t solve this, fundamentally, without having the AI coordinate with the desktop app.

Our solution to this is kind of lame: we check whether the user is running claude, amp, or copilot in the embedded terminal, and halt write_text_file until it’s done and resolved.

Conclusions

There are a lot of details missing. We do co-operate a community discord called Text Editors Hate You Too if you want to discuss, or have questions.