How I Built a Zotero Plugin with an AI Pair Programmer
I had an itch. My Zotero library, a collection of academic papers I’ve gathered over the years, was full of dead links. It felt messy, and I desperately wanted to clean it up. But the thought of checking each attachment manually was daunting. “How hard could it be,” I thought, “to just build a plugin to do it for me?”
As it turns out, with an AI pair programmer, it’s surprisingly achievable.
In just two days, I went from knowing next to nothing about Zotero plugin development—I’d never seriously coded in TypeScript or built a browser extension before—to having a working plugin that validates attachments, fetches metadata, processes preprints, and even downloads missing files.
The result of this two-day sprint is a plugin I call Zotadata, and you can check it out on GitHub.
I call this “Vibe Driven Development.” It’s a new way of working where you, the developer, act as the architect and the project lead, while the AI does the heavy lifting. You don’t have to get your hands dirty with every line of code, but you absolutely need to be the boss who knows what smells right and what’s a dead end.
The Art of Prompting is Real
The term “prompt engineering” gets thrown around a lot, but it’s a real skill. The big AI providers like Anthropic and Google feed their models complex “system prompts” that act as a constitution—guiding their behavior, setting rules, and establishing a persona. For instance, Claude’s system prompt reminds it to be helpful and harmless, and to be cautious about giving advice on sensitive topics like law or finance. These foundational instructions are what separate a useful assistant from a chaotic text generator.
This highlights a crucial truth: for an LLM, context is everything. To get the results you want, you have to be a good coach.
Before writing a single line of code, I sat down with the AI and hammered out a roadmap. This might seem like overkill, but it’s essential. Just like humans, AI models don’t have perfect memory, especially in long conversations. A clear roadmap helps refresh the AI’s “memory” and keeps it on track. It also prevents the model from getting confused by previous, unrelated tangents. I found that starting a new, clean chat session for each major feature, armed with our roadmap, was the most effective way to keep the AI focused.
You’re the Expert, The AI is the Intern
As smart as these models are, they aren’t the source of truth. They can and will make mistakes. Your domain knowledge is the critical feedback loop that keeps the project from going off the rails.
For example, when I tasked the AI with implementing the “Fetch Metadata” feature, its first instinct was to suggest a list of free public APIs. It was a plausible, even clever, solution. But I knew from my own experience that Zotero has a built-in, high-quality translator for fetching metadata via DOIs and ISBNs.
This was a classic case of the AI finding a “good” solution, but not the best one. I course-corrected, instructing it to prioritize Zotero’s native functions and only use the other APIs as a fallback. The result was a feature that was more robust, efficient, and better integrated. You are the senior developer in this relationship; the AI is a brilliant but inexperienced intern. You have to provide the guidance and critical feedback.
This is why having a tight feedback loop is non-negotiable. It’s not just one thing; it’s a multi-layered system for catching errors and keeping the AI on track.
-
Pragmatic Feedback: Manual First, Automated When Necessary. Ideally, we should all be building projects with robust, automated test suites from day one. I had the AI generate unit tests and verbose logging, which provided a solid foundation. But if I’m being honest, I got a bit lazy. For a project of this scale, the manual feedback loop was often faster and, frankly, good enough. My primary cycle became a rapid
build -> install -> test -> repeat
. I’d install the plugin in Zotero, try to break it, and the moment I saw something odd, I’d describe it to the AI. This hands-on approach worked well here, but I’m fully aware that for a more complex project, a more disciplined, test-driven framework would be essential to keep things from spiraling out of control. -
Structural Feedback: Small, Atomic Commits. Working with an AI can feel like moving at lightning speed, which makes it easy to create a tangled mess. To counter this, I was disciplined about my workflow. I broke down every feature into the smallest possible tasks. After the AI completed a task and I verified it worked, I immediately made a commit. This protected my work and created clean, isolated changes. If the AI went completely off the rails on the next task, I could always roll back to a known good state without losing much progress.
In this “vibe driven” style of coding, you’re the driver, but your hands are off the wheel most of the time. Your feedback loop—the tests, the logs, the manual checks, and the git history—is your navigation system, your dashboard, and your emergency brake, all rolled into one. It’s what lets you drive fast without crashing.
The Sci-Hub Surprise: A Lesson in AI Psychology
One of the most fascinating interactions was when I wanted to add a feature to download missing papers. My models are quite sensitive to intellectual property concerns, so when I initially made a vague request about “downloading papers from Sci-Hub,” the model politely declined.
Fair enough. But then I re-phrased my request. I asked it to “use the Sci-Hub API to download files for academic papers given a DOI.” This time, it worked perfectly.
The model’s safety training likely triggered on the general concept of “downloading papers,” which could be interpreted as piracy. But the more specific, technical instruction to “use an API” was seen as a standard coding task. It’s a subtle but powerful lesson in how to navigate the ethical guardrails of these systems.
Choosing Your AI Teammate
For this project, I primarily relied on Claude 4.0 Sonnet and Gemini 2.5 Pro. Both are incredibly powerful, but I found they had different strengths.
In my experience, Claude felt a bit more robust for the back-and-forth of debugging. At one point, Gemini got stuck in a repetitive loop trying to fix a bug. I switched over to Claude with the same context, and it identified the root cause and implemented a fix almost immediately. However, both models were instrumental in getting the project done. Your mileage may vary, and it’s great to have options.
The Future is Fun
I had a blast building this plugin. I could multitask, watch YouTube, or just let my mind wander while my AI partner churned out code.
I don’t think AI will replace experienced engineers anytime soon. Instead, it’s going to supercharge them. The more experience you have, the better you are at guiding the AI, spotting its mistakes, and leveraging its strengths. It changes the job from being a bricklayer to being an architect. And honestly, it’s a lot more fun. Let’s go get the vibe!