simulating readers cause I'm not a user researcher
October 28, 2025
Recently, I built an AI-powered testing tool that simulates different user personas navigating documentation with a defined goal to measure success rates, identify navigation issues, and help find actionable insights for doc teams. I managed to do this in 7.5 hours of screen time, and at no point did I abuse Claude so heavily that I hit my session limits.
My entire thesis was that documentation works great for some users, terrible for others, and we never test this systematically. I've never worked on a doc team with unfettered access to real users. When I did work somewhere with a user research team, they didn't care about docs. We do all kinds of testing on APIs but not the docs that explain them (unless you're using something like Doc Detective to contract test the docs). I decided that I wanted to use AI personas with realistic behaviors to navigate docs, attempt to get information on real tasks, and provide specific feedback on what's missing or structured stupidly.
What I ended up with was pretty cool for the amount of work that it took:
- Total development time: 7.5 hours (Friday night: 3 hours, Saturday: 4.5 hours split up over the day)
- Lines of code: ~1,500
- Uses: Playwright, Anthropic Claude API, Commander.js, dotenv
- Documentation: Comprehensive README with examples, probably the most comprehensive readme I've ever written TBH
- Status: Immediately usable, extensible if you aren't afraid of code
It was built entirely using Claude.ai and VS Code. I didn't use any sort of tools like Co-Pilot or Cursor (not even autocomplete). Just conversational coding and fixes because I really wanted to understand what I was building, and I didn't want Claude to do it for me.
Okay, let's talk more about how I built it.
the opening prompt
I didn't know whether my goal was achievable or not, but I started with this prompt:
I want to build a tool that reads through the docs and tells you whether the reader met a goal. It needs to simulate realistic user journeys through the docs:
- Nervous beginner who rapidly cycles through docs looking for something that will help them understand the basics of something
- Impatient expert who skips to API reference or ctrl + f everything
- Troubleshooter searching for error messages
- Someone who likes to read the docs start to finish before starting a task
The tool should measure whether each persona can meet their stated goal. It should generate a report showing which user types your docs serve well vs. poorly. It should use Claude API to simulate realistic reading patterns and decision making.
Claude responded with its typical "great idea" nonsense, but then laid out a solid technical approach. It said to define personas with specific goals and behaviors, use the LLM to simulate navigation decisions page-by-page, and measure success rates along with path efficiency. It also suggested tracking metrics like time-to-success, dead ends hit, and friction points encountered.
Claude's answer felt like enough of a rough shape to go on, so I started building. I didn't build it exactly like Claude suggested, but its initial suggestion was a solid starting point.
timeline: what got built when and what I learned
Like I said, I built this in small chunks of time between Friday night and Saturday evening. I don't have a ton of unbroken time to focus on side projects. In the next few sections, I'll break down how I spent my time and what came of it.
Friday night
My work on Friday night lasted about 3 hours, from ~7pm to 10pm. I go to bed at 10 so I had a hard stop, ha.
hour 1: foundation
- First, I created Node.js project with Playwright and Anthropic SDK.
- I built a super basic crawler to extract page content.
- I got the first page loading and content extraction working (imperfect, but definitely a proof-of-concept).
- I created the first persona (Confused Beginner) -- this stuff all lives in
personaBehaviors.jsnow but at the time was hardcoded in the main orchestration file. - I created rudimentary goal definitions. They now live in
goals.json, but were initially part of the main orchestration file.
hour 2: AI integration
- I integrated the Claude API for decision-making.
- I built the main decision logic (
decideNextStep.js) to analyze content and choose actions (uses Claude API). - I debugged a LOT JSON parsing and the prompt engineering.
- I got the first AI navigation decision working (again, not perfect, but a compelling proof-of-concept).
Hour 3: journey system
- I built the initial journey tracking system (this is now in
journey.js, and orchestrates the whole tool). - I implemented link extraction and filtering (this is now done by
linkExtractor.js). - I ran the first successful end-to-end test, confirming the concept works. Took some notes for things I wanted to work on the next day.
- Went to bed, but couldn't stop thinking about it.
Saturday
Saturday, I spent about 4.5 hours messing with this project. I woke up, worked for about two hours on it, went to run some errands, and wrapped up in the afteroon. I put some finishing touches on it in the evening.
The first thing I actually did was refactor the whole thing into modular files. I initially hadn't been sure that this project would work so I actually just had everything in a few files. I know, For SHAME. Once it was refactored, I spun up a quick little CLI that uses Commander, and from here on out, the CLI evolved with the rest of the system.
Hour 4: dialing in the persona system
The personas are pretty much the most important part of this tool, but I was having a really hard time making them work the way I wanted to. This was the darkest hour of the build, not to be dramatic or anything.
- I added 3 more personas (Efficient Developer, Methodical Learner, Desperate Debugger) -- I'm still not sure the desperate debugger is a worthwhile persona for this kind of testing cause there's a 100% chance they'd just use the site search with their error message.
- I built the link prioritization algorithms (each persona ranks links differently -- for example, beginners like tutorials and experts love references).
- I implemented loop detection so the run would quit if the "reader" hit the same link three times without solving the task (don't want to waste those tokens, you know).
- I tested that the personas showed distinctly different navigation patterns -- it wasn't perfect though and I really struggled a bit spiritually here because personas are hard. This part of the project made me want to quit.
Hour 5: content strategy innovation plus a breakthrough moment
- I realized personas shouldn't just prefer different content -- they should also read it differently.
- I built progressive content disclosure for beginners (preview → full content if uncertain). This was partially to save tokens, but also to keep things fast. My idea here was that the content should be giving the reader enough to go on in the intro paragraph that they can decide whether or not this content is going to help them.
- I then implemented keyword extraction and targeted content for experts (simulates Ctrl+F because developers don't read :P).
- I created full-always mode for methodical learners.
- I also added token usage tracking because I'm doing this on my own dime and I'm stingy with the tokens.
Hour 6: site configuration framework
I had hit a wall with the various ways different doc sites bury the content in elements -- every site has a different HTML structure.
- Stripe uses
<article id="content"> - Anthropic uses
<div id="content"> - Some sites use
<main>, others use<div class="docs-content"> - Navigation menus are in
<nav>,<aside class="sidebar">,<div class="left-nav">, etc.
I knew I couldn't hardcode selectors for every single site on the planet, but I also couldn't rely on universal extraction alone. I decided to go with configuration over code here, and made a lightweight and extensible way to figure out and tell the tool where to look for the doc content.
The universal content extractor was doing its best with semantic HTML (<article>, <main>), but I wanted my tool to be extensible and not brittle, so I did more work:
- I built site configuration system (
sites.json). - I created an inspector tool to analyze any documentation site (
inspector.js). - I bulked up the universal content extraction with smart fallbacks.
- I finally was able to test successfully on multiple different doc sites.
Hour 7: feedback generation
A critical feature for making the tool useful is the feedback it provides. I needed to know what to fix or improve whether a test failed or was a success. I deferred to Claude here for the main design decisions. I knew what I wanted but didn't have strong opinions about how it needed to be done.
- I added failure feedback mechanisms to help me understand when a reader failed its goal.
- I added success feedback that evaluates the content style to reader match.
- I added an aggregate reporting system for grouped test runs.
- I did the final implementation of the token cost estimation.
Hour 8: added polish
- I mostly just added a spit-shine to the CLI.
- I added URL validation and security fixes.
- I created
.env.exampleand a proper.gitignorebecause I realized that I'd committed my dangnode_modules. - I wrapped up and polished the comprehensive README.
ship it
And that's that. It was a pretty fun project, I learned a lot about site crawling and using the Anthropic APIs. It's also usable and has given me some cool insights at work already. Stuff I wouldn't have noticed without a user research team watching real humans struggle to navigate the docs.
what makes this special to me
There's a lot of stuff that makes this project kinda special, I think. But that might be because I'm proud of it? IDK. Here are the parts that make me feel most clever:
1. persona-specific reading modes (click to expand)
Different users don't just prefer different content formats, they also consume content in fundamentally different ways. I am proud of the three distinct reading modes I came up with:
- Progressive disclosure (Beginners): Start with 1500-char preview, load full content (5000 chars) only if Claude indicates uncertainty. Saves tokens while providing context when needed.
- Keyword search (Experts): Extract 1000-char sections around goal-related keywords, simulating Ctrl+F behavior. Experts scan for specific info, not sequential reading.
- Full-always (Methodical): Always load maximum content (5000 chars). Reflects actual thorough reading behavior.
These reading modes lead to 50% token savings for beginners, more realistic behavior simulation, and better (more realistic) success rates.
2. Feedback system (click to expand)
Success doesn't mean good UX. Accidentally finding an answer in a format that's difficult for a reader to use is still a documentation problem. Beginners are wired to prefer the hand-holdy resources like tutorials, and experts are wired to prefer things like API references and implementation guides with code examples.
Success Feedback: Even when tests succeed, the tool evaluates whether the content type matched persona preferences:
- "Perfect" - Ideal format for this persona
- "Acceptable" - Found answer but would prefer different format
- "Poor" - Answer exists but format is frustrating
I can use this information to fill gaps in content strategy for different kinds of readers.
Failure Feedback: When tests fail, Claude analyzes the complete journey to identify:
- What navigation problem occurred
- What specific content is missing
- Actionable recommendation for doc teams
- User impact assessment
The tool provides actionable insights regardless of outcome. I get recommendations for specific fixes, not just "this failed, good luck loser".
3. Smart link prioritization by persona (click to expand)
Each persona reorders links based on their preferences:
- Beginners: Heavily favor tutorials, guides, "getting started" (+50 score), penalize API references (-20)
- Experts: Heavily favor API references (+100), penalize tutorials (-30), simulate jumping to reference docs
- Methodical Learners: Don't reorder, they respect document structure and read sequentially
- Debuggers: Favor troubleshooting, error pages, FAQs (+60), search for error-related content
This behavior is designed to mimic realistic navigation patterns that match real user behavior.
conclusion
This is the most ambitious/complete single thing I've ever built. It's not perfect but it works really well for what I need it for. It's also well-documented and you can extend it for whatever your use case is.
Be sure to check it out on GitHub, the README is comprehensive.