Under the hood

How the analysis works — no AI involved*

Every number, pattern, and observation Scribbly surfaces is produced by deterministic text analysis: counting, measuring, and comparing. There is no machine learning, no language model, and no external server involved. This page explains exactly what the code does to your manuscript text and how each insight is produced.

AI Disclosure*

While Scribbly uses absolutely no AI in any part of the app, Claude Code was used to code Scribbly. The goal was to create an app that is open-source, free forever, and provides detailed analysis for authors who want to improve their manuscript without tools that are constantly embedding Generative AI into their workflows.

With that said: We want to make it clear upfront how AI was used in the making of this app so you can make your own decision whether it is a tool you want to use in your writing process.

Due to the usage of AI to build the app, there are bugs. It's not optimized. But, for authors who want a free text analysis tool that doesn't leverage Generative AI and is run locally, we hope Scribbly can be as helpful to us as it is to us.

On this page

1How text becomes data
2Tension scoring
3Sentence rhythm
4Dialogue ratio
5Character presence
6Character co-occurrence graph
7Thread & act detection
8Repetition detection
9Plot anchors & story architecture
10Semantic map & annotation matching
11Version comparison
12Where the computation happens

How text becomes data

When you open a manuscript, Scribbly extracts its full text and splits it into chapters — first by looking for headings that match the pattern Chapter N or Chapter [Roman numeral], then by triple line breaks or scene-break markers (* * *, ---) if no headings are found. Each chapter becomes an object with a title and a text body.

A shared tokenizer strips possessives ('s, 's) and extracts every alphabetic token alongside its position in the text as a fraction from 0 to 1. This positional value is used throughout the analysis to place events proportionally across the manuscript rather than by word count or page number.

Everything runs in your browser

All of the functions described on this page are executed in JavaScript inside your browser tab. No text is sent to any server. The computation uses only standard browser APIs — regular expressions, array operations, and basic arithmetic.

Tension scoring

Tension is not detected by reading for meaning. It's a composite of five measurable prose signals, each computed independently and then weighted together. The idea is that high-tension writing tends to produce physically different text — shorter, more erratic sentences; more punctuation; shorter paragraphs — regardless of subject matter.

30% weight

Sentence length variance

The statistical variance in word count across all sentences in the chapter. A mix of very short and very long sentences — jagged rhythm — scores higher than uniform prose. Computed as the mean squared deviation from average sentence length.

25% weight

Short sentence clustering

Counts sentences under 6 words and rewards runs of them over scattered instances. A run of three short sentences scores 1+2+3=6; three isolated short sentences score 1+1+1=3. This captures the "staccato pursuit" rhythm that readers feel as urgency.

25% weight

Punctuation intensity

Counts !, ?, em-dashes (—, --, –), and ellipses per word. Exclamation marks and question marks count at full weight; em-dashes at 0.7×; ellipses at 0.5×. The result is divided by word count to normalise for chapter length.

20% weight

Paragraph brevity

The chapter is split on double line breaks. Average paragraph word count is computed, then inverted: 1 − (avg / 300), clamped at 0. Short, punchy paragraphs score near 1. Long flowing paragraphs score near 0. Chapters already at 1-sentence paragraphs score full marks on this signal.

5% weight

Conflict keyword frequency

A hand-curated list of ~130 conflict and intensity words (fight, betrayal, desperate, scream, death, etc.) is checked against the chapter tokens. Matches are counted per word. This signal is intentionally a small bonus — keyword matching alone is a poor proxy for tension in literary fiction, which is why it only contributes 5%.

Each of the five raw scores is normalised 0–1 against the manuscript-wide maximum before weighting — so what you see in the timeline reflects relative differences across your chapters, not some absolute scale. A chapter only scores "high" if it has more of these signals than your other chapters do.

// Final tension score for each chapter:
score = (variance / maxVariance) × 0.30
      + (shortCluster / maxCluster) × 0.25
      + (punctuation / maxPunct) × 0.25
      + brevity × 0.20
      + (keywords / maxKeywords) × 0.05

Sentence rhythm

For each chapter, Scribbly splits the text into sentences using a regex that matches text followed by ., !, or ?. It then computes the average sentence length in words for that chapter.

The manuscript-wide mean and standard deviation of average sentence lengths are then calculated. Any chapter whose average sentence length deviates more than 1.2 standard deviations from the manuscript mean is flagged as a rhythm outlier — a chapter that reads noticeably differently from the rest of the book. This can flag a chapter with unexpectedly long, meandering prose in an otherwise punchy manuscript, or an unusually clipped chapter in a slow-burn literary novel.

Dialogue ratio

Scribbly counts characters inside quotation marks (both straight " and curly " ") using a regex capped at 600 characters per match to avoid swallowing adjacent scenes. The total character count inside quotes is divided by the total character count of the chapter to produce a ratio between 0 and 1.

A chapter with a dialogue ratio above 0.55 is classified as dialogue-heavy. This threshold is used in the Manuscript Timeline and in cross-signal observations — for example, when a high-tension chapter has almost no dialogue, Scribbly surfaces that as a structural note worth examining.

Character presence

Character tracking works entirely from the names and aliases you define in Manuscript Setup. Scribbly does not attempt to infer characters automatically — you tell it who matters, and it counts them.

For each defined character, every name and alias is searched across the full manuscript text using a case-insensitive word-boundary regex. Each match records its positional location (0–1 across the full text). From the full list of positions, Scribbly computes:

First appearance — the lowest position value (earliest in text)
Last appearance — the highest position value (latest in text)
Peak activity — the centre of a 10%-wide sliding window that contains the most matches
Total count — total number of name/alias matches across the manuscript

A character whose last appearance is before 60% of the manuscript is flagged as a possible dropped thread. A character who first appears after 45% of the manuscript but has 15 or more mentions is flagged as potentially needing earlier seeding. These are observations, not prescriptions — a late-arriving character may be intentional.

Why aliases matter

If your protagonist is referred to as "Lena," "Dr Vasquez," and "she" across different chapters, only the first two are trackable by name — pronouns are too ambiguous to attribute reliably. Defining all aliases in Manuscript Setup ensures every named reference is captured. Unaliased names will simply be undercounted.

Character co-occurrence graph

The connection graph is built by scanning the manuscript in sliding windows of 500 words, stepping forward 100 words at a time. Within each window, Scribbly identifies all capitalised proper nouns that appear at least 5 times in the manuscript and in at least 3 distinct chapters (a filter that removes sentence-starting common words like "Instead" or "Maybe" that happen to be capitalised).

When two character names appear in the same 500-word window, their co-occurrence counter is incremented. After the full scan, character pairs with at least 2 co-occurrences are included as edges in the graph. Node size encodes total appearance count; edge weight encodes shared-scene frequency.

A thick line between two characters means they are present in the same passages frequently — it does not necessarily mean they interact directly, only that their names appear close together in the text. Thinner lines indicate more incidental proximity.

Thread & act detection

The Thread Tracker operates in two modes. Any word or phrase you add manually is tracked by exact word-boundary regex across every chapter, and its positions are reported as a sparkline. Scribbly additionally runs an automated scan for threads that may have structural problems.

In the automated scan, the manuscript is divided into three equal act segments. Scribbly counts how often each significant word appears in each act — using capitalised proper nouns and lowercase words of 4+ characters, after filtering stop words. Only terms that appear at least 12 times total and in at least 3 distinct chapters are examined. Four distribution patterns are flagged:

Pattern	What it means	Severity
Unresolved warn	Heavy in Acts 1–2, nearly absent in Act 3. Appears in early/middle of the book but drops before the end — may lack payoff.	Flag for review
Underdeveloped warn	Barely present in Acts 1–2, concentrated in Act 3. Something important to the ending that wasn't seeded early enough.	Flag for review
Dropped warn	Heavy in Act 1 only. Prominent at the start but quietly disappears — may have been introduced and then abandoned.	Flag for review
Isolated info	Concentrated in Act 2 only. Could be an intentional subplot, or a thread that never connects to the larger arc.	Worth examining

A term is only flagged as unresolved if Acts 1+2 contain more than 4× what Act 3 contains, and Act 3 represents less than 15% of total mentions. The ratios are calibrated to avoid flagging gradual fade-outs, which are often intentional.

Repetition detection

Scribbly scans the full manuscript text in a sliding window of 500 words, stepping 50 words at a time. Inside each window, it counts how often each word of 4+ characters (excluding stop words) appears. Any word that appears 4 or more times within a single 500-word stretch is logged as a potential repetition, along with the position in the manuscript where it occurs.

The result is a list of up to 50 terms, sorted by how many windows they were flagged in. A word that appears at the top of this list was overused in multiple different stretches of the manuscript — a pattern worth reading closely. A word that appears once is a localised repetition, potentially a stylistic choice or an isolated edit oversight.

This detection runs across both the current draft and any comparison draft when version comparison is active, making it possible to see whether a word you overused in Draft 1 still appears in Draft 2.

Plot anchors & story architecture

Plot anchors — Rising Action, Midpoint, Climax, Falling Action, Resolution — are detected from the tension scores Scribbly has already computed, not assigned by genre convention or keyword matching.

Climax — the chapter with the highest tension score across the whole manuscript.
Midpoint — the chapter nearest the 50% mark with the highest tension score in the 35%–65% window.
Rising action — the start of the longest unbroken upward trend in tension before the climax chapter.
Falling action — the first chapter after the climax where tension drops below 70% of the climax score and stays there.
Resolution — the last stretch of chapters where tension is consistently below 60% of the manuscript average.

The Story Architecture observations are derived from these positions. A "flat Act 2" alert fires when Act 2's average tension score is more than 30% below both Act 1 and Act 3. Climax placement warnings fire if the highest-tension chapter arrives before 60% or after 92% of the manuscript — thresholds chosen because they represent genuine outliers in conventional three-act structure, not stylistic preferences.

All thresholds are percentile-based relative to your manuscript. "High tension" means top 25% of your chapters. "Quiet" means bottom 33% of your chapters. The same chapter might register as high tension in a slow literary novel and low tension in a thriller — because the scale is always your book, not a universal standard.

Semantic map & annotation matching

The Semantic Map is where Scribbly comes closest to something that might feel like AI — but the mechanism is straightforward keyword matching, not neural embeddings or vector search.

Each cluster in the map (whether from Scribbly's built-in set or custom clusters you define) has a list of keywords. When you highlight a passage, Scribbly checks whether any of the cluster keywords appear in that passage's text. The match strength is determined by:

Keyword frequency in the passage

How many of the cluster's keywords appear in the highlighted text. A passage containing hollow, drained, and faded scores higher against an "Emotional Burnout" cluster than one containing only tired.

Keyword density

Matches are weighted by word count so that a two-sentence highlight with three keyword hits scores higher than a paragraph-long highlight with the same three hits.

Cluster size in the graph

A cluster node grows in proportion to how many of your annotations match it. Clusters with many annotation overlaps appear larger. Clusters with few matches appear smaller or not at all if below the minimum threshold.

Edges between clusters

An edge between two cluster nodes indicates that the same annotation matched both — i.e. the passage contained keywords from both clusters. Thicker edges mean more annotations where both clusters matched simultaneously.

Because the map is entirely keyword-driven, the quality of the map depends on the quality of your clusters. Scribbly's built-in clusters cover common literary and craft concerns, but adding custom clusters with vocabulary specific to your manuscript — character names, recurring objects, setting terms — will make the map significantly more specific and useful.

What about embeddings?

Scribbly's codebase does include an embeddings worker that can optionally compute vector representations of highlighted passages using a lightweight on-device model. This is a local computation — no text leaves your browser. It is used only as a supplementary signal for annotation matching in the Map view when available; the core keyword matching described above always runs regardless.

Version comparison

When you link two manuscripts as parent and revision, Scribbly runs the full analysis pipeline on both independently and then compares the two result sets. No text diffing is performed — there is no line-by-line comparison of what words changed. What changes is the analysis output.

Tension curve overlay — both tension score arrays are plotted together. If your revision reduced tension in Act 2, the curves will visibly diverge there.
New alerts since last version — repetition flags, thread drop flags, and cross-signal observations that appear in the current analysis but not in the previous one.
Resolved alerts — flags that appeared in the previous analysis but no longer appear in the current one, confirming that a structural fix landed.
Structural shift detection — if the climax chapter index, act score averages, or quiet-run lengths change significantly between versions, this is reported.

The comparison is available on demand via the "Compare versions" button in the Dashboard header — it only appears when the currently open manuscript has a linked parent version with a saved analysis.

Where the computation happens

Every function described on this page — tokenization, tension scoring, dialogue counting, character tracking, thread detection, repetition scanning, plot anchor detection, cluster matching — runs as JavaScript in your browser tab. The results are stored in your browser's IndexedDB alongside your highlights and notes.

Nothing is sent to a server. Scribbly has no backend. There is no analytics pipeline, no model training loop, no telemetry. When you close the tab, the only record of your manuscript's analysis is the one saved in your browser's local storage — and that is fully exportable and deletable by you.

The optional embeddings worker, if it runs, also operates locally. It loads a small model from a bundled file, computes vector representations in a Web Worker off the main thread, and stores results in IndexedDB. It does not communicate with any external endpoint.

Running Scribbly offline

Because everything runs in the browser, Scribbly can be served from your local filesystem. Clone the repository, open index.html directly, and the full analysis pipeline runs without any internet connection.

Read deeper.
Edit smarter.
Keep your manuscript yours.

Teach Scribbly your book before it analyzes it

Turn vague problems into specific revision plans

Your manuscript's structural anatomy

Your manuscript's thematic architecture, made visible

A reading environment where your highlights do the work

See what actually changed between drafts

Your manuscript never touches a server

No server, no backend

No account required

No AI in your manuscript

Your manuscript, seen clearly.