Introducing v10: A smoother, more honest doodle engine

We just shipped v10 of the Thinking Line doodle engine — the part of our pipeline that turns a still hand-drawn whiteboard image into the animated, drawing-itself video you actually watch.

If you generated a video before today, you probably noticed two specific kinds of jank:

Letters that briefly fattened into a 12-pixel outline while being drawn, then snapped back to their true thickness.
Source images peppered with rectangular boxes around every label, which the doodle renderer then dutifully traced as ugly frames.

v10 fixes both, and it changes how pacing feels in a third subtle but important way. This post walks through what we changed and why.

What was broken in v8

v8's doodle renderer drew each glyph in three different states:

Completed contours were filled with cv2.drawContours(..., thickness=-1) — the letter at its true shape.
The partially-drawn active contour was rendered as a 12-px polyline outline traced along the contour boundary.
The moment the contour finished, the renderer switched back to a fill.

The result: every letter visibly inflated to a chunky outline mid-stroke, then collapsed to its real shape. Once you notice it, it's all you can see. It made the videos feel like prototypes.

The second problem was the prompts feeding our image generator. Phrases like "diagram", "infographic", and "technical drawing" — perfectly innocent on their own — kept producing images with flowchart-style labeled boxes. Our doodle engine traces every closed shape, so those boxes ended up as crisp rectangles surrounding every concept label. The opposite of a hand-drawn whiteboard.

How v10 draws each glyph

The fix for the thickness jump turned out to be conceptually simple. Instead of switching rendering modes between "outline" and "fill", v10 does both at once:

For every glyph, we pre-compute a filled mask — what the glyph should look like when it's done.
During the reveal, we sweep a thick brush along the contour. As the brush moves, we AND that sweep with the pre-computed fill mask.

Because every revealed pixel always passes through the true filled mask, the glyph is always at its real thickness. No 12-pixel inflation, no snap-back. The brush controls when a pixel becomes visible; the fill mask controls what that pixel looks like when it does.

Uniform pacing across glyph sizes

Once we'd fixed thickness, a different problem became obvious: small dots and short letters were drawing way too fast, while large icons crawled.

The naive fix — give every glyph the same time slice — also looks wrong (the brush moves at radically different speeds across the screen). The naive fix in the other direction — give every glyph time proportional to its contour length — gives tiny glyphs a one-frame slice and big letters all the time in the world.

v10 uses a compressed weighting:

Each glyph's time slice is proportional to the square root of its contour length.
The drawing brush is sized to each glyph's height (clip(height × 0.45, 6, 56)).

Together these two settings keep the visible reveal rate roughly constant across glyph sizes — small glyphs get a small brush moving slowly; big glyphs get a wider brush. Both finish in about the same wall-clock time, and the pen never feels like it's racing or slogging.

Box-free image prompts

The doodle engine is only as good as the image it's tracing. v10 rewrote the prompts going to our image model to deliberately avoid the language that produces boxed labels.

Removed from positive prompts: "diagram", "infographic", "technical drawing", "vector style".

Added to positive prompts: "loose floating icons connected by curved arrows", "labels written next to icons (not inside any shape)", "open whitespace composition".

Added to negative prompts: boxes, rectangles, frames, panels, callout boxes, labeled containers, flowchart shapes, table cells, borders, dividers.

The manifest LLM that writes our image prompts also got new rules: every image_prompt must describe a hand-drawn whiteboard with loose floating icons, never a flowchart or framed sections.

Side-by-side

The same topic — "How a neural network learns" — rendered with v8 and v10:

Aspect	v8	v10
Stroke thickness during reveal	Inflates to 12-px outline, then snaps	Stays at true thickness throughout
Time per small glyph	Drawn at high pen-speed	Drawn at uniform visible speed
Time per large glyph	Same fixed slice as a dot	Proportional to √length, capped
Source image style	Boxed flowchart labels	Loose floating icons + arrows

Compatibility

v10 is now the primary usual mode generator on Thinking Line. v8 stays on disk as an ImportError fallback — if anything goes wrong with v10 in production, v8 picks up automatically. The frontend doesn't need to change: when you ask for the usual video mode, you get v10.

What's next

The renderer is one piece of a larger pipeline we're still tuning. A few directions on our radar:

Hybrid scene modes — letting each beat pick between a doodle reveal, a slow camera move on a static image, or a small 3D scene with floating labels. Generalizable across topics, useful for intros and key-concept beats.
Smarter pacing per beat — using the audio narration's word-level timing to align brush strokes with the words being spoken, not just to fill the segment's duration.
Captions and overlays — synced to the same word-level timing.

If you have feedback on v10 — strokes that still feel uneven, scenes that still come out boxed, anything — please send it our way. The engine improves on real videos, not synthetic ones.

— The Thinking Line team

Back to all posts