Manga Kotoba, Part 3: Performance Hell and the N+1 Crisis
This is part 3 of 5.
Parts 1–2 covered the
concept and the core
Symfony/Doctrine setup.
You can read those first
or jump in here.
The app worked. I’d wired up the Symfony controllers, built the Doctrine entities, seeded a handful of manga titles and volumes, and I could hit the API endpoints and get JSON back. It felt good — that fragile, optimistic good you feel before you feed the system real data.
Then I imported my actual manga library.
Forty titles. Each with multiple volumes. Each volume with pages and vocabulary entries. The manga list endpoint — the one the frontend hits first, on every page load — started responding in four to six seconds. On a page refresh. For a list endpoint.
I opened the Symfony Web Profiler toolbar. The query counter read 83.
This is the story of how I fixed that.
Anatomy of an N+1 Query
“N+1” means 1 query
to fetch N rows,
then N more queries
to fetch related data
for each row. It’s one
of the most common
ORM performance bugs.
Before I show the fix, let me show you exactly what was happening. The interactive demo below illustrates the difference between N+1 loading and a proper eager JOIN. Click Load to see the queries fire in real time.
⚡ N+1 Query Visualizer
The difference is stark. N+1 fires a query for the manga list, then a query per manga for volumes, then another per manga for cover images. With 5 titles that’s already 11 queries. With 40 real titles it was 83.
Every query is a round-trip to the database. On Heroku, your database might be in a different availability zone. Each round-trip can cost 5–20ms. Forty round-trips × 20ms = up to 800ms of pure waiting, before any computation happens.
Finding the Problem — The Symfony Profiler
The Symfony Web
Profiler toolbar is the
dark bar at the bottom
of every dev-env page.
Click the database icon
to see all queries.
The Symfony Web Profiler is indispensable. In dev mode, a slim toolbar appears at the bottom of every page (and every API response in the profiler UI). Clicking the database icon opens a full list of every Doctrine query, with execution times and the exact SQL generated.
Here’s what mine looked like after loading the manga list:
When you see a pattern like SELECT ... WHERE manga_id = ? repeated twenty or forty times with only the id value changing — that’s your N+1. Each line is a separate network round-trip to the database.
The profiler also lets you click any query to see the exact Doctrine call stack that generated it, which makes pinpointing the source trivially fast.
The max_questions Heroku Limit
Heroku’s free MySQL
(via ClearDB) had amax_questions limit
of 3600 per hour.
The N+1 queries were
eating through it in
minutes.
I was running on Heroku’s free tier with ClearDB (MySQL). What I didn’t know until the app started failing in production was that ClearDB’s free plan enforces a max_questions limit — a cap on the total number of SQL statements executed per hour.
My N+1 queries were burning through that limit every time a real user loaded the app.
SQLSTATE[HY000]: General error: 2006 MySQL server has gone awayThis cryptic message was Heroku cutting off the connection after the quota was exhausted. It took me an embarrassingly long time to connect the dots between N+1 queries and this error.
The fix had two dimensions: eliminate the N+1 queries structurally, and also reduce unnecessary queries in the import pipeline. Commits 86dcf38 and f036f4c addressed the API layer; 2ba7d0e fixed the CSV importer.
The Fix — Eager JOIN Fetching in DQL
Doctrine’s default
fetch mode is LAZY:
it defers loading
associations until
you access them.
Each access = a query.
Doctrine’s default behaviour is lazy loading. When you call $manga->getVolumes(), Doctrine fires a SELECT right there. In a loop over forty manga, that’s forty queries you never explicitly wrote — they materialise invisibly as your code iterates.
The pattern that caused the problem:
❌ Before — implicit N+1
// MangaRepository.php public function findAllWithStats(): array { // Returns Manga objects with lazy-loaded associations return $this->findAll(); } // In the controller / serializer — each call below fires a query foreach ($mangas as $manga) { $volumeCount = count($manga->getVolumes()); // QUERY #2, #3, #4 … $cover = $manga->getCoverImage(); // QUERY #N+2, #N+3 … }
The fix is to write an explicit DQL query that JOIN FETCHes the associations you know you’ll need:
✅ After — single eager JOIN
// MangaRepository.php public function findAllWithVolumesAndCovers(): array { return $this->createQueryBuilder('m') ->select('m', 'v', 'c') ->leftJoin('m.volumes', 'v') ->leftJoin('m.coverImage', 'c') ->getQuery() ->getResult(); } // Controller: volumes and coverImage are already hydrated — // accessing them fires ZERO additional queries foreach ($mangas as $manga) { $volumeCount = count($manga->getVolumes()); // no query $cover = $manga->getCoverImage(); // no query }
After this change, the manga list endpoint dropped from 83 queries to 3. Response time: from 4–6 seconds down to under 200ms.
The COALESCE Wall — When DQL Isn’t SQL
Commit ccae44c:
“fix: replace COALESCE
DQL (unsupported) with
two-query fallback”.
DQL is a subset of SQL
— not everything
translates directly.
After sorting the N+1 issue, I wanted to add a “known vocabulary percentage” to each manga — the proportion of words in the manga that the user had already learned. My first instinct was to reach for SQL’s COALESCE to handle the null case when a user had no vocabulary records yet.
The DQL I tried:
❌ Failing DQL with COALESCE
// This looks reasonable, but Doctrine's DQL parser rejects it $dql = 'SELECT m, COALESCE( (SELECT COUNT(uv.id) FROM App\Entity\UserVocabulary uv WHERE uv.manga = m AND uv.user = :user), 0 ) AS knownCount FROM App\Entity\Manga m'; // Error thrown: // [Semantical Error] line 0, col 18 near 'COALESCE(': Error: // 'COALESCE' is not defined as a DQL function.
Doctrine’s DQL supports a limited set of aggregate and scalar functions. COALESCE on a subquery result isn’t one of them in the version I was on — it either rejected the syntax outright or silently mishandled the null.
The working solution was to split it into two queries and do the null-coalescing in PHP:
✅ Two-query PHP fallback
// Query 1: fetch all manga $mangas = $this->mangaRepository->findAllWithVolumesAndCovers(); // Query 2: fetch known vocabulary counts for this user, keyed by manga ID $knownCounts = $this->userVocabRepository ->countKnownByMangaForUser($user); // Returns: ['manga_id' => count, ...] // PHP null-coalescing instead of SQL COALESCE foreach ($mangas as $manga) { $known = $knownCounts[$manga->getId()] ?? 0; $total = $manga->getTotalVocabularyCount(); $pct = $total > 0 ? round($known / $total * 100) : 0; $manga->setKnownPercentage($pct); }
Two queries instead of 83, and no DQL parsing headaches. The slight redundancy of loading all counts upfront is trivially cheap compared to the N+1 alternative.
Density Score — How Vocabulary-Heavy Is a Page?
Commit 3ad213b:
“feat: add
uniqueVocabCount and
densityScore per page
in volume API”.
Users use this to pick
which pages to study.
One feature I’m genuinely proud of is the density score. It’s a simple metric, but it turned out to be exactly what users wanted: a way to quickly identify which pages are good study material versus action pages dominated by sound effects.
The formula:
// densityScore: how "vocabulary dense" is this page? $densityScore = $uniqueWordCount / max($totalWordCount, 1) * $depthWeight; // depthWeight: discount pages with very few words (splash pages, etc.) $depthWeight = min($totalWordCount / 10.0, 1.0); // Result: 0.0 (pure repetition / sound effects) // to: 1.0 (every word is unique and the page has many words)
Try it below — drag the sliders to explore how the score reacts to different page compositions:
🧮 Density Score Calculator
Action pages — panels filled with ドカーン! and ガガガ! — score near zero. Dialogue-heavy pages, especially those with a variety of N3/N2 vocabulary, score near 1.0. Users can sort pages by density score and build targeted study sessions.
Volume Read Progress Tracking
Commit 363ecb9:
“feat: volume read
progress tracking
(unread/reading/
completed)”.
Simple state machine,
big UX win.
Tracking read progress feels obvious in retrospect, but it wasn’t in the original spec. A user asked: “can I mark which volumes I’ve already read so I don’t re-study vocabulary I already know from that volume?”
The data model is straightforward — a join table with a status enum:
// VolumeProgress.php entity #[ORM\Entity] class VolumeProgress { #[ORM\ManyToOne(targetEntity: User::class)] private User $user; #[ORM\ManyToOne(targetEntity: Volume::class)] private Volume $volume; #[ORM\Column(type: 'string', enumType: ReadStatus::class)] private ReadStatus $status = ReadStatus::UNREAD; } enum ReadStatus: string { case UNREAD = 'unread'; case READING = 'reading'; case COMPLETED = 'completed'; }
The UI cycles through the three states on click. Try it:
📚 Volume Progress Shelf
Click any volume to cycle: unread → reading → completed
On the backend, the API accepts a PATCH /api/volumes/{id}/progress with {"status": "reading"}. The status feeds back into the vocabulary learning system: completed volumes’ words get lower priority in flashcard queues.
The Common Vocabulary Endpoint
Commit b936b14:
“feat: add GET
/api/vocabulary/
common endpoint”.
The most useful
endpoint in the
whole API.
GET /api/vocabulary/common returns words that appear most frequently across the user’s manga library — filtered to exclude words the user already knows. It’s the “what should I learn next” endpoint.
The DQL is a subquery that ranks by occurrence count:
// VocabularyRepository.php public function findCommonUnknown(User $user, int $limit = 20): array { return $this->createQueryBuilder('v') ->select('v', 'COUNT(pv.id) AS HIDDEN occurrences') ->join('v.pageVocabularies', 'pv') // exclude words the user already knows ->leftJoin( 'App\Entity\UserVocabulary', 'uv', 'WITH', 'uv.vocabulary = v AND uv.user = :user' ) ->where('uv.id IS NULL') ->setParameter('user', $user) ->groupBy('v.id') ->orderBy('occurrences', 'DESC') ->setMaxResults($limit) ->getQuery() ->getResult(); }
Note the HIDDEN keyword in the select clause — this is a DQL trick that lets you use the aliased aggregate in orderBy without Doctrine trying to hydrate it as an entity field.
The frequency histogram below shows sample output. Toggle words you know to remove them from the queue:
📊 Common Vocabulary — Frequency Chart
Check words you know — they'll be removed from the learning queue.
Lessons Learned
Flip each card to
reveal the lesson
behind the headline.
👆 Click any card to flip it
What the Numbers Looked Like
To put the improvement in perspective:
| Metric | Before | After |
|---|---|---|
Queries per /api/manga |
83 | 3 |
| Response time (avg) | 4 800 ms | 185 ms |
ClearDB max_questions / hour used |
~3 600 (full quota) | ~210 |
Time to MySQL server has gone away |
~15 min of usage | never |
The indexes on volume.manga_id, page.volume_id, and user_vocabulary.user_id were added as part of the same sprint (741fdfb). Without them, even the JOINed queries would have been doing full table scans.
#[ORM\Index] on the entity or via a migration. It's boilerplate, but it's the difference between a 3ms lookup and a 300ms scan on a table with 50k rows.