Manga Kotoba, Part 3: Performance Hell and the N+1 Crisis

Part 1

The Idea

Part 2

Building the Core

Part 3

Performance Hell

Part 4

Heroku & S3

Part 5

Retrospective

This is part 3 of 5.
Parts 1–2 covered the
concept and the core
Symfony/Doctrine setup.
You can read those first
or jump in here.

The app worked. I’d wired up the Symfony controllers, built the Doctrine entities, seeded a handful of manga titles and volumes, and I could hit the API endpoints and get JSON back. It felt good — that fragile, optimistic good you feel before you feed the system real data.

Then I imported my actual manga library.

Forty titles. Each with multiple volumes. Each volume with pages and vocabulary entries. The manga list endpoint — the one the frontend hits first, on every page load — started responding in four to six seconds. On a page refresh. For a list endpoint.

I opened the Symfony Web Profiler toolbar. The query counter read 83.

This is the story of how I fixed that.

Anatomy of an N+1 Query

“N+1” means 1 query
to fetch N rows,
then N more queries
to fetch related data
for each row. It’s one
of the most common
ORM performance bugs.

Before I show the fix, let me show you exactly what was happening. The interactive demo below illustrates the difference between N+1 loading and a proper eager JOIN. Click Load to see the queries fire in real time.

⚡ N+1 Query Visualizer

Queries fired

0ms

Simulated time

idle

Status

🗄️

— queries will appear here —

The difference is stark. N+1 fires a query for the manga list, then a query per manga for volumes, then another per manga for cover images. With 5 titles that’s already 11 queries. With 40 real titles it was 83.

Every query is a round-trip to the database. On Heroku, your database might be in a different availability zone. Each round-trip can cost 5–20ms. Forty round-trips × 20ms = up to 800ms of pure waiting, before any computation happens.

Finding the Problem — The Symfony Profiler

The Symfony Web
Profiler toolbar is the
dark bar at the bottom
of every dev-env page.
Click the database icon
to see all queries.

The Symfony Web Profiler is indispensable. In dev mode, a slim toolbar appears at the bottom of every page (and every API response in the profiler UI). Clicking the database icon opens a full list of every Doctrine query, with execution times and the exact SQL generated.

Here’s what mine looked like after loading the manga list:

When you see a pattern like SELECT ... WHERE manga_id = ? repeated twenty or forty times with only the id value changing — that’s your N+1. Each line is a separate network round-trip to the database.

The profiler also lets you click any query to see the exact Doctrine call stack that generated it, which makes pinpointing the source trivially fast.

The `max_questions` Heroku Limit

Heroku’s free MySQL
(via ClearDB) had a
max_questions limit
of 3600 per hour.
The N+1 queries were
eating through it in
minutes.

I was running on Heroku’s free tier with ClearDB (MySQL). What I didn’t know until the app started failing in production was that ClearDB’s free plan enforces a max_questions limit — a cap on the total number of SQL statements executed per hour.

My N+1 queries were burning through that limit every time a real user loaded the app.

Error seen in production logs:
SQLSTATE[HY000]: General error: 2006 MySQL server has gone away

This cryptic message was Heroku cutting off the connection after the quota was exhausted. It took me an embarrassingly long time to connect the dots between N+1 queries and this error.

The fix had two dimensions: eliminate the N+1 queries structurally, and also reduce unnecessary queries in the import pipeline. Commits 86dcf38 and f036f4c addressed the API layer; 2ba7d0e fixed the CSV importer.

The Fix — Eager JOIN Fetching in DQL

Doctrine’s default
fetch mode is LAZY:
it defers loading
associations until
you access them.
Each access = a query.

Doctrine’s default behaviour is lazy loading. When you call $manga->getVolumes(), Doctrine fires a SELECT right there. In a loop over forty manga, that’s forty queries you never explicitly wrote — they materialise invisibly as your code iterates.

The pattern that caused the problem:

❌ Before — implicit N+1

// MangaRepository.php
public function findAllWithStats(): array
{
    // Returns Manga objects with lazy-loaded associations
    return $this->findAll();
}

// In the controller / serializer — each call below fires a query
foreach ($mangas as $manga) {
    $volumeCount = count($manga->getVolumes());  // QUERY #2, #3, #4 …
    $cover       = $manga->getCoverImage();         // QUERY #N+2, #N+3 …
}

The fix is to write an explicit DQL query that JOIN FETCHes the associations you know you’ll need:

✅ After — single eager JOIN

// MangaRepository.php
public function findAllWithVolumesAndCovers(): array
{
    return $this->createQueryBuilder('m')
        ->select('m', 'v', 'c')
        ->leftJoin('m.volumes', 'v')
        ->leftJoin('m.coverImage', 'c')
        ->getQuery()
        ->getResult();
}

// Controller: volumes and coverImage are already hydrated —
// accessing them fires ZERO additional queries
foreach ($mangas as $manga) {
    $volumeCount = count($manga->getVolumes());  // no query
    $cover       = $manga->getCoverImage();         // no query
}

After this change, the manga list endpoint dropped from 83 queries to 3. Response time: from 4–6 seconds down to under 200ms.

Rule of thumb: if you know you'll iterate over a collection and access a relationship, always JOIN FETCH it. If you only need it sometimes (e.g. a detail page vs a list page), keep separate repository methods for each use case.

The COALESCE Wall — When DQL Isn’t SQL

Commit ccae44c:
“fix: replace COALESCE
DQL (unsupported) with
two-query fallback”.
DQL is a subset of SQL
— not everything
translates directly.

After sorting the N+1 issue, I wanted to add a “known vocabulary percentage” to each manga — the proportion of words in the manga that the user had already learned. My first instinct was to reach for SQL’s COALESCE to handle the null case when a user had no vocabulary records yet.

The DQL I tried:

❌ Failing DQL with COALESCE

// This looks reasonable, but Doctrine's DQL parser rejects it
$dql = 'SELECT m,
    COALESCE(
        (SELECT COUNT(uv.id)
         FROM App\Entity\UserVocabulary uv
         WHERE uv.manga = m AND uv.user = :user),
        0
    ) AS knownCount
FROM App\Entity\Manga m';

// Error thrown:
// [Semantical Error] line 0, col 18 near 'COALESCE(': Error:
// 'COALESCE' is not defined as a DQL function.

Doctrine’s DQL supports a limited set of aggregate and scalar functions. COALESCE on a subquery result isn’t one of them in the version I was on — it either rejected the syntax outright or silently mishandled the null.

The working solution was to split it into two queries and do the null-coalescing in PHP:

✅ Two-query PHP fallback

// Query 1: fetch all manga
$mangas = $this->mangaRepository->findAllWithVolumesAndCovers();

// Query 2: fetch known vocabulary counts for this user, keyed by manga ID
$knownCounts = $this->userVocabRepository
    ->countKnownByMangaForUser($user);
// Returns: ['manga_id' => count, ...]

// PHP null-coalescing instead of SQL COALESCE
foreach ($mangas as $manga) {
    $known = $knownCounts[$manga->getId()] ?? 0;
    $total = $manga->getTotalVocabularyCount();
    $pct   = $total > 0 ? round($known / $total * 100) : 0;
    $manga->setKnownPercentage($pct);
}

Two queries instead of 83, and no DQL parsing headaches. The slight redundancy of loading all counts upfront is trivially cheap compared to the N+1 alternative.

Density Score — How Vocabulary-Heavy Is a Page?

Commit 3ad213b:
“feat: add
uniqueVocabCount and
densityScore per page
in volume API”.
Users use this to pick
which pages to study.

One feature I’m genuinely proud of is the density score. It’s a simple metric, but it turned out to be exactly what users wanted: a way to quickly identify which pages are good study material versus action pages dominated by sound effects.

The formula:

// densityScore: how "vocabulary dense" is this page?
$densityScore = $uniqueWordCount / max($totalWordCount, 1) * $depthWeight;

// depthWeight: discount pages with very few words (splash pages, etc.)
$depthWeight = min($totalWordCount / 10.0, 1.0);

// Result: 0.0 (pure repetition / sound effects)
//     to: 1.0 (every word is unique and the page has many words)

Try it below — drag the sliders to explore how the score reacts to different page compositions:

🧮 Density Score Calculator

Total words on page 20

Unique words 12

0.00

Score

Medium density

A decent mix of vocabulary. Worth studying but not exceptional.

Low (<0.3) Medium (0.3–0.7) High (>0.7)

Action pages — panels filled with ドカーン! and ガガガ! — score near zero. Dialogue-heavy pages, especially those with a variety of N3/N2 vocabulary, score near 1.0. Users can sort pages by density score and build targeted study sessions.

Volume Read Progress Tracking

Commit 363ecb9:
“feat: volume read
progress tracking
(unread/reading/
completed)”.
Simple state machine,
big UX win.

Tracking read progress feels obvious in retrospect, but it wasn’t in the original spec. A user asked: “can I mark which volumes I’ve already read so I don’t re-study vocabulary I already know from that volume?”

The data model is straightforward — a join table with a status enum:

// VolumeProgress.php entity
#[ORM\Entity]
class VolumeProgress
{
    #[ORM\ManyToOne(targetEntity: User::class)]
    private User $user;

    #[ORM\ManyToOne(targetEntity: Volume::class)]
    private Volume $volume;

    #[ORM\Column(type: 'string', enumType: ReadStatus::class)]
    private ReadStatus $status = ReadStatus::UNREAD;
}

enum ReadStatus: string
{
    case UNREAD    = 'unread';
    case READING   = 'reading';
    case COMPLETED = 'completed';
}

The UI cycles through the three states on click. Try it:

📚 Volume Progress Shelf

Click any volume to cycle: unread → reading → completed

On the backend, the API accepts a PATCH /api/volumes/{id}/progress with {"status": "reading"}. The status feeds back into the vocabulary learning system: completed volumes’ words get lower priority in flashcard queues.

The Common Vocabulary Endpoint

Commit b936b14:
“feat: add GET
/api/vocabulary/
common endpoint”.
The most useful
endpoint in the
whole API.

GET /api/vocabulary/common returns words that appear most frequently across the user’s manga library — filtered to exclude words the user already knows. It’s the “what should I learn next” endpoint.

The DQL is a subquery that ranks by occurrence count:

// VocabularyRepository.php
public function findCommonUnknown(User $user, int $limit = 20): array
{
    return $this->createQueryBuilder('v')
        ->select('v', 'COUNT(pv.id) AS HIDDEN occurrences')
        ->join('v.pageVocabularies', 'pv')
        // exclude words the user already knows
        ->leftJoin(
            'App\Entity\UserVocabulary', 'uv',
            'WITH',
            'uv.vocabulary = v AND uv.user = :user'
        )
        ->where('uv.id IS NULL')
        ->setParameter('user', $user)
        ->groupBy('v.id')
        ->orderBy('occurrences', 'DESC')
        ->setMaxResults($limit)
        ->getQuery()
        ->getResult();
}

Note the HIDDEN keyword in the select clause — this is a DQL trick that lets you use the aliased aggregate in orderBy without Doctrine trying to hydrate it as an entity field.

The frequency histogram below shows sample output. Toggle words you know to remove them from the queue:

📊 Common Vocabulary — Frequency Chart

Check words you know — they'll be removed from the learning queue.

Lessons Learned

Flip each card to
reveal the lesson
behind the headline.

👆 Click any card to flip it

What the Numbers Looked Like

To put the improvement in perspective:

Metric	Before	After
Queries per `/api/manga`	83	3
Response time (avg)	4 800 ms	185 ms
ClearDB `max_questions` / hour used	~3 600 (full quota)	~210
Time to `MySQL server has gone away`	~15 min of usage	never

The indexes on volume.manga_id, page.volume_id, and user_vocabulary.user_id were added as part of the same sprint (741fdfb). Without them, even the JOINed queries would have been doing full table scans.

Add indexes to every foreign key column. Doctrine won't do this automatically. In Symfony you add them via #[ORM\Index] on the entity or via a migration. It's boilerplate, but it's the difference between a 3ms lookup and a 300ms scan on a table with 50k rows.

🚀

Up next — Part 4

Deploying to Heroku with RDS and S3

The ephemeral filesystem disaster, why uploaded manga pages vanished on every dyno restart, migrating from ClearDB to Amazon RDS, and the environmental variable rabbit hole.

Anatomy of an N+1 Query

⚡ N+1 Query Visualizer

Finding the Problem — The Symfony Profiler

The max_questions Heroku Limit

The Fix — Eager JOIN Fetching in DQL

The COALESCE Wall — When DQL Isn’t SQL

Density Score — How Vocabulary-Heavy Is a Page?

🧮 Density Score Calculator

Volume Read Progress Tracking

📚 Volume Progress Shelf

The Common Vocabulary Endpoint

📊 Common Vocabulary — Frequency Chart

Lessons Learned

What the Numbers Looked Like

The `max_questions` Heroku Limit