Alex Ng - BlogThe time I brought down production

Updated: 4/2024~2mins

How a quest for better SEO caused me great, completely unrelated pain

Project Image


This website is built with NextJS, hosted on Vercel and uses Cloudflare as its DNS (Domain Name Server). So the abbreviated fetch trace looks like this:

  1. Initial website fetch request
  2. Cloudflare understands and forwards the request to Vercel
  3. Vercel understands and forwards the request to the NextJS server


SEO (Search Engine Optimization) is crucial for a website as it boosts findability. SEO can make or break a website; non-existent SEO will make the website difficult to find - more so than finding a needle in a haystack, and poor SEO will reach the wrong audience and render the site useless.

In short, SEO is essential for this website, and thus, I embarked on improving the website's SEO.

The Incident

11:46 pm:

While I was improving this website's SEO, I came across some SEO-checking websites online. From the generated reports, I discovered that some internal links were not redirecting correctly. Primarily, internal link redirects are rewritten to, causing a mismatch in host path.

Okay, no big deal, I just have to update the Cloudflare redirect rules to redirect all traffic to

11:55 pm: The site goes down.

The investigation

00:14 am: I realize the website is down and start investigating.

The site is unreachable, all requests are timing out. What is going on? The vercel deployment is still online and the preview builds of production are still accessible. However, something is amiss. The logs do not show any request timing out. Perhaps it is an issue with the transport portion of the OSI model?

Checking the network logs in my browser reveal the issue - an infinite loop! When visiting the website, users are redirected to then, over and over before timing out shortly after. Why is this happening?

In an epic blunder, I still had Vercel redirecting all traffic to, thus causing an infinite loop of redirection mayhem.

What a predicament!

00:24 am: Production is rolled back.

00:25 am: The site is back up.


This 30-minute outage was avoidable. I should have reconfirmed routing rules before pushing to production.