Alex Ng - BlogThe time I brought down production
How a quest for better SEO caused me great, completely unrelated pain
Background
This website is built with NextJS, hosted on Vercel and uses Cloudflare as its DNS (Domain Name Server). So the abbreviated fetch trace looks like this:
- Initial website fetch request
- Cloudflare understands and forwards the request to Vercel
- Vercel understands and forwards the request to the NextJS server
Preface
SEO (Search Engine Optimization) is crucial for a website as it boosts findability. SEO can make or break a website; non-existent SEO will make the website difficult to find - more so than finding a needle in a haystack, and poor SEO will reach the wrong audience and render the site useless.
In short, SEO is essential for this website, and thus, I embarked on improving the website's SEO.
The Incident
11:46 pm:
While I was improving this website's SEO, I came across some SEO-checking websites online. From the generated reports, I discovered that some internal links were not redirecting correctly. Primarily, internal link redirects https://ngjx.org
are rewritten to https://www.ngjx.org
, causing a mismatch in host path.
Okay, no big deal, I just have to update the Cloudflare redirect rules to redirect all www.ngjx.org
traffic to ngjx.org
.
11:55 pm: The site goes down.
The investigation
00:14 am: I realize the website is down and start investigating.
The site is unreachable, all requests are timing out. What is going on? The vercel deployment is still online and the preview builds of production are still accessible. However, something is amiss. The logs do not show any request timing out. Perhaps it is an issue with the transport portion of the OSI model?
Checking the network logs in my browser reveal the issue - an infinite loop! When visiting the website, users are redirected to ngjx.org
then www.ngjx.org
, over and over before timing out shortly after. Why is this happening?
In an epic blunder, I still had Vercel redirecting all ngjx.org
traffic to www.ngjx.org
, thus causing an infinite loop of redirection mayhem.
What a predicament!
00:24 am: Production is rolled back.
00:25 am: The site is back up.
Takeaways
This 30-minute outage was avoidable. I should have reconfirmed routing rules before pushing to production.
Background
This website is built with NextJS, hosted on Vercel and uses Cloudflare as its DNS (Domain Name Server). So the abbreviated fetch trace looks like this:
- Initial website fetch request
- Cloudflare understands and forwards the request to Vercel
- Vercel understands and forwards the request to the NextJS server
Preface
SEO (Search Engine Optimization) is crucial for a website as it boosts findability. SEO can make or break a website; non-existent SEO will make the website difficult to find - more so than finding a needle in a haystack, and poor SEO will reach the wrong audience and render the site useless.
In short, SEO is essential for this website, and thus, I embarked on improving the website's SEO.
The Incident
11:46 pm:
While I was improving this website's SEO, I came across some SEO-checking websites online. From the generated reports, I discovered that some internal links were not redirecting correctly. Primarily, internal link redirects https://ngjx.org
are rewritten to https://www.ngjx.org
, causing a mismatch in host path.
Okay, no big deal, I just have to update the Cloudflare redirect rules to redirect all www.ngjx.org
traffic to ngjx.org
.
11:55 pm: The site goes down.
The investigation
00:14 am: I realize the website is down and start investigating.
The site is unreachable, all requests are timing out. What is going on? The vercel deployment is still online and the preview builds of production are still accessible. However, something is amiss. The logs do not show any request timing out. Perhaps it is an issue with the transport portion of the OSI model?
Checking the network logs in my browser reveal the issue - an infinite loop! When visiting the website, users are redirected to ngjx.org
then www.ngjx.org
, over and over before timing out shortly after. Why is this happening?
In an epic blunder, I still had Vercel redirecting all ngjx.org
traffic to www.ngjx.org
, thus causing an infinite loop of redirection mayhem.
What a predicament!
00:24 am: Production is rolled back.
00:25 am: The site is back up.
Takeaways
This 30-minute outage was avoidable. I should have reconfirmed routing rules before pushing to production.