HCODX/Broken Link Checker
Crawl · 4xx / 5xx · CSV · Free

Broken Link Checker: find every dead link on a page

Pro-grade broken link checker that crawls a page recursively (depth 1–3), extracts every <a>, <img>, <script>, <link> and <iframe>, and reports the HTTP status — 200 OK, 301, 404, 500, DNS failure, timeout, or soft 404. Sitemap mode, custom URL list, User-Agent spoofing, retry-with-backoff, skip patterns, source-page tracking, CSV/JSON export — all client-side via a rotating CORS-proxy chain.

Each line is a regex tested against the full URL. Matches are skipped.
Use cases

Why run a broken link checker

SEO hygiene

Google demotes pages with high dead-link density. Audit your blog, docs, knowledge base — fix the 404s before the ranking drops.

Sitemap-driven audit

Paste your sitemap.xml (or sitemap index) and check every <loc> URL in one run — perfect for post-deploy and post-migration QA.

Hot-linked assets

Catch dead <img>, <script>, <link> CSS and <iframe> resources — not just <a> links. They quietly break your pages.

Soft-404 detection

Many affiliate / category links return 200 OK but render "page not found" — we scan the body for the tell-tale phrases and flag them.

Recursive site crawl

Depth 1–3 follows internal links from your starting page so a single click audits the whole section, not just the home page.

Bot vs browser

Spoof Googlebot, Bingbot, mobile or your own UA via the User-Agent picker. Reveals cloaking, bot-only redirects and broken UA-gated paths.

Step by step

How to check for broken links — pro workflow

1

Pick a mode

Crawl a page — paste a URL, we extract <a>, <img>, <script>, <link>, <iframe>. Sitemap URL — drop a sitemap.xml (sitemap-index files are followed automatically). Paste list — up to 500 URLs.

2

Tune depth & scope

Depth 0 only checks listed URLs. Depth 1 also checks every link on the starting page. Depth 2/3 follows internal links recursively. Pair with Scope = Internal only for a site audit.

3

Pick resource types

Untick types you don't care about. Auditing only outbound links? Untick img, script, link. Auditing visual breakage? Keep everything ticked.

4

Add skip patterns

Regex per line. Common: ^https?://(facebook|twitter|x|linkedin)\\.com/ to skip social, or \\.(zip|exe|dmg|iso)$ to skip downloads.

5

Set retries & UA

2 retries with exponential backoff handles flaky 5xx and timeouts. Pick Googlebot UA to see what Google sees — many sites serve a different page to bots.

6

Filter & export

Click any summary cell (OK / Redirect / 4xx / 5xx / Error / Soft 404) to filter. Bad rows sort to the top automatically. Export CSV or JSON — both include source page, depth, attempts, anchor text and resource type.

About

About broken links, link rot and SEO

A broken link is any URL on your page that no longer resolves to a working resource — whether it's an <a href>, an <img src>, a <script src> or a CSS <link href>. The destination might return 404 Not Found (page deleted), 500 Server Error (server down), a DNS failure (domain expired), a timeout, or a soft 404 (HTTP 200 but the body actually says "page not found"). Visitors see a dead-end; Google sees an unmaintained site and demotes it.

How this checker works

Every request goes through a rotating chain of 5 public CORS proxies (search.hcodx.com → killcors.com → cors.lol → allorigins.win → codetabs.com). If one is rate-limited or down, the next takes over. Each link gets retry-with-exponential-backoff on 5xx and timeouts (250 ms → 500 ms → 1s → 2s with jitter), so transient flakes don't show up as false positives.

In Crawl a page mode we fetch the page HTML, parse it with the browser's DOMParser, and harvest every <a>, <img>, <link>, <script>, <iframe> and <source> reference. Relative URLs are resolved against the page (respecting any <base href> tag). At depth ≥ 1 the crawler queues internal pages and follows them recursively, building a BFS frontier that's deduplicated by URL.

In Sitemap URL mode we pull the XML, follow sitemap-index entries (up to 2 levels deep), and queue every <loc> URL. Capped at 1000 URLs per run to keep your browser responsive.

In Paste list mode the crawl step is skipped — we check each URL in your list directly.

Soft 404 detection

Sites often serve a friendly "page not found" template with HTTP 200 status (instead of a real 404). Search engines penalise this because users land on a dead page they thought worked. We fetch the body of every successful 2xx HTML response and scan the first 4 KB for tell-tale strings — "page not found", "doesn't exist", <title>404</title>, "the requested URL was not found", etc. Hits get a soft 404 badge in the results.

User-Agent spoofing

Some sites cloak — they serve Googlebot one page and humans another. Others block non-browser UAs with 403. Switch the UA dropdown to Googlebot, Bingbot, iPhone Safari or any preset to see the exact response the bot sees. The UA hint is forwarded as an X-Proxy-User-Agent header so the proxy can replay it server-side.

Why retries matter

A naive checker without retries reports false 5xx and timeouts every run — most "broken" links are actually fine on the second try. We retry up to 3 times with exponential backoff + random jitter, following the same pattern AWS, Google and Microsoft recommend for resilient HTTP clients. The result column shows a ×N badge for any link that required more than one attempt.

Skip patterns

Each line in the Skip-patterns textarea is treated as a JavaScript regex tested against the full URL. Useful patterns: ^https?://(facebook|twitter|x|linkedin|instagram)\.com/ to skip social (they block bots anyway), \.(zip|exe|dmg|iso|mp4)$ to skip heavy downloads, /(login|signin|account)/ to skip gated pages.

Link rot is real

Studies repeatedly show that ~25 % of URLs on the open web disappear every five years. A blog post you publish today will have 5–10 % dead outbound links within 18 months unless someone keeps it healthy. Run this checker quarterly on your top pages.

SEO impact

Google has stated for years that links are part of how it evaluates a page — both incoming (PageRank) and outgoing (relevance signals). A page with a high ratio of dead outbound links looks unmaintained; the algorithms learn to demote it. Fixing dead links is one of the easiest wins in technical SEO.

Privacy & rate limits

The URLs you check pass through public CORS proxies and we don't log them. Per-proxy rate limits vary — that's why we rotate. For enterprise-scale crawling (10,000+ pages) use a dedicated tool like Screaming Frog (desktop), Sitebulb, or Ahrefs Site Audit.

FAQ

Broken Link Checker — frequently asked questions

A broken link checker fetches every <a href> on a page (or each URL in a list) and reports the HTTP status — 200 OK, 301 redirect, 404 not found, 500 error, DNS failure or timeout. Useful for SEO, content audits and migration QA.

Crawler mode handles ~500 links per page. Paste-list mode handles up to 500 URLs per run. Checks fan out 2–8 in parallel for speed; large lists finish in under a minute.

Some sites block bot traffic, return 403 to non-browser User-Agents, or require cookies/JS to render a real page. The check still reflects reality — if our proxy or Googlebot can't fetch it cleanly, that link is fragile from an SEO standpoint.

Yes — up to 5 hops by default. The result shows the final HTTP status and a 3xx badge if redirects were involved. Use the Redirect Checker to see the full chain.

Yes — CSV and JSON exports include every link with its status code, anchor text, source page and timing.

No — this tool checks the links on a single page (or a list you paste). For full-site crawling use a dedicated tool like Screaming Frog, Sitebulb or Ahrefs Site Audit.

Related

Related tools