Fixing false 404 headers on external pages including wp-blog-header.php
I recently noticed, thanks to my broken link checker, that all my normal site pages were returning HTTP/1.1 404 Not Found even though they load just fine in the browsers I test. Some older browsers, however, will just see the 404 error, and crawlers like googlebot will fail to index that content.
I first suspected WP-Super-Cache, because the pages returned HTTP 200 when served from cache. However, nonexistent pages exhibited the same behavior:
A real page served uncached:
…and then cached:
A fake page served uncached:
…and then cached:
The problem persisted with caching disabled, and I discovered why after some digging. Wordpress runs on the root of my domain, but normal pages powered by the old Cooltrainer CMS also include the Wordpress core to gain access to new posts (for the front page) and to generate the menus for Archives and Categories. However, since wordpress doesn’t know about them it treats them as nonexistent, sending the 404 header.
The fix is simpler than you may imagine. Instead of using wp-blog-header directly, like so:
…we deconstruct it into its component functions, excluding the bad ones.
Wordpress is instantiated by calling the WP class. I use the following:
From the documentation:
init() — set up the current user.
parse_request() — Parse request to find correct WordPress query.
query_posts() — Set up the Loop based on the query variables.
register_globals() — Set up the WordPress Globals.
send_headers() — Sets the X-Pingback header, 404 status (if 404), Content-type. If showing a feed, it will also send last-modified, etag, and 304 status if needed.
And obviously, we’re leaving out handle_404().
handle_404() — Issue a 404 if a request doesn’t match any posts and doesn’t match any object (e.g. an existing-but-empty category, tag, author) and a 404 was not already issued, and if the request was not a search or the homepage. Otherwise, issue a 200.