Fixing false 404 headers on external pages including wp-blog-header.php

2010-06-20 14:06

I recently noticed, thanks to my broken link checker, that all my normal site pages were returning HTTP/1.1 404 Not Found even though they load just fine in the browsers I test. Some older browsers, however, will just see the 404 error, and crawlers like googlebot will fail to index that content.

I first suspected WP-Super-Cache, because the pages returned HTTP 200 when served from cache. However, nonexistent pages exhibited the same behavior:

A real page served uncached:

curl -I http://cooltrainer.org/about/
  1. HTTP/1.1 404 Not Found
  2. Date: Sun, 20 Jun 2010 05:02:22 GMT
  3. Server: Apache
  4. X-Powered-By: PHP/5.3.1
  5. Vary: Cookie,Accept-Encoding
  6. X-Pingback: http://cooltrainer.org/xmlrpc.php
  7. Expires: Wed, 11 Jan 1984 05:00:00 GMT
  8. Cache-Control: no-cache, must-revalidate, max-age=0
  9. Pragma: no-cache
  10. Last-Modified: Sun, 20 Jun 2010 05:02:24 GMT
  11. Content-Type: text/html; charset=UTF-8

…and then cached:

curl -I http://cooltrainer.org/collectionviewer/
  1. HTTP/1.1 200 OK
  2. Date: Sun, 20 Jun 2010 05:02:27 GMT
  3. Server: Apache
  4. X-Powered-By: PHP/5.3.1
  5. Vary: Accept-Encoding,Cookie
  6. Cache-Control: max-age=300, must-revalidate
  7. WP-Cache: Served supercache file from PHP
  8. Content-Type: text/html; charset=UTF-8

A fake page served uncached:

curl -I http://cooltrainer.org/somepagethatdoesntexist/
  1. HTTP/1.1 404 Not Found
  2. Date: Sun, 20 Jun 2010 05:12:59 GMT
  3. Server: Apache
  4. X-Powered-By: PHP/5.3.1
  5. Vary: Cookie,Accept-Encoding
  6. X-Pingback: http://cooltrainer.org/xmlrpc.php
  7. Expires: Wed, 11 Jan 1984 05:00:00 GMT
  8. Cache-Control: no-cache, must-revalidate, max-age=0
  9. Pragma: no-cache
  10. Last-Modified: Sun, 20 Jun 2010 05:13:00 GMT
  11. Content-Type: text/html; charset=UTF-8

…and then cached:

curl -I http://cooltrainer.org/somepagethatdoesntexist/
  1. HTTP/1.1 200 OK
  2. Date: Sun, 20 Jun 2010 05:13:01 GMT
  3. Server: Apache
  4. X-Powered-By: PHP/5.3.1
  5. Vary: Accept-Encoding,Cookie
  6. Cache-Control: max-age=300, must-revalidate
  7. WP-Cache: Served supercache file from PHP
  8. Content-Type: text/html; charset=UTF-8

The problem persisted with caching disabled, and I discovered why after some digging. Wordpress runs on the root of my domain, but normal pages powered by the old Cooltrainer CMS also include the Wordpress core to gain access to new posts (for the front page) and to generate the menus for Archives and Categories. However, since wordpress doesn’t know about them it treats them as nonexistent, sending the 404 header.

The fix is simpler than you may imagine. Instead of using wp-blog-header directly, like so:

  1. require_once("diary/wp-blog-header.php");

…we deconstruct it into its component functions, excluding the bad ones.

Wordpress is instantiated by calling the WP class. I use the following:

  1. require_once("diary/wp-config.php");
  2. $wp->init(); $wp->parse_request(); $wp->query_posts();
  3. $wp->register_globals(); $wp->send_headers();

From the documentation:

  • init() — set up the current user.
  • parse_request() — Parse request to find correct WordPress query.
  • query_posts() — Set up the Loop based on the query variables.
  • register_globals() — Set up the WordPress Globals.
  • send_headers() — Sets the X-Pingback header, 404 status (if 404), Content-type. If showing a feed, it will also send last-modified, etag, and 304 status if needed.

And obviously, we’re leaving out handle_404().

  • handle_404() — Issue a 404 if a request doesn’t match any posts and doesn’t match any object (e.g. an existing-but-empty category, tag, author) and a 404 was not already issued, and if the request was not a search or the homepage. Otherwise, issue a 200.

Now, our pages work as expected.

Real, uncached:

curl -I http://cooltrainer.org/collectionviewer/
  1. HTTP/1.1 200 OK
  2. Date: Sun, 20 Jun 2010 05:41:24 GMT
  3. Server: Apache
  4. X-Powered-By: PHP/5.3.1
  5. Vary: Cookie,Accept-Encoding
  6. X-Pingback: http://cooltrainer.org/xmlrpc.php
  7. Content-Type: text/html; charset=UTF-8

Real, cached:

curl -I http://cooltrainer.org/collectionviewer/
  1. HTTP/1.1 200 OK
  2. Date: Sun, 20 Jun 2010 05:41:26 GMT
  3. Server: Apache
  4. X-Powered-By: PHP/5.3.1
  5. Vary: Accept-Encoding,Cookie
  6. Cache-Control: max-age=300, must-revalidate
  7. WP-Cache: Served supercache file from PHP
  8. Content-Type: text/html; charset=UTF-8

Fake, uncached:

curl -I http://cooltrainer.org/somepagethatdoesntexist/
  1. HTTP/1.1 404 Not Found
  2. Date: Sun, 20 Jun 2010 05:41:15 GMT
  3. Server: Apache
  4. X-Powered-By: PHP/5.3.1
  5. Vary: Cookie,Accept-Encoding
  6. X-Pingback: http://cooltrainer.org/xmlrpc.php
  7. Content-Type: text/html; charset=UTF-8

Fake, cached:

curl -I http://cooltrainer.org/somepagethatdoesntexist/
  1. HTTP/1.1 404 Not Found
  2. Date: Sun, 20 Jun 2010 05:41:18 GMT
  3. Server: Apache
  4. X-Powered-By: PHP/5.3.1
  5. Vary: Accept-Encoding,Cookie
  6. Cache-Control: max-age=300, must-revalidate
  7. WP-Cache: Served supercache file from PHP
  8. Content-Type: text/html; charset=UTF-8