2010-06-20

Fixing false 404 headers on external pages including wp-blog-header.php

Filed under: News,Tech — Nicole Reid @ 14:14

I recently noticed, thanks to my broken link checker, that all my normal site pages were returning HTTP/1.1 404 Not Found even though they load just fine in the browsers I test. Some older browsers, however, will just see the 404 error, and crawlers like googlebot will fail to index that content.

I first suspected WP-Super-Cache, because the pages returned HTTP 200 when served from cache. However, nonexistent pages exhibited the same behaviour:

A real page served uncached:

[nicole@Emi#nicole]curl -I http://cooltrainer.org/collectionviewer/
HTTP/1.1 404 Not Found
Date: Sun, 20 Jun 2010 05:02:22 GMT
Server: Apache
X-Powered-By: PHP/5.3.1
Vary: Cookie,Accept-Encoding
X-Pingback: http://cooltrainer.org/xmlrpc.php
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Cache-Control: no-cache, must-revalidate, max-age=0
Pragma: no-cache
Last-Modified: Sun, 20 Jun 2010 05:02:24 GMT
Content-Type: text/html; charset=UTF-8

…and then cached:

[nicole@Emi#nicole]curl -I http://cooltrainer.org/collectionviewer/
HTTP/1.1 200 OK
Date: Sun, 20 Jun 2010 05:02:27 GMT
Server: Apache
X-Powered-By: PHP/5.3.1
Vary: Accept-Encoding,Cookie
Cache-Control: max-age=300, must-revalidate
WP-Cache: Served supercache file from PHP
Content-Type: text/html; charset=UTF-8

A fake page served uncached:

[nicole@Emi#nicole]curl -I http://cooltrainer.org/somepagethatdoesntexist/
HTTP/1.1 404 Not Found
Date: Sun, 20 Jun 2010 05:12:59 GMT
Server: Apache
X-Powered-By: PHP/5.3.1
Vary: Cookie,Accept-Encoding
X-Pingback: http://cooltrainer.org/xmlrpc.php
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Cache-Control: no-cache, must-revalidate, max-age=0
Pragma: no-cache
Last-Modified: Sun, 20 Jun 2010 05:13:00 GMT
Content-Type: text/html; charset=UTF-8

…and then cached:

[nicole@Emi#nicole]curl -I http://cooltrainer.org/somepagethatdoesntexist/
HTTP/1.1 200 OK
Date: Sun, 20 Jun 2010 05:13:01 GMT
Server: Apache
X-Powered-By: PHP/5.3.1
Vary: Accept-Encoding,Cookie
Cache-Control: max-age=300, must-revalidate
WP-Cache: Served supercache file from PHP
Content-Type: text/html; charset=UTF-8

The problem persisted with caching disabled, and I discovered why after some digging. WordPress runs / on my domain, but normal pages powered by the old Cooltrainer CMS also include the WordPress core to gain access to new posts (for the front page) and to generate the menus for Archives and Categories. However, since wordpress doesn’t know about them it treats them as nonexistent, sending the 404 header.

The fix is simpler than you may imagine. Instead of using wp-blog-header directly, like so:

require_once("diary/wp-blog-header.php");

…we deconstruct it into its component functions, excluding the bad ones.

WordPress is instantiated by calling the WP class. I use the following:

require_once("diary/wp-config.php");
$wp->init();
$wp->parse_request();
$wp->query_posts();
$wp->register_globals();
$wp->send_headers();

From the documentation:

  • init() — set up the current user.
  • parse_request() — Parse request to find correct WordPress query.
  • query_posts() — Set up the Loop based on the query variables.
  • register_globals() — Set up the WordPress Globals.
  • send_headers() — Sets the X-Pingback header, 404 status (if 404), Content-type. If showing a feed, it will also send last-modified, etag, and 304 status if needed.

And obviously, we’re leaving out handle_404().

  • handle_404() — Issue a 404 if a request doesn’t match any posts and doesn’t match any object (e.g. an existing-but-empty category, tag, author) and a 404 was not already issued, and if the request was not a search or the homepage. Otherwise, issue a 200.

Now, our pages work as expected.

Real, uncached:

[nicole@Emi#nicole]curl -I http://cooltrainer.org/collectionviewer/
HTTP/1.1 200 OK
Date: Sun, 20 Jun 2010 05:41:24 GMT
Server: Apache
X-Powered-By: PHP/5.3.1
Vary: Cookie,Accept-Encoding
X-Pingback: http://cooltrainer.org/xmlrpc.php
Content-Type: text/html; charset=UTF-8

Real, cached:

[nicole@Emi#nicole]curl -I http://cooltrainer.org/collectionviewer/
HTTP/1.1 200 OK
Date: Sun, 20 Jun 2010 05:41:26 GMT
Server: Apache
X-Powered-By: PHP/5.3.1
Vary: Accept-Encoding,Cookie
Cache-Control: max-age=300, must-revalidate
WP-Cache: Served supercache file from PHP
Content-Type: text/html; charset=UTF-8

Fake, uncached:

[nicole@Emi#nicole]curl -I http://cooltrainer.org/somepagethatdoesntexist/
HTTP/1.1 404 Not Found
Date: Sun, 20 Jun 2010 05:41:15 GMT
Server: Apache
X-Powered-By: PHP/5.3.1
Vary: Cookie,Accept-Encoding
X-Pingback: http://cooltrainer.org/xmlrpc.php
Content-Type: text/html; charset=UTF-8

Fake, cached:

[nicole@Emi#nicole]curl -I http://cooltrainer.org/somepagethatdoesntexist/
HTTP/1.1 404 Not Found
Date: Sun, 20 Jun 2010 05:41:18 GMT
Server: Apache
X-Powered-By: PHP/5.3.1
Vary: Accept-Encoding,Cookie
Cache-Control: max-age=300, must-revalidate
WP-Cache: Served supercache file from PHP
Content-Type: text/html; charset=UTF-8

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment