Taming wp_list_pages

Ever since pages got added to WordPress in version 1.5, wp_list_pages has been the way to get a list of pages. In the pre-widgets era, it used to be coded into pretty much every theme’s sidebar or header to generate the site’s menu.

Nowadays wp_list_pages is at the heart of the Pages widget, and it’s perfect for blog-type sites with a modest number of pages. But if you’re using WordPress as a CMS, and most of your content is written as pages, using wp_list_pages can cause serious performance issues.

The problem

I’m currently working with a client who’s migrating large numbers of pages into a multisite WordPress setup. One of the sites has close to 2000 pages. We were using CSS to selectively hide non-relevant parts of the pages menu, but any page where (our modified version of) the Pages widget was active would render really slowly. It didn’t help that my client still uses Internet Explorer 7, but the main problem is that a nested list of 2000 pages is simply a whole lot of HTML.

Using arguments

There are a number of function arguments that can help “trim” wp_list_pages’s output. The most important one is probably “depth”, which limits the level of nesting, and simply does not show pages that are nested beyond the specified level. This is pretty impractical for site menus. Another option is to use “child_of” to show only the pages nested under a certain pages. With a bit of tinkering, this would allow you to contextually show the current page’s children, but that’s still not what you want for a site menu.

Using CSS

The commonly used solution is to use CSS. wp_list_pages adds a clever set of CSS classes, that identify the currently selected page and its ancestors. It’s easy to set up a bunch of CSS rules that hide all other child pages, so that you end up with a clever folding menu. There is a problem with this however. It’s not very bandwidth-efficient to send tons of HTML to a browser and then tell it to hide most of it. In my case, with 2000 pages and IE7, things would all but grind to a halt. It took tens of seconds for the browser to go through all the items and hide them. Hover effects used in the list would take seconds to update. Essentially, things were broken.

A stopgap solution

I decided to try filtering out unneeded HTML elements from wp_list_pages’s output before sending it to the browser. This turned out to be surprisingly easy. It doesn’t make sense for me to post the whole widget, because parts of it are very client-specific, but essentailly I added a function called “filterHtml” and ran the page list through it. This is part of the ‘widget’ method, which renders the widget’s output.

$out = wp_list_pages( $args );
$out = $this->filterHtml( $out );

filterHtml

The function itself uses DOMDocument, and its logic is suprisingly simple. It loops through all the list items in the HTML, and checks whether each has a parent list item element. If this is true, it checks for the CSS classes that WordPress adds to identify currently selected items and their parents. If any of these classes is present, and/or there is no parent (root element), the element needs to be kept. If not, the element is flagged as ‘tobedeleted’.

/*
 * Function that filters out parts of the page tree that should be hidden anyway. This helps
 * performance on slow browsers when there are a lot of pages in the site.
 */
function filterHtml( $html ){
	
	// create a DOM Document object for the page list's HTML
	$dom = new DOMDocument();
	$dom->loadHTML( $html );
	
	// define the classes that we'll look for on order to determin visibility
	$parentClasses = array( 'current_page_ancestor', 'current_page_parent', 'current_page_item' );
	
	// find all list items (the li's contain the class attribute)
	$pagelinks = $dom->getElementsByTagName('li');
	
	// loop through the list items containing the a tags
	foreach( $pagelinks as $pagelink ){
	
		// get the parent li item
		$parentli = $this->findParentNode( $pagelink, 'li' );
		
		// if parent li found, look into it, if not, this is a root node, keep it.
		if( $parentli ){

			// compare classes with our parentClasses array to see if these nodes need to be visible				
			$classes = $parentli->getAttribute('class');
			$keep = false;
			foreach( $parentClasses as $parentClass ){
				$pos = strpos( $classes, $parentClass );
				if( $pos !== false ){
					$keep = true;
				}
			}
			
		} else {
			$keep = true;
		}
					
		// if keep not true, schedule for deletion by adding an attribute to the parent ul element
		if( !$keep ){
			$parentul = $this->findParentNode( $pagelink, 'ul' );
			$parentul->setAttribute( 'tobedeleted', 'true' );
		}
	}
	
	// remove the marked elements by recursively going through the entire DOM
	$this->recursiveDelete( $dom );

	// return the processed content, but first remove unwanted extra
	// added html doctype and tags by braindead php
	// Thanks: http://nl.php.net/manual/en/domdocument.savehtml.php#85165
	return preg_replace('/^<!DOCTYPE.+?>/', '', str_replace( array('<html>', '</html>', '<body>', '</body>'), array('', '', '', ''), $dom->saveHTML()));

}

recursiveDelete

Towards the end of the function, the recursiveDelete method is called, which removes all elements that are flagged. I tried doing this inside filterHtml’s loop, but ran into problems with child nodes of already deleted parents. It turned out to be safer to first flag nodes and then delete them in one sweep.

/*
 * Recursive function to delete nodes from the DOM
 */
function recursiveDelete( $node ){
	// if there are child elements, find them and loop though them
	if( $node->hasChildNodes() ){
		$children = $node->childNodes;
		// fire this function for all child elements
		foreach( $children as $child ){
			$this->recursiveDelete( $child );
		}
	}
	// if node is of the correct type and marked for deletion, make it so
	if( get_class($node) == 'DOMElement' ){
		if( $node->hasAttribute('tobedeleted') ){
			$node->parentNode->removeChild($node);
		}
	}
}

findParentNode

There’s one more ‘helper’ function called ‘findParentNode’ that does exactly what it’s name suggests. It helps our filter function identify the parent element of the current page link.

/*
 * Function to find the closest node of a certain type, searching upward in the document tree
 */
function findParentNode( $n, $nodetype ){
	if( $n && get_class($n) == 'DOMElement' ){
		$node = $n->parentNode;
		while( $node->nodeName != $nodetype && $node != null ){
			$node = $node->parentNode;
		}
		return $node;
	} else {
		return null;
	}
}

These function were all written to be part of a class, so if you’re planning to use them as part of functions.php or in some other context that’s not a class, please prefix the function names to make them unique.

Results

I ran some tests this week, and I’m quite happy with the results. In the worst case scenario, it stripped a whopping 717 kilobytes of HTML from the page (756 KB to 39 KB). This equals to 99% of the HTML generated by wp_list_pages, and it only took around 0.20 seconds to process. IE7 came back to life for my client, and it had absolutely no impact on what the end user was seeing. If you’re using page caching, the 0.2 seconds probably aren’t going to be a problem, since this gets done only when the cache is updated. Most users will get a cached version. But even without caching, the reduction in HTML might outweigh the added processing time.

Better solution needed

While the code I posted here works, it’s not a very elegant solution. Generating a long list and then shortening it makes little sense. Doing it server-side helps browsers cope and can seriously reduce page sizes. But ideally, this should be done by wp_list_pages itself, or perhaps by a new template function. WordPress already adds CSS classes to the right elements, so in theory it should be easy to (optionally) have wp_list_pages return only those. Something like this perhaps?

$out = wp_list_pages( 'filter_selected=true' );

Please note that I’m not saying wp_list_pages is broken. It does exactly what is was designed to do. It lists all the pages. Its job was never to generate folding, context-dependent menus. But people are using it to do just that, because there’s no viable alternative. And with large volumes of content, wp_list_pages in its current form is not a perfect solution.

(Image by David Lee King, released under Creative Commons. Thanks.)

EDIT: Marko Heijnen (fellow Dutchman and WordPress core contributor) has created a Walker solution that aims to do the same thing, but in a more WordPress-like way. I can’t currently get it to do exactly what I need, but there’s definite potential. Seems to work flawlessly in my test setup.

3 Comments

  1. Not sure if this is a solution for you, but I have recently implemented a solution for a client which uses a meta key value to filter pages from the wp_list_pages function: http://vanderwijk.nl/wordpress/paginas-verbergen-in-het-wp_list_pages-navigatie-menu/

    Comment by Johan — October 24, 2012 @ 3:24 pm

    • Thanks Johan. I checked your solution, and I actually considered your approach. In my case however, I wanted to avoid editors having to manually select pages to show, and I needed the whole pages structure to be available through the menu. So the weeding out needed to happen, selectively, based on the page the user is currently viewing.

      Comment by Roy — October 30, 2012 @ 3:56 pm

  2. Exactly what I was looking for. Thanks very much. I believe I’m going to try Marko’s walker solution first and see how it goes. I’ll report back if I run into any problems.

    Comment by Nathan — December 8, 2012 @ 8:24 pm