Drupal 6 patterns: static caching

18th Jun 2010 // By Chris // Drupal Planet

Drupal 6 is database-heavy, so it is our duty as responsible developers to reduce, as far as possible, the amount of load on the database. This means preventing two identical queries running during the same page request, and we can achieve this through re-use of a code pattern used extensively in core.

This article's topic has probably been covered elsewhere many times, as the pattern is not exclusive to Drupal, and is used quite heavily throughout PHP (and possibly other languages), but it's important enough to reiterate here.

The basic premise is this: you have some database heavy lifting to do. Perhaps you have a complicated query that involves a bunch of joins and is not quick, but must be run. You write a function that can perform this query for you:

  1. function mymodule_get_foo() {
  2.   // This is not a complicated query, but yours might be much more significant.
  3.   return db_result(db_query("SELECT foo FROM {mymodule_foos} ORDER BY created DESC LIMIT 1"));
  4. }

Perhaps initially, you call that function just once in your code. But then you add more code, and you find that you're calling that function many times. Each time the function runs, the database query runs, causing an unnecessary delay: you knew the result the first time around, and nothing has changed, so the result will be the same, whether you repeat the query or not.

  1. function mymodule_do_fooing() {
  2.   $foos = array();
  3.  
  4.   // We run this 1000 times, so we run 1000 expensive database queries, yet the result
  5.   // will be the same each time.
  6.   for ($i = 0; $i < 1000; $i++) {
  7.     $foos[] = mymodule_get_foo();
  8.   }
  9. }

The solution is to use the static keyword in conjunction with a variable in order to 'save a copy' of the variable within the scope of that function, so that next time the function runs in the same page request, the variable is still there.

  1. function mymodule_get_foo() {
  2.   // "Remember" the value of the variable from last time we set it.
  3.   static $foo;
  4.  
  5.   // If it's never been set before, we need to set it for the first and only time.
  6.   if (!isset($foo)) {
  7.     // This is not a complicated query, but yours might be much more significant.
  8.     $foo = db_result(db_query("SELECT foo FROM {mymodule_foos} ORDER BY created DESC LIMIT 1"));
  9.   }
  10.  
  11.   // Return either the "remembered" value or the one we just got from the database.
  12.   return $foo;
  13. }

This is only useful if the function runs multiple times in the same page request. If one user requests a page, and the variable becomes statically cached like this, then a second user requests the same page later, the statically cached variable is no longer available.

I made a big assumption earlier on when I said that we know nothing has changed in the database between the first and second times that we wanted the data. You must make sure this is the case, or you'll cause yourself a huge headache when you can't figure out why your database queries aren't working (they are, but because you've cached the result, it looks like they aren't). An example is when you load a node, change the node, then load it again.

  1. function mymodule_nodefiddler() {
  2.   // Get the node for the first time. Inside node_load() it becomes statically cached.
  3.   $node = node_load(13);
  4.  
  5.   // Manually update node 13.
  6.     "UPDATE {node} SET title = '%s' WHERE nid = %d",
  7.     'I am fiddling with you',
  8.     13
  9.   );
  10.  
  11.   // Because we've already loaded node 13, the statically cached version is used
  12.   // instead of going back to the database for a fresh copy! Our node object now
  13.   // has the wrong details!
  14.   $node = node_load(13);
  15. }

In this situation, it'd be tempting to remove the static caching altogether, but suppose we wanted to load the node 12 times, then change it, and load it once more? We can add a new parameter to the static caching function that gives us the option of performing a 'reset', clearing the cache and getting fresh results, like this:

  1. function mymodule_get_foo($reset = FALSE) {
  2.   // "Remember" the value of the variable from last time we set it.
  3.   static $foo;
  4.  
  5.   // If it's never been set before, or if we've been asked to reset,
  6.   // we need to run the database query.
  7.   if (!isset($foo) || $reset) {
  8.     // This is not a complicated query, but yours might be much more significant.
  9.     $foo = db_result(db_query("SELECT foo FROM {mymodule_foos} ORDER BY created DESC LIMIT 1"));
  10.   }
  11.  
  12.   // Return either the "remembered" value or the one we just got from the database.
  13.   return $foo;
  14. }

It's worth considering this static caching approach, both with and without resets, depending on what you use it for, while developing your own modules, and even some themes. It is a solid code pattern that helps you do your bit to reduce page execution time and system strain, and when used properly, is a very powerful tool.

About The Author

Chris's Profile Picture
Chris

Chris Cohen is a seasoned web developer who began in Perl, before moving on to Java, C# and currently PHP and Drupal. He regularly attends DrupalCon and is "Certified to Rock" (certifiedtorock.com).

5

Comments

dalin's picture

"Drupal 6 is database-heavy"

I disagree with this statement. There may be certain contributed modules that can be described this way. But the majority of contrib, and Core itself have their load weighted to PHP. And this is by design. It's much easier to scale a web server than a DB server.

But with that said, Drupal 7 does bring some nice improvements in DB performance and scalability.

Chris's picture
It's interesting you say that, because I watched webchick at Drupalcon Paris in 2009, and she pointed out that the number and complexity of queries had gone up in Drupal 7 core, compared with Drupal 6, making it perform worse out of the box. At the time this was a fairly early build of D7 but the code was frozen very shortly after, so no new features would have been introduced. This means that unless the performance degradation was attributable to bugs, it will still perform worse. I also agree with what you say about contributed modules. It's impossible to generalise, because modules differ vastly in the quality of their code, but I have personally seen some that perform awfully with plenty of unnecessary database interaction. I've seen plenty of cases where this static caching approach could have been used!
Larry Garfield's picture

The complexity of DB queries has gone up because we're now doing multi-load operations, but that reduces overall number. On the other hand, Fields in Core is a fairly inefficient database model (by design, because it's much simpler) so that results in more queries.

I suspect the overall performance drag in Drupal 7 is due more to the revisions in the theme and rendering layer, which got dramatically more complex. I still need to do some profiling though before we can verify that.

catch's picture

Fields in core shouldn't result in a higher total number of queries - field_attach_load() is cached, so it's only a single cache_get() for most requests. In node listings it'll also be less queries overall due to multiple load (this goes for both cache miss and cache hit scenarios).

Where it will be a performance hit is the extra joins required compared to the D6 storage model to generate listing pages. That's only going to be resolvable by using something like David Strauss' materialized views to denormalize, or mongodb field storage. I'm not sure how much worse this is than D6 to be honest - can't think of many queries involving CCK which were actually indexed on D6 sites I built, and mongo field storage is a great solution to this overall problem - just not in widespread use yet, but no more out of reach than materialized views for sites that actually need it.

Overall performance hit in D7 is pretty much 100% in PHP, not all the blame can be put on rendering and theme layer - part of the reason it looks like that is because operations which previously happened in page callbacks now happen in the rendering process (returning an array is less work than returning a string). A fair bit is down to fields in core especially on more realistic sites than you get with the default profile.

Nikos Stylianou's picture

Thanks for the handy tip.

Post new comment

The content of this field is kept private and will not be shown publicly.

New job vacancies button

Categories

Search