GDC: Game Optimization through Experimentation

How do we know if a game design is any good?  How do we know if a particular feature makes a game better, worse, or has no effect?

Until fairly recently A/B testing was a practice most of us associated with market research teams as they adjusted features or advertisements so as to increase a company’s return on investment.  However, in the last few years, particularly with the rise of social games that live and die with their ARPU, the idea of empirically testing how particular game features influence metrics of “success” has had a growing relevance for designers as well.  Of course, traditionalists might prefer to rely on intuition when designing their games – indeed, an ongoing discussion I’ve witnessed both at GDC and at GDC Online last October has been not just how but if data should be used for informing designs.  However, for designers trying to maximize return on investment, particularly those operating on a very limited budget, user data (such as number of levels completed, amount of time spent playing, retention rates, and money spent) can offer invaluable insights into what “works” and what does not.

Unfortunately, as Erik Andersen (PhD student at the University of Washington's Center for Game Science) pointed out in his session last week, much of the A/B testing done by studios focuses on relatively small variables (such as the color of a particular button).  Even worse – at least for us academically minded folk – the results of industry A/B testing are not usually made publicly available.  This is problematic in that every party has to independently reinvent the wheel, which is rational from a competitive sales perspective, but hinders the capacity for developing shared theories of engagement.

To this end, Andersen and colleagues have brought A/B testing to the lab.  With large N sizes (101k players), higher-order manipulations (the relative presence of features common across many games) and looking across types of games (simple and complex), they are attempting to produce A/B generalizations that can contribute to an academic, shared knowledge of what improves a game.

Andersen presented findings from a series of studies that compared the relative payoff of particular features in the games Refraction, Hello Worlds (both relatively simple games) and Foldit (relatively more complex), all created as projects within the Center for Game Science:

  • Aesthetic Features:  One study examined the impact of particular aesthetics (music & sound effects, level completion animations).  Researchers found that, across games, the presence of these audio features lead to no differences in behavioral metrics of engagement (total play time, return visits, etc.), whereas the removal of the animations generally negatively affected player engagement.
  • Secondary Objectives – Normal versions of all three games include secondary objectives, such as acquiring coins or rings, in addition to the primary goal of level completion.  With these objectives removed, players tended to actually complete more levels in both of the simple games (Refraction and Hello Worlds).  Andersen noted that while this effect is desirable, it certainly is a problem if your design prevents players from accessing later content.  Additionally, the relative presence of secondary objectives was found to influence total time spent playing, with this effect being moderated by player type: players who played long durations now played longer, while those that played for shorter periods (4x as many individuals) now played even less. In the end, these differences cancelled each other out.
  • Tutorials – All 3 games contained tutorials, which can be a fairly expensive investment for developers.  Interesting, the effect of tutorials on measures of player engagement was found to be moderated by the complexity of the game: while tutorials made no difference for Refraction or Hello Worlds, players completed 75% more levels and played 29% longer when provided a tutorial for Foldit.  A follow-up compared the inclusion of “context-sensitive” tutorials (those that display needed information as and when needed) vs. no-context “manual”-style tutorials, finding that context-sensitive tutorials lead to a 40% increase in the number of levels played and a 16% increases in play duration for Foldit (and no difference for “simple” games)

Andersen was quick to note that the games examined are fairly limited in scope compared to the wide variety of games available to players, but reiterated that, for these games, the effects observed carried across games for thousands of players.  In turn, he suggested that future work would need to look at more games so as to examine the generalizability of findings.

I found these studies be interesting, in light of some of the counter-intuitive findings yielded (e.g., no effect for audio) as well some of the moderating variables identified (e.g., game complexity).  Granted, this form of “academic” A/B testing is young and needs to deal with certain issues of rigor (for example, no subjective reports were collected to corroborate the levels of engagement assumed through behavioral measures), but this work nonetheless represents a solid step towards making A/B testing a more scientifically minded enterprise.  In bringing industry questions and metrics into a lab setting, analyzing data without the bias of financial pressures, and then sharing results publicly, game design can better draw upon the strengths of iterative theory development and hypothesis testing that make up any science.

GDC: Game Optimization through Experimentation by Jim Cummings, unless otherwise expressly stated, is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

No Responses to “GDC: Game Optimization through Experimentation”

Trackbacks/Pingbacks

  1. Landing Page Optimisation – Testing With Google Website Optimiser - OxonDigital - [...] credit: Motivate Play (function(d, s, id) { var js, fjs = d.getElementsByTagName(s)[0]; if (d.getElementById(id)) [...]

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>