Interactions and Improvisational Testing

testing.com > Testing Craft > Techniques (Exploratory Testing) > A Survey > Improvisational Testing

Interactions and Improvisational Testing

Summary

I describe an interaction bug that would be very hard to design a test to find. You almost have to stumble across it in a flash of inspiration. I argue that there's a need for what I call here "improvisational testing" - testing that's planned to actively encourage imaginative leaps. I'm making a deliberate analogy to jazz improvisation (dangerously so - I know little about it): practiced, building on structure, but not exhaustively planned.

I think improvisational testing may be a particular style of exploratory testing, which is why I use the separate term.

Here's a security bug described to me in1980. A particular system had six character passwords, each character being chosen from a 26-character alphabet. Passwords were not encrypted, but they were stored in an inaccessible place. There was a particular command to ask the system questions like "Is 'aaaaaa' the password for marick?" This command worked in the obvious way: it checked if the first character was right, then if the second character was right, and so on through the number of characters in the proposed password.

There were 26 x 26 x 26 x 26 x 26 x 26 = 308,915,776 possible passwords. Were passwords randomly chosen, someone trying to discover one would, on average, have to make half that number of attempts. (Passwords aren't randomly chosen, of course, but that's a different issue.)

This bug is about being able to find a password in at most 156 attempts (26 + 26 + 26 + 26 + 26 + 26) - altogether more tractable.

How was this done?

This particular operating system provided virtual memory. In such a system, a program may think it's accessing much more memory than is available in RAM. The virtual memory is what the program thinks it has (for "virtual", read "imaginary"); the physical memory is what it actually has. The operating system supports this sleight of hand by dividing the virtual memory into pages of some fixed size. We can represent a program's virtual memory space like this:

The darker areas are pages of virtual memory that are actually present in physical memory. The rest of them are temporarily stored on disk. The sum total of the dark pages is the amount of physical memory available. (There are all kinds of optimizations, but that's the basic idea.)

If the program tries to use some virtual memory that isn't dark, the operating system "swaps out" one of the dark pages to disk, then reads the needed data into physical memory, essentially darkening that virtual memory page. The total number of dark pages remains constant. Here's an example:

This picture differs from the previous one in that the program has tried to use page 3. To accomodate that, the operating system has swapped out page 2 and read in page 3.

What on earth does this have to do with guessing passwords? Nothing, unless you're far more clever than I'll ever be - and unless you know about a couple of other features this operating system provided.

Virtual memory systems work best if they can somehow predict how programs are going to use memory. If they're successful, they can avoid swapping out pages that are just going to have to be swapped back in right away. They can even start reading data from disk into physical memory before the program will need it. But prediction is hard to do. Any predictive strategy will be exactly wrong for some program. So the designers of this operating system decided to give programmers control (if they wanted it). A programmer could use OS commands to:

explicitly swap pages in and out of physical memory.
ask to be notified when a particular page was swapped in or out.

This allows you to crack passwords. Here's how:

Start with a candidate "aaaaaa". Arrange for it to straddle a page boundary (easy to do). The first character is on page N; the rest are on page N+1.
Make sure page N is in physical memory. Make sure page N+1 is not.
Ask to be notified if page N+1 is swapped in.
Ask the system if "aaaaaa" is the password for the user you want to pretend to be.

Here's what this looks like:

There are now three cases of interest:

The system says "aaaaaa" is the right password. You win.
The system tells you the candidate was wrong - but it does not tell you that page N+1 was swapped in. That means the system discovered the candidate is wrong before it needed to check the second character. You know something more specific than that the candidate is wrong. You know that the first character is not "a". So you can try the same procedure again with "baaaaa".
The system first tells you that page N+1 has been swapped in, then tells you the candidate is wrong. From this, you know that the first character of the true password is "a". If the first character were not "a", the system would have replied "wrong password" before it ever needed to swap in page N+1. You can now repeat the query. This time, you shift the candidate so that the first two characters are on page N, the remaining four on page N+1.

On average, it will take you 13 tries to discover the first character, 13 to discover the second, and so on.

Isn't that neat? It makes me feel like the final movement of Mozart's 41st symphony and Lakatos's Proofs and Refutations do: how wonderful it is that people can put things together so cleverly!

Moreover, this is a perfect example of how interaction bugs work. The people designing the virtual memory system were surely not thinking at all about passwords. Worrying about the data on the pages being swapped in and out was not part of their job. The people designing the password system had no reason to think about virtual memory. What possible relevance could it have? There was probably no overlap between the two groups, and it's reasonable to guess that the designs were done at different times. So the opportunity for a chance "Aha!" moment among the designers was pretty small. The very "design and conquer" approach that makes large-scale systems tractable works against it.

The opportunity for an "Aha!" moment among the users was probably equally small. But there are many more users than designers, so there's a greater chance that someone will have the insight.

In the security testing world, people sometimes use "tiger teams" - small groups of intensely experienced people who try to break an entire system. I think of this as a way of emulating the collective cleverness of a zillion normal people.

Things are a bit different in the "functional testing" world, where we do not assume an actively malicious user. Nevertheless, users are adept at unconsciously triggering interactions. For example, users are impatient. While waiting for one lengthy operation to finish, they might get bored and start a second operation, one that unexpectedly interacts with the first. You say that the program doesn't allow users to do two things at once? Ever started another copy of a program while the first copy is churning away at something?

Too often we testers emulate the "feature focus" of the designers. A telephone handset's tester might exercise the Phone Book feature over and over and over again, by entering phone numbers, deleting phone numbers, trying various boundary and error cases, and so forth. Will she think to check what happens if a phone call arrives while she's fiddling with the Phone Book? Maybe. Maybe not. How can we increase the chance a tester will have an "Aha!" moment? That she'll make a connection that no one's made before?

One way is to run tests that emulate user tasks. Some testers should explicitly be given a task focus, not a feature focus. Their job is to do whatever users do in order to accomplish whatever it is that users try to accomplish.

But I have a feeling that's not enough. How carefully should those tests be designed? My sense of my own testing is that designing task-based tests up front impoverishes my imagination. If I carefully work out in advance how I'm going to step through a task, I see fewer opportunities. There's less chance that I'll say, "Hmm... what if I make this kind of error right now?" or "What if I take this side track and try to come back here later?" While I do include predetermined user errors and odd actions in task-based test designs, there's something about actually using the program freely that spurs new ideas, especially if I pretend to be a particular kind of user (rather than Brian Marick, a tester).

I find that annoying. By nature I like to plan, not wing it. And it's certainly true that wandering through the program poses the risk that I'll miss things. Where have I been? Where have I not been? What have I not done? Still, to do the right job, I have to let go - just for a little while. Improvisational testing seems to be necessary.

So the question before us is: how can we get better at it? ImprovisationalTesting is a good topic for the forum. Other methods of ExploitingInteractions perhaps deserve pages of their own. I also solicit writeups for the Techniques section of the Testing Craft site.

Related Testing Craft Pages

There's more about exploratory testing in the index of techniques.

There is discussion in the Wiki Forum at page ImprovisationalTesting.
There is discussion in the Wiki Forum at page ExploitingInteractions.
(The Forum is explained in its FrontPage.)

Summarized Discussion

In this spot, the author of this page will occasionally summarize the discussion in the Forum.