Calculating Throughput and Response Time

上一篇 / 下一篇  2010-12-09 15:11:24 / 个人分类:PerformanceTest

Calculating Throughput and Response Time

【本文系转载,做学习使用,转自:http://javidjamae.com/2005/06/20/calculating-throughput-and-response-time/】
20 June 2005 9,408 Views 4 Comments

In software, response time measures a client’s perspective of the total time that a system takes to process a request (including latency). The response time of a single request is not always representative of a system’s typical response time. In order to get a good measure of response time, one will usually calculate the average response time of many requests. Response time is usually measured in units of “seconds / request” or “seconds / transaction”. (Note: Don’t confuse response time and latency.)

Throughput is the measure of the number of messages that a system can process in a given amount of time. In software, throughput is usually measured in “requests / second” or “transactions / second”.

When I first started doing performance analysis, I naively assumed that throughput and response time were linearly related and were thus reciprocols of one another. Though there are conditions that might allow these two system measurements to be inversely proportional, it is definitely not a given.

Let’s look at a real-life example; consider a checkout lane in a grocery store. Let’s assume that the cashier always takes 2 minute to check out a customer. Let’s also assume that there is no line and that a new customer walks up to the cashier at the exact moment that another customer was done checking out, with absolutely no delay between customer checkouts. If we have 10 such customers, we would calulate response time and throughput as follows.

To calculate response time, we sum up the total checkout time for all customers and divide by the number of customers:

Response time = 20 minutes / 10 checkouts = 2 minutes / checkout

To calculate latency, we calculate the average wait time in line:

Latency = 0 minutes / 10 checkouts = 0 minutes / checkout

We can also measure the rate at which things occur:

People that got in line / minute

    * This is the queue input rate
    * The first person got in line at time 0, the last person got in line at time 18 min
    * 10 people / 18 minutes = .56 people got in line / minute

People that got to the register / minute

    * This is the queue output rate and it is also the system input rate
    * The first person got to the register at time 0, the last person got to the register at time 18 min
    * 10 people / 18 minutes = .56 people started checking out / minute

People that finished checking out / minute

    * This is the system output rate
    * The first person finished checking out at time 2 min, the last person finished at time 20 min
    * 10 people / 18 minutes = .56 completed checkouts / minute

People that the cashier checked out / minute

    * This is the processing rate
    * The first person started checking out at time 0 min, the last person finished checking out at 20 min
    * 10 people / 20 minutes = .5 checkouts / minute

As you can see there are many different rates that we can measure. People use the word throughput to refer to all of these different rates, but generally when we talk about throughput in software we are referring to the processing rate (people that the cashier checked out / minute). Depending on how we are measuring, either the system input rate or the queue input rate is also known as the system load. Accordingly, the term load testing is used to describe a test where we send many requests into a system and observe the its non-functional behavior.

Based on this input, it looks like throughput and response time are inversely proportional:

Throughput = .5 checkouts / minute
Response Time = 2 minutes / checkout

or…

Throughput = 1 / Response Time [NOT ALWAYS TRUE]

This is because we have no latency and because our system was provided with exact conditions that allow it to have a load without customer wait time or cashier idle time.

Let’s vary our example a little. What if 10 people used this same checkout lane, but each person arrived in line 1 minute after the last person was done checking out? The cashier is just twiddling his thumbs, waiting for a customer for 1 minute. The cashier is still capable of checking out a customer in 2 minutes, so the average response time is still 2 minutes / customer, but the throughput of people coming out of the checkout lane is not the same.

Response time = 20 minutes / 10 checkout= 2 minutes / checkout
Latency = 0 minutes / 10 checkouts = 0 minutes / checkout
Throughput = 10 checkouts / 29 minutes = .34 checkouts / minute

Let’s also consider the opposite. What if 10 people used the checkout line at nearly the same time. In other words, what if there was a line of 9 people behind a customer who is being checked out? From a customer perspective, the checkout time (or response time) is the amount of time from when they get in line until they are done checking out, and the latency is how long it takes them to get to the cashier from the time they get in line. The first person to get to the checkout lane wouldn’t wait at all. The first person in line (not the one being checked out currently) would wait 2 minutes to start checking out, the second person would wait 4 minutes, and so on until the last person who would wait 18 minutes to start being checked out.

Response time = 110 minutes / 10 checkout= 11 minutes / checkout
Latency = 90 minutes / 10 checkouts = 9 minutes / checkout
Throughput = 10 checkouts / 20 minutes = .5 checkouts / minutes

From a customer’s perspective, the average customer checkout time is greater, even though the clerk is still working at the same speed and is able to push 10 people through line in 20 minutes. The checkout lane is saturated at the point when the queue input rate exceeds the queue output rate. As you can see, the rate at which customers are getting in line makes all the difference. The term degradation is often used to describe a system whose response time increases when the load is increased. In our grocery example, our system starts degrading when we have more than one customer get in line every two minutes.

In this grocery example, people can form. a line. In a software system, the line (or queue) is either going to be on the sender side or the receiver side, depending on whether the system is synchronous or asynchronous. If a receiver blocks all messages until it is done executing its current request, then the system is synchronous and the queue is on the sender’s side. If the receiver accepts messages as fast as possible, and uses a seperate execution thread to execute request, then the receiver must have a queue and the system is said to be asynchronous. You could have a queue on both the sender and receiver, but this is usually superfluous. See: Synchronous vs. Asynchronous Systems.

Software load is usually measured in requests per second. For example, you may describe the load on a system as “10 request per second”. In a real-world scenario, the load will change as a function of time. In a grocery store, more customers will try to check out at peak shopping hours. In the stock market, the most volume is traded in the first and last 15 minutes the market is open. A Web page will have different load depending on the day of the week and the time. Thus, if you are designing a test of your system, you want to determine the behavior. under different types of load.

In order to improve the performance of our grocery store, we can make it multi-threaded by adding more lanes. This concurrency helps in two ways:

    * It can minimize the response time that each customer experiences by reducing wait times in line
    * It can increase throughput

Let’s say we have 10 lanes, 10 customers that get to the checkout area at the same exact time, each customer goes to a different lane, each checkout takes 2 minutes.

Response time = 20 minutes / 10 checkout= 2 minutes / checkout
Latency = 0 minutes / 10 checkouts = 0 minutes / checkout
Throughput = 10 checkouts / 2 minutes = 5 checkouts / minute

With a single lane, our response time for this same load was 11 minutes / checkout, but with multiple lanes, our response time is 2 minutes / checkout, the best that our system can provide. Increasing the number of lanes (or threads) increased our throughput and allowed us to maintain optimum response time.

But, of course, nothing is free. In this case, we’ve increased the number of active employee resources to optimize our performance, but we must pay for those resources. In software, we have to worry about system resources. We can spawn off multiple threads, but we have to be careful how much CPU and memory each execution thread is utilizing.

Technorati Tags: performance, response time, throughput

TAG:

 

评分:0

我来说两句

Open Toolbar