(二)典型测试错误

发表于:2007-9-17 14:58

字体: | 上一篇 | 下一篇 | 我要投稿

 作者:翻译:米全喜    来源:网络转载

        An alternative approach to capture/replay is scripting tests. (Most GUI capture/replay tools also allow scripting.) Some member of the testing team writes a "test API" (application programmer interface) that lets other members of the team express their tests in less GUI-dependent terms. Whereas a captured test might look like this:

        捕获/回放的一个替代方法是脚本化测试。(大多数GUI捕获/回放工具都允许编写脚本。)测试小组的某些成员编写一个“测试API(应用编程接口)”,允许小组的其他成员以较少依赖GUI的方式表达他们的测试。一个捕获的测试类似于这样:

  · text $main.accountField "12"

  click $main.OK

  menu $operations

  menu $withdraw

  click $withdrawDialog.all

  ...

  文本 $main.accountField "12"

  点击 $main.OK

  菜单 $operations

  菜单 $withdraw

  点击 $withdrawDialog.all

  a script might look like this:

  而一个脚本类似于:

  · select-account 12

  withdraw all

  ...

  select-account 12

  withdraw all

        The script commands are subroutines that perform the appropriate mouse clicks and key presses. If the API is well-designed, most GUI changes will require changes only to the implementation of functions like withdraw, not to all the tests that use them. Please note that well-designed test APIs are as hard to write as any other good API. That is, they're hard, and you shouldn't expect to get it right the first time.

        脚本命令是执行适当的鼠标点击和按键的子程序。如果API设计得好,大多数GUI变化仅需要对函数(例如withdraw)实现变化,而不是所有使用它们的测试。请注意设计精良的API和其他好API一样难写。也就是说,因为它们不容易写,你也不要指望第一次就得到正确结果。

        In a variant of this approach, the tests are data-driven. The tester provides a table describing key values. Some tool reads the table and converts it to the appropriate mouse clicks. The table is even less vulnerable to GUI changes because the sequence of operations has been abstracted away. It's also likely to be more understandable, especially to domain experts who are not programmers. See [Pettichord96] for an example of data-driven automated testing.

        这个方法的一个变种,是数据驱动的测试。测试员提供一个表来描述键值。某些工具读取表并将它转换为特定的鼠标点击。这个表即使在GUI变化时也不易受到损害,因为操作序列已经被抽象出来了。它也有可能是更易于理解,尤其是对于非程序员的领域专家。查看[Pettichord96]可以获得数据驱动自动化测试的示例。

        Note that these more abstract tests (whether scripted or data-driven) do not necessarily test the user interface thoroughly. If the Withdraw dialog can be reached via several routes (toolbar, menu item, hotkey), you don't know whether each route has been tried. You need a separate (most likely manual) effort to ensure that all the GUI components are connected correctly.

        注意这些抽象测试(不论是脚本化的还是数据驱动的)不一定会完全测试用户界面。如果“取款”对话框能够通过几个途径(工具条、菜单项)达到,你无法知道是否尝试了每个路线。你需要一个单独的(很可能是手工的)的工作来确保所有的GUI部件都正确地连接。 

        Whatever approach you take, don't fall into the trap of expecting regression tests to find a high proportion of new bugs. Regression tests discover that new or changed code breaks what used to work. While that happens more often than any of us would like, most bugs are in the product's new or intentionally changed behavior. Those bugs have to be caught by new tests.

        不论你采用的是什么方法,不要陷入期望回归测试发现高比例的新 bug 的陷阱。回归测试是发现以前起作用、但新代码或更改后的代码不起作用的现象。虽然它比我们希望的发生的次数更多,但许多 bug 是产品的新的或故意更改的行为。那些 bug 必须通过新测试来捕捉。

  Code coverage

  代码覆盖率


        GUI capture/replay testing is appealing because it's a quick fix for a difficult problem. Another class of tool has the same kind of attraction.

        GUI捕获/回放测试因为可以快速修复困难问题而具有吸引力。另一类工具也同样具有吸引力。

        The difficult problem is that it's so hard to know if you're doing a good job testing. You only really find out once the product has shipped. Understandably, this makes managers uncomfortable. Sometimes you find them embracing code coverage with the devotion that only simple numbers can inspire. Testers sometimes also become enamored of coverage, though their romance tends to be less fervent and ends sooner.

        困难的问题是很难知道你是否圆满地完成了测试工作。可能只有当产品已交付后才能真正知道。可以理解的是,这使得经理们不舒服。有时候你会发现他们热心采用代码覆盖率,认为只有那些简单的数字可以鼓舞士气。候测试员也变得倾心于覆盖率,虽然他们的兴趣没有那么高,而且结束得也快。

        What is code coverage? It is any of a number of measures of how thoroughly code is exercised. One common measure counts how many statements have been executed by any test. The appeal of such coverage is twofold:

        什么是代码覆盖率?它是代码是否全面执行的数字衡量。一个常见的衡量是计算所有测试共执行了多少条语句。对这种覆盖率的呼吁有两方面:

        1. If you've never exercised a line of code, you surely can't have found any of its bugs. So you should design tests to exercise every line of code. 

        如果你从未执行过某一行代码,你当然不能找出它的任何 bug 。所以应当设计一个可以执行每一行代码的测试。

        2. Test suites are often too big, so you should throw out any test that doesn't add value. A test that adds no new coverage adds no value. 

        测试套件常常很大,所以应该抛弃任何不能增值的测试。一个不增加新覆盖率的测试不能增加任何价值。

        Only the first sentences in (1) and (2) are true. I'll illustrate with this picture, where the irregular splotches indicate bugs:

        句子(1)和(2)中,只有第一句是正确的。我将用下面的图说明,其中的不规则黑点指示的是 bug :

        If you write only the tests needed to satisfy coverage, you'll find bugs. You're guaranteed to find the code that always fails, no matter how it's executed. But most bugs depend on how a line of code is executed. For example, code with an off-by-one error fails only when you exercise a boundary. Code with a divide-by-zero error fails only if you divide by zero. Coverage-adequate tests will find some of these bugs, by sheer dumb luck, but not enough of them. To find enough bugs, you have to write additional tests that "redundantly" execute the code.

        如果你仅编写需要满足覆盖率的测试,你会发现 bug 。那些总是失败的代码不论怎样执行,你都肯定能发现它们。但是大多数的 bug 取决于如何执行某一行代码。例如,对于“大小差一”(off-by-one)错误的代码,只有当你执行边界测试时才会失败。只有在被零除的时候,代码才会发生被零除的错误。覆盖率足够的测试会发现这些 bug 中的一部分,全靠运气,但发现得还不够多。要发现足够多的 bug ,你必须编写其他的测试“冗余地”执行代码。

        For the same reason, removing tests from a regression test suite just because they don't add coverage is dangerous. The point is not to cover the code; it's to have tests that can discover enough of the bugs that are likely to be caused when the code is changed. Unless the tests are ineptly designed, removing tests will just remove power. If they are ineptly designed, using coverage converts a big and lousy test suite to a small and lousy test suite. That's progress, I suppose, but it's addressing the wrong problem.

        同样的原因,因为有些测试不能增加覆盖率而将它们从回归测试套件中去掉也是危险的。关键不是覆盖代码;而是测试那些当代码更改时容易被发现的 bug 。除非测试用例是不熟练的设计,否则去掉测试用例就是去除作用力。如果它们是不熟练的设计,可以使用覆盖率将一个大而粗劣测试用例套件转化成一个小而粗劣的测试用例套件。我想这是进步,但是与这个问题无关。

        A grave danger of code coverage is that it is concrete, objective, and easy to measure. Many managers today are using coverage as a performance goal for testers. Unfortunately, a cardinal rule of management applies here: "Tell me how a person is evaluated, and I'll tell you how he behaves." If a person is evaluated by how much coverage is achieved in a given time (or in how little time it takes to reach a particular coverage goal), that person will tend to write tests to achieve high coverage in the fastest way possible. Unfortunately, that means shortchanging careful test design that targets bugs, and it certainly means avoiding in-depth, repetitive testing of "already covered" code. 

        代码覆盖率的一个重大危险是它是具体、主观而易于衡量的。今天的许多经理都使用覆盖率作为测试员的绩效目标。不幸的是,一个重要的管理规则适用于这里:“告诉我如何评价一个人,然后我才能告诉你他的表现。”如果一个人是通过在给定的时间内覆盖了多少代码(或者是在多么少的时间内达到了特定覆盖目标)来评估的,那么那个人将倾向于以尽可能快的方式达到高覆盖率的测试。不幸的是,这将意味对以发现 bug 为目的的仔细测试设计的偷工减料,这当然也意味着避开了深层次、重复地测试“已经覆盖”的代码。

        Using coverage as a test design technique works only when the testers are both designing poor tests and testing redundantly. They'd be better off at least targeting their poor tests at new areas of code. In more normal situations, coverage as a guide to design only decreases the value of the tests or puts testers under unproductive pressure to meet unhelpful goals.

        仅当测试员设计了的测试质量不高并且冗余地进行测试时,将测试度作为测试设计技巧才能起作用。至少可以让他们将这些把这些质量不高的测试转移到新的领域中。在正式的场合,覆盖率作为一个设计的指导只会减少测试的价值,或将测试员置于低效率的压力下,以达到没有用处的目标。

        Coverage does play a role in testing, not as a guide to test design, but as a rough evaluation of it. After you've run your tests, ask what their coverage is. If certain areas of the code have no or low coverage, you're sure to have tested them shallowly. If that wasn't intentional, you should improve the tests by rethinking their design. Coverage has told you where your tests are weak, but it's up to you to understand how.

        覆盖率在测试中确实能起作用,但不是作为测试设计的指导,而是作为一个大致的评估。在运行完测试后,看一下它们的覆盖率是多少。如果某个领域的代码没有覆盖到或覆盖率很低,可以确定你对它们的测试很肤浅。如果不是故意那样做的,你应该考虑重新设计它们以改进测试。覆盖率告诉你测试的哪个部分是薄弱的,但如何理解则取决于你。

        You might not entirely ignore coverage. You might glance at the uncovered lines of code (possibly assisted by the programmer) to discover the kinds of tests you omitted. For example, you might scan the code to determine that you undertested a dialog box's error handling. Having done that, you step back and think of all the user errors the dialog box should handle, not how to provoke the error checks on line 343, 354, and 399. By rethinking design, you'll not only execute those lines, you might also discover that several other error checks are entirely missing. (Coverage can't tell you how well you would have exercised needed code that was left out of the program.)

        你也不能完全忽略覆盖率。你可以浏览未覆盖的代码行(可能是在程序员的辅助下)以发现你忽略的某种测试。例如,你可能浏览代码以确定你是否对某个对话框的错误处理测试不足。在完成这些之后,你翻回头应该考虑对话框应该处理的所有用户错误,而不是检查第343、354和399行的错误。通过重新思考设计,你不仅能执行那些行,而且可能会发现几个其他完全被忽略了错误。(覆盖率不能告诉你程序之外的、所需要代码的执行情况)。

        There are types of coverage that point more directly to design mistakes than statement coverage does (branch coverage, for example). However, none - and not all of them put together - are so accurate that they can be used as test design techniques.

        还有几类覆盖率,比语句覆盖率更直接地指向设计错误(例如分支覆盖率)。但是,其他种类——即使把他们都放在一起——也不能够精确到用于测试用例设计技巧。

        One final note: Romances with coverage don't seem to end with the former devotee wanting to be "just good friends". When, at the end of a year's use of coverage, it has not solved the testing problem, I find testing groups abandoning coverage entirely. That's a shame. When I test, I spend somewhat less than 5% of my time looking at coverage results, rethinking my test design, and writing some new tests to correct my mistakes. It's time well spent.

        最后再说明一下:对覆盖率的兴趣似乎不能以从前的爱好者希望成为“好朋友”而结束。在使用了一年的覆盖率之后,它没有解决测试问题,我发现测试小组完全放弃了覆盖率。这是一件丢人的事情。当我测试的时候,我花大约5%的时间查看覆盖率结果,重新考虑我的测试用例设计,并编写一些新的测试用例校正我的错误。这个时间是值得花的。

33/3<123
《2023软件测试行业现状调查报告》独家发布~

精彩评论

  • miracle602
    2007-9-23 22:47:38

    太专业了

  • 51testing
    2007-9-20 17:23:03

    是会员推荐的一篇经典的国外文章

  • fmsbai5
    2007-9-20 14:20:39

    如果这是一篇近期国外的文章的话,那么国外的测试看起来没有超出我们多少。

关注51Testing

联系我们

快捷面板 站点地图 联系我们 广告服务 关于我们 站长统计 发展历程

法律顾问:上海兰迪律师事务所 项棋律师
版权所有 上海博为峰软件技术股份有限公司 Copyright©51testing.com 2003-2024
投诉及意见反馈:webmaster@51testing.com; 业务联系:service@51testing.com 021-64471599-8017

沪ICP备05003035号

沪公网安备 31010102002173号