Data Driven Testing (contains 7 DDT examples from commercial projects)

    Introduction

    Data Driven Testing (DDT) is an approach to the architecture of automated tests (unit, integration, most often it is applied to backend testing), in which the test is able to receive a set of inputs and an expected result or expected state with which it must compare the actual result received after the inputs run. This comparison is an assert of such test. Moreover, as a part of input parameters, the test run options or flags, which affect its logic, can be transferred in the test.
    ddt
    Often covered system or method is very complex, and if it’s impossible to enter explicitly expected values, we might speak about the expected output states. Sometimes, you need to apply ingenuity to understand what will be the input description, and what will be the output one. For example:

    • The database state can be a part of the input description, i.e. DDT functionality will include database setUp. This state can be taken from a SQL dump, or can be programmatically generated using Java (the second method is easier to support). See examples 4 and 6.
    • The database state can be also the part of the expected output state. But sometimes everything is much more complicated, and we need to verify not only the state, but also the sequence of calls, events, and even context after each call within the system. In this case, to build a call tree during test execution might be a good idea i.e., logging desired thing in the desired format (e.g., XML or JSON), and then the call tree verified earlier will be the expected data. By the way, the expected call tree can be written in the first test run, then manually be checked that it is correct, and then be used for tests. See Examples 3 and 7.

    The advantages of using DDT

    A particular advantage of well-designed DDT is the ability to enter the input values and the expected result in the form suitable for all roles on the project - from the manual tester to the manager (test manager) of the project, and even, product owner (in the author’s practice such cases arose). Accordingly, when you are ableto provide the manual testers with full-time work to increase the test coverage and data sets - it reduces the cost of testing. And in general, convenient and understandable format allows you to see more clearly what was covered and what was not. It is, in fact, the test documentation. For example, it can be XLS file with understandable structure (although properties file is often enough). See example 1.

    Otherwise, the creator of such test can give his colleagues the workflow, which helps to easily prepare the expected values. I.e. it is desirable to avoid the dependency in creation of data sets from developers and automated testers, anyone on the project should be able to prepare them.

    Also, the project manager can easily monitor the development, he does not have to delve into the essence of algorithms, and even the quality of done code, DDT tests are enough to move forward.

    The use of DDT, surprisingly enough, allows to use less qualified engineers on projects as good coverage of difficult areas using DDT immediately shows correctness of functionality operation, so the man due to Continuous Integration and local tests run or tests run on Dev environment, is able to independently fix the arisen problems.

    The nuances

    I note that at a certain approach in the use, Robot Framework tests can be very DDT-oriented, and give a lot of benefits.
    The simplest form of DDT - is parameterized JUnit tests, when using Data Providers the data sets for testing are provided in the test.
    DDT is not a panacea, but competent implementation in the right places helps very much.

    How to know when you need to create DDT

    • You can create DDT if you have seen how some components of the system can be described by means of the input parameters and output status or result. In the author's experience, the developers need some time to start noticing how DDT can be introduced in the project. Moreover, very often without explicit reminder or push from the outside people continue to rely on manual testing, or write a lot of similar unit tests. As it was on the project of example 4.
    • If you had to participate in situations where DDT became the salvation of the project, because of accumulated internal complexity, when a small change can bring down the entire system. See example 6.
    • If the input parameters are a complex set of data, or when you need to check carefully the interdependence of inputs, do validation with lots of erroneous options, and also when the amount of the resulting system states is unlimited.
      • The own DSL (domain specific language) is used. See example 3.
      • Covered system is expandable - plug-ins, scripts. See example 4.
      • In the database there are complex calculations or complex relationships. See example 6.
    • DDT can be used to stabilize the system, in which difficult-fixable bugs may arise, and to organize the competent regression testing (see examples 3 and 5).
    • Also, if the unit tests contain copy pasting at the test method level, it is necessary to think about bringing this out into DDT (the amount of code for the tests is reduced significantly).

    How to create DDT

    Most often, it’s enough to make a cycle above the main test, and to arrange comparing of output data and expected data, and also to realize reporting - by logging, or other means.

    But in non-trivial cases it is possible to introduce architecture in order to create DDT. See examples 3 and 7.

    Examples

    1. The simplest example, which the author uses to recruit interns and trainees

    The task: To convert the number from digital format to string format. For example 134345 would be "one hundred thirty-four thousand three hundred forty-five". * Consider declination - and the difference in word endings (in Russian).

    • The algorithm should work for as much as necessary large number, respectively, the values of number rank  - a million, a thousand, a billion etc. - should be taken from the reference book, for example, a text file.
    • Be sure to create a Data Driven Test (I, as a user, should be able to enter multiple sets of numbers 1. number  2. correct expected result. The test itself checks all the sets and says what is not true), which proves that your algorithm works correctly. Use JUnit.
    • If it is possible, apply OOP.

    You can view how one of our trainees has implemented such test here: https://github.com/Dubouski/NumbersToWords/ - pay attention, that he decided to put data sets for testing in Excel, i.e. anyone, including manual tester, will be able to test his algorithm.

     

    2. The example, the author often shows in lectures

    Task: To realize HTML parser from scratch. That is, to implement the line changeover in the DOM model.

    The first engineer will go the following way. He will need some time to examine the specification. Then he will ask to think about architecture. Then he will make prototypes, test them somehow. All this time, days, weeks, the manager and the team will not be able to check his status in fact.

    From the first day the second engineer will make the first unit test, which will verify the simplest case - the empty string, or the empty <html/> tag, or something more simple. And every day, he will create new state verifications, expanding his code. It is reasonable to use DDT here, and let the whole team, managers and testers to create different versions of HTML, and also expected result (for example, the tree of DOM objects that can be written upon the first run of this algorithm and verified manually). In this approach, the cases are accumulated, the algorithm and logic changes don’t break down the previous data sets, and the opportunity to clearly understand what has already been implemented arises. Moreover, the sets of input data and expected values can be grouped into folders, subfolders, and thus it allows creating documentation for parser.

    3. The approach, which is used in testing XML2Selenium automated testing platform  

    XML2Selenium is a system based on plug-ins and plug-ins interaction. Plugins generate events, subscribe to the events of other plug-ins. I.e. the complex interaction is hidden from the user.

    XML2Selenium can run the tests, written in XML format, and can test Web UI application (inside we use Selenium/Web Driver).

    Certainly, any change in the system core can break the logic of interaction, kill the backward compatibility (at occurrence of new plug-ins), moreover, the system core required the Senior engineers work, the cost of failure was high.

    We have applied the following approach. The special JVM parameter is introduced, it asks the core to store the whole call and event tree for this test, including registration, initialization, and the entire life cycle of plug-ins, as well as the context passed between plugins after each atomic action of test. Thus, we have obtained a generated file that contained a full sample of system behaviour. Here is an example of such file:

    Once a developer or tester manually verified that the given tree for this test is correct, it was dispatched to a set of expected values, and thus, on Continuous Integration server Jenkins master branch the parse tree was compared for each running test. It extremely stabilized the system state.

    4. The approach, used on one of the projects for upload data testing

    In this case, the project required to test the correctness of data upload component work. Uploading was nontrivial, depended on state of the database, and bugs popped up in it off and on. One could say that the team did not control the stability of this component. It was necessary to radically increase its stability.

    The solution was to use DDT. The database dump in XML format was the input parameter. The previously tested uploading file, certainly unmistakable was the expected value. Thus, the test came down to the creation of different versions of database as well as to the verification of expected files, and creation of such test sets.

    On this project the adequate regression has been provided. Every time when a bug arose in the uploading - another set of data, which covered the situation was added. The sets are accumulated, their existence did not allow making new bugs, and after a while the component was stabilized.

    As I remember, we have implemented the generation of the desired state of database from Java, in order not to depend on the dumps, and not to update them constantly with changes of data schema.

    5. Real project - testing the error handling in the complex grammar

    The essence of the project was the following. The client part received from the server XML with the full description of not only UI, which was needed to render, but also with the behavior, which was transmitted in binary format in the same XML. That is, the behavior could be anything, there was an obvious plugin system.

    This XML had a rather complicated syntax, and our team was independent of the backend developers. We had to protect ourselves from improper XML, we had to start construction of UI only when we totally understood that XML was correct.

    For this purpose DDT was used. We have created a huge number of options of input XML, including irregular, incorrect. We checked that the necessary exceptions with the right messages to be shown.

     

    Thus, the input parameter is XML, the expected state is the type of exception, its message, or even part of stack trace. The several hundreds of test sets were created, and every time when something in the format was changed, the new parameters or new exceptions and bugs appeared - the tests were expanded. By the end of the project it was the most stable part of the system.

    On this image you can see the part of Excel file, which specifies the input parameter and the expected behavior - in our case, information about exceptions (type and message).

    6. Testing of complex data loading. Logic isolation of stored procedures

    On one of the projects there was data loading in CSV format (very complex internal format) into the database. While we got a legacy code, and the entire loading logic takes place in stored procedures in which there was about a dozen of thousands of lines. The task was to stabilize the loading component and provide regression testing.

    As in example 4 with the data uploading, we applied DDT, and used as inputs 1) state of the database (we used a dump in SQL format) and 2) the file you want to upload. As an expected value we used XLS file that represented the content of required tables of this database after uploading.

    All bugs, which often occurred, we placed in DDT.

    Soon, we found that our fixes do not affect the previous bugs and that the amount of bugs became less, while the control over them - became stronger, as now it is possible to quickly and efficiently reproduce them automatically, adding new sets.

    In this project case, due to lack of time, the raw dumps (the data weren’t specifically prepared but taken directly from the system during bug reproduction) were used, thus, potentially support of such system will require expenses in the future.

    7. The most complex example of DDT from the author practice

    When creating SOA platform (that is, the framework for SOA projects) the testing of the system behavior, the system bus, reproduction of all possible situations that may occur during the SOA project lifecycle has been tasked.

    To achieve this goal the framework, that allows according to the system description to deploy it, was created. In reality it was different virtual servers which are connected to each other through our SOA platform. The application was deployed on each server, it was connected to the system bus, and could run the received scripts. We stored the desired behavior and the desired scenario in the transferred scripts.

    Thus, the input data were 1) the system topology description file and 2) a set of scripts for each node of this topology, which informed about the situation on each node. The expected value was a tree (a kind of log) of messages processing in such system.