We frequently see test automation efforts fail to provide adequate quality assurance because of inadequate test data. Low quality tests often have false positive or false negative results, leading to a low level of confidence in testing, significant time spent on test analysis and maintenance efforts. One of the primary reasons behind subpar tests is a lack of attention to test datasets. Uncontrollable changes of test data, hardcoded identifiers in test code and ignored test cases due to a lack of available production data all contribute to unreliable tests, and low confidence in test results. To solve these problems, teams need to invest in robust test data management techniques and tools.

Since the inception of Grid Dynamics in 2006, we have never employed manual test engineers; instead, we have focused exclusively on test automation. We very quickly became familiar with the "flaky tests problem", and began successfully solving it by using test data management and service virtualization. We have helped multi-billion dollar enterprises with technology departments consisting of thousands of developers and testers solve these problems in a lean and agile way for more than 10 years. We bring our expertise and blueprints to analyze application portfolios, choose early candidates for adoption, and scale test data management techniques.

Experts in test data management

Synthetic data generation

Synthetic data generation is the most predictable approach to test data management. The datasets are created in a reliable and controlled fashion when tests are implemented, and are only changed when needed. The synthetic data is loaded to the system under test on demand during every test execution, simplifying management of the environment.

An extra benefit of synthetic data is that quality engineers can model corner cases that may not appear in the production data.

Production data curation

Testing with production data has its own benefits, since it may uncover defects missing in a synthetic data set. Using production data for testing requires significant curation to tokenize or obfuscate sensitive data, reduce the size of the dataset and periodically renew the dataset in the test environments.

Additionally, test cases need to avoid using hardcoded identifiers to find the right combinations of data items. Instead, each test case should implement a mechanism to find data items that satisfy its requirements.

Unified data interface

To simplify working with both synthetic and production data, and enable the reuse of tests to work with both datasets, a unified data retrieval interface should be implemented. We call this interface "data pools". With this interface, a test requests needed data items, and the required data is returned. When the system that is being tested works with synthetic data, the interface will retrieve synthetic data or generate it on the fly.

When it works with a production dataset, the interface will query production data and gracefully fail when data is not found, clearly stating that failure is not due to a defect, but due to the absence of data.

How it works

How it works

Get in touch

We'd love to hear from you. Please provide us with your preferred contact method so we can be sure to reach you.

Please follow up to email alerts if you would like to receive information related to press releases, investors relations, and regulatory filings.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.