Home > Research > Provision and Mask Your Test Data With the Right Tool

Provision and Mask Your Test Data With the Right Tool

It goes without saying that quality assurance (QA) is a critical component in any software development organization. QA establishes the quality standards and tactics for the validation and verification of software products, whether you are running an Agile or waterfall methodology. See Info-Tech’s Build a Strong Foundation for Quality blueprint for more information on QA practices. The insights generated from QA activities are critical to make informed, data-driven decisions for product deployment, enhancements, and product road mapping.

Automation, containers, behavior-driven development (BDD), and experienced-based testing are just a few tactics professed by the industry to overcome many of common challenges with QA. While these approaches are helpful, they do not directly address the foundations of effective QA:

  • Clear, justified, and prioritized test strategy, plan, and cases.
  • Production-like, high-fidelity environments to host the testing of your application system.
  • Production-like, high-fidelity data to be used in the testing of your application system.

Many of today’s testing tactics primarily focus on test design (e.g. BDD), test execution (e.g. automation), and test environment management (e.g. cloud and containers), assuming the right test data is available. However, this is not always the case.

What are the test data management (TDM) challenges?

The accuracy and relevance of your test data can make or break the success of your deployed product. Sufficient time and effort should be dedicated to ensuring the data you are preparing supports the test cases you want to execute. Unfortunately, we still see test data management (TDM) as a significant challenge that undermines the speed, rigor, and stability of test automation.

  • 56% of organizations indicated that the lack of appropriate test environment and data as a challenge in applying testing to Agile developments (World Quality Report, 2019-20).
  • 48% of organizations indicated test data and environment availability and stability as a main challenge in the achieving desired level of test automation (World Quality Report, 2019-20).
  • 52% of respondents said their testing teams are dependent on database administrators to get the data they need as a top test data issue while 49% stated it was due to their data spread across multiple databases, and 47% mentioned that it was due to having to maintain the right test data set versions with different test versions (Continuous Testing Report, 2020).

Many test data challenges can be rooted in the unavailability of on-demand test data due to costs, process handoffs (e.g. laborious sign-offs and wait times), functional siloes (e.g. separate test, data, and operations teams), and application system complexities (e.g. lack of holistic data management governance and distributed systems). These challenges are further exacerbated by the fact that many organizations are generating new test data manually despite the maturity of today’s toolsets. In fact, 69% of organizations employ spreadsheets to manually generate new test data and 59% create data manually with every test run (Continuous Testing Report, 2020).

TDM solutions can help overcome some of these challenges by automating the manual and error-prone tasks to generate and refresh your test data and by centrally managing your datasets for on-demand access.

What are TDM solutions and how can they help?

TDM solutions create and manage non-production data that reliably mimics or resembles your production data, so that automation tools and testers can rigorously and accurately verify and validate your application systems. Today’s TDM solutions share several common elements:

  • Dataset Provisioning – Provision test datasets from multiple, heterogeneous data sources through non-disruptive synchronization and cloning methods. Maintain a useable representation of production data based on test requirements.
  • Virtual Database – Create lightweight virtual database copies that are version controlled, easily accessible, secured, compliant with regulations (e.g. GDPR), and can be rolled back and edited.
  • Masking – Replace personal and sensitive data with fictitious yet realistic values using predefined or custom profiling expressions with custom or out-of-the-box masking algorithms.
  • Governance – Secure the visibility, modifiability, and consumption of test data with role-based access control. Indicate when the virtual data should be refreshed with the latest production data, branched virtual data copies, rolled back, shared with other teams and tools, and automated to support continuous integration (CI) and continuous delivery (CD) pipelines.

What are the implementation considerations?

Much like any other tool, you will need to evaluate how your TDM solution fits into your overall testing and product delivery environment and strategy:

  • Test Management (TM) Tool Integration – TM tools actively manage, coordinate, and monitor your testing activities and orchestrate the various tasks needed for test automation, provided they have good test data. A key benefit of TM tools is that they provide traceability of test cases, scripts, plans, and defects and trends found within testing back to the original requirements and requests that motivated and justified the testing initiative. See our Choose the Right QA Tools to Validate and Verify Product Value and Stability note for more information.
  • CI/CD Pipelines – Test data management is an integral part of your product delivery pipeline, which can include a number of automation tools that require high fidelity and quality test data, such as test automation, build automation, continuous integration and deployment automation. Therefore, test data management tools must be specifically configured to accommodate on-demand access of up-to-date test data in a consumable format. This requires a close look at the orchestration and requirements of your automation tools and how they trigger the automated pull of test data.
  • Distributed Data Sources – Production data can reside in multiple sources within your distributed application environment. Ensure your target data sources are discoverable and accessible so that data can be consolidated. Evaluate data model changes from each data source and accommodate them when your test data is refreshed with good metadata analysis and data management practices.
  • Data Security and Compliance – Data needs to be protected from unauthorized access. Test data is no different. Most TDM vendors abide to the General Data Protection Regulation (GDPR) and personal identity information protection standards, such as Personal Information Protection and Electronic Documents Act (PIPEDA). However, not every vendor goes about data security and privacy the same way (e.g. dynamic masking through an API, GUI to mask data in CSV files) nor abides to all industry-specific compliance requirements, such as Health Insurance Portability and Accountability Act (HIPAA), Gramm-Leach-Bliley Act (GLBA), and the Safe Harbor requirements and Payment Card Industry Data Security Standard (PCI DSS).
  • Data Governance and Collaboration – Test data is a cross-functional asset. It involves the collaboration among operations, security, developers, testers, and database analysts (DBAs) to provision, subset, mask, and manage test data in such a way that it meets your data quality and management standards, irrespective of where the data comes from and who uses it. The challenge is implementing just-enough oversight and discipline so that teams are not significantly impeded and disempowered to pull and manipulate data as they see fit. Even though TDM may not require the same degree of rigor and control as formal data management practices, some of its key principles can be leveraged to ensure proper data quality, ownership, and approvals are followed. For example, test data owners must keep an active tab on the value and relevance of their data (i.e. test data lifecycle) to determine when test datasets should be tweaked, refreshed, or retired. See our Create a Plan for Establishing a Business-Aligned Data Management Practice blueprint for more details.

Your TDM solution should not be done in isolation as local improvements can drive down the efficiency and performance of your entire delivery pipeline.

Who are the players in the space?

While it may be ideal to have a single vendor platform to manage all QA and testing activities, many TM vendors have some test data management features (e.g. Microsoft’s Azure DevOps, Parasoft’s Automated Software Testing Tool Suite, and Sauce Labs’ Continuous Testing Cloud Platform), but they are limited to access management and basic data editing and virtualization. In some cases, these basic capabilities are enough to meet testing requirements. However, some vendors offer complementary products within their portfolio to fulfill the TDM gap (such as Micro Focus’ Data Express and Broadcom’s Test Data Manager) or offer out-of-the-box integration with solutions specializing in TDM (such as Informatica’s Test Data Management).

Notable TDM vendors include:

What are the key features of TDM solutions?

Each TDM vendor has its own approach to the gathering, masking, and preparing of test data and out-of-the-box compatibility with specific industry standards and regulations. However, they share a common set of table stakes features:

  • Data Discovery, Profiling and Analysis Find, view, analyze, and observe data mined from specified environments and data sources based on defined filters, data profiles, and data requirements.
  • Data Provisioning Access, clone, and virtualize production data from targeted single or multiple data sources and store them in a centralized test database that is accessible to teams and tools. Data models and metadata are also centrally stored here to avoid repeated effort.
  • Self-Service Portal Allow on-demand access to the database. Teams can store, augment, branch, version, and roll back test datasets as they see fit.
  • Data Masking Provide a comprehensive set of masking and transformation techniques so that test data is consumable and satisfies industry regulations and standards, such as the General Data Protection Regulation (GDPR). This feature includes obfuscating sensitive data (e.g. encryption) or replacing sensitive data with altered, fictious data.
  • Audit Logging Complete a record of all changes made to entities within the TDM solution. Easily view and filter logs, specify logging levels, and set retention policies.
  • Synthetic Data Generation Create test data subsets that contain all the characteristics of production data but with none of the sensitive data. The tool offers out-of-the-box test data generation rules that can be repeatedly executed on demand.
  • Test Data Reporting Aggregated visual reporting of the state, use, and changes of the test data through real-time dashboards with customizable templates.
  • Test Management Integration Enable real-time integration with third-party TM, environment management, build and deployment management, and configure management solutions through out-of-the-box plugins or customizable REST APIs.
  • Governance Define, implement, and uphold data classifications, privacy rules, and role-based access rights to govern the TDM process and enforce organizational policies and standards.
  • Cloud and Virtual Data Storage Host the test database in a cloud or virtual environment that abides to data protection and privacy regulations.

TDM vendors try to position themselves as key differentiators through unique features in addition to positioning their table stakes features as best in class.

  • Automated Data Comparison Compare and contrast dataset tables from multiple sources so they can be consolidated and assembled into a usable test dataset. Teams can analyze data used to test an application by comparing the results before and after the application is executed.
  • Capability to Browse and Edit Datasets Truncate, delete, and insert data into existing test datasets to select and prepare the appropriate subset with the appropriate data volume and quality required for testing.
  • Big Data Support Support the management and processing of test data generated from big data utilities and tools (e.g. Hadoop).
  • Test Data Designer and Modeler – Define and customize data sources, data models, data management services, and data masking policies and rules for your desired test datasets.
  • Out-of-the-Box Support for Commercial-Off-the-Shelf (COTS) Applications Access production data stored within COTS applications and within systems managed and hosted by third-party vendors.

Our Take

Test data management (TDM) solutions streamline test management workflows by removing the often manual, time-consuming, and laborious tasks of test data provisioning. While it may seem that TDM solutions are nice-to-have compared to other development, testing, and deployment priorities, the impacts of an automated TDM can be significant, especially when the system under test is distributed, large, and diverse. So, consider the following factors when deciding if a TDM solution is valuable in your organization:

  • The degree of fidelity and preparation your tests require for the test data to be compliant and consumable.
  • The complexity of your system under test.
  • The fragmentation, distribution, accessibility, and quality of your production data.
  • The degree of holistic definition and enforcement consistency of data structures, quality, and governance policies.
  • The mandatory regulations regarding the use of production data.
  • The tool’s out-of-the-box integration with other product delivery and test management tools.

Remember, TDM does not discount the importance of good data management and test management principles and the need to continuously improve your QA practices

Want to Know More?