Validate output of captured big data sample and record results

Submitted by sylvia.wong@up… on Sun, 12/19/2021 - 15:37
Sub Topics

To be able to validate the output of big data samples, it is important to design, formulate and select suitable test scenarios and record detailed information on test cases.

Test cases and test scenarios

A test scenario identifies what needs to be tested and defines specific functional requirements that need to be tested. 43 It is a collective set of test cases that helps to successfully examine the test scenario. 44

In the context of testing big data samples, some examples of test case scenarios would be to:

  • verify the tables/columns in the data model
  • verify the relationships between the tables
  • verify report visualisation functionality.

A test case is a documented set of actions for testing a particular test scenario. It defines in detail:

  • what needs to be tested?
  • how to perform the test?
  • what are the expected results from the test?

In the context of testing big data samples, some examples of test cases would include:

  • Check data type and format of specific columns
  • Check for duplicate records in a particular table
  • Check for missing values in a specific table/column.

How to design and write test cases?

Organisations may have their own test case template that they would prefer to use. If so, it is important to use the specific template to design and document the test case scenarios and test case details.

Refer to the 43 Test Case Templates to get an idea about various types of test case templates that can be used. Notice that all test case templates have the following common elements.

A typical test case document would include the following components. 45

  • Test case ID – This is a unique number assigned to each test case. It helps to keep track of the test cases and to avoid testing the same things over and over again. The test case ID can include a short acronym to help easily identify the name of the sample dataset model being tested or the organisational department the dataset relates to.
    • An example of a test case ID generated for the Customer Relations (CR) department would have the following format:
      • <Department Acronym - Capitalised first two letters>_<Test case number>
      • Example: CR_001
  • Test scenario and description – This field should clearly state the high-level objective of the test. For example: “Verify Customer table.” 
  • Test steps and method – This field should clearly outline the steps that need to be taken to test the test scenario and the method for testing. It may even include any pre-requisites about the data loaded to the system, including mentioning any recommended programs/tools used for the test. For example:
    1. Check Customer table fields/columns
      • Method: Run query in DAX Studio
    2. Check the Customer table for duplicate records
      • Method: Run query in DAX Studio
  • Test data and queries – This is an optional field and may depend on what is needed for carrying out the test. If the test requires specific data, query or a test script as input to the system, then this needs to be clearly documented using this field.
  • Expected results – This is an important field that explains what results should be expected from the test in an ideal situation. Ensure that this field contains precise information about the expected results.
  • Results received – At the stage where you are still designing and formulating test cases, this field is left blank. It will only be completed by the tester during the actual implementation of the tests.

    Note: You will learn more about how to complete this field during the implementation of test cases.

  • Test Results (Pass/Fail) - At the stage where you are still designing and formulating test cases, this field is left blank. It will only be completed by the tester after carrying out the test and by comparing the expected results, and the results received. Note: You will learn more about how to complete this field during the implementation of test cases.
  • Comments - At the stage where you are still designing and formulating test cases, this field is left blank. It will only be completed by the tester after carrying out the test to indicate any issues encountered or to note down any important observations that occurred during the test. Note: You will learn more about how to complete this field during the implementation of test cases.

A basic example of a test case document format is as follows:

Test Case ID Test Description Test steps and method Test data/query
(test scripts)
Expected Results Actual Result Test Result (Pass/Fail) Additional Comments
CR_001 Verify Customer table

1. Check Customer table fields/columns

Method: Run query in DAX Studio

EVALUATE Customer
Customer ID
Customer Name
Customer Address
Customer Phone
Customer DOB
     

2. Check for duplicate records

Method: Run query in DAX Studio

EVALUATE SUMMARIZECOLUMNS(
  'Total rows', COUNTROWS(Customer),
  'Distinct rows', COUNTROWS(DISTINCT(Customer))
 )
The same value is displayed for both 'Total rows' and 'Distinct rows'      

When writing test scenarios and test cases, it is important to consider the requirements and data quality standards of the business. The business requirements would specify the desired output of the big data sample, which needs to be documented in the test case scenarios and test cases.

Complete the following Learning Activity.

Learning activity

All target output table fields should be checked for the following data quality standards to ensure the data represented in the report visualisations are accurate. Consider the following extract of a company’s data quality requirement.

Column validations

Check 1 - Check the following datatype and data format of the target table columns.

  • Columns that include date information should have data type as 'Date' only and formatted as 'dd/mm/yyyy'.
  • Columns that include numerical information that does not require decimal numbers should have data type as ‘Whole number’.
  • Columns that contain both numbers and letters and special characters such as hyphens (-) should have the Data type and Format both set as Text.

Method to validate the above: Select the specific column in Power BI Desktop > Data view; then check the Data type and Format details under Column Tools.

Check 2 - Check for data correctness by validating a specific data value with the result returned (e.g. by filtering the customer name with a sample customer name, e.g. John Smith)

  • The result should return the correct details of the specific record

Method to validate the above: Use DAX Studio to run the following query to return a specific row from the relevant target table.

EVALUATE FILTER(Customer, [Customer Name] = "John Smith")

Table validations

Check 1 - Check if the table includes the correct fields/columns.

EVALUATE Customer

Check 2 - Check the target output table for duplicate values.

EVALUATE SUMMARIZECOLUMNS(
  'Total rows', COUNTROWS(Customer),
  'Distinct rows', COUNTROWS(DISTINCT(Customer))
)

If both values for 'Total rows' and 'Distinct rows' return:

  • the same exact value, then there are no duplicates in the table.
  • different values (most often, a lesser value for 'Distinct rows' than the 'Total rows'), then there are duplicates in the target table.

The sample dataset that needs to be validated is shown here:

Customer
Customer_ID Customer_Name Customer_Address Customer_Phone Customer_DOB
AB-10015 Aaron Bergman Concord, North Carolina (819) 867-9684 01/05/95
BM-11140 Becky Martin Philadelphia, Pennsylvania (763) 376-3282 09/12/77
CS-12460 Chuck Sachs Houston, Texas (861) 495-8389 12/05/82
DC-12850 Dan Campbell Naperville, Illinois (649) 964-7933 10/11/79
DK-13375 Dennis Kane Melbourne, Florida (355) 440-9156 02/06/96
DP-13000 Darren Powers Jackson, Michigan (824) 236-2899 12/01/78
IM-15055 Ionia McGrath Phoenix, Arizona (716) 599-465 06/03/88
JC-15340 Jasper Cacioppo Columbus, Ohio (416) 584-3938 27/10/92
ML-17410 Maris LaWare Dallas, Texas (954) 817-897 05/02/80
PW-19240 Pierre Wener Chicago, Illinois (722) 786-7569 12/01/79
SC-20020 Sam Craven South Carolina (921) 954-4652 12/10/92
VM-21835 Vivian Mathis Rochester, Minnesota (949) 483-4462 25/12/95
XP-21865 Xylona Preis Santa Clara, California (786) 293-8278 30/10/95
ZC-21910 Zuschuss Carroll Portland, Oregon (528) 665-9954 06/03/81
CC-12610 Corey Catlett Cincinnati, Ohio (464) 225-9423 16/10/90
FA-14230 Frank Atkinson San Francisco, California (538) 880-8941 26/03/78

Based on the sample dataset and the business requirements, design and formulate test scenarios and test cases to validate the output.

Take some time to document your test scenarios and test cases using the following template: Test case template (.xlsx file)

Check your answer against the following sample.

Test Case ID Test Description Test steps and method Test data/query (test scripts) Expected Results
CR_001 Verify Customer table

1. Check Customer table fields/columns

Method: Run query in DAX Studio

EVALUATE Customer
Customer ID
Customer Name
Customer Address
Customer Phone
Customer DOB

2. Check for duplicate records

Method: Run query in DAX Studio

EVALUATE SUMMARIZECOLUMNS(
  'Total rows', COUNTROWS(Customer),
  'Distinct rows', COUNTROWS(DISTINCT(Customer))
)
The same value is displayed for both 'Total rows' and 'Distinct rows'
CR_002 Verify Customer ID for Customer table

1. Check the datatype and data format of target Customer ID column in PowerBI Desktop > Data view: then check the Data type and Format details under Column tools

Method: Select Customer ID

  Test

2. Validate Customer ID data correctness

EVALUATE FILTER(Customer,
  [Customer Name] = "John Smith"
)
Customer initials should not be displayed in the ID, only the number should be
    3. Check for missing values or blank cells Method: Run query in DAX Studio
DEFINE
  MEASURE Customer[EmptyID] =
    CALCULATE(COUNTROWS(Customer), Customer[Customer ID] == BLANK()
  )
EVALUATE
  SUMMARIZECOLUMNS(
    Customer[Customer ID],
    Customer,
    COUNTROWS(Customer),
    'Order Detail with blank Customer ID',
    [EmptyID]
  )
No values returned for Customer with blank Customer ID in the test output.
CR_003 Verify Customer DOB for Customer table 1. Check the datatype of target Customer DOB field Method: Select the Customer DOB column in PowerBI Desktop > Data view: then check the Data   Date
    2. Check the data format of target Customer DOB field Method: Select the Customer DOB column in PowerBI Desktop > Data view: then check the Data   DD/MM/YYYY
    3. Check for missing values or blank cells Method: Run query in DAX Studio
DEFINE
  MEASURE Customer[EmptyDOB] =
    CALCULATE(COUNTROWS(Customer), Customer[Customer DOB] == BLANK()
  )
EVALUATE
  SUMMARIZECOLUMNS(
    Customer[Customer DOB],
    Customer,
    COUNTROWS(Customer),
    'Order Detail with blank Customer DOB',
    [EmptyDOB]
  )
No values returned for Customer with blank Customer ID in the test output.
a person testing data in different scenarios

Now you will learn how to implement the selected test cases that you have designed and formulated previously.

Common testing tools

There are a variety of testing tools that can be used to run test scripts in order to validate the output from the sample dataset. Organisations may have their preferred testing tools that can be used to implement the documented test case scenarios and test cases.

It is important to choose a testing tool that integrates well with the data analytic platform which the sample dataset was loaded into.

As we are using Power BI Desktop as our main data analytic platform, we will focus on using a testing tool that is supported by Power BI Desktop, which is DAX Studio.

Refer to the following article from Microsoft External tools in Power BI Desktop that details a variety of other external tools that can be used with Power BI Desktop.

Following organisational procedures

When implementing the selected test case scenarios and test cases, it is important that you follow any relevant organisational procedures that may outline details of:

  • what tools should be used for carrying out tests
  • the process to follow when testing
  • how to document test results
  • how to report any issues encountered during the testing process.

Implementing selected test scenarios and test cases

When testing big data samples, each of these test cases can be performed as a manual validation check or run as a test script.

A test script is a line-by-line description of all the actions and data needed to properly perform a test. The script includes detailed explanations of the steps needed to achieve the specific goal within the program, as well as descriptions of the results that are expected for each step. 43

As the testing process is already documented in the test case document, during implementation, you are required to:

  • follow the steps recorded in the test case document
  • conduct the tests by performing manual checks or by running test scripts
  • record the actual test output
  • compare the actual test results with the expected test results in the test case document
  • document any other issues or observations noted during the test
  • record the test result outcome for each test case as either pass/fail.

The following three fields in the test case document should be used to document the test results. 45

  • Results received – Results of the test are received once the tester goes through all the steps documented in the test case document. The results received could be positive or negative and may either confirm whether the intended results were achieved or if it is an entirely different result.
  • Test Results (Pass/Fail) - This field is completed by the tester after receiving test results and by comparing the expected results and the results received to indicate whether the final outcome of the test is a ‘Pass’ or a ‘Fail’. Usually, this field can be colour coded in the test case template (e.g. Pass results in green, Fail results in red) to clearly indicate the final test result.
  • Comments – This is an optional field that can be completed by the tester after determining the final test result/outcome. This field is used to indicate any issues encountered during the test. Especially if the test resulted in a ‘Fail’, there might be other important observations that occurred during the test that may help identify why the test failed.

The following example shows how the implemented test case results are documented.

Test Case ID Test Description Test steps and method Test data/query (test scripts) Expected Results Actual result Test results Comments and notes
CR_001 Verify Customer table

1. Check Customer table fields/columns

Method: Run query in DAX Studio

EVALUATE Customer
Customer ID
Customer Name
Customer Address
Customer Phone
Customer DOB
  Pass  

2. Check for duplicate records

Method: Run query in DAX Studio

EVALUATE SUMMARIZECOLUMNS(
  'Total Rows', COUNTROWS(Customer),
  'Distinct Rows', COUNTROWS(DISTINCT(Customer))
)
The same value is displayed for both 'Total Rows' and 'Distinct Rows' 'Distinct Rows' value is smaller than the 'Total Rows' displayed. Fail Duplicates exists.
CR_002 Verify Customer ID for Customer table

1. Check the datatype and data format of target Customer ID column.

Method: Select Customer ID column in Power BI Desktop > Data view: then check the Data type and Format details under Column tools

  Test Text Pass  

2. Validate Customer ID data correctness

 

EVALUATE FILTER(Customer,
  [Customer Name] = "John Smith"
)
Customer initials should not be displayed in the ID, only the number should be Customer initials displayed. Fail  
    3. Check for missing values or blank cells Method: Run query in DAX Studio
DEFINE
  MEASURE Customer[EmptyID] = 
    CALCULATE(COUNTROWS(Customer), Customer[Customer ID] == BLANK()
  )
EVALUATE
  SUMMARIZECOLUMNS(
    Customer[Customer ID],
    Customer,
    COUNTROWS(Customer),
    'Order Detail with blank Customer ID',
    [EmptyID]
  )
No values returned for Customer with blank Customer ID in the test output.      
CR_003 Verify Customer DOB for Customer table 1. Check the datatype of target Customer DOB field Method: Select the Customer DOB column in Power BI Desktop > Data view: then check the Data   Date Date/Time Fail  
2. Check the data format of target Customer DOB field Method: Select the Customer DOB column in Power BI Desktop > Data view: then check the Data   DD/MM/YYYY DD/MM/YYYY: HH:MM:SS Fail  
3. Check for missing values or blank cells Method: Run query in DAX Studio
DEFINE
  MEASURE Customer[EmptyDOB] = 
    CALCULATE(COUNTROWS(Customer), Customer[Customer DOB] == BLANK()
  )
EVALUATE
  SUMMARIZECOLUMNS(
    Customer[Customer DOB],
    Customer,
    COUNTROWS(Customer),
    'Order Detail with blank Customer ID',
    [EmptyDOB]
  )
No values returned for Customer with blank Customer ID in the test output.      

What is sub-standard data?

Sub-standard data refers to data that does not meet the quality standards set by an organisation.

Following are some of the types of sub-standard data that can be found in a sample dataset

  • Records of products that have been phased out.
  • Categories of items that are no longer used in the business.
  • Data fields that are of no value for the current analysis task.
  • Missing data - records that have blank fields that cannot be used for analysis (e.g. specific customer records with no contact details recorded).
  • Expected positive number columns that have negative values. (e.g. sales figures that have minus values).
  • Date columns that indicate the end date earlier than the start date.

Isolate sub-standard data

When it is found that sub-standard data exists in the sample dataset, it is important to isolate or separate that data from the rest of the data in the sample dataset.

This can be done by creating calculated tables using DAX functions; FILTER and EXCEPT. The process is as follows:

  • Use the FILTER() function to create a new table ('Table 2') with the data that needs to be isolated/separated from the original sample data table ('Table 1').
  • Use the EXCEPT() function to create a new table ('Table 3') that includes all the data from the sample dataset except the isolated data from 'Table 1'.
New sample data table (Table 3) = Original sample data table (Table 1) – Sub-standard data (Table 2)

Watch the following video to understand how to use the EXCEPT() function.

Refer to the DAX guide for more details on how to use the EXCEPT() function.

Watch the following video demonstration that goes through the step-by-step process of isolating sub-standard data in a sample dataset.

Correct data acquisition paths

It is important to understand that the data segregations and aggregations implemented in your Power BI report are all requesting data from the original sample dataset, which includes sub-standard data.

However, once you have isolated or separated the sub-standard from the original sample dataset, the report visuals need to now request data from the new sample dataset, which does not contain any sub-standard data.

This is done to prevent further importation of sub-standard data into the testing system. How you correct data acquisition paths depend on the particular issues you find with these paths.

Following are some of the ways you can correct the data acquisition paths of your data model:

  • Correct target output table queries to retrieve data from the new sample table.
  • Correct visualisations to request data from the new sample table.
  • Once the data acquisition paths are corrected, refresh all data model visuals.

Watch the following video demonstration that goes step-by-step through the process of correcting acquisition paths.

a person typing in a laptop

Generate results of validation activity

It is important to run through the previously formulated test case scenarios and test cases each time there’s a change in the sample dataset.

Due to isolating sub-standard data and correcting data acquisition paths in the data model, there were significant changes in the sample dataset. Therefore a validation check should be performed, and the test results should be documented to identify any new issues in the sample dataset.

Therefore, the results of the validation activity should include:

  • The actual results and the final test outcome (Pass/Fail) of the new test run against the test case scenario document.
  • The date when the validation was performed should also be noted in the documentation.
  • Use the comments/notes section to record any additional information or issues related to the validation activity.

Associated supporting evidence

These may include:

  • copies of test scripts run during the validation check
  • screen capture images of any issues detected during the validation
  • copies of performance data recorded as part of the validation checks
  • completed test run document with final test outcome clearly recorded as Pass/Fail.

Follow organisational policies and procedures

Organisations may require all generated validation activity and associated supporting evidence to be stored in a secure location. Organisations may have their recommended folder structure and version control requirements when saving documents and files, which you would need to consider.

Consider the following extract from a sample policy document regarding the storage of evidence related to sample data validation activities.

All relevant information related to the validation activities must be stored using the following folder structure.

Phase 1 – Data validation (folder) - This folder should contain copies of:

  • relevant sample data files (CSVs, Excel Worksheets)
  • supporting evidence of sample data validation activities such as: 
    • Power BI data validation files 
    • DAX query files.

Phase 2 – MapReduce validation (folder) - This folder should contain copies of:

  • sample data files (where any identified anomalies have been resolved) 
  • supporting evidence of sample data validation activities such as: 
    • TestCase_template (Excel files, with completed Source to Target Mapping information) 
    • Power BI test report files (.pbix files) 
    • Power BI performance data (JSON files).

Phase 3 – Output validation (folder) - This folder should contain copies of:

  • supporting evidence of sample data validation activities such as: 
    • TestCase_template (Excel files with completed Test Case and Validation information with recorded Test run results.) 
    • Power BI test report files (.pbix files)

Follow legislative requirements

Depending on the type of organisation or industry sector, or type of data you are working with, there are various legislative requirements that need to be considered when storing sensitive data.

Due to legislative requirements, organisational policies may require you to ensure that:

  • the validation results documented do not contain any personally identifiable data of the organisation’s customer
  • the test results containing customer data are stored in a secure, encrypted location where information is accessible only within the organisation internally.

To check your understanding, answer the following questions.

Topic summary

Congratulations on completing your learning for this topic, Validate output of captured big data sample and record results.

In this topic, you have learnt how to:

A diagram depicting the topics learnt in this topic
  • design, formulate and select test scenarios and test cases 
  • implement selected test cases and test scenarios using common testing tools and according to organisational procedures 
  • isolate sub-standard data 
  • correct data acquisition paths 
  • generate and store results of the validation activities and associated supporting evidence according to organisational policies, procedures, and legislative requirements.

Check your learning

The final activity for this topic is a set of practical activities that will help you prepare for your formal assessment.

Practical activities:

You should continue to do the following tasks using in the same Power BI data model you have created at the end of the last topic.

Task 1: Design, formulate and select suitable test scenarios and test cases

You have been provided with the following: 

  • Dataset: Sales 2020-2021(.xlsx file) – use the validated sample dataset from the activity you’ve completed at the end of the previous topic
  • Test case template (.xlsx file)

You should also refer to the Source to Target Mapping you have completed at the end of the previous topic.

Document each test case and test scenario clearly, in a logical sequence using the template provided.

Task 2: Implement the test scenarios and test cases

Implement the test scenarios and test cases according to the following procedure.

Recommended tools

  • The recommended tools to use for test implementation is DAX Studio and Power BI Desktop.

Test case implementation process 

  • Follow the series of test cases and test case scenarios as documented in the Test case template > Test Cases tab.
  • Use DAX Studio to write and execute test queries and to preview data outputs.
  • Use Power BI Desktop views (report view, table view etc.) to validate visualisations and table data.

Documenting results 

  • During test case implementation, the actual results and the final test results (Pass/Fail) should be clearly documented using the Test case template > Validation tab.
  • Use the comments/notes section to record any additional information or issues related to the test case implementation.
  • Use clear, specific and industry-related terminology when representing test results.

Task 3: Isolate sub-standard data

Isolate sales records that have minus (-) values for Profit from the sample dataset.

Ensure that you create a new table to contain the sub-standard data. Then, create a new, revised sample dataset table that is free of any sub-standard data.

Task 4: Correct data acquisition paths

Duplicate the previously created report view that contains all visualisations from the activity you’ve completed at the end of the previous topic and do the following

  • Correct all the target output table queries to retrieve data from the new sample table created in task 3.
  • Ensure that the visualisations in the new report reflects the data from the corrected target output tables.

What’s next?

Next, we will dive into the more practical tasks related to optimising big data sample results and documentation.

Module Linking
Main Topic Image
a group of employees checking out data on the screen
Is Study Guide?
Off
Is Assessment Consultation?
Off