Validate output of captured big data sample and record results

Submitted by sylvia.wong@up… on Sun, 12/19/2021 - 15:37

Sub Topics

To be able to validate the output of big data samples, it is important to design, formulate and select suitable test scenarios and record detailed information on test cases.

Test cases and test scenarios

A test scenario identifies what needs to be tested and defines specific functional requirements that need to be tested. ⁴³ It is a collective set of test cases that helps to successfully examine the test scenario. ⁴⁴

In the context of testing big data samples, some examples of test case scenarios would be to:

verify the tables/columns in the data model
verify the relationships between the tables
verify report visualisation functionality.

A test case is a documented set of actions for testing a particular test scenario. It defines in detail:

what needs to be tested?
how to perform the test?
what are the expected results from the test?

In the context of testing big data samples, some examples of test cases would include:

Check data type and format of specific columns
Check for duplicate records in a particular table
Check for missing values in a specific table/column.

How to design and write test cases?

Organisations may have their own test case template that they would prefer to use. If so, it is important to use the specific template to design and document the test case scenarios and test case details.

Refer to the 43 Test Case Templates to get an idea about various types of test case templates that can be used. Notice that all test case templates have the following common elements.

A typical test case document would include the following components. ⁴⁵

Test case ID – This is a unique number assigned to each test case. It helps to keep track of the test cases and to avoid testing the same things over and over again. The test case ID can include a short acronym to help easily identify the name of the sample dataset model being tested or the organisational department the dataset relates to.
- An example of a test case ID generated for the Customer Relations (CR) department would have the following format:
  - <Department Acronym - Capitalised first two letters>_<Test case number>
  - Example: CR_001
Test scenario and description – This field should clearly state the high-level objective of the test. For example: “Verify Customer table.”
Test steps and method – This field should clearly outline the steps that need to be taken to test the test scenario and the method for testing. It may even include any pre-requisites about the data loaded to the system, including mentioning any recommended programs/tools used for the test. For example:
1. Check Customer table fields/columns
  - Method: Run query in DAX Studio
2. Check the Customer table for duplicate records
  - Method: Run query in DAX Studio

Test data and queries – This is an optional field and may depend on what is needed for carrying out the test. If the test requires specific data, query or a test script as input to the system, then this needs to be clearly documented using this field.
Expected results – This is an important field that explains what results should be expected from the test in an ideal situation. Ensure that this field contains precise information about the expected results.
Results received – At the stage where you are still designing and formulating test cases, this field is left blank. It will only be completed by the tester during the actual implementation of the tests.

Note: You will learn more about how to complete this field during the implementation of test cases.
Test Results (Pass/Fail) - At the stage where you are still designing and formulating test cases, this field is left blank. It will only be completed by the tester after carrying out the test and by comparing the expected results, and the results received. Note: You will learn more about how to complete this field during the implementation of test cases.
Comments - At the stage where you are still designing and formulating test cases, this field is left blank. It will only be completed by the tester after carrying out the test to indicate any issues encountered or to note down any important observations that occurred during the test. Note: You will learn more about how to complete this field during the implementation of test cases.

A basic example of a test case document format is as follows:

Test Case ID Test Description Test steps and method Test data/query
(test scripts) Expected Results Actual Result Test Result (Pass/Fail) Additional Comments

CR_001

Test Case ID	Test Description	Test steps and method	Test data/query (test scripts)	Expected Results	Actual Result	Test Result (Pass/Fail)	Additional Comments
CR_001	Verify `Customer` table	1. Check `Customer` table fields/columns Method: Run query in DAX Studio	EVALUATE Customer	Customer ID Customer Name Customer Address Customer Phone Customer DOB
2. Check for duplicate records Method: Run query in DAX Studio	EVALUATE SUMMARIZECOLUMNS( 'Total rows', COUNTROWS(Customer), 'Distinct rows', COUNTROWS(DISTINCT(Customer)) )	The same value is displayed for both `'Total rows'` and `'Distinct rows'`

Verify Customer table

1. Check Customer table fields/columns

Method: Run query in DAX Studio

EVALUATE Customer

Customer ID
Customer Name
Customer Address
Customer Phone
Customer DOB

2. Check for duplicate records

Method: Run query in DAX Studio

EVALUATE SUMMARIZECOLUMNS(
  'Total rows', COUNTROWS(Customer),
  'Distinct rows', COUNTROWS(DISTINCT(Customer))
 )

The same value is displayed for both 'Total rows' and 'Distinct rows'

When writing test scenarios and test cases, it is important to consider the requirements and data quality standards of the business. The business requirements would specify the desired output of the big data sample, which needs to be documented in the test case scenarios and test cases.

Complete the following Learning Activity.

Learning activity

Business requirements

All target output table fields should be checked for the following data quality standards to ensure the data represented in the report visualisations are accurate. Consider the following extract of a company’s data quality requirement.

Column validations

Check 1 - Check the following datatype and data format of the target table columns.

Columns that include date information should have data type as 'Date' only and formatted as 'dd/mm/yyyy'.
Columns that include numerical information that does not require decimal numbers should have data type as ‘Whole number’.
Columns that contain both numbers and letters and special characters such as hyphens (-) should have the Data type and Format both set as Text.

Method to validate the above: Select the specific column in Power BI Desktop > Data view; then check the Data type and Format details under Column Tools.

Check 2 - Check for data correctness by validating a specific data value with the result returned (e.g. by filtering the customer name with a sample customer name, e.g. John Smith)

The result should return the correct details of the specific record

Method to validate the above: Use DAX Studio to run the following query to return a specific row from the relevant target table.

EVALUATE FILTER(Customer, [Customer Name] = "John Smith")

Table validations

Check 1 - Check if the table includes the correct fields/columns.

EVALUATE Customer

Check 2 - Check the target output table for duplicate values.

EVALUATE SUMMARIZECOLUMNS(
  'Total rows', COUNTROWS(Customer),
  'Distinct rows', COUNTROWS(DISTINCT(Customer))
)

If both values for 'Total rows' and 'Distinct rows' return:

the same exact value, then there are no duplicates in the table.
different values (most often, a lesser value for 'Distinct rows' than the 'Total rows'), then there are duplicates in the target table.

Sample dataset

The sample dataset that needs to be validated is shown here:

Customer
Customer_ID	Customer_Name	Customer_Address	Customer_Phone	Customer_DOB
AB-10015	Aaron Bergman	Concord, North Carolina	(819) 867-9684	01/05/95
BM-11140	Becky Martin	Philadelphia, Pennsylvania	(763) 376-3282	09/12/77
CS-12460	Chuck Sachs	Houston, Texas	(861) 495-8389	12/05/82
DC-12850	Dan Campbell	Naperville, Illinois	(649) 964-7933	10/11/79
DK-13375	Dennis Kane	Melbourne, Florida	(355) 440-9156	02/06/96
DP-13000	Darren Powers	Jackson, Michigan	(824) 236-2899	12/01/78
IM-15055	Ionia McGrath	Phoenix, Arizona	(716) 599-465	06/03/88
JC-15340	Jasper Cacioppo	Columbus, Ohio	(416) 584-3938	27/10/92
ML-17410	Maris LaWare	Dallas, Texas	(954) 817-897	05/02/80
PW-19240	Pierre Wener	Chicago, Illinois	(722) 786-7569	12/01/79
SC-20020	Sam Craven	South Carolina	(921) 954-4652	12/10/92
VM-21835	Vivian Mathis	Rochester, Minnesota	(949) 483-4462	25/12/95
XP-21865	Xylona Preis	Santa Clara, California	(786) 293-8278	30/10/95
ZC-21910	Zuschuss Carroll	Portland, Oregon	(528) 665-9954	06/03/81
CC-12610	Corey Catlett	Cincinnati, Ohio	(464) 225-9423	16/10/90
FA-14230	Frank Atkinson	San Francisco, California	(538) 880-8941	26/03/78

Task

Based on the sample dataset and the business requirements, design and formulate test scenarios and test cases to validate the output.

Take some time to document your test scenarios and test cases using the following template: Test case template (.xlsx file)

Sample answer

Check your answer against the following sample.

Test Case ID	Test Description	Test steps and method	Test data/query (test scripts)	Expected Results
CR_001	Verify `Customer` table	1. Check `Customer` table fields/columns Method: Run query in DAX Studio	EVALUATE Customer	Customer ID Customer Name Customer Address Customer Phone Customer DOB
CR_001	Verify `Customer` table	2. Check for duplicate records Method: Run query in DAX Studio	EVALUATE SUMMARIZECOLUMNS( 'Total rows', COUNTROWS(Customer), 'Distinct rows', COUNTROWS(DISTINCT(Customer)) )	The same value is displayed for both `'Total rows'` and `'Distinct rows'`
CR_002	Verify `Customer ID` for `Customer` table	1. Check the datatype and data format of target `Customer ID` column in PowerBI Desktop > Data view: then check the Data type and Format details under Column tools Method: Select `Customer ID`		Test
CR_002	Verify `Customer ID` for `Customer` table	2. Validate `Customer ID` data correctness	EVALUATE FILTER(Customer, [Customer Name] = "John Smith" )	`Customer` initials should not be displayed in the ID, only the number should be
		3. Check for missing values or blank cells Method: Run query in DAX Studio	DEFINE MEASURE Customer[EmptyID] = CALCULATE(COUNTROWS(Customer), Customer[Customer ID] == BLANK() ) EVALUATE SUMMARIZECOLUMNS( Customer[Customer ID], Customer, COUNTROWS(Customer), 'Order Detail with blank Customer ID', [EmptyID] )	No values returned for `Customer` with blank `Customer ID` in the test output.
CR_003	Verify `Customer DOB` for `Customer` table	1. Check the datatype of target `Customer DOB` field Method: Select the `Customer DOB` column in PowerBI Desktop > Data view: then check the Data		Date
		2. Check the data format of target `Customer DOB` field Method: Select the `Customer DOB` column in PowerBI Desktop > Data view: then check the Data		DD/MM/YYYY
		3. Check for missing values or blank cells Method: Run query in DAX Studio	DEFINE MEASURE Customer[EmptyDOB] = CALCULATE(COUNTROWS(Customer), Customer[Customer DOB] == BLANK() ) EVALUATE SUMMARIZECOLUMNS( Customer[Customer DOB], Customer, COUNTROWS(Customer), 'Order Detail with blank Customer DOB', [EmptyDOB] )	No values returned for `Customer` with blank `Customer ID` in the test output.

a person testing data in different scenarios

Now you will learn how to implement the selected test cases that you have designed and formulated previously.

Common testing tools

There are a variety of testing tools that can be used to run test scripts in order to validate the output from the sample dataset. Organisations may have their preferred testing tools that can be used to implement the documented test case scenarios and test cases.

It is important to choose a testing tool that integrates well with the data analytic platform which the sample dataset was loaded into.

As we are using Power BI Desktop as our main data analytic platform, we will focus on using a testing tool that is supported by Power BI Desktop, which is DAX Studio.

Refer to the following article from Microsoft External tools in Power BI Desktop that details a variety of other external tools that can be used with Power BI Desktop.

Following organisational procedures

When implementing the selected test case scenarios and test cases, it is important that you follow any relevant organisational procedures that may outline details of:

what tools should be used for carrying out tests
the process to follow when testing
how to document test results
how to report any issues encountered during the testing process.

Implementing selected test scenarios and test cases

When testing big data samples, each of these test cases can be performed as a manual validation check or run as a test script.

A test script is a line-by-line description of all the actions and data needed to properly perform a test. The script includes detailed explanations of the steps needed to achieve the specific goal within the program, as well as descriptions of the results that are expected for each step. ⁴³

As the testing process is already documented in the test case document, during implementation, you are required to:

follow the steps recorded in the test case document
conduct the tests by performing manual checks or by running test scripts
record the actual test output
compare the actual test results with the expected test results in the test case document
document any other issues or observations noted during the test
record the test result outcome for each test case as either pass/fail.

The following three fields in the test case document should be used to document the test results. ⁴⁵

Results received – Results of the test are received once the tester goes through all the steps documented in the test case document. The results received could be positive or negative and may either confirm whether the intended results were achieved or if it is an entirely different result.
Test Results (Pass/Fail) - This field is completed by the tester after receiving test results and by comparing the expected results and the results received to indicate whether the final outcome of the test is a ‘Pass’ or a ‘Fail’. Usually, this field can be colour coded in the test case template (e.g. Pass results in green, Fail results in red) to clearly indicate the final test result.
Comments – This is an optional field that can be completed by the tester after determining the final test result/outcome. This field is used to indicate any issues encountered during the test. Especially if the test resulted in a ‘Fail’, there might be other important observations that occurred during the test that may help identify why the test failed.

The following example shows how the implemented test case results are documented.

Test Case ID	Test Description	Test steps and method	Test data/query (test scripts)	Expected Results	Actual result	Test results	Comments and notes
CR_001	Verify `Customer` table	1. Check `Customer` table fields/columns Method: Run query in DAX Studio	EVALUATE Customer	Customer ID Customer Name Customer Address Customer Phone Customer DOB		Pass
CR_001	Verify `Customer` table	2. Check for duplicate records Method: Run query in DAX Studio	EVALUATE SUMMARIZECOLUMNS( 'Total Rows', COUNTROWS(Customer), 'Distinct Rows', COUNTROWS(DISTINCT(Customer)) )	The same value is displayed for both `'Total Rows'` and `'Distinct Rows'`	`'Distinct Rows'` value is smaller than the `'Total Rows'` displayed.	Fail	Duplicates exists.
CR_002	Verify `Customer ID` for `Customer` table	1. Check the datatype and data format of target `Customer ID` column. Method: Select `Customer ID` column in Power BI Desktop > Data view: then check the Data type and Format details under Column tools		Test	Text	Pass
CR_002	Verify `Customer ID` for `Customer` table	2. Validate `Customer ID` data correctness	EVALUATE FILTER(Customer, [Customer Name] = "John Smith" )	`Customer` initials should not be displayed in the ID, only the number should be	`Customer` initials displayed.	Fail
		3. Check for missing values or blank cells Method: Run query in DAX Studio	DEFINE MEASURE Customer[EmptyID] = CALCULATE(COUNTROWS(Customer), Customer[Customer ID] == BLANK() ) EVALUATE SUMMARIZECOLUMNS( Customer[Customer ID], Customer, COUNTROWS(Customer), 'Order Detail with blank Customer ID', [EmptyID] )	No values returned for `Customer` with blank `Customer ID` in the test output.
CR_003	Verify `Customer DOB` for `Customer` table	1. Check the datatype of target `Customer DOB` field Method: Select the `Customer DOB` column in Power BI Desktop > Data view: then check the Data		Date	Date/Time	Fail
		2. Check the data format of target `Customer DOB` field Method: Select the `Customer DOB` column in Power BI Desktop > Data view: then check the Data		DD/MM/YYYY	DD/MM/YYYY: HH:MM:SS	Fail
		3. Check for missing values or blank cells Method: Run query in DAX Studio	DEFINE MEASURE Customer[EmptyDOB] = CALCULATE(COUNTROWS(Customer), Customer[Customer DOB] == BLANK() ) EVALUATE SUMMARIZECOLUMNS( Customer[Customer DOB], Customer, COUNTROWS(Customer), 'Order Detail with blank Customer ID', [EmptyDOB] )	No values returned for `Customer` with blank `Customer ID` in the test output.

What is sub-standard data?

Sub-standard data refers to data that does not meet the quality standards set by an organisation.

Following are some of the types of sub-standard data that can be found in a sample dataset

Records of products that have been phased out.
Categories of items that are no longer used in the business.
Data fields that are of no value for the current analysis task.
Missing data - records that have blank fields that cannot be used for analysis (e.g. specific customer records with no contact details recorded).
Expected positive number columns that have negative values. (e.g. sales figures that have minus values).
Date columns that indicate the end date earlier than the start date.

Isolate sub-standard data

When it is found that sub-standard data exists in the sample dataset, it is important to isolate or separate that data from the rest of the data in the sample dataset.

This can be done by creating calculated tables using DAX functions; FILTER and EXCEPT. The process is as follows:

Use the FILTER() function to create a new table ('Table 2') with the data that needs to be isolated/separated from the original sample data table ('Table 1').
Use the EXCEPT() function to create a new table ('Table 3') that includes all the data from the sample dataset except the isolated data from 'Table 1'.

New sample data table (Table 3) = Original sample data table (Table 1) – Sub-standard data (Table 2)

Watch the following video to understand how to use the EXCEPT() function.

Refer to the DAX guide for more details on how to use the EXCEPT() function.

Watch the following video demonstration that goes through the step-by-step process of isolating sub-standard data in a sample dataset.

Correct data acquisition paths

It is important to understand that the data segregations and aggregations implemented in your Power BI report are all requesting data from the original sample dataset, which includes sub-standard data.

However, once you have isolated or separated the sub-standard from the original sample dataset, the report visuals need to now request data from the new sample dataset, which does not contain any sub-standard data.

This is done to prevent further importation of sub-standard data into the testing system. How you correct data acquisition paths depend on the particular issues you find with these paths.

Following are some of the ways you can correct the data acquisition paths of your data model:

Correct target output table queries to retrieve data from the new sample table.
Correct visualisations to request data from the new sample table.
Once the data acquisition paths are corrected, refresh all data model visuals.

Watch the following video demonstration that goes step-by-step through the process of correcting acquisition paths.

Generate results of validation activity

It is important to run through the previously formulated test case scenarios and test cases each time there’s a change in the sample dataset.

Due to isolating sub-standard data and correcting data acquisition paths in the data model, there were significant changes in the sample dataset. Therefore a validation check should be performed, and the test results should be documented to identify any new issues in the sample dataset.

Therefore, the results of the validation activity should include:

The actual results and the final test outcome (Pass/Fail) of the new test run against the test case scenario document.
The date when the validation was performed should also be noted in the documentation.
Use the comments/notes section to record any additional information or issues related to the validation activity.

Associated supporting evidence

These may include:

copies of test scripts run during the validation check
screen capture images of any issues detected during the validation
copies of performance data recorded as part of the validation checks
completed test run document with final test outcome clearly recorded as Pass/Fail.

Follow organisational policies and procedures

Organisations may require all generated validation activity and associated supporting evidence to be stored in a secure location. Organisations may have their recommended folder structure and version control requirements when saving documents and files, which you would need to consider.

Consider the following extract from a sample policy document regarding the storage of evidence related to sample data validation activities.

All relevant information related to the validation activities must be stored using the following folder structure.

Phase 1 – Data validation (folder) - This folder should contain copies of:

relevant sample data files (CSVs, Excel Worksheets)
supporting evidence of sample data validation activities such as:
- Power BI data validation files
- DAX query files.

Phase 2 – MapReduce validation (folder) - This folder should contain copies of:

sample data files (where any identified anomalies have been resolved)
supporting evidence of sample data validation activities such as:
- TestCase_template (Excel files, with completed Source to Target Mapping information)
- Power BI test report files (.pbix files)
- Power BI performance data (JSON files).

Phase 3 – Output validation (folder) - This folder should contain copies of:

supporting evidence of sample data validation activities such as:
- TestCase_template (Excel files with completed Test Case and Validation information with recorded Test run results.)
- Power BI test report files (.pbix files)

Follow legislative requirements

Depending on the type of organisation or industry sector, or type of data you are working with, there are various legislative requirements that need to be considered when storing sensitive data.

Due to legislative requirements, organisational policies may require you to ensure that:

the validation results documented do not contain any personally identifiable data of the organisation’s customer
the test results containing customer data are stored in a secure, encrypted location where information is accessible only within the organisation internally.

To check your understanding, answer the following questions.

Topic summary

Congratulations on completing your learning for this topic, Validate output of captured big data sample and record results.

In this topic, you have learnt how to:

A diagram depicting the topics learnt in this topic

design, formulate and select test scenarios and test cases
implement selected test cases and test scenarios using common testing tools and according to organisational procedures
isolate sub-standard data
correct data acquisition paths
generate and store results of the validation activities and associated supporting evidence according to organisational policies, procedures, and legislative requirements.

Check your learning

The final activity for this topic is a set of practical activities that will help you prepare for your formal assessment.

Practical activities:

You should continue to do the following tasks using in the same Power BI data model you have created at the end of the last topic.

Task 1: Design, formulate and select suitable test scenarios and test cases

You have been provided with the following:

Dataset: Sales 2020-2021(.xlsx file) – use the validated sample dataset from the activity you’ve completed at the end of the previous topic
Test case template (.xlsx file)

You should also refer to the Source to Target Mapping you have completed at the end of the previous topic.

Document each test case and test scenario clearly, in a logical sequence using the template provided.

Task 2: Implement the test scenarios and test cases

Implement the test scenarios and test cases according to the following procedure.

Recommended tools

The recommended tools to use for test implementation is DAX Studio and Power BI Desktop.

Test case implementation process

Follow the series of test cases and test case scenarios as documented in the Test case template > Test Cases tab.
Use DAX Studio to write and execute test queries and to preview data outputs.
Use Power BI Desktop views (report view, table view etc.) to validate visualisations and table data.

Documenting results

During test case implementation, the actual results and the final test results (Pass/Fail) should be clearly documented using the Test case template > Validation tab.
Use the comments/notes section to record any additional information or issues related to the test case implementation.
Use clear, specific and industry-related terminology when representing test results.

Task 3: Isolate sub-standard data

Isolate sales records that have minus (-) values for Profit from the sample dataset.

Ensure that you create a new table to contain the sub-standard data. Then, create a new, revised sample dataset table that is free of any sub-standard data.

Task 4: Correct data acquisition paths

Duplicate the previously created report view that contains all visualisations from the activity you’ve completed at the end of the previous topic and do the following

Correct all the target output table queries to retrieve data from the new sample table created in task 3.
Ensure that the visualisations in the new report reflects the data from the corrected target output tables.

What’s next?

Next, we will dive into the more practical tasks related to optimising big data sample results and documentation.

Module Linking

M00147

M00147CL

Main Topic Image

a group of employees checking out data on the screen

Is Study Guide?

Off

Is Assessment Consultation?

Off