This is Adam Walsh from ACE Finance.
In this module you will learn the skills and knowledge that enable Adam to perform his role, namely how and when to:
- validate the assembled or obtained big data samples
- validate the big data sample process and the business logic
- validate the output of captured big data samples and record the results
- optimise the results of the big data sample and the accompanying documentation.
We will be following Adam in the performance of these tasks. Before we begin, let us find out more about his role and the current project he is involved in.
Adam’s role
Adam is the Chief Data Officer (CDO) at ACE Finance. He is responsible for validating and testing samples of various types of data processed by the organisation so that the data can be used to make better business decisions by management.
Watch the following video to understand the importance of a Chief Data Officer (CDO) in an organisation.
Adam’s current project
ACE Finance processes various types of data within its internal systems on a daily basis. The management currently relies on the reports manually generated from each department. They currently do not have any insights into the large amount of data stored in their internal database systems.
Often, they have noticed discrepancies in the various business-related figures reported by different departments. Therefore, the management has recognised the need to implement a business intelligence reporting system that can directly obtain data from their internal systems.
Therefore, a new project team (that includes testers and analysts) is formulated, which will be led by Adam, to carry out tasks such as:
- assembling or obtaining raw big data from various sources
- processing the assembled big data
- testing and validating big data so that it can be used more broadly within the organisation.
Let us begin by considering his role by asking Adam the following four questions.
According to Gartner’s popular definition, big data is described as:
Big data is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation.1
On a daily basis, companies receive huge amounts of data from their daily interactions and processes. This massive amount of data is produced by different sources such as social media platforms, weblogs, Internet of Things (IoT) devices such as sensors found in smart devices and many more. Traditional database management systems are not able to handle this vast amount of data.
The wealth of information, patterns and insights held within big data can only be discovered by processing it using various strategies and specialised tools. Therefore, enterprises are using big data to:
- increase revenue
- find new efficiencies
- improve products and services
- better understand customer behaviour.
There are three defining properties that helps us understand how to measure big data. They are commonly known as the three Vs.
- Volume – the amount of data
- Velocity – the speed of data capture and processing
- Variety – the use of different data types and sources
Note: There are two other Vs that help further understand big data (veracity: the reliability of data, and value: the usefulness of data). Therefore, this is alternatively known as the five Vs of big data.
Watch the following video that explains the usefulness of big data in business.
The value of big data lies in our ability to extract insights and make better decisions.2
Big data plays a vital role in businesses today. The insights gained from big data empower businesses to take action.
Better use of data helps the organisation do better business. Therefore, data is considered as an invaluable asset in any organisation. For example, the insights gained by analysing big data can help the business to:
- gain competitive advantage
- drive supply chain efficiencies
- enable data-driven decision making
- implement better strategies to meet business goals
- minimise losses to the organisation and increase its revenue
- improve what they know about customer’s wants and needs
- optimise business processes by knowing the inefficiencies and opportunities for improvement in current business practices.
However, organisations should be cautious when handling and using data for different purposes. This is because there are various data protection and privacy laws and regulations that businesses and industries need to comply with.
Watch the following video to understand how different industry sectors can benefit from data enablement.
As big data deals with large volumes of complex data that is difficult to process, it needs to be tested to ensure that the data can be used effectively in the business.
The data integration stage of any data analytic project involves the gathering of data from different sources into a single destination or analytic platform where various tests and validation checks must be conducted. This testing and checking ensures that the data has the following properties:
- comprehensive – so that it contains all required information for analysis later on
- accurate, trustworthy and reliable - so that it can be relied upon when making business decisions
- current – so that it contains up to date information
- meets data quality requirements – so that the data is free of any errors or abnormalities
- compliant with legislative requirements – so that no privacy or legislative laws and regulations are breached.
There are various platforms and solutions available today that can automate big data testing. For example, take a look at this video which outlines one such big data testing solution. When watching this video pay close attention to the risks of bad-quality data, challenges in data quality analysis and the importance of using the right solutions and techniques in the testing process.
There are a wide variety of tools and platforms available today that can be used to carry out big data testing. Typically from organisation to organisation the choice of these tools and platforms may differ. Some big data testing tools may have additional and more advanced tools and features than others.
However, for the purpose of testing big data sources, the selected tool or platform should have the functionality to:
- obtain data from a variety of big data sources
- import data into a common platform for processing
- validate the data from the source for accuracy and correctness
- perform transformations to the data
- conduct performance tests
- detect any anomalies in the data
- clean the data and validate data quality.
Insights from the real-world
Irrespective of the product or platform used, all big data sample testing tasks should be done in a test environment. Once the testing is successfully completed and all business requirements have been met, the solution is then implemented in a production environment.
If your organisation prefers to use a specific tool or platform, it is important that you become familiar with the tool/platform and learn how to use them effectively for big data testing.
The following table outlines some examples of the tools and platforms used for big data testing and analysis, which are categorised based on their main functional capabilities. 3
Tools for data extraction, transformation and loading (ETL) Capturing data, performing validations and testing |
|
---|---|
Visualisation Reporting, analytics, dashboards and KPIs |
|
Platforms Storage and processing big data sources |
At ACE Finance we have decided to use Microsoft Power BI and DAX Studio as our preferred tools. The reason we have chosen to use Power BI Desktop is because it is a powerful tool that can connect to data sources, build data models and visualisations.
Note: For the purpose of this module, you will be introduced to Microsoft Power BI and DAX Studio. Power BI Desktop is a single tool with multiple capabilities and integrates well with DAX Studio to perform data validations. Therefore, using these tools would enable you to learn the hands-on skills required for this module. You will use these tools at a later topic to perform various tasks related to big data sample testing.
Insights from the real-world
In the industry, Power BI is predominantly used as a visualisation and sharing tool. There are tools and products which are more enterprise-grade that provides specialised data movement, data storage, data processing and ETL capabilities that can be used in conjunction with Power BI.
For more information on other analytic tools and their popularity in the industry, refer to Gartner’s Magic Quadrant for Analytics and Business Intelligence Platforms.
Microsoft Power BI
Watch the following video as an introduction to Microsoft Power BI.
You can download the freely available Microsoft Power BI Desktop from Downloads | Microsoft Power BI.
Additional resources
- Get started with Power BI Desktop - Power BI | Microsoft Docs
- External tools in Power BI Desktop - Power BI | Microsoft Docs
Power BI terminology and basic concepts
Common terms used especially when using Power BI for data testing and visualisations are as follows.
- Workspaces
- Dashboards
- Reports
- Datasets
- Visualisations
Watch the following video to learn more about the terminologies and basic concepts in Power BI.
Refer to the following to familiarise yourself with the basic terminology concepts for using Power BI.
- The Power BI service - basic concepts for beginners - Power BI | Microsoft Docs
- Glossary for Power BI business users - Power BI | Microsoft Docs
DAX Studio
This is a tool that can work with Power BI Desktop and provides the following additional functionality and features such as:
- writing DAX queries and scripts
- performing diagnosis on data model and queries
- performance tuning
- detailed statistics of performance data
- analysis.
Refer to the DAX Studio official website to find out more about this tool.
Using Power BI Desktop and DAX Studio on a macOS computer
Note: You will be required to perform a variety of tasks using Power BI Desktop and DAX Studio from Topic 3 onwards in this module. If your personal computer is installed with Windows operating system (e.g. Windows 10), ignore the following set up guidelines as these instructions will not apply to you.
However, if your personal computer is installed with macOS you will not be able to install these applications on your computer. Therefore as an alternative solution you will be provided with additional guidelines on how to set up a virtual Windows environment in your macOS computer so that you are able to follow through the activities in this module.
- Step 1: Download the VM - Windows 10 (.zip) file to a location on your macOS computer from this link.
- Step 2: Download the Virtual Machine instructions for macOS users document and read through the instructions for setting up the virtual environment.
- Step 3: Watch the following video demonstration on how to follow through the steps using the files you have downloaded.
Now we have met Adam and discussed aspects of his role at ACE Finance, let us now dive further into the fundamental concepts behind testing and validating big data samples.