Interpret big data sources and summaries

Submitted by sylvia.wong@up… on Sat, 03/06/2021 - 15:56

Interpreting big data sources and summaries involves the use of appropriate tools/platforms and applying the mathematical and statistical concepts covered in the previous topics.

In this topic you will learn how to:

use Power BI Desktop platform to access data sources and summaries
apply insight analysis and descriptive statistics
identify insights related to operational decision-making requirements.

Foundation skills

In this topic, you will develop the following foundation skills.

Technology: Analysts must be able to use appropriate technology platforms to use big data for operational decision-making. In this topic you will be introduced to Microsoft Power BI Desktop, which is a single tool and a technology platform with multiple capabilities. Therefore, you are required to be familiar with the basic concepts and terminology related to this technology platform. The practical activities in this topic will guide you through the basic steps on how to use the tool/platform efficiently to perform the required analysis tasks.
Numeracy: Analysts need to use mathematical and statistical concepts when analysing big data and are required at times to record numerical data in reports. The practical activities in this topic will help you learn how mathematical formulas are used when writing queries and DAX statements in Power BI Desktop.
Planning and organising: Analysts must be able to efficiently and logically sequence the stages of the big data analysis process. There can be many steps, processes and stages specific tasks carried out using big data for operational decision-making. The practical activities in this topic will help you develop good planning and organising skills that will enable you to perform these tasks efficiently, using the correct logical sequence.
Writing: Analysts must have the ability to use clear, specific and industry-related terminology to clearly represent key outcomes and recommendations related to big data analysis in reports and other forms of written communication with stakeholders.

You will put these foundation skills into practice by performing the practical tasks in this topic.

Sub Topics

You will use Power BI Desktop to carry out the demonstration tasks in this module. This tool would enable you to learn the hands-on skills required for this module.

Installing Power BI Desktop

Do the following to prepare your computer with the required software before proceeding any further in the module content.

Install Power BI Desktop software.
Refer to the article Introduction to Power BI | Microsoft Learn

The following video will demonstrate the installation steps in detail.

Accessing big data sources and summaries

Power BI Desktop has three views: Report, Data and Model. To access data summaries and data sources as required for this unit, you will mainly access the ‘Report view’ and ‘Data view’.

The Power BI 'Report' view - can be used to view big data summaries.
The Power BI 'Data' view - can be used to access data source details such as tables, measures and other data that is associated with the Power BI reports.

The following short video will demonstrate how you can navigate from one view to another in Power BI desktop.

Microsoft Power BI is used with permission from Microsoft. Power BI is a trademark of the Microsoft group of companies.

Refer to the article What is Power BI Desktop? from Microsoft Learn for more information on the three views.

Making data fields accessible in the report view

Sometimes data columns in the ‘Data’ view may be hidden from the ‘Report’ view. To ensure the required fields from the ‘Data’ view are visible and accessible from the report view, you will need to unhide the required tables/columns either from the ‘Data view’ or the ‘Model’ view.

The following video demonstrates how hidden data tables and fields can be made visible and accessible in the ‘Report’ view of a Power BI Desktop file.

Microsoft Power BI is used with permission from Microsoft. Power BI is a trademark of the Microsoft group of companies.

Practical activity 1: Use Power BI to access data sources and summaries

Download the zipped file Demo 1 - Access data sources and summaries_v1.zip which contains the Power BI work file required for this activity.
Unzip the downloaded file to find the Power BI work file inside of it.
- Note: In a Windows computer, you can right-click the file and select 'Extract All...' to unzip the file.
Open the Power BI work file 'Demo 1 – Access data sources and summaries.pbix' using the Power BI Desktop application and do the following tasks.

Note: If you are unsure of the steps required to perform any of the tasks in this activity, you can expand the ‘Hint’ section to see the detailed steps. A screenshot is provided at the end under ‘Checkpoint’ so that you can check your work against the desired end result.

Task 1 - Access big data summaries

Navigate to the ‘Report’ view.
View the XYZ Manufacturing Financials Overview report page.
Customise this page to only show data from the year 2021.

Hint:

Use the year filter to select the required date range

Task 2 - Access big data sources

Using the same Power BI work file, do the following:

Navigate to the ‘Data’ view.
View the data source tables and columns available in the Power BI work file.
Make the following fields accessible in the ‘Report’ view of the file.
1. 'Key Measures'[Gross Profit]
2. ‘Key Measures’[Total Manufacturing Cost]
3. ‘Key Measures’[Total Sales]
4. Date[Year]
5. Date[Month]
6. Financials[Product ID]
7. Financials[Profit]

Checkpoint

Once all tasks are completed correctly, you should get the following result, as displayed in the screenshot.

Screenshot of accessing data sources and summaries using Power BI Desktop © Microsoft.

Man working on multiple data for data analysis

Applying descriptive statistics

Descriptive statistics are explanations and information components that can be used to describe data sets to ensure that they are able to be understood. Descriptive statistics can be applied to support operational decision-making by using the information as evidence that can be used to:

form a position
select an option
determine an answer to a question.

What is univariate analysis?

Univariate analysis is the simplest form of analysing the data. The data has only one variable. No dealings with cause or relationships among different variables. ³¹

Earlier in this module, you learnt basic statistical concepts that help to describe the central tendency (mean, median, mode), dispersion (range, variance, standard deviation) and shape (skewness, kurtosis) of a single numerical variable. Keeping these basic concepts in mind, analysts must perform univariate analysis of the key numerical variables used in an analysis.

Some important facts to remember when evaluating statistics are as follows. ¹⁷

Examine both the mean and the median and identify if they are similar or whether they are very different.
If only the mean is presented, then one cannot determine whether the data are skewed or symmetrical or whether the median might be a better measure of central tendency than the mean.
In addition, one should look to see whether the standard deviation or interquartile range of a very skewed set of data has been included in the statistics provided. Without these, one cannot determine the amount of variation that exists in the data.

The following video will take you through the process of analysing the statistical measures of numerical variables. Pay close attention to how the descriptive statistics obtained for each numerical variable tell a story about its central tendency, dispersion and shape.

Note: This demonstration uses Microsoft Excel to perform basic demonstrations. The same concepts will later be used to carry out demonstration tasks in Power BI Desktop.

Practical activity 2: Creating statistical measures in Power BI Desktop

The tasks in this activity will help you practice what you learnt so far in this topic and help you prepare for the demonstration tasks in your formal assessment. You will specifically learn how to:

create a report page using a given template and customise report elements
create statistical measures required for the univariate analysis
record numerical data in the report.

Use the same Power BI work from practical activity 1, 'Demo 1 - Access data sources and summaries' (where you accessed the financials dataset) and save it as ‘Demo2-Univariate analysis’. In this activity, we will conduct a univariate analysis of the Financials[Profit] data.

Refer to the Statistical functions (DAX) - DAX for more information on the statistical functions used in this exercise.

Note: Additional steps on how to perform the tasks are provided in the ‘Hint’ sections. Screenshots are provided within the 'Hint' sections and under the 'Checkpoint' sections so that you can check your work against the desired end result.

Task 1 - Customise a new report page for the univariate analysis

In the Power BI Desktop work file, duplicate the ‘XYZ Manufacturing Report template’ and rename it as ‘Profit: Univariate Analysis Report’.

Step 1 – Create a new report page using XYZ Manufacturing’s standard report template.

Hint:

To do this, right-click on the ‘XYZ Manufacturing Report template’ > Select ‘Duplicate Page’.

The following screenshot further demonstrates how this task can be done.

Screenshot of duplicating the report template to create a new report page using Power BI Desktop © Microsoft

Step 2 – Rename the new report page as ‘Profit: Univariate Analysis Report’.

Hint:

Double-click on the duplicate report page name (at the bottom) and start typing in the new name. Once done, press the ‘Enter’ key.

The following screenshot further demonstrates how this task can be done.

Screenshot of renaming the new report page using Power BI Desktop © Microsoft

Step 3 – Customise the ‘Title’ of the report page as ‘PROFIT – UNIVARIATE ANALYSIS’.

Hint:

First, select the report element that contains the title (e.g. the black rectangle at the top)
Then, expand the ‘Format’ pane (at the right side of the Power BI Desktop application) and do the following.
- Under the ‘Shape’ tab, expand the ‘Style’ options and scroll down to the ‘Text’ section.
- Type in the new title name in the input text box.

The following screenshot further demonstrates how this task can be done.

Screenshot of customising the report title element using Power BI Desktop © Microsoft

Step 4 – Customise the ‘Sub-title’ of the report page to state the goal of the analysis as “GOAL: TO ANALYSE THE DISTRIBUTION OF THE ‘PROFIT’ FIELD”.

Hint:

Follow the same process in Step 3.

The following screenshot further demonstrates how this task can be done.

Screenshot of customising the report sub-title element using Power BI Desktop © Microsoft

Task 2 - Create the statistical measures required for the univariate analysis

Refer to the Tutorial: Create your own measures in Power BI Desktop - Power BI | Microsoft Learn for more step-by-step instructions on creating measures in Power BI Desktop.

Do the following tasks in the ‘Data’ view of the Power BI work file.

Step 1: Create a new empty table called ‘Statistics’

Hint:

Go to the ‘Table tools’ tab.
Click on ‘New table’
Type in ‘Statistics =’ in the formula bar.
Press the ‘Enter’ key or click on the icon to apply the formula.

Within the ‘Statistics’ table, do tasks 2 to 11.

Step 2: Create a new measure to calculate the arithmetic mean of the ‘Profit’ field called ‘Profit Average’.

Hint:

Profit Average = AVERAGE(Financials[Profit])
For more information on the function used, refer to AVERAGE function (DAX)

Step 3: Create a new measure to calculate the median of the ‘Profit’ field called ‘Profit Median’.

Hint:

Profit Median = Median(Financials[Profit])
For more information on the function used, refer to MEDIAN function (DAX)

Step 4: Create a new measure to calculate the mode of the ‘Profit’ field called ‘Profit Mode’ using the following sample DAX formula.

Note: Ensure that you replace the TableName[X] with the fully qualified column name of the 'Profit' field in the DAX statement provided.

= MINX( 
     TOPN (1, 
          ADDCOLUMNS( 
               VALUES(TableName[X]),"Frequency", CALCULATE( COUNT (TableName[X]))
                    ), 
                    [Frequency], 0 
          ), 
          TableName[X] 
      )

Hint:

The following screenshot further demonstrates how the standard protocols of writing DAX statements are used to create this measure.
Notice that several DAX functions (e.g. MINX, TOPN, ADDCOLUMNS, CALCULATE, COUNT and VALUES) are used to calculate the mode.

Screenshot the DAX measure to calculate the mode of the ‘Proift’ field using Power BI Desktop © Microsoft

Step 5: Create a new measure to calculate the standard deviation of the ‘Profit’ field called ‘Profit Standard Deviation’.

Hint:

Profit Standard Deviation = STDEV.P(Financials[Profit])
For more information on the function used, refer to STDEV.P function (DAX)

Step 6: Create a new measure to obtain the minimum value of the ‘Profit’ field called ‘Profit Min’.

Hint:

Profit Min = MINA(Financials[Profit])

For more information on the function used refer to MINA function (DAX)

Step 7: Create a new measure to obtain the maximum value of the ‘Profit’ field called ‘Profit Max’.

Hint:

Profit Max = MAXA(Financials[Profit])

For more information on the function used refer to MAXA function (DAX)

Step 8: Create a new measure to calculate the range of the ‘Profit’ field called ‘Profit Range’.

Hint:

Profit Range = [Profit Max] – [Profit Min]

Step 9: Create a new measure to calculate the first quartile of the ‘Profit’ field called ‘Profit Q1’.

Hint:

Profit Q1 = PERCENTILE.INC(Financials[Profit], 0.25)

For more information on the function used refer to PERCENTILE.INC function (DAX)

Step 10: Create a new measure to calculate the third quartile of the ‘Profit’ field called ‘Profit Q3’.

Hint:

Profit Q3 = PERCENTILE.INC(Financials[Profit], 0.75)

Step 11: Create a new measure to calculate the interquartile range (IQR) of the ‘Profit’ field called ‘Profit IQR’.

Hint:

Profit IQR = [Profit Q3] - [Profit Q1]

Task 3 - Record numerical data in the report using visuals

Step 1: Add a ‘Multi-row Card’ visual to the report page to visualise the ten (10) statistical measures created previously in task 2.

Place this visual on the top row within the body of the report.

Note: The report has placeholders (rectangular shape elements) that indicate the placement of visuals and their approximate size. You may wish to delete these placeholder elements from the report page or place the visuals on top of these elements.

Hint:

Click on an empty space on the report page canvas (Note: If a report element is selected, the ‘visualisations’ pane will not be visible).
Expand the ‘Visualisations’ pane and select the ‘Multi-row card’ visual. This visual will then be added to the report page.
Place and resize the visual according to the placement guidelines.

Step 2: Add measures to the visual so that the required numerical information can be displayed in the report.

Hint:

Select the ‘Multi-row card’ visual added to the report
Then, from the 'Data' pane, expand the 'Statistics' table.
Then, select each of the ten (10) statistical measures to add them to the 'Multi-row card' visual in the following order.
1. Profit Average
2. Profit Median
3. Profit Mode
4. Profit Standard Deviation
5. Profit Min
6. Profit Max
7. Profit Range
8. Profit Q1
9. Profit Q3
10. Profit IQR

Step 3: Customise the 'Multi-row card' visual according to the following specifications.

Specifications:

Customise the title of this visual to ‘Descriptive Statistics of Profit’ according to the following standard guidelines.

Font type ‘Segoe UI’
Font size ‘14’
Font-style ‘Bold’
Text colour “#0A7411”

Also, increase the font size of the values displayed to ‘14’ and change the background colour of the visual to ‘#E8EBE7’.

Hint:

Select the visual and go to the ‘Visualizations’ pane and select the ‘Format your visual’ icon .
Select the ‘General’ tab > ‘Title’ > Turn ‘On’. Then, expand the ‘Title’ section and input the title text accordingly.
Format the ‘Title’ using the font type ‘Segoe UI’, font size ‘14’, font style ‘Bold’ and text colour as “#0A7411”.
To increase the size of the callout values displayed on the visual, go to the ‘Visual’ tab > expand ‘Callout values’ > Increase the font size to ‘16’, and format font style as ‘Bold’.
To increase the size of the category labels, go to the ‘Visual’tab > expand ‘Category labels’ > Increase font size to ‘14’.
To change the background colour of the visual, go to the 'General' tab > expand 'Effects' > 'Background' > set the colour to '#E8EBE7'.

Checkpoint

Once these steps are done correctly, you should get the following result, as shown in the screenshot.

Screenshot of using visuals record numerical data using Power BI Desktop © Microsoft

Visualising distributions using frequency tables and histograms

To understand the basics of what a histogram is, what it consists of and why it is used to visualise statistical measures, watch the following video demonstration. Pay close attention to the concepts of ‘frequency counts’ and ‘bins’ and how they are used to represent and group numerical data.

Note: This demonstration uses Microsoft Excel to perform basic demonstrations. The same concepts will later be used to carry out demonstration tasks in Power BI Desktop.

Binning in Power BI Desktop

Bin sizes can be set for numerical fields in Power BI Desktop. Bins can be created for calculated columns, but not for measures. It is important to allocate the right bin size in order to visualise the data properly. To learn more refer to the article Use grouping and binning in Power BI Desktop from Microsoft Learn. Watch the following video that demonstrates how to create groupings and bins in Microsoft Power BI Desktop.

Creating a histogram and adding a distribution curve in Power BI

The following video will demonstrate how to create a histogram and how a distribution curve can be added. Pay close attention to how the NORM.DIST function is used in the demonstration.

Practical activity 3: Visualise the distribution of data

The tasks in this activity will help you practice what you learnt and will help you prepare for the demonstration tasks in your formal assessment. You will specifically learn how to create a frequency distribution table and a histogram to visualise the distribution of the data. Continue to work on the same Power BI work file from practical activity 2, ‘Demo2-Univariate analysis’.

Task 1: Create a frequency distribution table

Step 1: Within the ‘Financials’ table, create bins to group the data in the 'Profit' field using an appropriate bin size.

Hint:

From the 'Data' pane, expand the 'Financials' table to find the 'Profit' field.
Right-click on the ‘Profit’ field and select ‘New group’
Adjust the bin size to a suitable value (e.g. 10)

Notice the name of the new field created - Profit(bins). You will be using this new (bins) group when creating the distribution curve in the next task.

Step 2: Within the 'Statistics' table, create a new measure to calculate the count of the ‘Profit’ field called ‘Profit Frequency'.

Hint:

Right-click on the ‘Statistics’ table > Select ‘New measure’
Type in the formula bar: Profit Frequency = COUNT(Financials[Profit])
For more information on the function used, refer to COUNT function (DAX)

Step 3: Add a 'Table' to the report to create a frequency distribution table.

Place this visual at the bottom-right corner body of the report and resize according to the placement guidelines.

Note: The report has placeholders (rectangular shape elements) that indicate the placement of visuals and their approximate size. You may wish to delete these placeholder elements from the report page or place the visuals on top of these elements.

Hint:

Click on an empty space on the canvas of the report page (Note: If a report element is selected, the ‘visualisations’ pane will not be visible.)
Expand the ‘Visualisations’ pane and select the ‘Table’ visual. This visual will then be added to the report page.
For more information on the function used refer to COUNT function (DAX)

Step 4: Add the ‘Profit(bins)’ and ‘Profit Frequency’ measures to the visual to display as columns on the table.

Hint:

Select the table in the report.
From the ‘Data’ pane, expand the ‘Statistics’ table and select the ‘Profit Frequency’ measure as an input field of the table.
Expand the ‘Financials’ table and select the ‘Profit(bins)’ field as an input field of the table.

Step 5: Customise the 'Table' according to the following specifications.

Table specifications:

Customise the title of this visual to ‘Frequency Distribution of Profit’ according to the following standard guidelines:

Font type ‘Segoe UI’,
Font size ‘14’,
Font style ‘Bold’
Text colour “#0A7411”

Also, change the style pre-set of the table to ‘Bold header’, increase the font size of the column headers to ‘14’ with emphasis and change the background colour of the visual to ‘#E8EBE7’.

Hint:

Select the visual and go to the ‘Visualizations’ pane. Then select the ‘Format your visual’ icon .
Select the ‘General’ tab > ‘Title’ > Turn ‘On’. Then, expand the ‘Title’ section and input the title text accordingly.
Format the ‘Title’ using the font type ‘Segoe UI’, font size ‘14’, font style ‘Bold’ and text colour as “#0A7411”.
To increase the size of the column headers, go to the ‘Visual’ tab > expand ‘Column headers’ > Increase font size to ‘14’, and format style as ‘Bold’.
To change the style pre-set of the table, go to the 'Visual' tab > expand ‘Style presets’ > select the ‘Bold header’ option
To change the background colour of the visual, go to the 'General' tab, and expand 'Effects' and select the background colour to '#E8EBE7'.

Checkpoint

Once these steps are done correctly, you should get the following result, as shown in the screenshot.

Screenshot the frequency distribution table created using Power BI Desktop © Microsoft

Task 2: Create a histogram

Step 1: Within the ‘Financials’ table, create a new calculated column called ‘Profit Distribution Curve’ using the NORM.DIST function for the ‘Profit’ field.

Refer to the Tutorial: Create calculated columns in Power BI Desktop - Power BI | Microsoft Learn for more information on how to create calculated columns.

Note: When using the NORM.DIST calculation formula, you will need to type in the exact values obtained for the associated arithmetic mean and standard deviation values for the ‘Profit’ field as displayed in the ‘Multi-row card’ visual in the report.

For more information on how to use this function, refer to NORM.S.DIST function (DAX).

Hint:

Right-click on the ‘Financials’ table > Select ‘New column’.
Type the following in the formula bar:

Profit Distribution Curve = NORM.DIST(‘Financials’[Profit (bins)], 9, 136.37, FALSE())

Step 2: Add a ‘Line and clustered column chart’ visual to display the distribution of the ‘Profit’ field according to the following specifications.

Specifications:

Column Y-axis: ‘Profit Frequency’
Line Y-axis: Average of ‘Profit Distribution Curve’
X-axis: ‘Profit[bins]’

Place this visual below the multi-row card visual, left of the frequency distribution table.

Ensure that the ‘Zoom Slider’ is turned on to help zoom in to further investigate the shape of the distribution.

Hint:

Drag the fields ‘Profit Frequency’, ‘Profit[bins]’ and ‘Profit Distribution Curve’ to the appropriate visual input as specified.
However, the ‘Profit Distribution Curve’ as an input to the ‘Line Y-axis’ will display as a sum value at first. You must change it to an average value by clicking on the down arrow next to the field by selecting ‘Average’.
To turn on the zoom slider, select the visual > go to the ‘Visualisations’ pane > select the ‘Format your visual’ icon . Then, under the ‘Visual’ tab, turn 'On' the ‘Zoom slider’.

Step 3: Customise the title of this visual to '‘Distribution of Profit (Histogram)’ according to the standard guidelines as applied to other visuals in the report:

Hint:

Option 1: Follow similar steps as Task 1 > Step 5 of this practical activity.
Option 2: Use the 'Format Painter' to apply formatting. To do this:
- select a visual in the report with the required formatting already configured.
- then, click on the 'Format Painter' icon on the 'Home' tab, 'Clipboard' section of the Power BI Desktop menu bar.
- Once the mouse pointer changes to a paintbrush symbol, click on the visual to be formatted. This will then apply the same formatting configurations. However, you will need to input title text separately for each visual.

Checkpoint

Once these steps are done correctly, you should get the following result, as shown in the screenshot.

Applying insight analysis

Insight analysis refers to the analysis of the big data insights that have been collected and that will be applied to support operational decision-making. The insights from the big data sets should be used to assess and inform on problems that exist so that operational decision-making can be maintained.

When insights are collected from the big data sets, these will need to be analysed using appropriate tools so that they can be better understood.

Once insights are understood, these will need to be used to justify and substantiate decision that applies within the organisation.

Using smart narratives to record insights into the analysis

Smart narratives are used to provide brief explanations of the overall results and key outcomes of a Power BI report. They are a type of visual that can be added to a report in the form of text to outline summaries of trends, key outcomes, explanations and context.
To learn how to create smart narrative summaries in Power BI, refer to Smart narratives tutorial - Power BI.

The following video further demonstrates how to create smart narratives and customise them as required using dynamic values.

Practical activity 4: Displaying statistical analysis insights

The tasks in this activity will help you practice what you learnt so far in this topic and will help you prepare for the demonstration tasks in your formal assessment. You will specifically learn how to generate a report of the key outcomes of the analysis using the smart narratives feature in Power BI Desktop.

Continue to work on the same Power BI work file from practical activity 3, ‘Demo2-Univariate analysis’.

Task 1 - Create required statistical analysis measures

Create a new measure to calculate ‘Pearson’s Median Skewness’ for the ‘Profit’ field called ‘Profit Median Skewness’.

Note: Use “Pearson’s Median Skewness” equation to calculate the skewness.

$$\text{Pearson's Median Skewness} = \frac{\text{3(Mean-Median)}}{\text{Standard Deviation}}$$

Hint:

Profit Median Skewness = DIVIDE(3*([Profit Average]-[Profit Median]),[Profit Standard Deviation], 0)

Task 2 - Record numerical data in the report

Within the ‘Statistics’ table, create a new calculated measure to determine the shape of the ‘Profit’ data called ‘Shape of the Profit data’ using the following DAX formula.

Note: You must customise the DAX formula by replacing the 'X' with the variable name 'Profit'.

Shape of the (X variable) data =
VAR SK = [X Median Skewness] // Name of the measure that calculates the skewness of the variable
VAR _1 = SK > 0
VAR _2 = SK < 0
RETURN (
IF (_1, "Positively Skewed",
IF (_2, "Negatively Skewed", "No Skew; Data is normally distributed")) )

Hint:

Shape of the Profit data =
VAR SK = [Profit Median Skewness]
VAR _1 = SK > 0
VAR _2 = SK < 0
RETURN (
IF (_1, "Positively Skewed",
IF (_2, "Negatively Skewed", "No Skew; Data is normally distributed") ) )

Task 3 - Generate a report of the key outcomes of the analysis

Step 1: Add a 'Smart narrative' visual to the report.

Place this at the top-right corner within the body of the report.

Hint:

Click on an empty space on the canvas of the report page (Note: If a report element is selected, the ‘visualisations’ pane will not be visible).
Expand the ‘Visualisations’ pane and select the ‘Smart Narrative’ icon. This visual will then be added to the report page.

Step 2: Customise the title of this visual to ‘Statistical Analysis of Profit’ and apply standard formatting.

Also, set the background colour of the visual to '#E8EBE7'.

Hint:

Option 1: Follow similar steps as Practical activity 3 > Task 1 > Step 5.
Option 2: Use the 'Format Painter' to apply formatting. Detailed steps are outlined in Practical activity 3 > Task 2 > Step 3

Step 3: Customise the summary text ‘Profit’ data to only include information related to the statistical analysis, such as the following.

Shape of the data: Indicate whether it is symmetrical, positively skewed or negatively skewed. Add here a dynamic value to display the relevant information from the previously created measures ‘Shape of the Profit data’ as appropriate.
Central tendency: Indicate where the data is centred in terms of mean, median and mode? Are there any observations that are radically different from the rest which may influence the arithmetic mean?
Spread of the data: How broadly is the data dispersed? Range (Max value – Min value) and IQR. Are there huge differences between the range and the IQR, indicating that extreme values may be present in the data?

Customise the text in the smart narrative visual to be font type ‘Segoe UI’ and size 12.

Hint:

The smart narrative can be customised as follows:

Click on ‘+Value’ to add dynamic values to the smart narrative.
In the following example, the dynamic values added are within <> symbol. This will generate dynamic text on the narrative automatically.

An example of summary text (smart narrative) is shown below.

Shape of the data: <Shape of the Profit data>
Central tendency: Data is not centred. Mean (Average) > Median > Mode. Extreme values present in the data can have an impact on the average.
Spread of the data: Measures are dispersed with a range of <Profit Range> and IQR of <Profit IQR>. This indicates that there are extreme values within the data.

CHECKPOINT

Once these steps are done correctly, you should get the following result, as shown in the screenshot.

Screenshot of the Profit univariate analysis report page created using Power BI Desktop © Microsoft

Practical activity 5 - Putting it all together

Note: This activity will help you prepare for the demonstration tasks in your formal assessment. The analysis in this task will form the basis for the type of analysis we will discuss next in this topic. Therefore, it is recommended that you attempt this activity in preparation for the next section.

Continue to work on the same Power BI work file from the previous practical activity, and rename the file as ‘Demo3 -Cost Univariate Analysis’.

Tasks: Conduct a univariate analysis for the ‘Manufacturing Cost’ field of the ‘Financials’ dataset by:
a.   making the Financials[Manufacturing Cost] field accessible from the report view of Power BI
b.   creating the necessary statistical measures for the analysis
c.   recording numerical data in the report
d.   creating a frequency table and histogram to visualise the distribution of the data
e.   generating a report of the key outcomes of the analysis.

CHECKPOINT

Once these steps are done correctly, you should get the following result as shown in the screenshot.

Screenshot of the Cost univariate analysis report page created using Power BI Desktop © Microsoft

Findings from the statistical analysis refer to the information components that have been discovered, such as:

outcomes
trends
patterns
links
relationships
existence of extreme values in the data (outliers).

These findings will then need to be used to identify the insights that relate to the identified operational decision-making requirements. For example, let us explore how outliers or extreme values, when found in a dataset, could impact operational decision-making.

The effects of extreme values on operational decision making

Finding outliers in a dataset means that the result from any further analysis using this data may be impacted and may even provide inaccurate outcomes or skewed results.
To ensure that the operational decision-making does not have any effects from extreme values, it is important to implement a method to isolate the outliers from the rest of the data in the analysis.

“Outliers are values that seem excessively different from most of the other values. Such values may or may not be errors, but all outliers require review.” ¹⁷

When outliers (or extreme values) are present in the key numerical variables analysed, it will influence the mean value of those variables. Outliers will pull the value of the mean toward the extreme values. Therefore, identifying outliers is important as some analysis methods are sensitive to outliers and produce very different results when outliers are included in the analysis. ¹⁷

The following video discusses the effects of outliers on the distribution of data.

When analysts need to decide to remove or keep outliers, the following flow chart can help to make a decision. ³²

A diagram showing outlier decision making flowchart

Finding outliers and isolating incorrect results

Outliers can be found for numerical variables that do not have a defined range of possible values. Although there is no standard for defining outliers, statistical measures such as standard deviation or the interquartile range can be used to find outliers in numerical variables.

When performing the practical activity of conducting a univariate analysis, you learnt that measures such as ‘standard deviation’ and ‘interquartile range’ can be easily computed using analytical tools. Therefore, spotting outliers can also be done using analytical tools.

Using Z-scores to find outliers

When extreme data points in a numerical variable are converted to standardised values (Z-scores), this will help identify how many standard deviations away they are from the mean. Generally, values with a z score greater than three (3) or less than minus three (-3) are often determined to be outliers. ³³

A calculated column can be created to compute the z-score values in Power BI Desktop using the following standard formula.

$$\textit{Z-Score} = \frac{(X_i-\mu )}{ \sigma}$$

Isolating outliers from visuals in the analysis

In situations where outliers are genuinely present in numerical data (not due to errors in data entry), organisations would want to visualise their analytical results with/without outliers. Therefore, analysts need to know how to implement this.

Analysts can implement the following general steps to isolate outliers from an analysis using their preferred analytical tool.

Step 1 – Create a calculated column to compute the standardised (z-score) values. Note: The formula will need to use previously calculated average and standard deviation figures of the data.

For example, a conditional column using an if condition to tag the values in each row of data in the previously calculated ‘Z-Score Cost’ column will look like the following:

Z-Score Cost = DIVIDE(([Cost]- X), Y )

Step 2 – Create a conditional column to categorise the z-score values as ‘Outliers’ and ‘Non-outliers’. For example, a conditional column using if condition to tag the values in each row of data in the previously calculated ‘Z-Score Cost’ column will look like the following:

Cost Outliers/Non-outliers = If (
                              ( [Z-Score Cost] > 3 || [Z-Score Cost] < -3 ),
                              "Outliers", "Non-outliers"
                              )

Step 3 – Filter visuals in the analysis to show results with or without outliers.

For example, this can be done by using a slicer visual in the report, where the option is provided to the user to select only outliers, only non-outliers or include both in the analysis.

Practical activity 6: Isolating outliers from the analysis

Let’s put into practice what you’ve learnt so far in this topic. This activity will help you prepare for the demonstration tasks in your formal assessment. You will specifically learn how to:

create a calculated column to compute the z-score values
create a conditional column to categorise outlier and non-outlier values
filter the outliers from the visuals in a report.

Note: If you are unsure of the steps required to perform any of the tasks in this activity, you can expand the ‘Hint’ sections to see the detailed steps (Note: If similar tasks were performed in previous practical activities, these tasks would not include hints.). Screenshots are provided under the ‘Checkpoint’ sections so that you can check your work against the desired end result.

Use the same Power BI work file you created in the last practical activity (where you have performed the univariate analysis) and save it as ‘Demo4-Isolating outliers’.

Task 1 - Create a calculated column to record z-score values

Create a calculated column called ‘Profit Z-scores’ in the ‘Financials’ table that calculates standardised values based on the formula:

$$\textit{Profit Z-Scores} = \frac{([Profit]-\ Profit \ Average )}{ \ Profit \ Standard \ Deviation}$$

To do this, you must go to the ‘Data’ view in Power BI Desktop. You must enter the actual values for average and standard deviation as calculated in the previous univariate analysis.

Hint:

From the ‘Data’ view in Power BI Desktop, right-click on the ‘Financials’ table and select ‘New Column’.
Enter ‘New column name’ as ‘Profit Z-scores’.
Enter the formula as = DIVIDE(([Profit] – 9), 136.37).

CHECKPOINT

The following screenshot demonstrates how the standard protocols of writing DAX statements are used to create this calculated column.

Screenshot of the Profit Z-scores calculated column DAX statement in Power BI Desktop © Microsoft

Task 2 - Create a conditional column to categorise outlier and non-outlier values

Add a conditional column called ‘Profit Outliers/Non-outliers’ to isolate the outliers in the ‘Profit Z-scores’ column.

Note: If the ‘Profit Z-scores’ values are greater than three (3) and less than minus three (-3), the output should be ‘YES’ to indicate these values are outliers. Otherwise, the output should be ‘NO’ to indicate those values are non-outliers.

Hint:

From the ‘Data’ view in Power BI Desktop, right-click on the ‘Financials’ table and select ‘New Column’.
Enter ‘New column name’ as ‘Profit Outliers/Non-outliers’ Write a conditional statement using the following as a guideline:
= If( ([Z-Score] > 3) || ([Z-Score] < -3), "Outliers", "Non-outliers" )

CHECKPOINT

The following screenshot demonstrates how the standard protocols of writing DAX statements are used to create this conditional column.

Screenshot of the conditional column created in the ‘Financials’ data table in Power BI Desktop © Microsoft

Task 3 - Filter the outliers from the visuals in the dashboard

Step 1: Add a new ‘Slicer’ visual to the ‘XYZ Manufacturing Financials Overview’ page.

Step 2: Position the slicer visual within the ‘Filters’ section on the left side of the report.

Step 3: Select the ‘Profit Outliers/Non-outliers’ column as the slicer input field.

Step 4: Customise the ‘Slicer’ visual title as ‘Display Profit:’ and apply the standard formatting style.

Hint:

Click on another slicer visual that is already formatted according to XYZ Manufacturing’s standard style.
Then, click on the ‘Format Painter’ option under the ‘Home’ tab in Power BI Desktop.
Then click on the new slicer visual where you want to apply the formatting.

Step 5: Ensure the slicer shows the ‘Select all’ option. (hint: Visual > Slicer settings > Selection

Hint:

Visual > Slicer settings > Selection.

Task 6: Filter the dashboard page to display the analysis without any outliers.

Hint:

Click on the ‘Outliers’ option in the slicer visual to de-select or exclude the outliers from the data.

CHECKPOINT

Once these steps are done correctly, you should get the following result, as shown in the screenshot.

Screenshot of using filters to exclude outliers from the dashboard page using Power BI Desktop © Microsoft

Practical activity 7: Try it yourself.

This activity will help you prepare for the demonstration tasks in your formal assessment. The analysis in this task will form the basis for the type of analysis we will discuss next in this topic. Therefore, it is recommended that you attempt this activity in preparation for the next section.

Continue to work on the same Power BI work file from the previous practical activity and save it as ‘Demo5 -Isolating Cost Outliers’.

Isolate the outliers of the ‘Manufacturing Cost’ field of the ‘Financials’ dataset by doing the following tasks.

Task 1: Create a calculated column called ‘Cost Z-scores’ in the ‘Financials’ table that calculates standardised values for the values in the ‘Manufacturing Cost’ field.
Task 2: Adding a conditional column called ‘Cost Outliers/Non-outliers’ to isolate the outliers in the ‘Cost Z-scores’ column.
Task 3: Filter out the cost outliers from the visuals in the ‘XYZ Manufacturing Financials Overview’.

CHECKPOINT

Once these steps are done correctly, you should get the results as shown in the following screenshots.

1 of 3

Screenshots were taken using Power BI Desktop © Microsoft

Group of researchers working on diagrams and charts

Identifying insights

Identifying insights refers to gaining a deeper understanding of an organisation's dataset by analysing information on a particular problem or decision-making requirement. This level of understanding helps organisations make better decisions than by relying on gut instinct.

The term 'actionable data insights' refer to insights that are specific and relevant enough that they lead to actions that can increase efficiency, revenue and profits.
Data insights will provide information about what the data is saying or what insights into the factors related to the decision-making can be identified.

Relating insights to operational decision-making requirements

Once findings have been assessed and the insights have been identified, it is important to acknowledge the relevance of those findings to operational decision-making requirements.

The identified operational decision-making requirements refer to the items that are within the decision-making scope as part of the work task or project.

Using the Pareto (80/20 Rule) analysis to identify insights

What is the Pareto principle (or 80/20 rule)?

The generic situation in which 80% of some output comes from 20% of some input is referred to as the Pareto principle. Therefore, the Pareto principle is also commonly known as the “80-20 rule”. ¹⁹
When applying the Pareto principle to any characteristic of interest, analysts need to know how to sort the data and calculate the cumulative percentage. To learn more about the Pareto principle and how it works, refer to the article The 80-20 Rule (aka Pareto Principle): What it is, How it works. ²²

The following video explains in simple terms and examples how the Pareto analysis is used as a common method to solve problems and in support of making operational decisions in businesses.

Creating a Pareto Chart in Power BI

Refer to the article What Is Pareto Analysis? How to Create a Pareto Chart and Example (investopedia.com) to learn more about the pareto analysis.

Pareto diagram

The following video will demonstrate how a Pareto Chart is created in Power BI Desktop using DAX formulas.

Testing the Pareto rule

The following video discusses how the Pareto Rule can be tested using a sample dataset. Pay close attention to the various DAX statements used and how they are formulated to produce the required results in Power BI visualisations.

Creating smart narrative summaries from visuals

In Power BI, smart narratives can be used to auto-generate summary text of the data displayed in report visuals. The following video demonstrates various methods that can be used to generate and further edit smart narrative summaries.

For more information, refer to the article Create smart narrative summaries by Microsoft Learn.

Practical activity 8: Creating a basic Pareto Analysis dashboard

The tasks in this activity will help you practice what you learnt so far about the Pareto principle and will help you prepare for the demonstration tasks in your formal assessment.

You will specifically learn how to:

create a report page using a given templates and customise report elements
include required filters in the analysis
create a numeric range parameter
create the required calculated measures for the analysis
record numerical data in the dashboard
visualise data using a Pareto chart, table and a scatter chart
generate summaries to report on analysis outcomes
use parameters in the dashboard to identify insights that relate to operational decision-making requirements.

Use the same Power BI work file you created in the last practical activity and save it as ‘Demo6-Pareto Analysis’.

Note: If you are unsure of the steps required to perform any of the tasks in this activity, you can expand the ‘Hint’ sections to see the detailed steps. If similar tasks were performed in previous practical activities, these tasks would not include hints. Screenshots are provided under the ‘Checkpoint’ sections so that you can check your work against the desired end result.

Scenario:

The management at XYZ Manufacturing wants to identify the portion of products with the greatest contribution to the total manufacturing costs so that necessary cost optimisation strategies can be applied for next year’s production focused on the selected products.

Let us use the Pareto rule to see whether the Top N percentage (approx. 20%) of products cause 80% of the total manufacturing costs.

Task 1 - Create a new report page for the Pareto Analysis

Step 1 – Create a new report page using XYZ Manufacturing’s standard dashboard template.

Step 2 – Rename the new dashboard page as ‘Top N Product Costs: Pareto Analysis’.

Step 3 – Customise the ‘Title’ of the report page as ‘TOP N Product Costs | PARETO ANALYSIS’

Step 4 – Customise the ‘Sub-title’ of the report page to state the goal of the analysis as “GOAL: TO IDENTIFY THE TOP N PRODUCTS THAT CAUSES 80% OF THE MANUFACTURING COSTS”.

Task 2 – Include the required filters for the analysis

From the previous ‘XYZ Manufacturing Financials Overview’ page, copy and paste the following filters to this dashboard page.

‘Select year: ‘ – containing values from the input field ‘Year’.
'Select Cost data: ‘ – containing values from the input field ‘Cost Outliers/Non-Outliers’.

Task 3 – Create a numeric range parameter

Procedure: From Power BI Desktop, navigate to the ‘Modelling’ tab> ‘New parameter’ > Select ‘Numeric range’ and customise according to the following specifications.

Specifications:

Name of the parameter: TopN%
Data type: Decimal Number
Min: 0.005
Max: 1.0001
Increment: 0.001
Default: 0
Select the option to ‘Add slicer to the page’.

Field formatting and placement: Configure the TopN% field as a ‘Percentage’ format with one decimal point. The correct format should be visible in the ‘Slicer’ visual. Place this visual in the ‘Filters’ section of the dashboard.

Slicer format specifications:

Apply the following format specifications to the parameter slicer so that it stands out from the rest of the filters.

Visual > Slicer header: Off
Visual > Values > Values> Font: Segoe UI, Size: 14, Style: Bold, Color: #F0E199
Visual > Slider: On
General > Title> On
- Text: Top N %
- Font: Segoe UI
- Size: 14
- Style: Bold
- Text color: #FFFFFF
General > Effects > Background > Color: #0A5619, Transparency: 0%

Task 4 – Create the required calculated measures for the analysis

Backview of a man seriously analyzing data on computer

Create the following five (5) measures within the ‘Key Measures’ table using the DAX statements provided. Apply correct programming protocols and techniques when typing in the DAX statements into Power BI’s formula bar.

Measure 1

Cost($) from TopN% Products = 
VAR ProductPercent = [Distinct Products] * [TopN% Value] // Calculates the percentage of products using the Top N parameter selection
RETURN
 CALCULATE([Total Manufacturing Cost],
    FILTER(VALUES('Financials'[Product ID]),
        RANKX(VALUES('Financials'[Product ID] ), [Total Manufacturing Cost], , DESC ) <= ProductPercent ) )

Note: Change the format of this measure to 'Currency'.

Hint:

Create a new measure in the ‘Key Measures’ table.
Copy and paste this code into the DAX formula bar.
Once the measure is created, select it from the ‘Data’ pane and go to the ‘Measure tools’ tab from the top menu.
Then, select ‘Format’ as ‘Currency’ from the drop-down options.

Measure 2

Cost(%) from TopN% Products = DIVIDE([Cost($) from TopN% Products], [Total Manufacturing Cost])

Note: Change the format of this measure to 'Percentage'.

Measure 3

Product-Cost Cumulative Percentage = 
VAR ProductCost = [Total Manufacturing Cost]
VAR AllFinancials = CALCULATE([Total Manufacturing Cost], ALLSELECTED(Financials))
RETURN
DIVIDE (
SUMX (
    FILTER(
        SUMMARIZE( ALLSELECTED(Financials), 'Financials'[Product ID],
            "Cost", [Total Manufacturing Cost]),
                [Cost] >= ProductCost),
                    [Cost]),
    AllFinancials, 0)

Note: Change the format of this measure to 'Percentage'.

Measure 4

Product Rank = RANKX (ALLSELECTED('Financials'[Product ID]), [Product-Cost Cumulative Percentage], , ASC)

Measure 5

No. of Selected Products from TopN% = [Distinct Products] * [TopN% Value]

Note: Change the format of this measure to 'Whole number'.

Task 5 – Record numerical data in the dashboard

Use ‘Card’ visuals to display the following key measures relevant to the analysis and format according to the following specifications.

[Total Manufacturing Costs] – change the name to display on the visual as ‘Total Costs’.
[Cost($) from TopN% Products] – change the name to display on visual as ‘Cost ($) of Top N’.
[Cost(%) from TopN% Products] – change the name to display on visual as ‘Cost (%) of Top N’.
[No. of Selected Products from TopN%] - change the name to display on visual as ‘Top N Products’.

Visual formatting specifications:

Format each of these card visuals as follows:

Visual >Category label: Off
General > Title: On
- Title text: Display as specified in ‘Name to display on visual’ column.
- Font: Segoe UI, Size: 22
General > Effects > Background colour: #CCCCCC

Important: Ensure that the numerical values displayed on the ‘Card’ visuals are in their correct formats (e.g. currency, percentage or whole number).

Task 6 – Visualise sorted data in a table

Create a summary table using the following specifications.

Table specifications:

Table Title: Summary Table
Table columns:
- ‘Financials’[Product ID]
- ‘Key Measures’[Total Manufacturing Costs]
  - Note: Change the name of this field for this visual as ‘Total Cost’
- ‘Key Measures’[Product-Cost Cumulative Percentage]
  - Note: Change the name of this field for this visual as ‘Product-Cost Cumulative %’
- ‘Key Measures’[Product Rank]
  - Note: Change the name of this field for this visual as ‘Ranking’
Table format: Set the visual > Style presents to ‘Bold header’.
Sort the table by the ‘Total Cost’ column in descending order.

CHECKPOINT

The following screenshot demonstrates how the summary table is configured to display data from the required fields.

Screenshot of summary table created using Power BI Desktop © Microsoft

Task 7 – Create the Pareto chart

Create a Pareto chart to showcase the ‘Product’ impact on ‘Profit’, using the necessary visualisations, data fields and previously created calculated measures according to the following specifications.

Chart specifications:

Chart type: Line and stacked column chart
X-axis: ‘Financials’[Product ID]
Column y-axis: ‘Key Measures’ [Total Manufacturing Costs]
- Note: Change the name of this field for this visual to ‘Total Cost’
Line y-axis: ‘Key Measures’ [Product-Cost Cumulative Percentage]
- Note: Change the name of this field for this visual to ‘Product Cost. Cumulative %’
Column legend: ‘Financials’[Cost Outliers/Non-outliers]
- Note: Change the name of this field for this visual to ‘Cost data’
Chart title: ‘Pareto Chart (Product impact on Revenue)’
Data labels: 'On'

CHECKPOINT

The following screenshot demonstrates how the Pareto chart is configured and formatted to display data from the required fields.

Screenshot of Pareto Chart created using Power BI Desktop © Microsoft

Task 8 - Create a scatter chart

Create a ‘Scatter’ chart to show the relationship between [Product Rank] vs. [Product-Cost Cumulative Percentage] according to the following specifications.

Chart Specifications:

Chart type: Scatter chart
Values: ‘Product’[Product ID]
X-axis: ‘Key Measures’[Product Rank]
Y-axis: ‘Key Measures’[Product-Cost Cumulative Percentage] o Note: Change the name of this field for this visual as ‘Product-Cost. Cumulative %’
Legend: ‘Financials’[Outliers/Non-outliers] o Note: Change the name of this field for this visual as ‘Cost data’
Chart title: ‘Scatter Chart (Product Rank vs. Product-Cost Cumulative %)’

CHECKPOINT

The following screenshot demonstrates how the Scatter chart is configured and formatted to display data from the required fields.

Screenshot of Scatter Chart created using Power BI Desktop © Microsoft

Task 9 - Contextualise the report.

Report context: The management wants to only focus on data from the last year (assume that this is 2021 in the given dataset).

Task: Use the ‘Select year:’ slicer to select only the required year.

Task 10 - Generate summaries from the chart visuals

Generate a summary (smart narrative) of the data presented in the ‘Pareto chart’ visual for the ‘Cost’ data. To do this:

Step 1: Right-click on the ‘Pareto chart’ visual and select ‘Summarise’.
Step 2: Place this visual below the multi-row card visual, above the ‘Pareto chart’.

You can also use the scatter chart to generate a summary. Then, copy and paste the auto-generated text into the first smart narrative visual to combine this information.

Format Specifications: Change the background colour of the smart narrative visual to ‘#E8EBE7’. (Hint: General > Effects). Apply standard formatting for the titles of visuals.

Hint:

Option 1: Follow similar steps as Practical activity 3 > Task 1 > Step 5.
Option 2: Use the 'Format Painter' to apply standard formatting to the visual from other visuals in the report. . Detailed steps are outlined in Practical activity 3 > Task 2 > Step 3

CHECKPOINT

The dashboard with its default settings should look like the following. We will further learn to use this dashboard to conduct a Pareto Analysis and relate it to operational decision making requirements in the next section of this topic

Screenshot of the basic Pareto Analysis Dashboard created using Power BI Desktop © Microsoft

Identifying key insights and providing recommendations

Man working on data dashboard on computer

Ethical considerations when reporting analysis results

Ethical considerations arise when deciding which results to include in a report. Presentations of key outcomes and results of the analysis should report results in a fair, objective and neutral manner.

“Unethical behaviour occurs when one selectively fails to report pertinent findings that are detrimental to the support of a particular position”. ¹⁷

Therefore, it is important that analysts should document both good and bad results. In situations where the outliers detected are genuine and are not due to errors, analysts should develop reports to present the analytic results with outliers as well as without outliers.

Let us implement this using the ‘Pareto Analysis’ dashboard created in the previous practical.

Writing summaries of key insights

When writing a summary based on a completed analysis report or dashboard, the general process is as follows.

First, describe the context of the analysis.
Then, outline the numerical data that supports the operational decision-making.
- State what percentage of causes (e.g. products, customers) resulted in what percentage of the total value (e.g. sales, profit, costs, margin).
- State outcomes with the effects of outliers and without the effects of outliers.

Use data storytelling to provide recommendations

When providing recommendations, it is important to apply logical thinking and consider aspects of the business operations, which will greatly differ in each analysis.

Analysts basically need to tell a story using the data whilst applying context and relating to the problem.

When writing a summary of the recommendations based on analysis outcomes, the general process is as follows.

First, briefly outline the operational decision-making requirement.
Then, state your recommendation providing any numerical figures relevant to the analysis.
Finally, provide logical reasoning behind the recommendation (may include a comparison of numerical figures to convince the reader).

Practical activity 9: Using the Pareto Rule to support operational decision-making

Let us now test the Pareto rule using the 'Pareto Analysis Dashboard' created in the previous practical.

Task 1 – Adjust the parameters in the dashboard

To test the Pareto rule, we need to use the numeric range parameter ‘Top N %’ to vary the selection of the top portion of products and evaluate the ‘Cost(%) of Top N’ value displayed in the card visual.

Open the Power BI Desktop file from the previous practical activity 'Demo6 -Pareto Analysis', and do the following.

Increase the ‘Top N%’ to approximately 20% and evaluate the value displayed in the ‘Cost(%) of Top N’ card visual.
Increase the ‘Top N%’ value until the ‘Cost(%) of Top N’ displays a value close to 80%.

The following screenshot shows the analysis parameter selection.

Screenshot of the Pareto Analysis Dashboard contextualised for 2021, including all Cost data in Power BI Desktop © Microsoft

As you can see from the analysis, for all 2021 data from the 'Financials' dataset, the top 32.2% of products (that is a total of 103 items) are contributing to 80.05% of the total manufacturing costs.

Industry insights

The analysis will generally help identify products of interest from a grouping of products, customers and so on that relate to a large part of the value (e.g. revenue, profit, costs) of the whole.

Task 2 – Change the context of the report (e.g. without outliers)

Let us evaluate this further by doing the following:

De-select ‘Outliers’ from the ‘Select Cost data:’ slicer to exclude the outliers from the cost data.
Adjust the ‘Top N%’ value until the ‘Cost(%) of Top N’ displays a value close to 80%.

The following screenshot shows the new analysis parameter selection.

Screenshot of the Pareto Analysis Dashboard contextualised for 2021, excluding outliers from the Cost data in Power BI Desktop © Microsoft

The revised analysis shows that after excluding any outliers in the data, the top 34.8% of products (that accounts for 110 items) are contributing to the 80.11% of total manufacturing costs.

Industry insights

The 80/20 rule is not a hard-and-fast rule. Therefore, the outcome when analysed may not always be 80/20 exactly, as you can see in the above analysis. It is the concept behind the rule that matters.

Task 3 – Another approach (e.g. including outliers whilst ensuring its impact on the decision is minimal)

Further to this observation, keeping the revised Top N as 34.8% let us, re-select the outliers into the analysis to see what effect it would have.

Screenshot of the Pareto Analysis Dashboard contextualised for 2021 to include all Cost data with minimal outlier impact using Power BI Desktop © Microsoft

This revised analysis will then indicate, whilst ensuring the effects of the outliers are minimal if we still need to include all data in the analysis, the analysis represents the following information.

Including outliers in the data whilst maintaining the top N as 34.8% (which accounts for 111 products) shows that the Top 111 products contribute to 82.24 % of total manufacturing costs.

Industry insights

Whilst the analysis may point to certain outcomes, the recommendations made regarding the analysis may depend greatly on other considerations and what the specific analysis is trying to determine.

Task 4 – Write a summary of key insights from the analysis to support operational decision-making

Based on the completed ‘Top N Product Costs: Pareto Analysis’ dashboard page write a paragraph summarising the key findings of the analysis.

Hint:

Remember to follow the general process when writing summaries of key insights which is:

First, describe the context of the analysis.
Then, outline the numerical data that supports the operational decision making.
- State what percentage of causes (e.g. products, customers) resulted in what percentage of the total value (e.g. sales, profit, costs, margin).
- State outcomes with the effects of outliers and without the effects of outliers.

Answer:

Based on the 2021 data (without considering the effects of outliers), a total of 103 products (32.2% Top N) contributed to 80.05% of the total manufacturing costs, which is equivalent to $169,650.

When outliers are excluded from the analysis, a total of 110 products (34.8% Top N)) contributed to 80.11% of the total manufacturing costs which is equivalent to a value of $145,740.

However, keeping the Top N % at 34.8% and including the outliers in the data shows that there are 111 top N products that contribute to 82.24% of the total manufacturing costs, equivalent to a value of $174,290.

Task 5 - Highlighting key insights and figures related to operational decision-making

When providing recommendations, it is important to highlight the decision points and important numerical figures using the visuals in the dashboard. For example, let’s highlight the 80% mark of the product-cost cumulative value and the top N number of products in the scatter chart using constant lines.

Step 1: Add a ‘Y-axis’ constant line to indicate the 80% mark of the cumulative product-cost value in the chart.

Hint:

Select the ‘Further analysis’ icon (i.e. the third main icon/tab) in the ‘Visualisations’ pane.
Expand the ‘Y-Axis Constant Line’ > click on ‘Add line’. In this new line added, configure the following:
- Value: 0.8, Color : #DE6A73, Transparency: 0%
- Data label: turned ‘On', Colour: #DE6A73

The following screenshot demonstrates how the Y-axis constant line is added to the chart and its format options.

Screenshot of the Y-axis constant line added to the chart using Power BI Desktop © Microsoft

Step 2: Add an ‘X-axis’ constant line to indicate the range of items to be selected for production cost optimisation against the ‘Product Rank’ in the scatter chart.

Hint:

Expand the ‘X-Axis Constant Line’ > click on ‘Add line’. In this new line added, configure the following: -

Value: 110, Color: #0A7411, Transparency: 20%
Data label: turned ‘On’, Colour: #0A7411

The following screenshot demonstrates how the X-axis constant line is added to the chart and its format options.

Screenshot of the X-axis constant line added to the chart using Power BI Desktop © Microsoft

CHECKPOINT

Once these steps are done correctly, you should get the following result as shown in the screenshot.

Screenshot of the completed Pareto Analysis Dashboard using Power BI Desktop © Microsoft

Task 6 – Outline recommendations

Write a paragraph outlining your recommendations on the operational decision based on the insights from the analysis.

Hint:

Remember to follow the general process when providing recommendations for operational decision-making.

First, briefly outline the operational decision-making requirement.
Then, state your recommendation providing any numerical figures relevant to the analysis.
Finally, provide logical reasoning behind the recommendation (may include a comparison of numerical figures to convince the reader).

Answer:

To decide on a percentage of products (or a selection of products) to apply cost reduction strategies in manufacturing, consider the following recommendations.

It is not advisable to select only the 103 products (Top 32.2%) as this result is impacted by the effects of outliers in the dataset.
Selecting 110 items (the top 34.8%) of products is recommended as these products have the most contribution to manufacturing costs and the analysis is not affected by outliers.
However, if all data (including outliers) is preferred to be included in the analysis, it is advisable to select 111 items (still maintaining the top N percentage as 34.8%) as the effect from outliers in the data is then minimal.