Sandbox: Tutorials
ABOUT THIS DOCUMENT: An in-depth, step-by-step guide on how to run various types of analysis on Sandbox
This section goes over how to run different types of analysis. Such tutorials include: pulling and analyzing data to further understand products and customers. Users can learn how to analyze customers’ interaction, group them into cohorts and observe their behaviors, and create a flexible report. Users can also analyze numerical --both discrete and continuous-- data to understand performance (e.g. year over year, latest weekday and previous weekday, etc)
Workbook Analysis
Find Latest Weekday and Previous Weekday
Current Weekday
Returns the Date of the current or most recent weekday (Monday through Friday). If the input date is a Monday, the function returns the input date. If the input date is a Sunday, the function returns the date of the Friday immediately previous.
DateAdd(“day”, (If(Weekday([DATE]) = 7, -1, Weekday([DATE]) = 1, -2, 0)), [DATE])
Previous Weekday
Returns the Date of the previous weekday (Monday through Friday). If the input date is a Monday, the function returns the date of the previous Friday.
DateAdd(“day”, (If(Weekday([DATE]) = 1, -2, Weekday([DATE]) = 2, -3, -1), [DATE])
Explanation
To the current or previous weekday, we use 3 functions: DateAdd, If, and Weekday. The full function to find the current weekday is:
DateAdd(“day”, (If(Weekday([DATE]) = 7, -1, Weekday([DATE]) = 1, -2, 0)), [DATE])
That’s a doozy of an equation. Let’s unpack it.
DateAdd
DateAdd adds time to a date. The set up for the function is DateAdd(<unit>,<amount>,<date>). Our Unit is “day”, the amount is add is our If statement, and the date column is [Date].
DateAdd(“day”, (If Weekday([DATE]) = 7, -1, Weekday([DATE]) = 1, -2, 0), [DATE])
If and Weekday
If returns a value based on whether logical conditions are met. The setup for the function is condition value pairs, which can be followed by an else statement. Our function has 2 condition values pairs, with an else statement. The full setup is:
If(<condition 1>, <value 1>, <condition 2>, <value 2>,<else>)
If(Weekday([DATE]) = 7, -1, Weekday([DATE]) = 1, -2, 0))
Both conditions use the Weekday function. Weekday returns the day of week for an input date, represented as the numbers 1 through 7, with 1 being Sunday and 7 being Saturday.
Condition 1 is Weekday([DATE]) = 7 which says to check if the day of week is 7 (aka Saturday). The value paired with condition 1 is -1. The condition value pair then says “If the Date is a Saturday, return -1”. This value feeds back into the DateAdd function, so that it subtracts one day from the input Date. Every time the date in the [DATE] column is a Saturday, the Friday before is returned.
Condition 2 does the same thing as Condition 1, but for Sundays. The condition value pair states Weekday([DATE]) = 1, -2 which says “if the Date is a a Sunday, return -2”. Subtracting two days from a Sunday gets you back to Friday.
The else condition at the end of the If statement handles all other cases. The If function returns the value 0 for all dates that aren’t a Saturday or Sunday. This then feeds the 0 into the DateAdd function, making no change to the date.
Calculating a Moving Average
A Moving Average is a function that is often used to filter out the noise of random price fluctuations and get a better idea of the trend. A moving average takes into account the value of the current row of data, as well as the surrounding rows. For example, if you have weekly sales data, a 4-week moving average would take into account the current week and the previous 3 weeks of data. In Sigma, we can set this up very easily.
Moving Averages are a type of Window Function. Windows functions acts on the data within the groupings they are placed in, and are affected by the order of the rows.
The first step to setting up a Moving Average is setting up your groupings and the sorting for your workbook. For an accurate Moving Average, the workbook must be uniquely sorted. Multicolumn Sorting Criteria and Groupings with no duplicate values are both ways to ensure a unique sort order. Once your data is grouped and sorted, you can accurately apply a Moving Average.
When Calculating a Moving Average:
- Create your Groups
- Create a unique sort order
- Enter your Moving Average formula
Looking again at the example of a four week moving average of weekly sales data, let’s work through the three steps. We want to see a four week moving average broken down by store and department.
Our first step is to group the data by Store Name, then create a second level grouping by Department. We then group by the Date column, and choose to truncate date by week. Then drag the Line Item Total column to the date group to automatically sum the Line Item Totals by week. Last, sort the Week of Date column ascending.
If you can't group by a column to ensure a unique sort order, you can click the gear icon to set up a more complex sort that takes into account multiple columns.
Now that the set up for the window function is done, we can calculate our Moving Average. Create a new column and enter in MovingAvg([Sum Line of Item Total], 3). This tells the function to look at the column [Sum of Line Item Total] (which shows our weekly sales) and average the current row with the three above it. This computes a four week moving average. If we wanted to sort our weekly data Descending, we could use MovingAvg([Sum of Line Item Total], 0, 3) to average the current row with the three rows below it. The function is confined by the groupings, and will only use the rows in its own group in calculating the averages. The result is a department level four week moving average for every department in every store.
Calculate Time Between Sales
We’re going to look at how to find the time between two orders. We’re starting with a List of Customer IDs, and a list of Invoice Dates. We’ll end with the data on the Workbook below.
Create a new Workbook and add a table element with your order information.
Time Between First and Most Recent Order
- Group by [Customer ID]
- Group by [Invoice Date]
- Calculate the date of the first invoice. The Min function is an aggregate that returns the lowest value of a column. The lowest value date is the earliest date in a list. To find the Min, add a new column next to [Customer ID] and use the formula Min([Invoice Date]). Rename the new column ‘First Order’.
- Calculate the date of the latest invoice. The Max function is an aggregate that returns the highest value of a column. The highest value date is the date furthest forward in time. In a list of invoices, dates that are all in the past, Max returns the most recent date. To find the Max, add a new column next to [Customer ID] and use the formula Max([Invoice Date]) Rename the new column ‘Latest Order’.
- Find the number of days between orders. Using the function DateDiff, we can find the difference between two dates. Create a new column next to [Customer ID] and enter the formula DateDiff(“day”, [First Order], [Latest Order]). This indicates that we want the results to be reported in days, that [First Order] is the first date and [Latest Order] is the second date, and that we want to know the difference between them. The result is the number of days between the first order and the latest order.
TIP: Max and Min are both aggregate functions, which means that the results depend on the grouping of the table. To see the data in different groupings, you can create a child element in your workbook. The parent table will be used as a data source for the child element, and you can use different groupings without affecting the calculations in the parent table.
Time Between First and Second Order
- Group by [Customer ID]
- Group by [Invoice Date]
- Sort [Invoice Date] Ascending, so that the first invoice date is on top and the latest on the bottom.
- Calculate the date of the first invoice. The Min function is an aggregate that returns the lowest value of a column. The lowest value date is going to be the earliest date in a list. To find the Min, add a new column next to [Customer ID] and type Min([Invoice Date]). Rename the new column ‘First Order’.
- Calculate the date of the second invoice. To find the date of the second invoice, we’ll use Nth which is a window function that returns the value of the Nth row of a group or column. We have our table grouped by the invoice date, which is sorted ascending. Therefore, we know that the date of the second order will always be in the second row. Create a new column next to [Customer ID] and enter the formula Nth([Invoice Date], 2). This will return the second row of [Invoice Date] for all of the groups defined by [Customer ID].
- Find the number of days between orders. Using the function DateDiff, we can find the difference between two dates. Create a new column next to [Customer ID] and enter the formula DateDiff(“day”, [First Order], [Second Order]). This indicates that we want the results to be reported in days, that [First Order] is the first date and [Second Order] is the second date, and that we want to know the difference between them. The result is the number of days between the first and second order.
TIP: The results of the functions Min and Nth depend on the grouping and sorting of the table. To see the data in different groupings, you can create a child element. The parent table will be used as a data source for the child element, and you can use different groupings without affecting the calculations in the parent table.
Calculate Week Over Week Numbers
Once you calculate your weekly numbers, calculating week over week numbers becomes very simple when using the Lag or Lead functions. Lag and Lead are Window Functions that shift all of the rows in a column. The key to using Lag and Lead properly is to ensure that your data is sorted in a unique order, with no “ties” in the sort order. You can click the gear icon in the column menu to set up multi column sort criteria to get a unique sort order.
Calculate Week over Week numbers using Lag
- Group your data by your date column. Open the column menu and choose Truncate Date > Week.
- In the left panel, drag the column you would like to sum up into the group defined by [Week of Date]. For example, if your sales numbers were in the column [Sales Amount] you would drag [Sales Amount] under the [Week of Date] column in the left panel. Sigma will automatically apply the formula Sum to [Sales Amount]. Rename the column "Weekly Sales".
OR
Create a column next to [Week of Date] in the table. Enter the formula Sum([Sales Amount]) in the formula bar. Rename your column "Weekly Sales".
- Sort weekly sales data by date in Ascending order, ensuring a unique sort order.
- Once you have your data sorted, create a new column next to you weekly sales column and enter in [Weekly Sales]/ Lag ([Weekly Sales])
- Change the column format to percent.
Calculate Week over Week numbers using Lead
- Group your data by your date column. Open the column menu and choose Truncate Date > Week.
- In the left panel, drag the column you would like to sum up into the group defined by [Week of Date]. For example, if your sales numbers were in the column [Sales Amount] you would drag [Sales Amount] under the [Week of Date] column in the left panel. Sigma will automatically apply the formula Sum to [Sales Amount]. Rename the column "Weekly Sales".
OR
Create a column next to [Week of Date] in the table. Enter the formula Sum([Sales Amount]) in the formula bar. Rename your column "Weekly Sales".
- Sort weekly sales data by date in Descending order, ensuring a unique sort order.
- Once you have your data sorted, create a new column next to you weekly sales column and enter in [Weekly Sales]/ Lead ([Weekly Sales])
- Change the column format to percent.
Create Flexible Reports with Parameters
Parameters are a powerful way to give viewers control of how data is calculated on a workbook. Parameters replace constant values in equations, allowing viewers to dynamically update inputs. They can be changed on the fly within a workbook.
What-if or user-input analysis lets users explore different scenarios, and parameters make setting it up easy. By changing the value of the parameter, multiple visualizations can be updated showing how different inputs affect the outcome. You can set up a parameter for expected increase in sales and see how changing the expectation changes the sales forecast. A parameter with the assumed cost can be quickly updated as prices fluctuate. Multiple parameters can be used in the same equation, letting you set up flexible scenarios. The possibilities are endless.
A workbook showing segments or cohorts also benefits from parameters. Let users choose which segment to focus one, and all the charts on the workbook can update to show that particular segment of the total population.
The usefulness of parameters goes beyond these specific examples though. They can be used any time you want to be able to dynamically update values that affect multiple equations in a spreadsheet. Parameters can be text, number, or dates, and can be open input or a defined list.
Parameters are defined in a control element. Once defined, they can be used in formulas just like column names are.
Once you get comfortable with parameters, you think of dozens of ways to use them.
Finding User Sessions
The concept of a “user session” is central to many product’s analytics. The number of actions someone takes, how long someone is spending in the product, what actions lead to others. It’s all based around the concept of “user sessions”.
But what is a user session?
The answer depends. Some applications will define the end of a user session as the application being closed. Chat apps may have a user opening and closing the application several times but still have it all be in one user session. Lack of actions may define user sessions, even if the application is technically open the whole time.
Some tools will automatically define a user session for you, but there are cases that you’ll want to use raw event data to define your user sessions.
What is a user session? A period of uninterrupted activity from a single user.
This may seem like a straightforward definition, but it actually has a lot of variables. What is a user? Is it someone logged in, anyone on a page?
What is uninterrupted activity? If someone has 5 minutes between activity, is that a new session? 10 minutes?If someone is inactive for 5 minutes, and then goes to a new page is that different than if they take action on the same page?
Answers differ from product to product, and may differ for the same product depending on what answers you are looking for. That’s why there is benefit in being able to define a user session manually, and have the logic of that easily exposed.
Define a User Session
We are going to define our users based on a unique identifier, in this case their email. Using Sigma parameters, we’ll set up a variable to determine the time between actions that will count as a new sessions.
Create a new parameter. Name it [Minimum Time Between Sessions (m)] and set the value at 10. You'll be able to easily update this number at any time.
Find the start of a user session
To do this analysis, your data needs two things: a unique user identifier, and time stamps for actions.
We’ll be looking at the gaps between actions for each user. When the gap is more than what we set as the minimum time between sessions, the next set of actions will count as a new session.
First, set up the data for analysis.
- Organize your data based on user. Create a new grouping defined by your unique user identifier, like user number or email.
- Organize the Timestamp data. Sort [Timestamp] ascending, which will put the first user events at the top of the list, and the most recent events at the bottom.
Now that our data is organized, we can start analyzing it.
Calculate the Time Between Each of the Events
Create a new Column to calculate the time between events. Call this [Time Between Events (m)]. Using the function DateDiff and the Window Function Lag we can calculate the time between the current event and the even right before.
DateDiff(“minute”, Lag([Timestamp]), [Timestamp])
Minute indicates we want the difference between the two dates to be reported in minutes. Lag([Timestamp]) puts the previous row’s Timestamp as the first date in the DateDiff function. [Timestamp] puts the current row’s timestamp as the ending date in the DateDiff function. This will give us the time in minutes between the current row’s timestamp and the previous row’s timestamp. The value for the first row of every group will be null because the Lag function is constrained to the windows defined in the User grouping.
Identify the Start of Each Session
Create a new column named [Start Session]. We will use the If function to determine which values to populated the new column with.
First we set up the logic that defines what counts as a new session. We’ll use our parameter [Minimum Time Between Sessions (m)] in the logic to determine if a timestamp is the beginning of a session. If the time between events is more than [Minimum Time Between Sessions (m)], we count that timestamp as the first of a new session.
The way we calculated the time between sessions means that the very first value for “time between sessions” will be null for each group. If the value is null, we print the value for [Timestamp].
If([Time Between Events (m)] > [Minimum Time Between Sessions (m)], [Timestamp], IsNull([Time Between Events (m)]) = True, [Timestamp])
This function will create a column where there is a value contained in every row that has a Timestamp that is the beginning of a user sessions, and a null value in all the other rows.
Map Each Event to Its Session
Now that we have a column with the timestamp values that start each session, we need to map the values to every event in the session.
Using FillDown we can easily print the the start of session time to every event in the user session. FillDown replaces a column’s null values with the closest prior non-null value. Create a new column [Start of Session].
FillDown([Start Session])
You can now group by [Start of Session] to organize your data into user sessions.
To assign each session a session number, use the RowNumber function. Create a new column [User Session Number] on the level that is grouped by [Start of Session].
RowNumber([Start of Session])
Calculate Average Session Length
Now that we have the start time of each session, we can calculate several different metrics.
To calculate the length of a session, we will find the difference between the first and the last timestamp in each session. Create a new column [Length of Sessions (s)] under the grouping defined by [Start of Session]. We’ll calculate the length in seconds.
DateDiff(“second”, [Start of Session], Max([Timestamp]))
Now that we know the length of each session, we can aggregate the [Length of Sessions (s)] column to find the average length for each user. Create a new column under the grouping defined by [User Email].
Avg([Length of Session (s)])
Once you create this Workbook table, you can build all types of analysis on top of it. You can create aggregates to count types of actions in a session. You can create cohorts of users based on the number of user sessions. Plus answers to any other questions you have.
Perform a Cohort Analysis
In this example, we’ll divide customers into cohorts based on the time to second sale. We’ll start with a list of customers, time of first order, and time to second sale, and we’ll end with the breakdown of cohorts shown below. You can follow the same steps to create cohorts based on any numeric metric.
- Determine your cohorts. You can use column details to get a high level overview of how your data is spread out. That can help you choose your initial cohorts. We are going to divide our customers into three cohorts: One week, One Month, Over a Month.
- Translate your categories into numbers that correspond to your data. Using the number of days between sales, we have three ranges that define our cohorts.
One Week: <=7
One Month: <=30
Over a Month: 30<
- Create your bins using the BinRange function. BinRange assigns data to user-defined bins. Each number in the BinRange function represents the inclusive lower limit of the bin. Read more about using BinRange to assign your data to categories.
Create a column called ‘Cohort Bins’ and use the function BinRange([First to Second Order], 8, 31). This creates three bins: Bin 1 is everything less than 8, Bin 2 is the number 8 and everything between 8 and 31, Bin 3 is everything 31 and greater. The outputs will be the number 1, 2 or 3.
This is a good time to check that the cohorts are being sorted the way you expect. You can spot check in the data, and use Column Details to check how the bins are distributed.
- If you like, you can now name your cohort bins using the Choose function. This helps keep track of what the bin numbers mean. With Choose, the value returned depends on the number inputted. When the input value is 1, Choose returns the first provided value, and so on. We will use the column ‘Cohort Bins’ as the input values, and provide a list of names that correspond to our bins.
- Create a new column named ‘Time to Sale’ and use the formula Choose([Cohort Bins], “One Week”, “One Month”, “Over a Month”). Check again to make sure the names of your bins are lining up to your cohorts correctly.
- Now that you have your bins, you can group your worksheet to see your cohort analysis. To see stats for your cohorts by month, group by month, and then by your cohorts. Use the column menu to select Truncate Date, then Month to create a column of dates that only includes the month and year data. Create a Group By with your new column. Next, create a Group By with [Cohort Bins] and [Time to Sale] as the keys.
- Let’s perform some cohort analysis! To count the number of customers in each cohort for each month, create a new column called ‘Count’ and use the formula Count().
To see the percentage in each cohort each month, we need to calculate the number in each cohort divided by the total customers. To find the Total number of customers each month, create a new column on the month level named ‘Total Monthly Customers’ and enter the formula Count(). Create a new column under the cohort level called ‘% Cohort’ and enter in the formula [Count] / [Total Monthly Customers]. Use the column menu, and change the format to Percent.
Once you have your cohorts assigned, you can group your data in several ways to get different views of your cohorts. You can assign several cohorts to the data, and try different analysis to reveal different trends in the data. The possibilities are unlimited!
Split Name from Domain on Emails
In this example, we'll walk you through a common use case of separating a name from a domain for emails. This is helpful for any company that is working with user data. This functionality is similar to 'Split text to columns' in Google Sheets or 'Left' and 'Right' functions in Excel. There are two main uses for splitting emails:
- Split for user name to run analysis on each user by name.
- Split for domain to get company name, you may then run analysis by each Organization. This is a very common use case for Account-Based Marketing, assigning domains to match accounts is made easy through this.
3. Select the drop-down menu on your email column. Click on 'Split Column' to open up the UI that separates the name from the email.
4. Split the email by "@". Select 'Custom' for your delimiter, type in the @ symbol, and make sure there are two split columns.
Once you have these split you can then group by user name or split the domain again to pull organization name.
Year Over Year Numbers by Month
Year over year data viewed by month can give us insights on how our business is doing while taking into account seasonality. In Sigma, you can calculate your data in just a few steps!
Group by Month
- Isolate the month portion of your dates. Create a new column next to your date column, and use the formula DatePart("month", [Date]) where [Date] is the name of your date column. This will give you a column of data with the month represented by the numbers 1 through 12. We'll call this column [Month of Date].
- Create a new grouping defined by [Month of Date]. You can do this in the left panel by dragging the column up to Groupings, or by clicking the + next to the Groupings heading.
- Sort the [Month of Date] column by ascending order.
Group by Year
- Isolate the year portion of your dates. Create a new column next to your date column, and use the formula DatePart("year", [Date]) where [Date] is the name of your date column. This will give you a column of data with just the year. We'll call this column [Year of Date].
- Create a new grouping defined by [Year of Date] underneath the grouping defined by [Month of Date]. You can do this in the left panel by dragging the column up to Groupings, or by clicking the + next to the Groupings heading.
- Sort the [Year of Date] by descending order, so the most recent year is on top.
Calculate your Monthly Sales
- In the left panel, add the column with your sales data to the Grouping defined by [Year of Date]. Sigma will automatically apply the aggregate Sum. This will give you all of the sales that happened in a specific month in that specific year.
- Rename your column to "Monthly Sales".
Calculate the Year over Year Data for the Month
- Create a new column in the [Year of Date] grouping.
- Use the formula Lead to find the previous month's sales. Use the formula Lead([Monthly Sales]). Rename the column to "Previous Month".
- Create a new column in the [Year of Date] grouping.
- Use the new [Previous Month] column to calculate the percent change. Use the formula ([Monthly Sales] - [Previous Month]) / [Previous Month]
- Change the column format to percent.
Your workbook table now shows your year over year data by month. The values are calculated via window functions, which means that it is affected by how the table is grouped and sorted. If you would like to see the year over year values in a different configuration, you should create a child element which will use your year over year table as a source.
Creating a child table turns all of the calculated values into static values, allowing you to rearrange the table without interrupting the window functions.
Comments
Please sign in to leave a comment.