Data Governance with MFour Studio

Understanding MFour’s Behavior Data Lifecycle from Capture to Consumption.

The Importance of Data Governance.

Data governance encompasses activities to ensure data is secure, private, accurate, available, and usable. It includes the MFour data stewards responsibilities, processes, and the technology supporting them throughout the data life cycle.

Why It's Important

Data governance ensures users engage accurate, high quality information in the right context, without having to wrangle information into a consumption ready form.  

Architecture Matters

The key to good data governance is trusting the end product while delivering transparency and lineage of the rules and enforcement.  MFour has invested heavily in a platform and architecture that enables this.  Tenants of the architecture include immutability (delete nothing), lineage (see what changed, where & why) and product discretion.  

This is done by creating data zones (Bronze, Silver, Gold) that allow for the preservation and synchronous and asynchronous curation of the data while maintaining compliance.

A Bit About the Zones

Bronze - This layer represents where data is landed.  Whether it's structured, semi-structured or unstructured, the data is landed and forever kept in its original state.

Silver - This layer is where we start to normalize the data.  In addition to mapping all data to a common structure, data quality rules are applied that represent the minimum standard for presentation.  As an example, if a location record in bronze has a departure time that was recorded before the arrival time, we would reject that record.

Gold - This is the customer interaction layer.  Gold allows the data management team to flex data into a structure that fits the product use case, since different products may require different business rules.  

The same data can be different across product tiers and still be correct if the product context differs.  

Historically this created two problems.  1) Technology would not allow for this because everyone relied on a centralized data warehouse and you could not physically store two versions of the same record. 2) In the event you were able to store the record more than once, it was hard to surface why two versions existed and provide insight into the data lineage.  

Because everything flows linearly from Silver, you are able to create an exponential number of product views while maintaining data lineage.

The Rules

A very important word we use is “quarantine”.  We quarantine data which means we isolate, but don’t delete.  Data is quarantined because it may hold future value and be required. 

What we restrict:

Location Traffic - We quarantine location events that fit under the following conditions:

  • Malformed Records - Records generated at the device level that may contain incomplete or conflicting data points
  • Low Confidence - Statistical methods applied rule out interactions likely to be the result of “walk bys” or “near proximity” interaction 
  • Traffic Specific to a Gig Economy Worker - Combining our app+web data, we are able to restrict visitation representation to locations likely resulting from Gig Economy activity 
  • Employee/Work - Statistical methods are applied to rule out frequent interactions with locations likely driven by work or business patterns
  • Homes+Residents - Removed from data set 

Apps - We restrict Apps shown that fall under these categories:

  • Not Safe for Work - language, pornography 
  • Market Research - sites such as SurveysontheGo.com, and other consumer panel sites.
  • Micro Income -apps that promote compensation or rewards for certain behaviors
  • Gig Economy - consumer behaviors done on behalf of another consumer via a service like Instacart, Grubhub or Uber.

Websites - We restrict Websites shown that fall under these categories:

  • Not Safe for Work - language, pornography 
  • Market Research - sites such as SurveysontheGo.com, and other consumer panel sites.
  • Micro Income - websites that promote compensation or rewards for certain behaviors
  • Gig Economy - consumer behaviors done on behalf of another consumer via a service like Instacart, Grubhub or Uber.
  • App Redirects - Sites used for as proxies or for cookie purposes, which often have an outsides visitation record, without user awareness

Other Data Governance Efforts

App Naming 

We keep a table that tracks “preferred app name” with a mapping to “captured app name”.  For various reasons, the same app may have different app names within the same ecosystem (iOS or Android) and different app names across ecosystems (iOS and Android)

Every 90 days, we review the top (by event volume) 2500 iOS and top 2500 Android apps to ensure naming consistency.  This ensures that when you look at total traffic for Youtube, you are seeing the correct total. 

Example uncleansed app names

youtube

youtube: watch, listen, stream

California Consumer Privacy Act and other Privacy Requirements

MFour Studio data is consistently kept in compliance with state and federal privacy laws. 

Was this article helpful?
3 out of 3 found this helpful

Comments

0 comments

Please sign in to leave a comment.

Articles in this section