Data Governance - Notes
Data Governance:
Data Governance is Rules, Processes, Accountability around data.
Data Governance Goals:
Want the organization to use data in Routine Way.
Harmonized Data Sources
Access for the Right Roles
No Extra Access
Ownership of Data
Who is responsible for Data Accuracy
Who is responsible for being data managed.
Who is responsible for Regular Updates on the data.
Data Governance vs Data Management:
Data Governance outlines the overall structure that should exist, It has rules, processes and accountability.
It is more about What should happen, How should happen.
It is also about the use of data and making it useful. To make sure to implement the Quality data is accessible to the right people in an efficient way throughout the organization.
It means access for systems for certain people and understanding of “Who has access, Why they have access, What are they doing with the information that exists”
Who is involved?
Data Owner or Data Sponsor: These are the people that have ultimate decision-making ability about the data and have ultimate accountability for the data being correct and up-to-date.
In some organizations, there can be multiple data owners or sponsors for different data sets.
In larger organizations, there can be data stewards, subject matter experts and data champions. These are the people that are working more with the data. They will really understand the content of the data.
Larger organizations, they typically will have a data governance committee. This group is ultimately responsible for all the decisions that are made. This committee will resolve the conflicts with different groups. This committee will help in decisions on implementations that need to be standardized - how data is used or stored or accessed across the organization.
This committee will act as a central resource to make sure that data of different types isn't implemented in a lot of different ways. Different data is stored in one single system, but access to all data is not provided automatically to all people accessing from different applications.
Determining the Scope:
First scoping the most important and immediate need of the data, which needs to be part of Data Governance.
Once done with the first (ex. Client data), then moving on to the next one.
Document the Available Data:
What information is in the data?
Where does it come from?
What are the multiple sources for the same organization?
Who owns the data?
Who is the expert in it?
How often is it updated?
Who monitors it?
Who accesses the data?
What is it used for?
Start exploring the data that are available from multiple sources for the same data.
Sales Data = Client Data + Order Data + Product Data
Data Mapping:
Data Mapping tells us how data information in one of the sources or maps the data in another source and by combining to give a complete picture.
Order Data:
Order Number, Order Data, Customer Name, Customer ID, Quantity, Product Number, Price.
Customer Data:
Customer Name, Customer ID, Shipping Address, Billing Address
Other Relationship:
Customer Data:
Customer Name, Customer ID, Shipping Address, Billing Address
Order Data:
Order Number, Order Date, Customer Name, Customer ID, Quantity, Product Number, Price.
Product Data:
Product Number, Size, Color, Price.
MetaData:
It means information about the data.
What type of format should it take by default. What type of information is contained in it.
Example:
Order Data: Order Date -
Metadata will tell us “what format it is in” (MM/DD/YYYY)”
And It contains a Short Description -
Ex: Date of Order made by Customer, Date on which Order received.
Data Scraping:
Data Scraping is finding the missing information and automatically pulling the data to the missed field.
Data Quality:
Data Integrity: Sub Topic of Data Quality.
It’s not the overall quality of the data. It is about “How stable is our data”, “How frequently the data is updated”, “Can we trust it?”, “How it is updated”, “How can we know, it is not corrupted?”
Data Integrity = Accuracy + Validity + Consistency of Data - across the lifecycle.
It is also about checks in place, to make sure everything is continuously functioning as expected.
All of these lead to a set of rules, processes and policies that are applied across the business to make sure data is being used in good way throughout the organization.
Data Management:
Data Management is all about implementing all of those rules. It is to make sure, that governance is being followed.