Title: Creating a Data Governance Program
Presenter: Mike Chapple, University of Notre Dame
This presentation was one of those EDUCAUSE decided to webcast. Primarily focused on events of last year, but will cover some things done over the last 5 – 10 years.
It All Starts with a Story…
One day, the President was wondering…how many students do we have?
Naturally, a lot of potential answers depending on who you ask.
SLIDE: how Notre Dame views data governance, using a building to illustrate
Access to Data (Roof)
- Quality & Consistency (current focus)
- Policies & Standards (current focus)
- Security & Privacy
- Compliance
- Retention & Archiving
Technology (Foundation
Data Driven Decision Making (D3M) = Business Intelligence (as it’s known everywhere else)
- Definitions need to be agreed upon (i.e. – what is a student)
SLIDE: Governance Model
- Executive Sponsors (EVP, CIO)
- Campus Data Steward
- Unit Data Stewards
- Coordinating Committees (Info Governance Committee, D3M Steering Committee)
SLIDE: Domain Objectives
- Data Steward(s) appointment
- Data definitions and business rules
- Data quality guidelines and monitoring process
- Regulatory compliance plan (if applicable)
SLIDE: Building Data Dictionary
- Term, i.e. “Active Student”
- Definition: PLAIN ENGLISH DEFINITION
- Source System, i.e. Banner
- Source Detail, i.e. SQL query which explains gory details of how you get the data
SLIDE: Data Definition Components
- Definition
- Source System / Detail
- Possible Values
- Data Steward
- Data Availability
- Classification
SLIDE: Start with Executive Support
This is pretty much an admonition; it really helps. At Notre Dame, responsibility for this campus function landed with IT.
SLIDE: Identify and Involve Stakeholders
Each item to be defined takes a meeting…it’s very time-consuming because you need to have representation from each area. Data is owned by the university, not specific departments!
Notre Dame uses a “RACI” matrix for each defined term
R – responsible (office)
A – accountable (who keeps the group on-track)
C – consult (you have a seat at the table)
I – inform (people who need to know)
The matrix is distributed to all stakeholders so they can fill it in with their preferences.
SLIDE: Reconcile Differences Visually
ND had two competing student numbers: “Registrar Count” and “IR Count”
IR count = Externally reportable enrolled student
“Registrar Students” includes some folks like students on leave, zero credit students, etc.
Use a stacked bar, starting with externally reportable enrolled students, adding registrar student populations on top of that.
SLIDE: Give the group a Starting Point
- Start with a draft
- Counting matters! Definitions help address this possible problem.
- Don’t use Jargon!
Security and Privacy
Risk-based security program
- Highly Sensitive (SSNs, CCs, Driver’s Licenses, Bank Accounts, ePHI)
- Sensitive (Everything else)
- Internal Information (info that would cause minimal damage is disclosed)
Compliance
We have to be responsive to new legal realities, since our campuses are like small cities and any law passed probably affects some area on our campus.
All data must be auditable.
- 75% of orgs have at least one person dedicated to IT compliance
- 76% of orgs have a corporate executive-level compliance office or council
Build compliance plans
- Document everything with respect to regulations, i.e. HIPAA
- Everything should be substantiated
Questions
With so many stakeholders, how did you address and resolve differences in data definitions? We didn’t really have very many of those disagreements, because each area was represented in each set of meetings, and there was a solid bond among the reps from each area.
What do you do with data NOT in the data warehouse? You just have to find some way to “chunk the work out.” The output of the program must be pristine, so naturally priorities must be set.
Did ND work with IU, since most of this is the same? No.
What tools are you using to manage metadata? Google Docs for now, great for getting started, but it’s not conducive to long-term maintenance. We’re actually building our own graph database. This tool will ultimately expose this data for other tools.
Any principle for prioritization? Steering committees prioritize based on BI needs of the institution.
Is there an ongoing need for a campus data steward versus a department data steward? In some areas, the data is general or applies to many different populations. Campus steward plays an important coordination role.
Do you consider your work the beginning of a master data management program? Yes!
Do you see shadow systems as being a problem? We’re not really far enough along to have experienced this problem yet. Data is not widely available yet. We refer to this phase “taking it from the team to the enterprise.”
This is for administrative data, yes? Yes, it does NOT include research data.