MI School Data
May 2012
1
MI School Data – Functionality Overview• District/School
Summary Quick Facts Openings/Closings School data file
• Assessment and Accountability Dashboard and Report Card MEAP, MME, MI-Access, and ACT College Readiness Indicator (ACT scores) Students not tested report Assessment revised cut scores
• Student Graduation/Dropout Non-resident Report Student Count
• Staffing/Financial Educator Effectiveness
• Effectiveness Ratings (Principals only 2010/11) • Evaluation Factors
• Postsecondary Reports by High School/District Enrollment/Credit Accumulation Remedial Coursework
2
MI School Data – Current Work• Earliest Priorities:
Migration of Data for Student Success (D4SS) Dynamic Inquiries Additional dashboard metrics (Best Practices)
K-3 Pupil Teacher Ratio, General Fund Balance, Salaries, Days of Instruction Additional displays/reports from MSLDS data sources:
Pupil Attendance, Retention in Grade, Pupil Mobility Usability improvements
“Front Page,” Location Selection, “Sticky Settings” User Administration Improvements
• Early Childhood More stakeholder discussion required
• Additional K-12 Finance - Source: FID Staffing - Source: REP Special Education public reporting and data portrait queries Top to Bottom Listing of Schools
• Postsecondary Enrollment, Credit Accumulation, & Remediation - User interface
By High School By Institution of Higher Education
Requirements initiated for additional reports More stakeholder discussion required
• Workforce Reports Workforce supply/demand study
3
MI School Data – Improved Location Set
4
Improved Location Set Sort Order
5
Multiple Parameter Display
6
A New Home Page?
7
CEPI Data Quality Overview
May 2012
8
CEPI Data Quality – Overview
“YOUR DATA ARE NOT NECESSARILY WRONG!”
The goal of our data quality process is finding ANOMALIES, not ERRORS
An ERROR is: “a deviation from accuracy or correctness”
An ANOMALY is: “an odd, peculiar or strange condition, situation,
quality, etc.”(definitions from Dictionary.com)
9
CEPI Data Quality – Applications• CEPI has several data collection applications
The Michigan Student Data System (MSDS) Graduation and Dropout Application (GAD)Title I Supplemental Education Services (SES)The Financial Information Database (FID) The Educational Entity Master (EEM)The Registry of Educational Personnel (REP)The School Infrastructure Database (SID)
• We will be focusing primarily on the last three databases (REP, SID and EEM)
10
CEPI Data Quality – Applications
11
CEPI Data Quality – Collection Windows• Data are submitted for each of our CEPI
Applications during Collection Windows(except the EEM, which is always open for updates)
• REP has two collections per yearThe End-of-Year (EOY) REP collection is open
from April 1 through June 30The Fall REP collection is open from
September 1 through the first business day in December
• The SID collection is once a year from April 1 through June 30
12
CEPI Data Quality – Process• The data quality process is similar across the
applications in the School Data Quality unit• Data Quality runs are completed at three
points in the collectionBefore the collection opens (pre)During the collection (mid)After the collection closes (post)
• Started by checking 10-20 items in EOY 2007• Expanded to over 300 in the REP collection
alone for Fall 2011 13
CEPI Data Quality – PRE collection• Analyzes data from the PRIOR collection• Prior collection data cannot be modified in the
current collection window• Identifies data elements that can be improved
upon in the current collection• Each district’s authorized users are informed
of the findings via e-mail shortly after the collection period opens
• Identifies issues in the data structure and tables of the new collection cycle before they are an issue for the districts
14
CEPI Data Quality – MID collection• Snapshot of data submissions taken with
about one month left in the collection window• Identifies anomalies in the current collection• Each district’s authorized users are informed
of the findings via e-mail with time to modify the data before the end of the collection window
• Identifies issues in the data structure and tables periodically throughout the collection period 15
CEPI Data Quality – POST collection• Snapshot of data submissions taken
immediately after the close of the collection • Identifies anomalies in the current collection
now completed• Analysis is completed in about a week• Each district’s authorized users are informed
of the findings via e-mail• Data cleansing period takes place allowing the
authorized users to modify their data prior to it being used for reporting 16
CEPI Data Quality – What are we looking for?• System edit violations or table integrity issues• Data values that are anomalies
Values outside of the expected range, but that might not be ERRORS
Values that don’t match other data Interactions with other data collections Issues arising out of the whole of the
collectionComparisons to prior submissions
17
CEPI Data Quality – System Edits• The system of validates each record as it is
processed by the system• Ensure required fields are submitted• Ensure that the dependencies with other fields
are followed• Most of these system edits are also built into
the data quality process• Issues errors and warnings
Errors prevent the record from being savedWarnings allow the record to be saved, but
the data may need to be modified
18
CEPI Data Quality – System Edits• There are limitations to what the system can
validateCannot look at the submission as a wholeCannot look at the prior year’s submissionCannot have exceptions to the rulesCannot be as flexible as the data quality
process• Several of the items in the Data Quality
process have been turned into new system edits 19
SID DATA QUALITY 20
School Infrastructure Database
SID Data Quality – Basics
• Mostly looking for outliers• Issues with Shared Space Entities• Dual Enrollment data in high schools and only
in high schools• System Edit Checks
21
SID Data Quality – Scatter PlotsExamine scatter plots of the raw number submitted and the "rate" per student reported
22
SID Data Quality – Scatter Plots• Identify “outliers” based on different factors
• Too high of a number• A building with 4500 incidents of bullying
• Too high of a rate• A building with 300 students and 450
incidents of truancy• Some incident types will flag any value
reported as an outlier• Homicides• Drive-by shootings
23
SID Data Quality – Robbery Plot
24
SID Data Quality – Robbery Plot
25
These are the lines indicating
the outliers
SID Data Quality – Robbery Plot
26
This line indicates the minimum we
want to flag as an anomaly
SID Data Quality – Robbery Plot
27
The five circled points are what
have been identified as outliers and
feedback will be sent on them
REP DATA QUALITY 28
Report of Educational Personnel
REP Data Quality – Starting out• Started looking at data using Excel and Access• Focused on rules that could not be built into
the Application• Started with a dozen checks in EOY 2007• Grew to 25 checks in Fall of 2007• Continues to grow each collection• Examples:
• Suffixes in First or Middle Name• No Title IX Coordinator Submitted• Too many classes taught by a single teacher
29
REP Data Quality – Name Issues• Data Quality Checks built on name fields:
Titles in name fieldsoFirst Name of “Dr. Timothy”oLast name of “Smith, DDS”
Name changes Incorrectly submitted Suffixes First names incorporating “To the Estate of”Names of “Test Data” and other artificial
names used for testing purposes30
REP Data Quality – Date Issues• Data Quality Checks built on date fields:
Teachers that are too young Staff members that are too old Staff members that are hired too youngEnforcing the order of datesoBirth Date < Hire Date < Termination Date
Terminated records without a valid termination date
Credential Date issues31
REP Data Quality – Title IX Issues• Data Quality Checks built on Title IX
Coordinator submissions:No Title IX coordinator SubmittedTitle IX coordinator submitted with a full FTETitle IX coordinator submitted with a
terminated status and no other staff member assigned to that position
• Have developed over time
32
REP Data Quality – Current State• For Fall 2011:
• Over 300 Checks were run• Districts were notified about 48 different
issues• 1381 messages were sent out• 1058 different users of 540 districts received
data quality feedback
33
REP Data Quality – Near Future• Data Quality Checks are being added and
improved• Looking improving the following issues:
• Grade-Levels of Students submitted in MSDS• Accounting Function Codes and their use in
the FID• Data contained in the Michigan Online
Educator Certification System (MOECS)• Teacher-Student Data Link (TSDL) related
issues 34
EEM DATA QUALITY 35
Educational Entity Master
EEM Data Quality – Differences• EEM is different from the other collections in
that it does not have a window• Data quality is ongoing and periodic• Often checking for data that is not in the
correct format• A starting point for using our data profiling
tools
36
EEM Data Quality – Sample Issues• Issues between EEM and other applications
Grades for a student or teacherEducational Settings Lead Administrator issues
• System edits working• Physical Addresses that do not exist• Data profiling has allowed us to find issues in
the contents of the data where they might not be in a consistent form
37
EEM Data Quality – Profiling Finds• Fields that contain both the descriptive value
and the code value in the same fieldCounty records that contain both “Wayne”
and “81” referring to the same thing • Leading zeros or spaces in a text field
State entries of “_ _ _ _ MI”Congressional Districts of “1” “01” and “001”
• Zip Code formattingZip+four containing the dash or not?
• Capitalization inconsistencies38
CEPI DATA QUALITY 39
Questions and Answers