Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Presented by Phil Lim, Sr. Product Manager, Galvanize
Analytics applications for
State Auditors
Phil LimSr Product Manager, Galvanize
Consulting & Product Management
Risk, Audit, Advisory, Compliance, Analytics, and Information Security
13 years ACL Experience
Computer Assisted Audit Tool
1986 (ACL)
Continuous Monitoring
2003 (CCM)
Audit Management
2011 (ACL Workpapers)
Risk & Compliance Management
2014 (ACL GRC)
Third Party Risk & Information Security2019 (Rsam)
What happened to
ACL? A journey…
ACL
Uniting audit, security, risk, & compliance
to strengthen organizations.
The operating system for conscious
organizations.
What is ACL being used for?Use cases & case studies
IT Controls Testing
Medicaid Unemployment
InsuranceReconciliations
Tax revenue assurance
T&E, PCardmonitoring
AP Cost Control
Use Cases for Data
Analytics
Visit
https://accounts.highbond.com/inspirations#byprocess
For more
Basic Commands in ACLdemo
Sorting & Filtering Benford & OutliersImporting & Exporting
dataLogs & Scripting
Basic Commands
Fuzzy Matching in ACL
Case sensitivity and special characters
Nicknames and titles “Joe vs Joeseph”
Corporate name prefixes/suffixes Incorporated, LLC, partners
Order of names First last vs Last, first
Many matches between two lists
Any others?
What are some
challenges matching
names from two
tables?
Matching a list of individuals against other
individuals or entities against other entities
Data cleansing using
Scripthub
All of these are available in ScriptHub:
• Case sensitivity
• Nicknames
• Standardizing corporate names
• Standardizing street names
• Sort words in field
Fuzzy Matching
Techniques
• Useful for employee-vendor matching, PEP list matching, identifying conflicts of interest…
• Levenshtein distance
• Dice co-efficient
• Other techniques
Levenstein distance
example
Cat > Hat 1 1/3 = 66%
Cat > Cats 1 1/3 = 66%
Smith > Smythe 2Smith > Smyth > Smythe
2/5 = 80%
Kitten > Sitting 3Kitten > Sitten > Sittin > Sitting
3/6 = 50%
What is the Levenshtein Distance from
Philadelphia to Vancouver? 11.
IT IS VERY SLOW
DICE COEFFICIENT is
about ngrams
n-gram length
“John Smith” n-grams “Smith, John D.” n-grams
2 Jo | oh | hn | n_ | _S | Sm | mi | it | th Sm | mi | it | th | h, | ,_ | _J | Jo | oh | hn | n_ | _D | D.
3 Joh | ohn | hn_ | n_S | _Sm | Smi | mit | ith
Smi | mit | ith | th, | h,_ | ,_J | _Jo | Joh | ohn | hn_ | n_D | _D.
DICE COEFFICIENT
examples
n-gram length
“John Smith” n-grams “Smith, John D.” n-grams Sharedn-grams
Dice’s Coefficient
1 J | o | h | n | _ | S | m | i | t | h(10 n-grams)
S | m | i | t | h | , | _ | J | o | h | n | _ | D | .(14 n-grams)
10 2x10 / (10+14) =0.8333
2(default)
Jo | oh | hn | n_ | _S | Sm | mi | it | th(9 n-grams)
Sm | mi | it | th | h, | ,_ | _J | Jo | oh | hn | n_ | _D | D.(13 n-grams)
8 2x8 / (9+13) = 0.7273
3 Joh | ohn | hn_ | n_S | _Sm | Smi | mit | ith(8 n-grams)
Smi | mit | ith | th, | h,_ | ,_J | _Jo | Joh | ohn | hn_ | n_D | _D.(12 n-grams)
6 2x6 / (8+12) = 0.6000
4 John | ohn_ | hn_S | n_Sm | _Smi | Smit | mith(7 n-grams)
Smit | mith | ith, | th,_ | h,_J | ,_Jo | _Joh | John | ohn_ | hn_D | n_D.(11 n-grams)
4 2x4 / (7+11) = 0.4444
Fuzzy Matching in ACLdemo
Attendance Check
Don’t let the “internal” in
internal audit limit your
universe of data.
US Government
Spending Data
https://www.usaspending.gov/#/do
wnload_center/custom_award_dat
a
Gain an understanding of how your
competitors are doing business
with the federal government
Hundreds of very rich data-points,
gigabytes of data
Federal Government
Spending Data
Sanction lists, incarceration lists,
Offshore data leaks
SAM list includes OFAC and other
sanctioned entities/individuals(sam.gov)
Industry specific sanctioned lists
like the LEIE for healthcare
providers
Incarcerated Individuals for
entitlement eligibility
Offshore (e.g. Paradise Papers)
data leak lists of potential risk of
shell/fraudulent entities(Google “icij offshore leaks”)
TERREMARK FEDERAL GROUP, INC. $961 M contract shared the same mailing address as a WorldCom office
International Consortium of Investigative Journalists
WORLDWIDE CORPORATE HOUSING, L.P.Contracts totaling $21M shared a mailing address with a business exposed in the offshore leaks: “Oakwood Worldwide Insurance Company”
International Consortium of Investigative Journalists
Consider this:
Would an invoice from
“Worldwide Corporate Housing”
raise any suspicions?
Optical character
recognition
I have thousands of scanned documents in various formats, structures, and styles
Generally, I want to know which documents have:
• Handwriting and parse it (or understand it as a signature)
• Numbers/dates and what their relative label is
• Checkboxes/circled options and what their relative label is
I want to know these to reconcile against my systems of record, or for regulatory compliance testing (e.g. ensure that all appropriate information is disclosed and initialed by the customer)
Unstructured Data
user story
Data to extract:
• Is it signed?
• Are there initials near the critical locations?
• Do the dates/amounts/percentages agree with the system of record?
• Are there any other markups/anomalies in the contract?
Google’s open-source
Tesseract OCR
Google has open-sourced a library for
applying optical-character recognition
https://opensource.google/projects/tesseract
Pre-process your data for best results
• Brightness/contrast/color
• Borders/orientation
ScriptHub script to do some of the
heavy-lifting:
https://scripts.highbond.com/#/scriptset/R_OCR_Import
Machine Learning for
journal entry testing
Predicting journal entry
classifications
Service Plan Vendor Name Date Amount Account Category
Basic Need BG Energy Solutions Limited 04/04/2018 $ 1,098.12 New Construction and Convs
Community Centres Northern Elevator Limited 04/05/2018 $ 38.00 Repairs and Maintenance
Local Area Teams Northern Elevator Limited 04/05/2018 $ 20.00 Repairs and Maintenance
Basic Need SS Systems Ltd 04/05/2018 $ 280.00 New Construction and Convs
MaintenanceEast Yorkshire Roofing Services Ltd 04/05/2018 $ 1,580.00 New Construction and Convs
Local Area Teams Pinacl Solutions Ltd 04/10/2018 $ 715.95 Repairs and Maintenance
Local Area Teams Pinacl Solutions Ltd 04/10/2018 $ 290.00 Repairs and Maintenance
Local Area Teams Pinacl Solutions Ltd 04/10/2018 $ 329.45 Repairs and Maintenance
Local Area Teams Pinacl Solutions Ltd 04/10/2018 $ 2.70 Repairs and Maintenance
Community Centres J Mark Construction Ltd 04/10/2018 $ 492.00 Repairs and Maintenance
TRAIN and PREDICT
1. Define a target data field.2. Prepare training dataset.3. Train a model to learn how to predict
that field.4. Use that model to predict that field
from new data.
Predicting journal entry
classifications
Service Plan Vendor Name Date Amount Account Category
Basic Need BG Energy Solutions Limited 04/04/2018 $ 1,098.12 New Construction and Convs
Community Centres Northern Elevator Limited 04/05/2018 $ 38.00 Repairs and Maintenance
Local Area Teams Northern Elevator Limited 04/05/2018 $ 20.00 Repairs and Maintenance
Basic Need SS Systems Ltd 04/05/2018 $ 280.00 New Construction and Convs
MaintenanceEast Yorkshire Roofing Services Ltd 04/05/2018 $ 1,580.00 New Construction and Convs
Local Area Teams Pinacl Solutions Ltd 04/10/2018 $ 715.95 Repairs and Maintenance
Local Area Teams Pinacl Solutions Ltd 04/10/2018 $ 290.00 Repairs and Maintenance
Local Area Teams Pinacl Solutions Ltd 04/10/2018 $ 329.45 Repairs and Maintenance
Local Area Teams Pinacl Solutions Ltd 04/10/2018 $ 2.70 Repairs and Maintenance
Community Centres J Mark Construction Ltd 04/10/2018 $ 492.00 Repairs and Maintenance
Predicting journal entry
classifications
Service Plan Vendor Name Date Amount Account Category
Basic Need BG Energy Solutions Limited 04/04/2018 $ 1,098.12 New Construction and Convs
Community Centres Northern Elevator Limited 04/05/2018 $ 38.00 Repairs and Maintenance
Local Area Teams Northern Elevator Limited 04/05/2018 $ 20.00 Repairs and Maintenance
Basic Need SS Systems Ltd 04/05/2018 $ 280.00 New Construction and Convs
MaintenanceEast Yorkshire Roofing Services Ltd 04/05/2018 $ 1,580.00 New Construction and Convs
Local Area Teams Pinacl Solutions Ltd 04/10/2018 $ 715.95 Repairs and Maintenance
Local Area Teams Pinacl Solutions Ltd 04/10/2018 $ 290.00 Repairs and Maintenance
Local Area Teams Pinacl Solutions Ltd 04/10/2018 $ 329.45 Repairs and Maintenance
Local Area Teams Pinacl Solutions Ltd 04/10/2018 $ 2.70 Repairs and Maintenance
Community Centres J Mark Construction Ltd 04/10/2018 $ 492.00 Repairs and Maintenance
Predicted Account Category
New Construction and Convs
Repairs and Maintenance
Repairs and Maintenance
New Construction and Convs
New Construction and Convs
New Construction and Convs
Repairs and Maintenance
Repairs and Maintenance
Repairs and Maintenance
New Construction and Convs
Confusion Matrix-ish…
Account CategoryPredicted Account
Category Total Count
New Construction and Convs New Construction and Convs 7022
Repairs and Maintenance Repairs and Maintenance 2315
Repairs and Maintenance New Construction and Convs 215
New Construction and Convs Repairs and Maintenance 98
Overfitting the problem
Better features
Department is important, and how much the
vendor is generally classified as New
Construction, as well as… GL postings on
Thursdays.
Predicting fraud
Predicting fraud
What are the data features you could use for this data-set?
Data features that can be derived from a
small set of fields
Consider data enrichment to add new data features, especially for business entities:
• # of employees
• Revenue
• Establishment date, etc.
Data feature engineering can be done to combine, split, or otherwise prep your data to be ready for training
Data Enrichment &
Data feature
engineering
Dozens of courses and CPE available to all ACL software users
in the Galvanize Academy
Demonstrate your expertise with ACDA certification
Cognitive (AI) Assets can be used for good or evil…
It’s our job to ensure that our AIs have morals.Are your organization’s AI assets in your audit universe?
Thank you.
Questions?