Upload
zubin67
View
389
Download
0
Tags:
Embed Size (px)
Citation preview
Software Group | IBM Israel Software Laboratories SOA Advanced Technologies
© 2007 IBM Corporation
Joshua FoxRegulatory Compliance through Metadata Mining
Software Group | Israel Software Labs
© 2007 IBM Corporation
What Does My IT System Mean?
Real World
Metadata
Software Group | Israel Software Labs
© 2007 IBM Corporation
Use Case: Security Marking A simplified example Security labeling has many drivers Focusing here on the semantics
Software Group | Israel Software Labs
© 2007 IBM Corporation
Weaponization-related
Weaponization-related
Use Case: Security Marking
Not Weaponization-
related
Not Weaponization-
related
Not Weaponization-
related
Not Weaponization-
related
Not Weaponization-
related
Not Weaponization-
related
Not Weaponization-
related
Not Weaponization-
related
Weaponization-related
Software Group | Israel Software Labs
© 2007 IBM Corporation
Biotech Lab
A lab takes its first DoD contract Needs DIACAP approval; cannot risk non-compliance Needs to apply security markings for access control in the
Information Sharing Environment
Software Group | Israel Software Labs
© 2007 IBM Corporation
The Metadata
Metadata for structured (machine-read) data Database schemas Web service WSDLs COBOL copybooks UML & DoDAF Models
Software Group | Israel Software Labs
© 2007 IBM Corporation
Security Markings: Find Subject Find all info services in semantic
area of, e.g. “weaponization” Metadata Repository holds service
descriptions, database schemas, other metadata
Repository also holds standard categories from data dictionary
Tool proposes categorization Analyst uses this as input, saving
valuable manual-analysis time
Semantics
Metadata
Software Group | Israel Software Labs
© 2007 IBM Corporation
<>…<>
<>…<>
<>…<>
Historical MD Situation MD in small quantities Scattered in
DBA teams Development teams
Software Group | Israel Software Labs
© 2007 IBM Corporation
Background
Trends in leading-edge enterprises
Large,
cross-organization,
metadata repositories
Software Group | Israel Software Labs
© 2007 IBM Corporation
The Promise:
Governance across the organization,
but…
Software Group | Israel Software Labs
© 2007 IBM Corporation
Mess of Metadata
<xsd>
…
<xsd>
<xsd>
<xsd>
……
<xsd>
<xsd>
<xsd>
<xsd>
<xsd>
……
……
<xsd>
Software Group | Israel Software Labs
© 2007 IBM Corporation
Heterogeneity in Metadata Different technologies: XML,
RDB, UML Different structures and
terminologies
<xsd>…
<xsd>
<xsd>
<xsd>
……
<xsd>
<xsd>
<xsd>
<xsd>
<xsd>…
…
……
Software Group | Israel Software Labs
© 2007 IBM Corporation
Confused Semantics in Metadata
Tank?
Army
Navy
Software Group | Israel Software Labs
© 2007 IBM Corporation
Confused Semantics in Metadata
“Secure” NSA: No eavesdropping Air Force: Buy it Army: Guard the perimeter Marines: Storm it Navy: Lock the door, turn
off the lights
Software Group | Israel Software Labs
© 2007 IBM Corporation
Huge Quantities of Metadata
<xsd>…
<xsd>
<xsd>
<xsd>
……
<xsd>
<xsd>
<xsd>
<xsd>
<xsd>…
…
……
Software Group | Israel Software Labs
© 2007 IBM Corporation
Older Approaches
Build taxonomy/ontology Map it to the metadata
Metadata (e.g., XSD)
Ontology
Software Group | Israel Software Labs
© 2007 IBM Corporation
Older Approaches Don’t Work
Software Group | Israel Software Labs
© 2007 IBM Corporation
Older Approaches Don’t Work Painstaking human labor
Software Group | Israel Software Labs
© 2007 IBM Corporation
Older Approaches Don’t Work Painstaking human labor High-cost labor: IT+ business
knowledge
$$
$
$
Software Group | Israel Software Labs
© 2007 IBM Corporation
Older Approaches Don’t Work Painstaking human labor High-cost labor: IT+ business
knowledge: Consultants!
$$
$
$
Software Group | Israel Software Labs
© 2007 IBM Corporation
Older Approaches Don’t Work Painstaking human labor High-cost labor with IT+ business
knowledge: Consultants! Beyond human limits
$$
$
$
:-(
:-(
:-(
:-(
:-(
Software Group | Israel Software Labs
© 2007 IBM Corporation
New Opportunities Created By:
Moore’s Law Great progress in Data Mining
Searching, classifying and organizing
Recent innovative uses: Terrorist Threat Analysis Security, Web 2.0, Google
Software Group | Israel Software Labs
© 2007 IBM Corporation
The Time is Right
Well-known search and information-management techniques
Now, apply them to metadata
Software Group | Israel Software Labs
© 2007 IBM Corporation
Compliance
MetadataRepository
Functional Architecture
Persistence
Semi-automation ofmapping
Engine
BusinessFunctionality Access Reporting
Real-LifeMeaning
Ontology(AKA taxonomy, dictionary,
glossary, logical model, categories)
Mapping(ontology <->metadata)
Software Group | Israel Software Labs
© 2007 IBM Corporation
Methodology
1) Prepare Metadata2) Set up Categories3) Machine Learning 4) Suggest Category
Software Group | Israel Software Labs
© 2007 IBM Corporation
(1) Prepare Metadata
a) Load metadata into repository
b) Pre-process metadata into Text: e.g., “Deployment”, “Location” Structure: e.g., “Deployment:Location” to
represent Table and Column
Software Group | Israel Software Labs
© 2007 IBM Corporation
(2) Set up Categories
(AKA taxonomy, ontology, glossary, data dictionary, business model, domain model)
a) Follow Security Classification Guide
b) May use Community-of-Interest (CoI) vocabulary
c) Defense Discovery Metadata Standard for categories
d) Keep it simple!
Software Group | Israel Software Labs
© 2007 IBM Corporation
(3) Machine Learning
a) Training on a sample of metadata samples
b) Provide semantic category mappings for this sample
c) Standard Bayesian classification algorithms learn common or uncommon words in a category
Software Group | Israel Software Labs
© 2007 IBM Corporation
(4) Suggest Category for Metadata Item
a) Preprocess metadata
b) Submit to classification engine
c) Receive suggested category
d) Proceed with analysis
Cla
ssificatio
nE
ng
ine
Metadata
Analyst
Humans and machines complementing each other
Software Group | Israel Software Labs
© 2007 IBM Corporation
Understand Your IT: Use Cases Legacy Transformation: What business services are
hiding in your legacy applications? Reuse: Where is a service with this business
functionality? Fast Start for Community of Interest
Software Group | Israel Software Labs
© 2007 IBM Corporation
Non-Financial
Non-Financial
Non-Financial
Non-FinancialNon-Financial
Non-Financial
Non-Financial
Non-Financial
Financial
Financial
Non-Financial
Use Case: SOX Reporting
Software Group | Israel Software Labs
© 2007 IBM Corporation
SOX ComplianceReal World
Metadata
A Telco needs to comply with SOX to avoid penalties
Build reports from all info services with “financial” information
Metadata repository holds services, DB schemas, etc.
Tool proposes categorization Analyst can find relevant data sources
more quickly, then build report
Software Group | Israel Software Labs
© 2007 IBM Corporation
Why Mine the Metadata
Services: Invocation-level data is transient
Metadata already expresses semantics of the data
Metadata uncoupled from ever-changing data
Table: Troop_ Deployment
Column: Total
Troop_Deployment
… … … Total
154,650
25,390
Software Group | Israel Software Labs
© 2007 IBM Corporation
Mining the Metadata: More Secure Tool & human analyst do
not access actual data Human analyst can avoid
accessing even the metadata
Table: Troop_ Deployment
Column: Total
Troop_Deployment
… … … Total
154,650
25,390
Software Group | Israel Software Labs
© 2007 IBM Corporation
Data Mining Complements metadata
mining Build metadata from data Differentiate on the
resource level
Table: Deployment
Column: Location
Deployment
… … … Location
“DC LAN 1”
“Baghdad LAN 2”
Software Group | Israel Software Labs
© 2007 IBM Corporation
SimplicityMetadata Data
Structured Data Documents
Coarse-Grained Fine-Grained
Classification Search, metadata-internal relationships, transformation-building
Schema-to-Semantics Schema-to-Schema
Feasible Long-term Research
Reusable Functionality Specialized Functionality
Business Value Technical Value
Our focus
Software Group | Israel Software Labs
© 2007 IBM Corporation
SummaryReal World
Metadata
Too much metadata: humans need help Use your metadata repository Understand your metadata Identify relevant metadata Comply with regulations using IT
metadata Metadata mining: The time is right
Software Group | Israel Software Labs
© 2007 IBM Corporation
Joshua Fox
Metadata Analytics
Israel Software Labs
IBM
http://www.joshuafox.com
Thank you
Software Group | Israel Software Labs
© 2007 IBM Corporation