View
1
Download
0
Category
Preview:
Citation preview
Accelerate your Data Science and DataOps projects with IBM DataStage and Watson Knowledge Catalog
IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice and at IBM’s sole discretion.
Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision.
The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract.
The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.
Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.
Please Note
2
© 2019 IBM Corporation
3
COLLECT
ORGANIZE
ANALYZE
INFUSE
Introducing DataOps
AI
Analytics and AI at scale and speed
to drive
Operational efficiency
Data privacy & compliance
DataOps(DevOps for Data + Data Operations)
• A concept, like DevOps for Data, enabling collaboration between data consumer & data provider at speed & scale
• Automated data operations providing curated data pipeline with quality & governance
• Drives agility and innovation everywhere
People Process Technology
Watson Knowledge Catalog supports end-to-end DataOps
4
Data Governance Teams
Data Quality – Trust your data
Data Stewards & Data Quality
Analysts
Data Consumption – Use your data
Data Citizens
Data is useful only if its quality, content, and structure is well understood. Delivering reliable, quality, timely data for business consumption is a continuous process.
To set up the foundation of a DataOps program, organizations need to comply with regulatory requirements, communicate and enforce policies and standards, and manage metadata.
Enterprises need to surface business-ready data to consumers allowing them to deliver timely value to the business and make better decisions
Knowledge Catalog
Data Governance – Know your data
All capabilities in a single experience
5
Data Governance
Data Quality Data Consumption
Knowledge Catalog
Business Glossary
Policy Management
Policy Enforcement
Reference Data
ManagementData Lineage Classification
Self-Service
Data Prep
Social Collaboration
Data Discovery
Data Profiling & Analysis
Business Term Suggestions
Data Quality Issue
Detection
Machine Learning and Automation make Data Governance less invasive
Getting Started Quickly
Using a body of knowledge for CCPA, GDPR, and CECL, get term assignment recommendations to assets in the catalog.
Quickly create and assign a data class to clusters of similar columns using patent-protected Fingerprint algorithm.
Ingest a PDF and capture business terms and governance rules based on the document.
Profile data automatically and classify each column
Search across catalogs, projects and categories based on metadata and past searches.
Using historical business term assignments and business term relationships, get recommendations for business terms to assign to columns.
Based on past searches and what’s popular, see recommended data assets.
Data protection rules automatically restrict access and anonymize data
Automated Daily Activities
Integrations between WKC and DataStage
Group Name / DOC ID / Month XX, 2018 / © 2018 IBM Corporation 7
Data Lineage
Shared Connections
Use of Reference Data
Integrations between WKC and DataStage
Group Name / DOC ID / Month XX, 2018 / © 2018 IBM Corporation 8
Data Lineage
Shared Connections
Use of Reference Data
Integrations between WKC and DataStage
Group Name / DOC ID / Month XX, 2018 / © 2018 IBM Corporation 9
Data Lineage
Shared Connections
Use of Reference Data
Demo
© 2020 IBM Corporation 10
Poll
Group Name / DOC ID / Month XX, 2018 / © 2018 IBM Corporation 11
What integrations between Watson Knowledge Catalog and DataStage would be most useful?
Benefits of Cloud Pak for Data
12
✅ Reduced cost of custom integrations with disparate tools
✅ Pay for what you need, not the entire platform
✅ Governance of data and AI lifecycle
✅ Built on open source technology
✅ APIs available to integrate with all services on the platform
✅ Common experiences and administration across offerings
Moving ForwardHelping customers migrate to Watson Knowledge Catalog and Cloud Pak.Demonstrating how Watson Knowledge Catalog benefits and extends, delivering business-ready-data to the enterprise.
Customers can migrate existing content from Information Server 11.5and 11.7 to Watson Knowledge Catalog using existing exportcapabilities. This will ensure uninterrupted operability of theirGlossary, Catalog and Analysis Projects.
Version 11.7 – ISTool Export Syntax
Version 11.5 – ISTool Export SyntaxInstallation Guidelines will be forthcoming. Cloud Pak will require a new dedicateddeployment on RedHat Linux OpenShift. Cloud Pak supports IBM Cloud in additionto other public Cloud Vendors or on-premise deployment.
Moving ForwardContent Migration from Information Server to Watson Knowledge Catalog
Assets which can be fully migrated to Watson Knowledge Catalog via command scripts:
• Glossary Assets: Categories & Terms, Governance Policies & Rules, Stewards, Labels
• Information Assets: Database, Data File, Business Intelligence, Data Model
• Other Assets: OpenIGC, Extended Data Sources and Mappings, FastTrack
• DataStage Projects and Job definitions
• Analysis Projects and Workspace
• Metadata Asset Manager Import Area definitions
Known Limitations
• Custom Attribute Relationships and Restrictions on Governance Assets (expected Q3 2020)
• Published Analysis Results and Quality Score will not migrate, and need to be re-generated
• Metadata Asset Manger historical Import Area information will not migrate
• Glossary Term History and Development Glossary will not migrate
• Glossary Multilingual definitions will not migrate (Translation expected Q2 2020)
• Data and Business Lineage configuration settings will need to be re-defined
The following components or capabilities are not currentlysupported in Watson Knowledge Catalog:
• Business Glossary Anywhere• Business Glossary for Eclipse• Cognos Framework Manger / Report Designer integration• Governance Dashboard / SQL Views• Stewardship Center / Subscription Manager• Business Process Manager Integration• Governance Catalog Collections
Business Challenge
Associated Bank wanted to improve their client experiences, and be able to better analyze data from many existing data sources.
Solution
Associated Bank is adopting IBM Cloud Pak for Data System, for rapid deployment and scaling of AI. Initial projects include a new Customer 360 system for improving client experiences and a new governed data dashboard for improved analytics results.
Outcome⎻ Cloud Pak for Data System provides the Bank a single interface
platform for end-to-end enterprise analytics⎻ Single source of all information around the customers through
all the bank systems⎻ Assist in compliance with privacy regulations like CCPS
Solution Components
Data Modernization and DataOps‒ IBM Cloud Pak for Data System (on premise ) with
‒ IBM DataStage
‒ IBM Watson Knowledge Catalog
‒ IBM Db2 Warehouse on Cloud
‒ Services from IBM’s Expert Labs team and IBM’s Data Science
and AI Elite team‒ Partner: iOLAP
"One of the great things about the Cloud Pak for Data System is the speed with which we'll be able to launch and scale our analytics platform. The integrated stack contains what we need to improve data quality, catalog our data assets, enable data collaboration, and build/operationalize data sciences. We're able to move quickly with design, test, build and deployment of new models and analytical applications."
Steve LueckVice President, Data Management
Associated Bank
Rapid deployment and scaling of AI
Industry: Banking & Financial MarketsGeography: North America
Watch the video
Associated Bank
Business Challenge
Large, complex bank with mix of disparate data silos with both legacy and modern capabilities. Adherence to numerous industry regulatory requirements made accessing and querying data difficult and complex; data lineage was a large factor. Data insight initiatives were often slowed or delayed.
Solution
The bank sought to move to one corporate operating model, in anticipation of GDPR and other cross-border regulatory requirements.
The Bank partnered with IBM in order to streamline data management and applications across all operational countries by developing a single operating model strategy and platform.
Outcome
⎻ Ensure proper data governance, while simultaneously leveraging data from across the bank
⎻ Consolidate stacks into a single user experience platform; increasing collaboration, streamlining application management, and optimizing licensing and IT cost drivers
⎻ Leverage data virtualization for existing on-premise investments with data to remove data silos
Solution Components
Data Modernization and DataOps⎻ IBM Cloud Pak for Data on premise with
⎻ IBM Data Virtualization⎻ IBM DataStage⎻ IBM Watson Knowledge Catalog
⎻ Services from IBM’s Expert Labs team and IBM’s Data Science and AI Elite team
Developing a single operating model strategy and platform
Industry: Banking & Financial MarketsGeography: Europe
A Large European Bank
Business Challenge
The bank, an existing IBM Information Server Suite customer (DataStage, Quality Stage, IGC, IA, FastTrack), wanted to improve its data governance strategy with a focus on data quality and data lineage and as a result hereof become regulatory compliant. They wanted to enhance their business user experience and be able to better analyze data from many existing data sources. They lacked centralized enterprise-wide Data and Analytics strategy support processes and needed an enterprise data inventory to improve data governance.
Solution
The bank partnered with IBM to streamline data collection and management across the bank by developing a single Data Governance operating model strategy and platform.
The bank sees a clear path to automating its data governance practice and IBM Cloud Pak for Data as the perfect solution providing a product evolution, a modern and open platform to drive core system transformation. Starting on the journey to automating their data governance, the bank is implementing IBM Watson Knowledge Catalog to catalog all meta-data across platforms and ultimately provide real-time quality data for their Data Scientists and business users as self-service.
Outcome
⎻ Ensure proper data governance, while simultaneously leveraging data from across the bank
⎻ Consolidate stacks into a single user experience platform and increase collaboration
⎻ Get trusted data and reduce the amount of data preparation
⎻ Free up time to spend on analysis and gain new insights
Solution Components
Data Modernization and DataOps⎻ IBM Cloud Pak for Data on premise with
⎻ IBM DataStage⎻ IBM Watson Knowledge Catalog
⎻ Services from IBM’s Expert Labs team
Automating data governance across the bank to meet changing expectations and increased regulation
Industry: Banking & Financial MarketsGeography: Europe
A Large European Bank
Governance
Expand connections Ecosystem
Customization of views by persona
Reference Data versioning
Enhanced Lineage – Business View
Support for Knowledge Accelerators
AI model policies and rules
Quality
ML assisted processing time estimates
DQ Remediation workflow
Address parse/enhance/verify
Delta data discovery
Connectivity to Hive over Kerberos
Consumption
Search and Drill Down asset hierarchy
Support additional asset types
Restrict access to data
Support external reporting and querying tools Integration with ADP and Cognos
3rd Party Data Accelerators/Providers
Enhance Platform Roles & Permissions –CPD user groups
Governance
Data Protection rules in Data Virtualization
Workflow customization for governance artifacts
Rule Based Meta Data Access Control**
Tag Data Source Connection**Data Lineage**
Quality
Enhanced learning for term suggestions
View of data quality trends over time
Data Rule Exception Management
‘Fingerprint’ data classes
Simplified Discovery Experience
WKC Instascan
Import table or column from ERWin**
Consumption
New Connectors: SharePoint, Hive MetaStore, OracleBI, Impala, Planning Analytics
Search on Description in Catalog Assets **
Overall
New look and feel!
Globalization for Brazilian Portuguese, English, French, German, Italian, Japanese, Russian, Simplified Chinese, Spanish, and Traditional Chinese
Watson Knowledge Catalog on Cloud Pak for Data2020/2021 Roadmap
Delivered 2H 2020Nov
1H 2021May
Governance
Reference Data Set mapping, hierarchies & custom columns
Workflow request management
Permissions and workflow by categories
Quality
Discovery and profiling of unstructured data
ML based data sampling
Quick scan discovery into catalog of choice
View sample data in term assignment
Consumption
Custom Asset Types
Asset relationships and hierarchies
Integration with Test Data Management
On Demand View of Sample Data from Term Assign
Overall
Import/Export to support move from dev/test/prod
© 2020 IBM Corporation
Governance
Expand connections Ecosystem
Approval Process for publishing assets
Quality
Retire Information Assets View
Data Quality Analysis Workspace
Discovery queues for new term generation
Consumption
Apache Atlas Integration
Overall
Additional Languages
2H 2021
** Available Today
IBM Cloud / © 2018 IBM Corporation 19
Recommended