Upload
others
View
13
Download
0
Embed Size (px)
Citation preview
1. Product Documentation / User . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1 OverView . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Installation / Implementation Video Plus Demo Video Plus URL / Key Code / Login . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Administrator Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Administrator link to create and or update User Authority, Password, Email and more . . . . . . . . . . . . . . . . . . . . . . 41.2.2 Control File - A Core to making BigDataRevealed the marvel it is. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2.1 Amazon AWS S3 Security Credentials and Key Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2.2.2 Control File Additional Hadoop Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2.2.3 Control File and Setup first time in or when changes are warranted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2.2.4 Control File Cloudera Impala / Navigator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2.2.5 Control File Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2.2.6 Control File Email Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.2.2.7 Control File Kerberos Security Configuration Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.2.2.8 Control File Twitter Connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.3 Manage Users and Maintain Users Credentials and Authorities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.2.4 Modify RegEx RegEx Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.4.1 Create, Modify, Delete Regular Expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.2.4.2 Create, Review Maintain, Delete Regular Expression Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.2.4.3 Create and Modify Regular Expression and RegEx Grouping for Pattern Discovery . . . . . . . . . . . . . . . . . . . 12
1.2.5 Notifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.3 Admin User Section . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.1 Admin User Profile Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.3.2 Create Watches for value ranges for maintenance, AML or for Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.3.3 Operational and User Lineage Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4 File Content Prep and Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.4.1 File Content Viewer and Delimiter Validator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.4.2 File System Tree for AWS S3 and Hadoop HDFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5 Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.5.1 Display Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.5.1.1 Click on Social Security in Quick Classification showing 5 Discoveries found . . . . . . . . . . . . . . . . . . . . . . . . 181.5.1.2 Executive Summary and Quick Column Classification Graphs and drill . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.5.1.2.1 Click on Social Security in Quick Classification showing 5 Discoveries found and Drilling into Results . 19191.5.1.2.2 Executive Summary - Interactive Graphs and drill for Quick Columnar Classification . . . . . . . . . . . . . 20
1.5.2 Run Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.5.2.1 Quick Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.5.2.1.1 Quick Classification Running . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.5.2.1.2 Quick Classification Run Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.5.2.2 Run Pattern Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231.5.2.2.1 Pattern Job Run and Display Screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231.5.2.2.2 Run The Pattern Discovery for User Selected Patterns for Compliance Checking . . . . . . . . . . . . . . . . 23
1.6 Running of Basic Core Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241.6.1 Run the Data Discovery Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241.6.2 Validating Delimiters before final execution of the Data Discovery Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251.6.3 Verify Job is running and view when completed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.7 Encrypting Column 11 Social Security vimeo.com/251375791 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261.7.1 Encryption of Compliance Violations- Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.8 Decrypting one or more columns of data if credentials allowed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281.8.1 Validation that Column 11 has been Decrypted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.9 Indirect Identifiers (the Regulations that will fail the vast majority) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301.9.1 Direct Identifiers Stage One of 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.9.1.1 Direct Identifiers Results phase One of Two . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321.9.1.2 Forgotten Identity Screen used to Decrypt Right if Erasure and Change of Consent . . . . . . . . . . . . . . . . . . . 321.9.1.3 Forgotten Identity Screen used to Decrypt Right if Erasure and Change of Consent Cont2 . . . . . . . . . . . . . 32
1.9.2 Indirect Identifiers and the Citizens Right of Erasure AKA Right to be Forgotten. . . . . . . . . . . . . . . . . . . . . . . . . . . 331.9.3 Indirect Identifiers completed job Open and Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331.9.4 Indirect Identifiers completed job Open and Review Cont. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341.9.5 Indirect Identifiers completed job Open and Review Cont. 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341.9.6 Indirect Identifiers completed job Open and Review Cont. 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351.9.7 Indirect Identifiers completed job Open and Review Cont. 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351.9.8 Indirect Identifiers completed job Open and Review Cont. 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361.9.9 To Discover the Indirect identifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.10 Live Streaming Remediation / Encryption Results Viewer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371.10.1 Live Streaming Data Compliance/ Remediation / Encryption on the Fly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371.10.2 Producer File Creator to connect and process data from live streams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371.10.3 Run Parameters for a Producer to access data and have that data Discovered by BigDataRevealed . . . . . . . . . 38
Product Documentation / UserBigDataRevealed for EU GDPR and most any Regulatory Data Compliance
BigDataRevealed assists in your ability to Protect
your Customers Private Data to meet the EU
GDPR and most any Regulatory Compliance
Core team
Steven MeisterFounder
Tyler Miller Vice President
Shashank Senior Developer
Quick navigation
When you create new pages in thisspace, they'll appear hereautomatically.
OverViewAdministrator FeaturesAdmin User SectionFile Content Prep andSelectionJobsRunning of Basic Core JobsEncrypting Column 11 SocialSecurityvimeo.com/251375791Decrypting one or morecolumns of data if credentialsallowedIndirect Identifiers (theRegulations that will fail thevast majority)Live Streaming Remediation/ Encryption Results Viewer
OverView
BigDataRevealed was built with EU GDPR and all Governmental Data Regulatory Agencies in mind. BigDataRevealed offers a mean tocollaboratively and extensively Discover (find . locate) Personally Identifiable Information in most any data format or type, Remediate ( Quarenteinanad or Encrypt this Sensitive data) protected by Regulatory Agencies as well as Encrypt-on-Fly live streams of business and social data.
BigDataRevealed also Discovers and Remediates the complexity of Indirect Identifiers, The Citizens right of Erasure (Right to be forgotten), andallow Citizens for verify, accept and or deny their consent to all or parts of their personally protected information.
BigDataRevealed as its name advocates, is meant with Big Data in mind, though delivers just the same for the SMB/SME businesses inhouseand on the cloud.
BigDataRevealed keeps its overall costs down to the Customer by using the Apache Hadoop Open Source Platform as well as the free to low costAmazon AWS S3 Environments.
BigDataRevealed is written in Spark with Java and scales like no other environments and does so cost effectively and securely.
BigDataRevealed plans in the first quarter to also source directly to most all RDBMS, though is a believer in a Central Repository for proper and
accurate results.
BigDataRevealed prides itself on the ability to install and implement in minutes and start delivering Compliance Discovery and Remediation Dayone for millions upon millions of rows of data while jump-starting your Compliance projects exponentially over inhouse, consultative or other thirdparty applications.
Installation / Implementation Video Plus Demo Video Plus URL / Key Code / Login
Here you can find the Video forvideo of the installation / implementation process.
Here you can find a basic demo video of BigDataRevealed Application
You know it is your first time in if you are prompted to key in a Key Code. If you do not have the key code please email at privacyinfo@bigdatarev or call 847-440-4439.ealed.com
If logging in for the first time please use Username hadoop and password revealed or ask you administrator for your assigned credentials. It isstrongly recommended to change your credentials in the Admin section at top right once logged in.
Administrator Features
Administrator link to create and or update User Authority, Password, Email and more
A senior administrator can add or modify authorities and User can modify their email and password
Administrator screen for User rules and permission and users ability to update email and password
only
Control File - A Core to making BigDataRevealed the marvel it is.
Control File to assign parameters for AWS S3, Hadoop, Security, Streaming and more ....
Amazon AWS S3 Security Credentials and Key Codes
Setup all necessary AWS S3 Security Credential to communicate with the Server and the Buckets of Data Assets
Control File Additional Hadoop Parameters
Enter any required additional Hadoop Information
Control File and Setup first time in or when changes are warranted
Server Settings are for configuring Hadoop, Spark, Kafka, Drill and other Hadoop parameters
Control File Cloudera Impala / Navigator
If you are Using Cloudera Hadoop and either Impala or Navigator fill in the appropriate parameters
Control File Databases
Set-Up Parameters to connect to certain RDBMS Files
Control File Email Settings
Used to contact users when Watches and Warnings occur such as AML or Parts beginning to fail
Control File Kerberos Security Configuration Settings
Setup of Kerberos Security settings if the server BigDataRevealed is running on has implemented Kerberos
Control File Twitter Connection
Twitter Connection Strings for reading and processing streams as well as contacting people when alerts and issues areencountered
Manage Users and Maintain Users Credentials and Authorities
Manage Users and Maintain Users Credentials and Authorities
Modify RegEx RegEx Group
Modify RegEx RegEx Group
Create, Modify, Delete Regular Expression
These are sharable, collaborative pattern detectors
Video of Creation of RegEx Pattern Discovery and Discovery group and run them in the Pattern Discovery run.
vimeo.com/251278398
Create, Review Maintain, Delete Regular Expression Groups
Allows 2 or more Patterns to be searched, Discovered at once
Create and Modify Regular Expression and RegEx Grouping for Pattern Discovery
Create and Modify Regular Expression and RegEx Grouping for Pattern Discovery vimeo.com/251278398
vimeo.com/251278398
Notifications
Notifications sent to Users when watches occur
Admin User Section
Admin user Section
Admin User Profile Maintenance
Add or Change content in Your User Profile
Create Watches for value ranges for maintenance, AML or for Patterns
For a file specify a column to monitor every x time to detect parts going defective, AML or patterns such as email addresses
Operational and User Lineage Logs
Logs of operations run by users and System Logs
File Content Prep and Selection
Select folders or files to run BigDataRevealed for Discovery and Remediation
File Content Viewer and Delimiter Validator
https://vimeo.com/251370329 File Content Viewer and Delimiter Validator
File System Tree for AWS S3 and Hadoop HDFS
https://vimeo.com/251370329 File System Tree for AWS S3 and Hadoop HDFS to be processed by BigDataRevealed
This is where the User selects the folders or files to run for Personally Identifiable Information and Business Columnar Classifications. The filescan also be viewed in the file content viewer and well as selecting a specific folder or file, the prior run results will be shown on the ExecutiveSummary Graphs Technical Dashboard.
Jobs
Where the User can Run Jobs or Display Jobs
Display Jobs
Show jobs that have been submitted and running and jobs already completed
Click on Social Security in Quick Classification showing 5 Discoveries found
https://vimeo.com/251371464 Here we see what columns Social Security was found and what percentage of SocialSecurity was found next screen we will drill deeper into the data assets
Executive Summary and Quick Column Classification Graphs and drill
Allows for the simple drilling into the Pattern Discovery results
Click on Social Security in Quick Classification showing 5 Discoveries found and Drilling into Results
We can see this not not a false positive and may need to be Encrypted
Executive Summary - Interactive Graphs and drill for Quick Columnar Classification
https://vimeo.com/251372458
Run Jobs
Below will be a list of Jobs that can be run/executed from this menu against the already selected folder or file
Quick Classification
Quick Classification Running
Run quick classification to discover one or more patterns in a column and what percentage of the data that pattern resembles.
Quick Classification Run Results
https://vimeo.com/251373215 Showing results for some of the columns that discovery found results and what percentage those resultsrepresent
Run Pattern Discovery
Pattern Job Run and Display Screen
Here you can see the pattern job is running and what patterns are being discovered. By clicking on the Discovery Patterns, we can see that thediscovery is being run for email, Socail Security Number and IP Address Valid.
Run The Pattern Discovery for User Selected Patterns for Compliance Checking
https://vimeo.com/251374195 The User can select groups or Individual Patterns for Discovery for auditing for PII Compliance
In this example we are running pattern Discovery for email, IP Address and Social Security Number - After all the runs we will drill down and
validate they are not false positives and if not decide and if decided the User can Encrypt those columns of data.
patterndiscoveryjob.mp4
Running of Basic Core Jobs
The running of the BigDataRevealed Core jobs such as Data Discovery, Columnar Classification and Pattern Discovery
Run the Data Discovery Job
https://vimeo.com/251375024 This job create from the selected folder and or file all the unique data assets and unique
patterns found that the User selected to run
Validating Delimiters before final execution of the Data Discovery Run
Allows the User to validate the proper file column delimiter is selected and if not the User may change the delimiter before running the job
Verify Job is running and view when completed
This screen allows the User to see the current running jobs and view and drill into the jobs that have been completed
Encrypting Column 11 Social Security vimeo.com/251375791
vimeo.com/251375791
https://vimeo.com/251375791
Encryption of Compliance Violations- Validation
https://vimeo.com/251375791 As we can see from the file content screen column 11 ssn is encrypted
Decrypting one or more columns of data if credentials allowed
Indirect Identifiers (the Regulations that will fail the vast majority)
Discovery and Remediation of Indirect Identifiers is probably the most difficult to master. Perhaps as high as 90% ofGDPR RequirementsCompanies will falter when attempting to discover and protect Indirect Identifiers that are spread across multiple files.
Fields that by themselves do not uniquely identify an individual, but when grouped together do identify an individual, or a very small group ofindividuals.
A good example of Indirect Identifiers would be Date of Birth, Postal Code and Gender. Only a handful of individuals will have the same valuein these three fields and constitute a GDPR violation.
Discovery of Cross File Indirect Identifiers. BigDataRevealed’s Automated Direct Key finder cross your enterprise, allows a User to logically‘Join’ multiple files by using another field found in all the files as the key to execute the logical Join. Rows from all files are then processed todetermine if the joined rows contain Indirect Identifiers that constitute a GDPR violation.
Direct Identifiers Stage One of 2.
https://vimeo.com/251400656 Important Video to Watch on the Indirect Identifiers and Consent Regulatory
Requirements.
Step one allows the User to select One or More Direct identifiers (keys) such s email, Passport Number, National Insurance ID, Social SecurityNumber, Phone Number an son on ...
This process will come back with a list of files that have one or more of these Patterns of Keys , what percentage of the file has these Patterns.This will insure that phase two will have the ability to join the tables by like keys then Discover on what unique domain values (Indirect Identifiers)like gender, age, postal code, illness and son on.. are present cross two or more files the User previously selected.
If a combination of Indirect Identifiers do exit Cross files, and theses combinations of values would allow a hacker or researcher the ability todetermine within a certain range of certainty of finding a person or small group of people, this would constitute a GDPR Compliance Violation.
Important to note: All permutations (join of all Indirect Identifiers) must be attempted to be joined and matched across ALL other values of all thecolumns and rows of all the selected files.
Below are the stage one result of files, their columns that have Direct Identifiers that will allow the proper Discovery and joins across all files toidentify Indirect Identifier violations. Here we searche a folder of iles that have emails. Social Securities and IP Address.
Direct Identifiers Results phase One of Two
Here we can see the list results that were derived by the BigDataRevealed Pattern Detection finding these Direct Key Identifiers and whatpercentage of each file they represent. Now the User has knowledge of what files can be used in order to detect the Indirect Identifier potentialviolation that occur in their Data Assets of their Customer.
Forgotten Identity Screen used to Decrypt Right if Erasure and Change of Consent
This screen shows columns that have been encrypted for the purpose of Citizens Right to be Forgotten or Add or Removal of Consent of PrivateInformation
Forgotten Identity Screen used to Decrypt Right if Erasure and Change of Consent Cont2
This screen shows columns that have been encrypted for the purpose of Citizens Right to be Forgotten or Add or Removal of Consent of PrivateInformation
Indirect Identifiers and the Citizens Right of Erasure AKA Right to be Forgotten.
They share commonalities in their Discovery and Remediation / Encryption.
Indirect Identifiers completed job Open and Review
Open the Job and review the results. Then Decide what columns of values need to be Encrypted to not be in violation of Indirect Identifiers.
This same process and results are used to meet the Citizens Right of Erasure AKA Right to be Forgotten Regulation.
Indirect Identifiers completed job Open and Review Cont.
Shows all the Unique Identifier result, what files they were found on and if intra or cross file results.
The next phase of selecting a unique value will show the file, column and row the Indirect Identifier was found. This will allow the necryption to becompliant with Indirect Identifiers ir can be selected and encrypted for the Citizens right of erasure aka right to be forgotten.
Indirect Identifiers completed job Open and Review Cont. 3
This phase of selecting a unique value will show the file, column and row the Indirect Identifier was found. This will allow the necryption to becompliant with Indirect Identifiers ir can be selected and encrypted for the Citizens right of erasure aka right to be forgotten.
See all rows for unique identifier and file col row and pattern type found for them
Indirect Identifiers completed job Open and Review Cont. 4
Row 8 col 11 is being encrypted
Indirect Identifiers completed job Open and Review Cont. 5
This shows that row 8 col 11 is eligible for decryption
Indirect Identifiers completed job Open and Review Cont. 6
relative row 8 col 11 Social Security is now encrypted as remediated by the user
To Discover the Indirect identifiers
https://vimeo.com/251400656 Select the Direct identifier key to search across the files and the Indirect identifier
unique values to find across your selected file joinable by your selected Identifier Key
Live Streaming Remediation / Encryption Results Viewer
Display and review the results of the remediation / encryption process on the live streaming data
Live Streaming Data Compliance/ Remediation / Encryption on the Fly
BigDataRevealed Discovers live streaming data for Regulatory Compliance of PErsonal Data and Will Encrypt this Data before the data in thestream gets written to your Files system.
Producer File Creator to connect and process data from live streams
Add the name of the stream, url, credentials and more
Run Parameters for a Producer to access data and have that data Discovered by
BigDataRevealed
This screen let you pick a producer, set the duration of time reading the streamed data and select the patterns to search for and remediate withencryption before the data is written to a file.