9
APPLICATIONS OF DATA MINING ISSUES IN DATA MINING Applications: Financial Data Analysis Retail Industry Telecommunication Industry Biological Data Analysis Other Scientific Applications Intrusion Detection Financial Data Analysis: Financial Data Collected from Banks and Financial Institutions Usually complete and reliable Design and Construction of data Warehouses for multi-dimensional data analysis and mining Analysis Changes by month, by region, by sector…and max, min, total, average, trend et c. Characteristic and Comparative analysis, Outlier Analysis Loan payment and customer credit policy analysis Feature Selection and attribute relevance ranking (Debt ratio, credit history, income, education level …) Loan granting policy can be adjusted Low risk Customers are granted loans Classification and Clustering of customers for

data Mining-Applications, Issues

Embed Size (px)

DESCRIPTION

data Mining-Applications, Issues

Citation preview

  • APPLICATIONS OF DATA MINING ISSUES IN DATA MINING

    Applications:

    Financial Data Analysis Retail Industry Telecommunication Industry Biological Data Analysis Other Scientific Applications Intrusion Detection

    Financial Data Analysis: Financial Data

    Collected from Banks and Financial Institutions Usually complete and reliable

    Design and Construction of data Warehouses for multi-dimensional data analysis and mining Analysis Changes by month, by region, by

    sectorand max, min, total, average, trend etc. Characteristic and Comparative analysis, Outlier

    Analysis

    Loan payment and customer credit policy analysis Feature Selection and attribute relevance ranking

    (Debt ratio, credit history, income, education level )

    Loan granting policy can be adjusted Low risk Customers are granted loans

    Classification and Clustering of customers for

  • targeted marketing Customer group identification Multidimensional clustering techniques Can associate new customer with existing groups

    Detection of money laundering and financial crimes Data from several sources integrated Data Analysis tools can be used to detect unusual

    patterns Data Visualization tools, Linkage Analysis tools Classification tools, Clustering tools Outlier Analysis tools

    Retail Industry:

    Sales Data, Customer Shopping history, Goods Transportation, E-Commerce

    Mining can help to Identify buying behaviour, discover shopping

    trends Improve the quality of customer service, retain

    customers Design and Construction of data warehouses

    Several ways to design a warehouse Entities involved: Sales, Customers,

    Employers, Goods transportation Preliminary data mining exercises can help to

    guide the design process Dimensions and levels to involve and pre-

    processing to be done

  • Multi-dimensional analysis of sales, customers, products, time and region Multi-feature data cube Visualization tools

    Analysis of effectiveness of sales campaigns Compare sales and transaction volume Multidimensional analysis

    Compare sales amount, number of transactions containing same items before and after the campaign

    Association Analysis Identify items likely to be purchased together

    Customer Retention Customer loyalty and trends

    Sequential pattern mining Adjust pricing strategy and goods range

    Purchase recommendation and cross-reference of items Recommender Systems Sales promotion by displaying deal information in

    association with items of interest

    Telecommunication Industry:

    Computer and Web data transmission, fax, Mobile phone, Telephone services

  • Multidimensional analysis of telecommunication data Helps to identify and compare the data traffic,

    System work load, Resource usage, User Group Behavior, Profit..

    Time-of-day usage patterns

    Fraudulent pattern analysis Identify fraudulent users and atypical usage

    patterns Illegal Customer account access Automatic Dial-out equipment Switch and route congestion patterns

    Multidimensional association and sequential pattern analysis Usage patterns for a set of communication

    services by customer group, time of day Sales Promotion

    Mobile Telecommunication Services Spatio-temporal data mining

    Use of visualization tools

    Biomedical and DNA Data Analysis:

    Research in DNA Analysis has led to Development of new drugs Cancer therapies Human genome study Discovery of genetic causes for many diseases

  • Genome Research Study of DNA Sequences Adenine, Cytosine, Guanine, Thymine 1,00,000 genes each has hundreds of

    nucleotides can be combined in a number of ways

    Identifying Gene Sequence patterns is challenging

    Semantic Integration of Heterogeneous, distributed genome databases Highly distributed generation and use of DNA

    data Integrated data warehouses and distributed

    federated databases Efficient Data Cleaning and Integration methods

    Similarity Search and Comparison among DNA Sequences Gene sequences isolated from healthy and

    diseased tissues Compare frequently occurring patterns in each

    class Help to identify the genetic factors of the disease

    and immune factors Non-numeric nature of data poses difficulties

    Association Analysis: Identification of co-occurring gene sequences Diseases triggered by a combination of genes

    acting together Association analysis helps to detect the kinds of

    genes that may co-occur Study interactions and relationships between them

  • Path Analysis: Linking genes to different stages of disease development Different genes become active at different stages

    of the disease Develop drug interventions that target specific

    stages

    Visualization tools and genetic data analysis Complex Gene structures Graphs, trees,

    Cuboids and visualization tools Better Understanding and support interactive data

    exploration

    Intrusion Detection: Intrusions

    Any set of actions that threaten the integrity, availability, or confidentiality of a network resource

    Misuse detection: use patterns of well-known attacks to identify intrusions Signatures Must be updated Classification based on known intrusions E.g., three consecutive login failures: password

    guessing.

    Anomaly detection: use deviation from normal usage patterns to identify intrusions Any significant deviations from the expected

    behavior are reported as possible attacks

    Data Mining Algorithms Misuse detection

  • training data labeled normal / intrusion Classifier can be used to detect known

    intrusions Classification algorithms, Association rule

    mining Anomaly detection

    Builds models of normal behavior and detects significant deviations

    Supervised normal training data Unsupervised no information about training

    data Classification, clustering

    Association and Correlation Analysis Finds relationships between system attributes

    describing the network data Helps in selection of useful attributes

    Analysis of Stream data Transient and dynamic nature of intrusions An event maybe normal on its own but malicious

    when viewed as a part of a sequence Distributed Data Mining

    Analysis of data from several locations Visualization and Querying tools

    Data Mining in other Scientific Applications:

    Old Scenario: Small, homogeneous data sets Formulate hypothesis, build model, evaluate

    results

    Current Scenario: High-dimensional data, stream

  • data, heterogeneous data (spatial, temporal) Collect and store data, mine for new hypotheses,

    confirm with data or experimentation

    Vast amounts of data have been collected from Scientific domains Climate and ecosystem modeling, Chemical

    engineering, fluid dynamics, structural mechanics

    Data Warehouses and data preprocessing Scientific applications methods are needed for

    integrating data from heterogeneous sources (Geospatial data warehouse) and identifying events (Climate and Ecosystem data)

    Mining complex data types Scientific data Semi-structured and unstructured Multimedia and Spatial data

    Graph-based mining Labeled graphs capture spatial, topological,

    geometric and other relational characteristics present in scientific data

    Nodes objects to be mined; edges relationships

    Scalable and efficient mining methods are needed

    Visualization tools and domain specific knowledge High level GUIs and visualization tools are

    needed Integrated with existing domain-specific systems

    and database systems

  • Issues in Data Mining:

    Mining methodology and user interaction Mining different kinds of knowledge in databases Interactive mining of knowledge at multiple

    levels of abstraction Incorporation of background knowledge Data mining query languages and ad-hoc data

    mining Expression and visualization of data mining

    results Handling noise and incomplete data Pattern evaluation

    Issues relating to the diversity of data types Handling relational and complex types of data Mining information from heterogeneous

    databases and global information systems (WWW)

    Performance and scalability Efficiency and scalability of data mining

    algorithms Parallel, distributed and incremental mining

    methods

    APPLICATIONS OF DATA MINING ISSUES IN DATA MININGApplications: Financial Data Analysis Retail Industry Telecommunication Industry Biological Data Analysis Other Scientific Applications Intrusion Detection

    Financial Data Analysis: Financial Data Collected from Banks and Financial Institutions Usually complete and reliable

    Design and Construction of data Warehouses for multi-dimensional data analysis and mining Analysis Changes by month, by region, by sectorand max, min, total, average, trend etc. Characteristic and Comparative analysis, Outlier Analysis

    Loan payment and customer credit policy analysis Feature Selection and attribute relevance ranking (Debt ratio, credit history, income, education level ) Loan granting policy can be adjusted Low risk Customers are granted loans

    Classification and Clustering of customers for targeted marketing Customer group identification Multidimensional clustering techniques Can associate new customer with existing groups

    Detection of money laundering and financial crimes Data from several sources integrated Data Analysis tools can be used to detect unusual patterns Data Visualization tools, Linkage Analysis tools Classification tools, Clustering tools Outlier Analysis tools

    Retail Industry: Sales Data, Customer Shopping history, Goods Transportation, E-Commerce Mining can help to Identify buying behaviour, discover shopping trends Improve the quality of customer service, retain customers

    Design and Construction of data warehouses Several ways to design a warehouse Entities involved: Sales, Customers, Employers, Goods transportation

    Preliminary data mining exercises can help to guide the design process Dimensions and levels to involve and pre-processing to be done

    Multi-dimensional analysis of sales, customers, products, time and region Multi-feature data cube Visualization tools

    Analysis of effectiveness of sales campaigns Compare sales and transaction volume Multidimensional analysis Compare sales amount, number of transactions containing same items before and after the campaign

    Association Analysis Identify items likely to be purchased together

    Customer Retention Customer loyalty and trends Sequential pattern mining

    Adjust pricing strategy and goods range

    Purchase recommendation and cross-reference of items Recommender Systems Sales promotion by displaying deal information in association with items of interest

    Telecommunication Industry: Computer and Web data transmission, fax, Mobile phone, Telephone services Multidimensional analysis of telecommunication data Helps to identify and compare the data traffic, System work load, Resource usage, User Group Behavior, Profit.. Time-of-day usage patterns

    Fraudulent pattern analysis Identify fraudulent users and atypical usage patterns Illegal Customer account access Automatic Dial-out equipment Switch and route congestion patterns

    Multidimensional association and sequential pattern analysis Usage patterns for a set of communication services by customer group, time of day Sales Promotion

    Mobile Telecommunication Services Spatio-temporal data mining

    Use of visualization tools

    Biomedical and DNA Data Analysis: Research in DNA Analysis has led to Development of new drugs Cancer therapies Human genome study Discovery of genetic causes for many diseases

    Genome Research Study of DNA Sequences Adenine, Cytosine, Guanine, Thymine 1,00,000 genes each has hundreds of nucleotides can be combined in a number of ways Identifying Gene Sequence patterns is challenging

    Semantic Integration of Heterogeneous, distributed genome databases Highly distributed generation and use of DNA data Integrated data warehouses and distributed federated databases Efficient Data Cleaning and Integration methods

    Similarity Search and Comparison among DNA Sequences Gene sequences isolated from healthy and diseased tissues Compare frequently occurring patterns in each class Help to identify the genetic factors of the disease and immune factors Non-numeric nature of data poses difficulties

    Association Analysis: Identification of co-occurring gene sequences Diseases triggered by a combination of genes acting together Association analysis helps to detect the kinds of genes that may co-occur Study interactions and relationships between them

    Path Analysis: Linking genes to different stages of disease development Different genes become active at different stages of the disease Develop drug interventions that target specific stages

    Visualization tools and genetic data analysis Complex Gene structures Graphs, trees, Cuboids and visualization tools Better Understanding and support interactive data exploration

    Intrusion Detection: Intrusions Any set of actions that threaten the integrity, availability, or confidentiality of a network resource

    Misuse detection: use patterns of well-known attacks to identify intrusions Signatures Must be updated Classification based on known intrusions E.g., three consecutive login failures: password guessing.

    Anomaly detection: use deviation from normal usage patterns to identify intrusions Any significant deviations from the expected behavior are reported as possible attacks

    Data Mining Algorithms Misuse detection training data labeled normal / intrusion Classifier can be used to detect known intrusions Classification algorithms, Association rule mining

    Anomaly detection Builds models of normal behavior and detects significant deviations Supervised normal training data Unsupervised no information about training data Classification, clustering

    Association and Correlation Analysis Finds relationships between system attributes describing the network data Helps in selection of useful attributes

    Analysis of Stream data Transient and dynamic nature of intrusions An event maybe normal on its own but malicious when viewed as a part of a sequence

    Distributed Data Mining Analysis of data from several locations

    Visualization and Querying tools

    Data Mining in other Scientific Applications: Old Scenario: Small, homogeneous data sets Formulate hypothesis, build model, evaluate results

    Current Scenario: High-dimensional data, stream data, heterogeneous data (spatial, temporal) Collect and store data, mine for new hypotheses, confirm with data or experimentation

    Vast amounts of data have been collected from Scientific domains Climate and ecosystem modeling, Chemical engineering, fluid dynamics, structural mechanics

    Data Warehouses and data preprocessing Scientific applications methods are needed for integrating data from heterogeneous sources (Geospatial data warehouse) and identifying events (Climate and Ecosystem data)

    Mining complex data types Scientific data Semi-structured and unstructured Multimedia and Spatial data

    Graph-based mining Labeled graphs capture spatial, topological, geometric and other relational characteristics present in scientific data Nodes objects to be mined; edges relationships Scalable and efficient mining methods are needed

    Visualization tools and domain specific knowledge High level GUIs and visualization tools are needed Integrated with existing domain-specific systems and database systems

    Issues in Data Mining: Mining methodology and user interaction Mining different kinds of knowledge in databases Interactive mining of knowledge at multiple levels of abstraction Incorporation of background knowledge Data mining query languages and ad-hoc data mining Expression and visualization of data mining results Handling noise and incomplete data Pattern evaluation

    Issues relating to the diversity of data types Handling relational and complex types of data Mining information from heterogeneous databases and global information systems (WWW)

    Performance and scalability Efficiency and scalability of data mining algorithms Parallel, distributed and incremental mining methods