74
Granular Data Mining and Rough-fuzzy Computing: Data to Knowledge and Big Data Issues Sankar K. Pal Indian Statistical Institute Calcutta http://www.isical.ac.in/~sankar & Data Mining Services Ltd (DMS), U.K. 1 st Mediterranean Conf. on Pattern Recogntion and Artificial Intelligence, Tebessa, Algeria, Nov 22, 2016

Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Granular Data Mining and Rough-fuzzy

Computing: Data to Knowledge and Big Data Issues

Sankar K. Pal

Indian Statistical Institute

Calcutta http://www.isical.ac.in/~sankar

&

Data Mining Services Ltd (DMS), U.K.

1st Mediterranean Conf. on Pattern Recogntion and Artificial Intelligence,

Tebessa, Algeria, Nov 22, 2016

Page 2: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Contents Machine Intelligence & DM

Fuzzy sets and uncertainty analysis

Rough sets and information granules

Example: Case mining

Other applications

Rough-fuzzy computing: Significance

Generalized rough sets and entropy

Object extraction & video tracking

Bioinformatics: Gene and miRNA selection

Challenging issues

Relevance to Big Data

Alg-1

Page 3: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Machine

Intelligence

Knowledge-based

Systems

Probabilistic reasoning

Approximate reasoning

Case based reasoning

Fuzzy logic

Rough sets

Pattern recognition

and learning

Hybrid Systems

Neuro-fuzzy

Genetic neural

Rough fuzzy

Fuzzy neuro

genetic

Non-linear

Dynamics

Chaos theory

Rescaled range

analysis (wavelet)

Fractal analysis

Data Driven

Systems

Neural network

system

Evolutionary

computing

Machine Intelligence: A core concept for grouping various advanced

technologies with Pattern Recognition and Learning

IAS are physical embodiments of Machine Intelligence

Page 4: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Pattern Recognition and Machine Learning

principles applied to a very large (both in size

and dimension) heterogeneous database

Data Mining

Data Mining + Knowledge Interpretation

Knowledge Discovery

Process of identifying valid, novel, potentially

useful, and ultimately understandable patterns

in data

Algeria-3

Page 5: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Fuzzy Sets:

Flexibility & Uncertainty Analysis (Lotfi Zadeh, Inform. Contro1, 1965)

Page 6: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

1

0 No

Yes

Black

White

Classical Set Hard

Fuzzy Set Soft ]1,0[

1,0

A = {(A(x), x) : for all x X}

µA(x) : Membership function : degree of belonging of x to A or

degree of possessing some imprecise property represented by A

Examples : tall man, long

street, large number, sharp

corner, very young, etc. FS are nothing but -functions

-functions are context

dependent

Page 7: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Meeting at 5 PM

5 6 7 4

0

1 Crisp Set

Fuzzy Set

Memb. Function

• Fuzzy Sets are nothing but Membership Functions • Membership Function: Context Dependent

Page 8: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

FS is a Generalization of classical set theory

Greater flexibility in capturing faithfully various aspects

of incompleteness or imperfection in a situation

Flexibility is associated with the Concept of

As Amount of Stretching the Concept

FS are Elastic, Hard sets are inelastic

p(x): Concerns with the no. of occurrences of x

: Concerns with the compatibility (similarity) of x

with an imprecise concept

Characteristics of FS

x

Algeria-6

Page 9: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Concept of Flexibility & Uncertainty

Analysis (overlapping data/ concept/ regions)

Page 10: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Rough Sets and Granular Computing

Page 11: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Rough Sets

. x

Upper

Approximation BX

Set X

Lower

Approximation BX

[x]B (Granules)

[x]B = set of all points belonging to the same granule as of the point x in feature space WB.

[x]B is the set of all points which are indiscernible with point x

in terms of feature subset B.

UB WZ. Pawlak 1982, Int. J. Comp. Inf. Sci.

Page 12: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Approximations of the set UX

B-lower: BX = }][:{ XxUxB

B-upper: BX = }][:{ XxUxB

If BX = BX, X is B-exact or B-definable

Otherwise it is Roughly definable

Granules definitely

belonging to X

w.r.t feature subset B

Granules definitely

and possibly belonging

to X

Rough Sets are Crisp Sets, but with rough description

Page 13: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Rough Sets

Uncertainty

Handling Granular

Computing (Using lower & upper approximations) (Using information granules)

Two Important Characteristics

Page 14: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

IEEE Trans. Syst., Man and Cyberns. Part B, 37(6), 1529-1540, 2007

Cluster definition using rough lower & upper approx

Sets and Granules can either or both be fuzzy (in real life)

• Lower and upper approximate regions could be fuzzy

Generalized Rough Sets – Stronger model of uncertainty handling

(uncertainty due to overlapping regions + granularity in domain)

Roughness in β :

1 - |lower|/ |upper|

Page 15: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Before I describe the Generalized Rough Sets

and example applications of Granular Mining,

let me explain the

Concept of granules

f-information granules & Case mining

Significance of rough granules

Relevance of Rough-Fuzzy computing in

SC paradigm

Algeria-11

Page 16: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Granulation: Natural clustering

L.A. Zadeh, “Toward a theory of fuzzy information granulation and its

centrality in human reasoning and fuzzy logic”, Fuzzy Sets and Systems, 90,

111–127, 1997

Replacing a fine-grained universe by a coarse-grained

one, more in line with human perception

Rough set theory concerns with a granulated domain

(crisp set defined over a crisp granulated domain)

Z. Pawlak, Rough sets, Int. J. Comp. Inf. Sci, 11(5), 341-356, 1982

Page 17: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Clump of indiscernible objects/points (similar

objects, not discriminable by given attributes/relation)

Concept of Granules

Examples: Granules in

Age: very young, young, not so old,…

Direction: slightly left, sharp right,…

School: each class/section

Image: regions of similar colors, gray values

e.g., max diff of 6 gray levels (Weber’s law)

Granules may be Crisp or Fuzzy (overlapping) $

Page 18: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Concept of -

f- Information Granules using

Rough Rules

Page 19: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

low medium high

low

m

ediu

m

hig

h

F1

F2

Rule 21 MM

• Rule provides crude description of the class using

granule

Information Granules and Rough Set Theoretic Rules

Page 20: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Rule characterizing the granule can be viewed as the Case or Prototype representing the class/ concept/ region

Elongated objects need multiple rules/

granules Unsupervised: No. of granules is

determined automatically Cases (prototypes) are granules, not

sample points case generation, NOT selection

Note:

Page 21: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

All the features may not appear in rules

Dimensionality reduction

Depending on topology, granules of

different classes may have different

dimensions Variable dimension reduction

Less storage requirement

Fast retrieval

Suitable for mining data with large dimension and size

Note:

Algeria-16

Page 22: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Example: IRIS data case generation

Three flowers: Setosa, Versicolor and Virginica

No of samples: 50 from each class

Features: sepal length, sepal width, petal length,

petal width

IEEE Trans. Knowledge Data Engg., 16(3), 292, 2004

Page 23: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

(a) Sepal L- Sep W

(b) Sepal L – Petal L

(c) Sepal L – Petal W

Iris Folowers: Setosa, Versicolor and Virginica

(a) (b)

(c)

Page 24: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

(a) Petal L - Sepal W

(b) Petal W - Sepal W

(c) Petal W - Petal L

Iris Folowers: Setosa, Versicolor & Virginica

(a) (b)

(c)

Page 25: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Iris Flowers: 4 features, 3 classes, 150 samples

0

0.5

1

1.5

2

2.5

3

3.5

4

avg. feature/case

Rough-fuzzy

IB3

IB4

Random

Number of cases = 3 (for all methods)

80%

82%

84%

86%

88%

90%

92%

94%

96%

98%

100%

Classification Accuracy (1-

NN)

Rough-fuzzy

IB3

IB4

Random

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

tgen(sec)

Rough-fuzzy

IB3

IB4

Random

0

0.001

0.002

0.003

0.004

0.005

0.006

0.007

0.008

0.009

0.01

tret(sec)

Rough-fuzzy

IB3

IB4

Random

Algeria-20

Page 26: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Information compression

Computational gain

Granular Computing (GrC): Computation is performed

using information granules and not the data points (objects)

Suitable for Mining Large Data

Rough set theory enriched GrC research

Page 27: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Applications of Rough Granules

Case Based Reasoning (evident is sparse)

Prototype generation and class representation

Clustering & Image segmentation (k selected autom)

Case representation and indexing

Knowledge encoding

Dimensionality reduction

Data compression and storing

Granular information retrieval

Page 28: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Certain Issues

Selection of granules and sizes

Fuzzy granules

Granular fuzzy computing

Fuzzy granular computing

Algeria-23

Page 29: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Generalized Rough Sets

Incorporate fuzziness in set & granules of rough sets

Concept of -

Page 30: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Generalized Rough Sets

. x

Upper

Approximation BX

Set X

Lower

Approximation BX

[x]B (Granules)

In practice, the Set and Granules, either or both, could be Fuzzy.

Generalized Rough Set

Stronger Paradigm for Uncertainty Handling

UB W

IEEE Trans. Syst, Man and Cyberns. Part B, 39(1), 117-128, 2009

Page 31: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Incorporate Fuzziness in Set & Granules

X is a crisp set & Granules have crisp

boundaries - rough set of X

X is a fuzzy set & Granules have crisp

boundaries - rough-fuzzy set of X

X is a crisp set & Granules have fuzzy

boundaries – fuzzy-rough set of X

X is a fuzzy set & Granules have fuzzy

boundaries – fuzzy rough-fuzzy set of X

IEEE Trans. Syst, Man and Cyberns. Part B, 39(1), 117-128, 2009

Generalized Rough Sets

Page 32: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

RX RX The pair is referred to as the fuzzy rough-fuzzy set of X.

X is a fuzzy set & Granules have fuzzy boundaries

R is an equivalence relation

Page 33: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

a measure of inexactness of X

and are the lower and upper approxs. of X

( ) 1R

RXX

RX

RX RX

Roughness Measure

Algeria-26

Page 34: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Proposed r-f entropy Fuzzy entropy

Proposed r-f entropy Fuzzy entropy

Proposed r-f entropy Fuzzy entropy Baboon image

Brain MR image

Remote sensing image

Seg

men

tati

on

Res

ult

s

Effect of

granules

(ω = 6)

Page 35: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

So far we considered granules of equal size

Next, consider granules of unequal size

Page 36: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Example Comparison

Original

Otsu’s thresholding

RE with 4x4 granule

RE with 6x6 granule

Rough-fuzzy with crisp set and 6x6 granule

Proposed methodology

Page 37: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Variation of β-Index over sequence ‘a’

Homogeneous granules of unequal size reduce the formation of spurious segments →

Reduce abrupt change of index-value over frames.

Page 38: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Video Tracking

• Spatial segmentation on each frame

+

• Temporal segmentation based on 3 previous frames

Granules of Unequal Size

quad-tree decomposition (spatial)

merits over fixed size for Video Tracking

Page 39: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Proposed Other RE (6x6) - SP

Otsu RE (6x6) - PUM RE (4x4) - PUM

Algeria-35

Page 40: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Use Neighborhood Granules in RGB-D: 3-d spatio-temporal, 2-d spatio-color, 1-d color

Granules of Arbitrary Shape (natural)

• Use granules, instead of pixels, for O/B partition

• Granules obtained from Lower Approx. Object Model

• Granular level rule based decision

• Automatic updation of rule base with flow graph

Deals with ambiguous tracking situation (unsupervised)

(overlapping, newly appeared object, merged with similar color)

IEEE Trans. Cybernetics (to appear)

Page 41: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Extraction of Temporal Information (in D-space)

___ (abs diff)

τp

UNION INTERSECTION

Current Frame ft

Intersection & | ft – fP| : Lower & Upper approx. of object in temporal domain

ft-2

ft-3

ft-1

Page 42: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

∩{τp ∀p ∈ P} denotes the common moving regions

of τp over P frames estimation of Lower

approximation of the moving object(s) (Olow) in

temporal domain Object model in D-space

⋃{ Olow , τP} = ⋃{ Olow , | ft – fP|} (basically τP)

denotes estimation of Upper approximation of the

moving object(s) (Oup) in temporal domain

Values of the features (RGB-D, Temporal) contained in

the set Olow are the core values of the object model

Values of the features in the set {Oup − Olow}, i.e.,

the boundary region, determine the extent to which

the values in the object model are allowed

Page 43: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,
Page 44: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Methodology

Algeria-40

Page 45: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Without Flow Graph Based Adaptation

Newly appeared object: Fails (Frames per sec = 15, P = 6)

Page 46: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Newly appeared object: Succeeds

With Flow Graph Based Adaptation

Page 47: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

f-Information Measure and Gene

Selection from Microarray Data

Example:

Selection principle → Maximization of relevance of a gene w.r.t.

decision attribute and Minimization of redundancy w.r.t. other genes

Merits: FEPM based density approximation approach vs. those of

Discretization and Parzen-window for computing Entropy and

Mutual, V & Chi-square information FGC

Granules (CI) model Low, Medium & High for

overlapping classes Fuzzy Approx. Space (FEPM)

IEEE Trans, Syst., Man and Cyberns, Part B, 40(3), 741-752, 2010

IEEE Trans. Knowledge & Data Engineering, 22(6), 854-867, 2010

Small sample, large dimension

Algeria-43

Page 48: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

FEPM Mg1g2

g2

X X

X X

0.8

0.4

0.2

0.1 0.5 0.8 g1

Each sample, in g1-g2 granulated space, is characterized by a 9-dimensional

membership vector, each component corresponds to one of the nine fuzzy

partitions L-L, L-M, L-H, .., M,H, .., H-H. Membership in (L,H) segment -

mg1g2

LH = mg

1L mg

2H

Attribute sets P & Q = g1 & g2

Joint freq of segment (Pi,Qj)

L M H

L

H

M X X

X

X

X

X

Example:

Page 49: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Gene Selection from Microarray Data

Mutual Information

82

84

86

88

90

92

94

96

98

100

102

Breast Leukemia Colon RAOA RAHC

Ac

cu

ra

cy

(%

)

FEPM

DISCRETE

PARZEN

Mutual Information

0

5

10

15

20

25

30

35

Breast Leukemia Colon RAOA RAHC

No

of

Sele

cte

d G

en

es

FEPM

DISCRETE

PARZEN

Maximization of relevance and minimization of redundancy

Maximum 50 genes selected

Highest accuracy obtained and the # genes required to obtain

that are plotted

SVM based classification accuracy (leave-one-out method) Higher or same accuracy with lower number of genes, except breast, in

case of FEPM

IEEE Trans. Syst., Man and Cyberns., Part B, 40(3), pp. 741-752, 2010

Genes/Samples

7129/49

7070/72

2000/62

18000/51

26000/50

Page 50: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

miRNA Ranking in Cancer: Fuzzy-Rough

Entropy & Histogram Based Patient Selection

Set is crisp & Granules are fuzzy: Fuzzy-rough entropy

Fuzzy Lower Approx. of N & C classes: used to find prob. (relative

frequency) of definite and doubtful regions for entropy computation

Entropy minimization implies higher Relevance of a miRNA

Top 1% miRNAs provides significant improvement over entire set in

terms of F-score

Superiority over related methods

Example:

IEEE/ACM Trans. Computational Biology and Bioinformatics (to appear)

Algeria-46

Page 51: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Crisp Classes and Fuzzy Granules of Patients

Normal Class (N)

Cancer Class (C)

Normal patient

Normal granule

Cancer patient

Cancer granule

Page 52: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Granulation w.r.t. each miRNA (granules in one dimension)

M1

M1 a miRNA

Page 53: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

F s

core

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Results: Relevance ( All 1% selected )

0

100

200

300

400

500

600

700

800

900

1000

No. of

miR

NA

s

Classification using SVM

IEEE/ACM Trans. Computational Biology and Bioinformatics (to appear)

miRNAs/Samples

309/98

352/66

866/36

864/57

847/158

887/50

Page 54: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

F s

core

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0

10

20

30

40

50

60

70

80

90

100

No

. of

miR

NA

s

( 10% selected 20% removed from them )

Results: Redundancy

Classification using SVM

Page 55: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Results: HFREM (histogram based selection of patients)

F score by selecting different percentages of patients from √n (n = # each categoty paients) bins of the N and C histograms

Algeria-50

Page 56: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Fuzzy-rough Community (virtual group) in

Social Networks

Example:

Information Sciences, 314, 100-117, 2015

Granules model f-Relations/ Interactions of actors

High volume, dynamic, complex

Page 57: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Where are these leading to ?

Different Machine learning tools

Data: Videos, Gene expression and social

networks

Summary

Page 58: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Fuzzy (F)-Granularity characteristics of

Computational Theory of Perceptions

(CTP) can be modeled using Fuzzy-Rough

computing concept

Promising Future Research Problem

Relevance to CTP

Algeria-52

Page 59: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Relevance to BIG Data handling

Page 60: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Big-Data is

High volume (scalable), high velocity (dynamic), high

variety (heterogeneous) information

Usually involves a collection of data sets so large and

complex that it becomes difficult to process using

conventional data analysis tools

Requires exceptional technologies to efficiently

process within tolerable elapsed of times

NEED completely new forms of processing to enable enhanced

decision making and knowledge discovery

New approaches – challenges, techniques, tools & architectures to

solve new problems

Page 61: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

To develop complex procedures running over

large-scale, enormous-sized data repositories for

- extracting useful knowledge hidden therein,

- discovering new insights from Big Data,

and

- delivering accurate predictions of various

kinds as required in tolerable elapsed of

time.

BIG Data Objective:

Page 62: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Predicted lack of talent for Big-Data related

technologies in USA

Supply and demand of deep analytical talent by 2018 (in Thousand People)

SOURCE: US Bureau of Labor Statistics; US Census; Dun and Bradstreet; Company interview; McKinsey Global Institute analysis

• 2018 supply: 300 thousand

• 2018 projected demand:

440-490 thousand

Page 63: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

In India

Source: Analytics Special Interest Group, set up by NASSCOM

There will be a shortage of about 200,000.00 data scientist

in India over the next few years

Algeria-58

Page 64: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Dealing with big data: Issues

Research areas for handling big

data include:

IR,

PR,

DM, KDD,

ML,

NLP,

SemWeb,

… Data Modalities

Additional Issues

Data Operators

Scalability

Streaming

Context

Quality

Usage

Collect Prepare

Represent

Model

Reason

Visualize

Page 65: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Dealing with big data (Handling challenges lying with all Vs)

Veracity

Variability

Demands a revolutionary change both in

Research Methodologies and Tools

Terabyte: 1012 = 10004

Zettabyte: 1021 = 10007

Page 66: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Example: PR (till 80’s) –-> DM (since late Ninties)

• New approaches developed for different tasks of PR

to handle DM problems (large data both in size and

dimension)

• Example: Feature Selection - where instead of

clustering samples in conventional PR, you cluster

features themselves in DM

Page 67: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Soft Computing: Roles

Uncertainty handling

Learning

Reasoning

Searching and optimization

Efficient, Flexible & Robust methodologies

Page 68: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Dealing with big data: Tasks

Tasks like:

Data size and feature space adaptation

Feature selection/ extraction in Big data

Uncertainty modeling in learning, sample selection, and classification/ clustering on Big data

Granular computing (a clump of objects…)

Distributed learning techniques in uncertain environment

Uncertainty in cloud computing

-

-

(Where SC methodologies can be used, in general)

Page 69: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

• Soft computing paradigm appears to have a strong promise in developing methodologies • However, the existing computational intelligence

techniques may need be completely re-hauled to handle the challenging issues - • Large scale Simulation tasks: Need hundreads of

Parameter tuning • Computational complexity: Nlog(N), or N rather

than N2

• Incorporate high-level intelligence to gain insight - involve human in the process of computing

Computational aspects and scalability issues - not much addressed by SC community

Algeria-63

Page 70: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Without “Soft Computing”, Machine Intelligence

and Data Mining Research Remains Incomplete.

In conclusion -

Page 71: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Without Soft Computing and Granular

Mining, Big Data processing/analysis may

remain incomplete

Algeria-65

Page 72: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Acknowledgement

Students and younger colleagues/ collaborators

Raja Ramanna Fellowship of Govt. of India

Page 73: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Thank You!!

Page 74: Soft Computing, Machine Intelligence and Data Mining...Computing (Using lower & upper approximations) (Using information granules) Two Important Characteristics IEEE Trans. Syst.,

Giant Panda from Chendu: Life is so..o good with bamboo shoots