48
Big Data and the SP Theory of Intelligence Varsha Prabhakaran S8 CSE B Roll No: 43 1

Big data and SP Theory of Intelligence

Embed Size (px)

Citation preview

Page 1: Big data and SP Theory of Intelligence

Big Data and the SP Theory of

Intelligence

Varsha PrabhakaranS8 CSE B

Roll No: 43

1

Page 2: Big data and SP Theory of Intelligence

Contents

Introduction

SP Theory of Intelligence

Problems of Big Data

Volume

Efficiency

Transmission

Variety

Veracity

Visualization

2

Page 3: Big data and SP Theory of Intelligence

Introduction

SP theory of intelligence be applied to the management

and analysis of big data

Overcomes the problem of variety in big data.

Analysis of streaming data- velocity

Economies in the transmission of data

Veracity in big data.

Visualization of knowledge structures and inferential

processes3

Page 4: Big data and SP Theory of Intelligence

SP Theory of Intelligence

4

Page 5: Big data and SP Theory of Intelligence

SP Theory of Intelligence Designed to simplify and integrate concepts across artificial intelligence, mainstream computing, and human perception and cognition.

Product of an extensive program of development and testing via the SP computer model.

Knowledge represented with arrays of atomic symbols in one

or two dimensions called “patterns”.

Processing are done by compressing information

Via the matching and unification of patterns.

Via the building of multiple alignments .

5

Page 6: Big data and SP Theory of Intelligence

Benefits of the SP Theory

Conceptual simplicity combined with descriptive and

explanatory power across several aspects of intelligence.

Simplification of computing systems, including software.

Deeper insights and better solutions in several areas of

application.

Seamless integration of structures and functions within and

between different areas of application

6

Page 7: Big data and SP Theory of Intelligence

SIMPLIFICATION OF COMPUTING SYSTEMS

7

Page 8: Big data and SP Theory of Intelligence

MULTIPLE ALIGNMENT: A CONCEPTBORROWED FROM BIOINFORMATICS

8

Page 9: Big data and SP Theory of Intelligence

Multiple Alignment

The system aims to find multiple alignments that enable a

New pattern to be encoded economically in terms of one or

more Old patterns

Multiple alignment provides the key to:

Versatility in representing different kinds of knowledge.

Versatility in different kinds of processing in AI and mainstream

computing.

9

Page 10: Big data and SP Theory of Intelligence

Multiple Alignment

S → NP V NP

NP → D N

D → t h i s

D → t h a t

N → g i r l

N → b o y

V → l o v e s

V → h a t e s

10

Page 11: Big data and SP Theory of Intelligence

Multiple Alignment

11

Page 12: Big data and SP Theory of Intelligence

Multiple Alignment

12

S 0 1 0 1 0 #S

Page 13: Big data and SP Theory of Intelligence

Multiple Alignment

Compression difference:

CD = BN-BEBN :total number of bits in those symbol in the New pattern that are aligned with Old symbols in the alignment

BE :the total number of bits in the symbols in the code pattern

Compression ratio:

CR = BN/BE;

13

Page 14: Big data and SP Theory of Intelligence

Multiple Alignment

BN is calculated as: h

BN = Ʃ Ci i=1

Ci is the size of the code for ith symbol in a sequence, H1...Hh, com- prising those symbols within the New pattern that are aligned with Old symbols

14

Page 15: Big data and SP Theory of Intelligence

Multiple Alignment

BE is calculated as:

s

BE = Ʃ Ci

i=1

where Ci is the size of the code for ith symbol in the sequence of s symbols in the code pattern derived from the multiple alignment.

15

Page 16: Big data and SP Theory of Intelligence

Multiple Alignment

16

Page 17: Big data and SP Theory of Intelligence

17

Page 18: Big data and SP Theory of Intelligence

18

Page 19: Big data and SP Theory of Intelligence

Big Data

19

Page 20: Big data and SP Theory of Intelligence

Problems of Big Data and Solutions

Volume: big data is … BIG!

Efficiency in computation and the use of energy.

Unsupervised learning: discovering ‘natural’ structures in data.

Transmission of information and the use of energy.

Variety: in kinds of data, formats, and modes of processing.

Veracity: errors and uncertainties in data.

Interpretation of data: pattern recognition, reasoning

Velocity: analysis of streaming data.

Visualization: representing structures and processes

20

Page 21: Big data and SP Theory of Intelligence

Volume: Making Big Data Smaller

“Very-large-scale data sets introduce many data management

challenges.”

Information compression.

Direct benefits in storage, management and transmission.

Indirect benefits

efficiency in computation and the use of energy

unsupervised learning

additional economies in transmission and the use of energy

assistance in the management of errors and uncertainties in data

processes of interpretation.

27

Page 22: Big data and SP Theory of Intelligence

Energy, Speed and Bulk

In the SP theory, a process of searching for matching patterns

is central in all kinds of ‘processing’ or ‘computing’.

This means that anything that increases the efficiency of

searching will increase computational efficiency and,

probably, cut the use of energy:

Reducing the volume of big data.

Exploiting ***probabilities***.

Cutting out some searching. 22

Page 23: Big data and SP Theory of Intelligence

Efficiency via Reduction in Volume

Information compression is central in how the SP system works:

Reducing the size of big data.

Reducing the size of search terms.

Both these things can increase the efficiency of searching, meaning gains in computational efficiency and cuts in the use of energy.

23

Page 24: Big data and SP Theory of Intelligence

Efficiency Via Probabilities

24

Page 25: Big data and SP Theory of Intelligence

Efficiency Via Probabilities

25

Page 26: Big data and SP Theory of Intelligence

26

Efficiency Via Probabilities

26

Page 27: Big data and SP Theory of Intelligence

27

Efficiency Via Probabilities

27

Page 28: Big data and SP Theory of Intelligence

Efficiency Via Probabilities

Statistical knowledge flows directly from:

Information compression in the SP system and

The intimate connection between information compression and

concepts of prediction and probability.

There is great potential to cut out unnecessary searching, with

consequent gains in efficiency.

Potential for savings at all levels and in all parts of the system

and on many fronts in its stored knowledge.

28

Page 29: Big data and SP Theory of Intelligence

Efficiency via a Synergy with Data-Centric Computing

29

Page 30: Big data and SP Theory of Intelligence

Efficiency via a Synergy with Data-Centric Computing

In SP-neural, SP patterns may be realized as

neuronal pattern assemblies.

There would be close integration of data and

processing, as in data-centric computing.

Direct connections may cut out some

searching30

Page 31: Big data and SP Theory of Intelligence

Unsupervised learningLossless compression of a body of information

Information compression, or “minimum length encoding”

remains the key.

Matching and unification of patterns

SP computer model has already demonstrated an ability to

discover generative grammars, including segmental

structures, classes of structure, and abstract patterns.

For body of information, I, the products of learning are:

a grammar (G) and an encoding (E) of I in terms of G31

Page 32: Big data and SP Theory of Intelligence

Product of Learning

32

Page 33: Big data and SP Theory of Intelligence

Transmission of Data

• By making big data smaller (“Volume”).

• By separating grammar (G) from encoding (E), as in some

dictionary techniques and analysis/synthesis schemes.

• Efficiency in transmission can mean cuts in the use of energy.

33

Page 34: Big data and SP Theory of Intelligence

Transmission of Data

34

Page 35: Big data and SP Theory of Intelligence

Transmission of Data

Simplicity of a focus on the matching and unification of

patterns.

Aims to discover structures that are, quotes, “natural”.

Brain-inspired “DONSVIC” principle can mean relatively

high levels of information compression.

Potential for G to include structures not recognized by most

compression algorithms, such as: Generic 3D models of objects and scenes.

Generic sequential redundancies across sequences of frames. 35

Page 36: Big data and SP Theory of Intelligence

Overcoming Problems of Variety of Big Data

Diverse kinds of data: the world’s many languages, spoken or written; static and moving images; music as sound and music in its written form; numbers and mathematical notations; tables; charts; graphs; networks; trees; grammars; computer programs; and more.

There are often several different computer formats for each kind of data. With images, for example: JPEG, TIFF, WMF, BMP, GIF, EPS, PDF, PNG, PBM, and more.

Adding to the complexity is that each kind of data and each format normally requires its own special mode of processing.

THIS IS A MESS! It needs cleaning up.

Although some kinds of diversity are useful, there is a case for developing a universal framework for the representation and processing of diverse kinds of knowledge (UFK).

36

Page 37: Big data and SP Theory of Intelligence

Universal Framework for the Representation and Processing of Knowledge(UFK)

Potential benefits of a UFK in: ● Learning structure in data ● Interpretation of data; ● Data fusion; ● Understanding and translation of natural languages; ● The semantic web and internet of things; ● Long-term preservation of data; ● Seamless integration in the representation and processing of diverse kinds of knowledge.

Most concepts are an amalgam of diverse kinds of knowledge (which implies some uniformity in the representation and processing of diverse kinds of knowledge).

The SP system is a good candidate for the role of UFK because of its versatility in the representation and processing of diverse kinds of knowledge.

37

Page 38: Big data and SP Theory of Intelligence

How Variety Hinders LearningDiscovering the association between lightning and thunder is likely to be difficult when: Lightning appears in big data as a static image in one of several formats; or in a moving image in one of several formats; or it is described, in spoken or written form, as any one of such things as “firebolt”, “fulmination”, “la foudre”, “der Blitz”, “lluched”, “a big flash in the sky”, or indeed “lightning”.

Thunder is represented in one of several different audio formats; or it is described, in spoken or written form, as “thunder”, “gök gürültüsü”, “le tonnerre”, “a great rumble”, and so on.

If learning and discovery processes are going to work effectively, we need to get behind these surface forms and focus on the underlying meanings. This can be done using a UFK.

38

Page 39: Big data and SP Theory of Intelligence

Veracity

“In building a statistical model from any data source, one must often deal with the fact that data are imperfect. Real-world data are corrupted with noise. … Measurement processes are inherently noisy, data can be recorded with error, and parts of the data may be missing.”

In tasks such as parsing or pattern recognition, the SP system is robust in the face of errors of omission, addition, or substitution.

39

Page 40: Big data and SP Theory of Intelligence

Veracity

40

Page 41: Big data and SP Theory of Intelligence

Veracity

When we learn a first language (L):

We learn from a finite sample.

We generalize (to L) without over-generalising.

We learn ‘correct’ knowledge despite ‘dirty data’.

41

Page 42: Big data and SP Theory of Intelligence

Veracity

For any body of data, I, principles of minimum-length encoding

provide the key:

Aim to minimize the overall size of G and E.

G is a distillation or ‘essence’ of I, that excludes most ‘errors’

and generalizes beyond I.

E + G is a lossless compression of I including typos etc but

without generalizations.

Systematic distortions remain a problem.42

Page 43: Big data and SP Theory of Intelligence

Interpretation of Data

Processing I in conjunction with a pre-established grammar (G) to create a

relatively compact encoding (E) of I

Depending on the nature of I and G, the process of interpretation may be

seen to achieve:

Pattern recognition

Information retrieval

Parsing and production of natural language

Translation from one representation to another

Planning

Problem solving43

Page 44: Big data and SP Theory of Intelligence

Velocity: Analysis of Streaming Data

In the context of big data, “velocity” means the analysis

of streaming data as it is received.

“This is the way humans process information.”

This style of analysis is at the heart of how the SP

system has been designed.

Unsupervised learning.

44

Page 45: Big data and SP Theory of Intelligence

Visualizations

The SP system is well suited to visualization for these reasons:

Transparency in the representation of knowledge.

Transparency in processing.

The system is designed to discover ‘natural’ structures in data.

There is clear potential to integrate visualization with the

statistical techniques that lie at the heart of how the SP system

works.

45

Page 46: Big data and SP Theory of Intelligence

Conclusion

Designed to simplify and integrate concepts across artificial intelligence, mainstream computing, and human perception and cognition, has potential in the management and analysis of big data.

The SP system has potential as a universal framework for the representation and processing of diverse kinds of knowledge (UFK), helping to reduce the problem of variety in big data

the great diversity of formalisms and formats for knowledge, and how they are processed.

46

Page 47: Big data and SP Theory of Intelligence

Bibliography

www.cognitionresearch.org/sp.htm .

Article: “Big data and the SP theory of

intelligence”, J G Wolff, IEEE Access, 2, 301-

315, 2014.

47

Page 48: Big data and SP Theory of Intelligence

48