View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Institute of Scientific Computing – University of Vienna
P.Brezany1
Data Analysis for Decision andManagement Processes
Univ.-Prof. Dr. Peter Brezany
Institute of Scientific Computing
Faculty for Information Science
University of Vienna
E-mail : [email protected]
WWW: http://www.par.univie.ac.at/~brezany
http://artemis.wszib.edu.pl/~brezany/
Institute of Scientific Computing – University of Vienna
P.Brezany2
Institute of Scientific Computing – Research Profile
The primary objectives of the Institute are
- to conduct research in high-performance advanced data analysis, knowledge management, programming languages, compilers, programming environments and software tools for high performance computing systems,
- to actively contribute to a transfer of technology to industry
- to disseminate knowledge in the fields of parallel and distributed computing and software technology
Institute of Scientific Computing – University of Vienna
P.Brezany3
Institute for Software Science – Main Research Projects and
Cooperations
Participation in 14 EU projects (coordination of 1 project)
The European Centre of Excellence for Parallel Computing,a department of the Institute, founded by the EU
Coordination of the CEI-PACT project (Austria, Slovakia, Czech Republic, Poland, Italy, Hungary, Slovenia)
Special Research Program AURORA of the Austrian ScienceFund (1997-2007)
Many international cooperations (NASA, CalTech, CERN, ...)
Institute of Scientific Computing – University of Vienna
P.Brezany4
New Research Field: GRID COMPUTING
The Grid – a new distributed com-puting infrastructure for science and engineering.
The Grid consists of physical resources (computers, disks, net-works, databases, sensors, laboratoryequipments) and “middleware“ software that ensures the access and the coordinated use of such resources.
Institute of Scientific Computing – University of Vienna
P.Brezany5
Media That Radically Influenced Society
Web
1500sPrinting Press
1840sPenny Post
1850sTelegraph
1920sTelephone
1930sRadio
1990s
1950s TV
20xxGrid
Institute of Scientific Computing – University of Vienna
P.Brezany6
Outline• Business Intelligence, knowledge management• Relation: data, information, knowledge• Knowledge discovery process – System
Architectures• Data warehousing and data webhousing• Data preparation:
– selection, preprocessing (cleaning, transformation), integration
• Data mining techniques – association rules, sequences, classification, prediction, neural
networks, clustering, meta-learning
• Advanced topics– Multi-agent and mobile agent systems– Web mining– intelligent search engines – semantic web– information and knowledge management on computing grids – security issues
Institute of Scientific Computing – University of Vienna
P.Brezany7
Basic LiteratureMark and Mary Whitehorn: Business Intelligence: The IBM Solution. Springer-Verlag, 2000.
R. Kimball: The Data Warehouse Toolkit. John Willey, 1996.
J. Han, M. Kamber: Data Mining. Concepts and TechniquesMorgam Kaufmann Publishers, 2000.
M. Ester, J. Sander: Knowledge Discovery in Databases.Springer-Verlag, 2000 (in German).
I.H. Witten, E. Frank: Data Mining. (Practical Machine Learning Tools and Techniques with Java Implementations).Morgam Kaufmann Publishers, 2000.
Institute of Scientific Computing – University of Vienna
P.Brezany8
Time Schedule
• Monday, Feb 27 : 17.15 – 20.30 (4 hours)
• Tuesday, Feb 28: 10.00 -- 13.15 (4 hours)
• Wednesday, Mar 01: 15.30 – 18.45 (3 hours)
• Thursday, Mar 02: 16.00 --18.15 (3 hours)
Location: s.1 AK4
Institute of Scientific Computing – University of Vienna
P.Brezany9
Business Intelligence
Definition:
Business Intelligence is an umbrella term, broadly covering theprocesses involved in extracting valuable business informationand knowledge from the mass of data that exists within a typical enterprise, and knowledge management (knowledge storage in an appropriate form and knowledge distribution).
What is meant by information and knowledge? This is best un- derstood by imagining a chain linking data information knowledge.
Institute of Scientific Computing – University of Vienna
P.Brezany10
Data Information Knowledge
• Data are the facts about events or processes.
• Information is the organization of, associations between, and constraints upon data that allow it to be used by a user or a machine.
• Knowledge is the interpretation of information and its use in a problem solving context. Knowledge can lead to new insights, which in turn lead to new innovations and ultimately to wealth creation and improvements in the quality of life.
• Wisdom arises when one understands the foundational principles responsible for the patterns representing knowledge (She/he can answer questions like Why ... ? and knows how he can find or derive new knowledge.
Institute of Scientific Computing – University of Vienna
P.Brezany11
Data
Example: When a customer visits a gass station and buys
petrol, it is possible to describe this transaction with the
following data: data/time, volume, price.
However, this data do not say, why this customer has chosen
this station and not any other, and it is not possible to find out from this data whether he will come again, or whether this
station is good or bad.
Data alone posses almost no meaning nor purpose. They are
the base material for getting information.
Institute of Scientific Computing – University of Vienna
P.Brezany12
Information
• A piece of information can be described as a message.
• As all messages, information has one sender and one receiver.
• Information shall form the opinion or attitude of the receiver to a problem and influence his behavior.
• We can also think of information as data which something changes/forms/influences.
• The word ``inform´´ originally meant ``give some form one thing or person´´.
Institute of Scientific Computing – University of Vienna
P.Brezany13
Information (2)
• Data become information when the receiver adds some meaning to data. Such a data upgrading can be done in different ways, for example:
– Contextualizing: We know for what purpose the data was collected.
– Calculation: The data could be mathematically analyzed und statistically
enriched.– Correction: Errors are removed from the data material.– Comprising: The data is transformed into a more compact
form; main components of the data material have to be identified.
Institute of Scientific Computing – University of Vienna
P.Brezany14
Information Management
Information management: all management tasks, which dealwith information and communication in one enterprise.
Institute of Scientific Computing – University of Vienna
P.Brezany15
Knowledge
• Knowledge is the production factor of the future, which will replace energy and materials.
• Knowledge is produced by means of head activity and processes, which modell the head activity.
• Transformation process Information Knowledge:
– Comparison: How shall I estimate information about the current situation in comparison to other known situations?
– Consequence: How will information influence decisions and activities.
– Connex: Which relations exist between one concrete information element and another one?
– Conversation: How do think other people about one certain piece of information?
Institute of Scientific Computing – University of Vienna
P.Brezany16
Knowledge (2)
• People gain knowledge through experience – they see, hear, touch, and taste the world around them.
• We can associate something we see with something we hear, thereby gaining new knowledge about the world.
• Suppose we know that the sun is hot, balls are round, and the sky is blue. These facts are knowledge about the world. How do we store this knowledge in our brain? How could we store this knowledge in a computer?
• This problem, called knowledge representation, is one of the first, most fundamental issues that researchers in artificial intelligence had to face.
Institute of Scientific Computing – University of Vienna
P.Brezany17
Knowledge Pyramide
Action
Knowledge
Information
Data
Characters
Syntax
Semantics (Meaning)
Pragmatics (Associated withContext and Experience)
Decision
Knowledge has 3 Dimensions: Syntax, Semantics, and Pragmatics.
Institute of Scientific Computing – University of Vienna
P.Brezany18
Example
• Characters: t i s n i o r o l a n i l w
• Data: The above characters give with the right syntax (here the sequence of letters) a
statement „It will rain soon“.
• Information: The above statement means:„Water drops fall from the sky“.
• Knowledge: Information „Water drops fall from the sky.“ isconnected with experience and expectations
like: „One can become wet; it can rain into the flat“.
• Action: Based on this knowledge, activities are developed: „I will take an umbrella, I will close the window, etc.“
Institute of Scientific Computing – University of Vienna
P.Brezany19
Knowledge Management
Knowledge management: all management tasks of the enterprise, which deal with obtaining, utillization, and further development of knowledge.
Institute of Scientific Computing – University of Vienna
P.Brezany20
Knowledge Representation
• Procedural representation– Perhaps the most common technique for representing
knowledge in computers is to use procedural knowledge.Procedural code not only encodes facts (constants and variables) but also defines a sequence of operations for using and manipulating those facts. Thus, program code is a perfect natural way of encoding procedural knowledge. This „hardcoded“ logic is typically not considered to be part of artificial intelligence per se.
• Declarative representation – A user simply states facts, rules, and relationships.
However, declarative knowledge must be processed by some procedural code. Most of the knowledge representation techniques studied in artificial intelligence are declarative. Some of them are shown on the following slides.
Institute of Scientific Computing – University of Vienna
P.Brezany21
Knowledge Representation - Rules
General form of a predicate logic rule:
if antecedents(s) then consequents(s)
(Instead antecedent, other names, e.g., precondition, are used. Instead consequent, other names, e.g., conclusion, action, hypothesis, are used.)
Rules can have following forms:• if P then Q• if P1 and P2 and ... and Pn then Q1 and Q2 and ... and Qm
• if P1 and P2 or ... or Pn then Q
Rules, which produce new facts, are called production rules.
Institute of Scientific Computing – University of Vienna
P.Brezany22
Rules (2)Architecture of a Production
System
Rules
Knowledge base Fact base Inference mechanisms
Facts
Act
Recognize Select
Institute of Scientific Computing – University of Vienna
P.Brezany23
Semantic Nets
Semantic nets are used to define the meaning of a concept by itsrelationships to other concepts.
A graph data structure is used, with nodes holding concepts andlinks with natural language labels showing the relationships.
A portion of a semantic net representation of the vehicle domain isshown in the next slide.
Remark: The standard relationships such as isa, has-part, and instance should be familiar to readers with object-oriented design experience.
Institute of Scientific Computing – University of Vienna
P.Brezany24
A Semantic Net Example
Vehicle
Automobile
Sports Car
Corvette
2
4
Motor
Wheels
Small
Doors
has-part
has-part
size
instance
num-doors
num-wheels
has-part
is-a
is-a
Institute of Scientific Computing – University of Vienna
P.Brezany25
Business Intelligence Tools
• Data warehouses
• OLAP (On-Line Analytical Processing) tools
• Data mining tools
• Text mining tools
• Data joiners
• Business Intelligence portals, etc.
Institute of Scientific Computing – University of Vienna
P.Brezany26
Business Intelligence Tools (cont.)• Data warehouse - a repository of multiple heterogeneous data
sources, organized under a unified schema at a single site in order to facilitate management decision making.
• OLAP – analysis techniques with functionalities such as summari- zation, consolidation, and aggregation, as well as the ability to view information from different angles.
• Data mining – extracting or “mining“ knowledge from large data sets.
• Text mining – “mining“ large textual (document) databases. Related term – web mining.
• Data joiner - working with data from disparate, heterogeneous data sources
• Business Intelligence portal – a Web site designed to be the first point of entry for visitors to information about a company. With help of the portal´s personalising functions, the user can choose informa-tion sources that he needs for performing a specific task. The portal allows problemless access to valuable information and data analyses; so, the basis for competent decisions is optimized.