27
Formatting and structuring knowledge Ch 2 from textbook : Organizing Knowledge: An Introduction to Managing Access to Information .

Formatting and structuring knowledge

  • Upload
    iniko

  • View
    35

  • Download
    1

Embed Size (px)

DESCRIPTION

Formatting and structuring knowledge. Ch 2 from textbook: Organizing Knowledge: An Introduction to Managing Access to Information. The tool for formatting and structuring knowledge. Database Bibliographic relationships Text analysis Text markup Metadata. Data base. - PowerPoint PPT Presentation

Citation preview

Page 1: Formatting and structuring knowledge

Formatting and structuring knowledge Ch 2 from textbook:

Organizing Knowledge: AnIntroduction to Managing Access toInformation.

Page 2: Formatting and structuring knowledge

The tool for formatting and structuring knowledge 1. Database2. Bibliographic relationships3. Text analysis 4. Text markup5. Metadata

Page 3: Formatting and structuring knowledge

Data base •A database is a collection of information

that is organized so that it can easily be accessed, managed, and updated.

•Some data base will hold public accessible information , such as abstracting and indexing database , full texts of report and directories while other will be data base that are shared within an organization or group organization.

Page 4: Formatting and structuring knowledge

Data base (Cont) •There are two main types of databases:1. Reference database : these refer or

point the user to another source such as a document, an organization or an individual for additional information, or for the full text of a document.

2. Source databases :These contain the original source data, and are one type of electronic document.

Page 5: Formatting and structuring knowledge

Reference database•It is include : 1. Bibliographic database: including citations

or bibliographic references and some time abstracts of literature , it is tell the user what has been written and in which sours .

2. Catalogue database: it is show the stock of given library or library network but not give mush information on the contents of these document .

3. Referral databases : references to information or data such as the names and addresses of organization and other directory-type data

Decreases in information contents

Page 6: Formatting and structuring knowledge

Source databases

•Data is available in machine readable form instead of printed form .

•Source databases can grouped according to their content:1. Numeric database ( ex : statistics , survey data )

2. Full-text databases ( journal articles , newsletters)

3. Text-numeric databases (mix of textual and numeric)

4. Multimedia databases ( sound , video . Pix)

Page 7: Formatting and structuring knowledge

Database structures

Inverted file

Relation model

Object-orientated model

Page 8: Formatting and structuring knowledge

The inverted file

• Useful for searching complex text-based databases, where the searcher does not know the form in which the search key may have been entered in the database, and has, essentially, to guess the most appropriate form.

• The inverted file is similar to an index.• In the inverted file approach there may be two

or three separate files:▫ two-file approach ( txt file, index file) ▫Three-file approach ( txt file, intermediate file,

index file)

Page 9: Formatting and structuring knowledge

The relational model

• Relational database has been widely adopted in database systems.

• In relation system , information is held in a set of relations or tables.

• The row in the tables are equivalent to record and the columns in the tables are equivalent to filed

a) catalogued-book relation occurrencesISBN

0-82112-462-30-84131-460-70-69213-517-80-93112-345-9

TitleAlchemy

Expert systemsComputer science

bibliography

Author O. Ahmad

R. AliS. SalehM. Omar

Year2007200720032004

b) order-book relation occurrences

Order no644644645646

ISBN0-82112-462-30-84131-460-70-69213-517-80-93112-345-9

Quantity ordered1432

Page 10: Formatting and structuring knowledge

The object-orientated model

• The object oriented approach to programming and database designee constructs system and database as collection of reusable interacting object.

• The object-oriented approach is attractive because:

1. Objects are easy to change and develop without necessarily changing any other part of the system

2. New objects can be easily created from existing objects

3. Objects can be copied or transferred into new systems (With little difficulty)

Page 11: Formatting and structuring knowledge

Complex database structures

• Stander DB design focuses on data in a limited range of data types such as integer and text .

• Other data type such as image , audio and video present special challenges .

• Multimedia DBMS (MM-DBMS ) are used to manage different data types, such as images, audio and video .

• MM-DBMS seek to use a range of technologies, such as▫ relational technology for tables▫ text databases for documents ▫ image storage devices for graphics an animation.

Page 12: Formatting and structuring knowledge

Text and multimedia • A language has a vocabulary of words, a syntax and

a semantic .▫ Syntax : is a set of rules for stringing work together

to make meaningful statements.▫Semantic :is the name given to study of meaning in

language.• Structural patterns are:

▫Problem-solution : it is simplest problem and solution proposed.

▫General-particular: Generalization is made and provided with one or more examples.

• These structural patterns are often found in combination

Page 13: Formatting and structuring knowledge

Documents

•A document is a record of knowledge, information or data, or a creative expression .

•Characteristics of electronic documents▫Easily manipulatable▫Internally and externally linkable, through

hyperlinks▫Readily transformable▫Inherently searchable▫Instantly transportable▫Infinitely replicable

Page 14: Formatting and structuring knowledge

Bibliographic relationships• Research has identified seven categories of relationship

between tow or more document :

1. Equivalence relationships : equivalence for exact copies that can be used interchangeably such as reproductions from the same type set document, e.g.,

photocopies, reprints, faxes, e-mail, microfilm, and microfiche.

2. Derivative relationships- horizontal : are expressions, representing different editions, translations, adaptations, arrangements.

3. Descriptive relationships: include critical and evaluative reviews, criticism and interpretation, annotated editions, commentaries, and analyses (these are all new works).

Page 15: Formatting and structuring knowledge

Bibliographic relationships (cont)

4. Whole-part relationships: are hierarchical relationships between component parts and its whole

5. Accompanying relationships: ex : supplements, indexes, and individual maps within magazines

6. Sequential relationships 7. Shared characteristic relationships

include different works that share an attribute, such as title, subject, or author.

Page 16: Formatting and structuring knowledge

Bibliographic relationships• The functional requirements for bibliographic records

(FRBR) identified a different set of relationships in the definition of the concepts

▫Work: the artistic creation .

▫Expression: the article realization of work through which the work can be read , seen , head of felt.

▫Manifestation: the format in which one of the expressions of the work can be found (HTML or PDF)

▫ Item: single exemplar of Manifestation.

Page 17: Formatting and structuring knowledge

Text analysis

•With text analysis can automate processes such as

1. Extracting keywords2. Preparing document representations :

▫ Ex: by processing the text to generate abstract .

3. Determining various characteristics of a text

▫ Ex: the level of the reading difficulty

Page 18: Formatting and structuring knowledge

Approaches to text analysis

•Statistical analysis: based on counting the frequency of particular words in the text

•Structural analysis: (knowledge-based analysis) scans the text for words, phases or sentences that are in significant position within the text.

Page 19: Formatting and structuring knowledge

Text markup and encoding • Electronic test at its most basic uses the ASCII

character set.• The application of markup to plain (ASCII) text

enables electronic documents to be stored and re-used efficiently

• Markup is of two kinds:1. Procedural markup : defines the final presentation

of the documents and application . 2. Descriptive markup: defines the heading ,content

list ,paragraphs and other element which make up the structure of document .

Page 20: Formatting and structuring knowledge

SGML

•SGML- standard generalized markup language: used to embedding descriptive markup within a document, and thus for describing the structure of a document.

•SGML formally describes the role of each piece of text, using labels enclosed within <brakets>

•It is a descriptive not a procedural

Page 21: Formatting and structuring knowledge

HTML

Page 22: Formatting and structuring knowledge

HTML

•HTML- hypertext markup language: is a subset of SGML – formally, it is an SGML document type definition – that has been specially developed for creating world wide web documents

•HTML is used to define the display of web documents, including features such as font size and type, background and text colors, the use of bold and italic, and page layout.

Page 23: Formatting and structuring knowledge

XML

•XML –eXtensible markup language: is a version of SGML that can be sues on the web. As compared with HTML, XML is extensible in the sense that new markup tags can be created to facilitate searching and exchange of information.

•An XML implementation typically consists of three parts: the XML document, a document type definition (DTD) and a style sheet (XSL)

Page 24: Formatting and structuring knowledge

DTD

•DTD- Document type definition: are SGML or XML applications that define the structure of a particular type of document, using markup.

•An XML schema is a richer form of a DTD that defines not only the structure, but also the content and semantics or meaning of documents

Page 25: Formatting and structuring knowledge

Both DTDs and XML schemas define:

•Elements that might be part of a particular document type

•Element names and whether they are repeatable

•The content of elements•What kinds of markup can be omitted•Tag attributes and their default values•Names of permissible entities

Page 26: Formatting and structuring knowledge

MetadataWhat is metadata?• Metadata is data about data created to describe

or represent the attributes and contents of that information package.

• Metadata is a form of document representation, but it is not a document surrogate in the way that a catalogue entry is .

• Metadata is linked directly to the resource and allows direct access to the resource .

Page 27: Formatting and structuring knowledge

The end