22
UNIT 5 Storing, getting sending your data SUMMARY MATERIAL www.ouw.co.uk, or contact Open University Worldwide, Michael Young Bu1908 858785; fax +44 (0)1908 858787; e-mail [email protected] The Open University, Walton Hall, Milton Keynes, MK7 6AA Licensed for use by the Arab Open University ____________________________________________________________________ _______ UNIT 5 Storing, getting and 1 | Page

S€¦  · Web viewObviously documents in the same folder must have different names otherwise their path names would be identical; consequently there would be no way to distinguish

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: S€¦  · Web viewObviously documents in the same folder must have different names otherwise their path names would be identical; consequently there would be no way to distinguish

UNIT 5 – Storing, getting sending your data SUMMARY MATERIAL

www.ouw.co.uk, or contact Open University Worldwide, Michael Young Bu1908 858785; fax +44 (0)1908 858787; e-mail [email protected] Open University, Walton Hall, Milton Keynes, MK7 6AA

Licensed for use by the Arab Open University

___________________________________________________________________________

UNIT 5 – Storing, getting and sending your data

SUMMARY MATERIAL ______________________________________________

2008–2009Storing, getting sending your data

1 | P a g e

Page 2: S€¦  · Web viewObviously documents in the same folder must have different names otherwise their path names would be identical; consequently there would be no way to distinguish

UNIT 5 – Storing, getting sending your data SUMMARY MATERIAL

This unit examines issues which arise when storing and transmitting data using computers. It aims to:

Describe the notion of persistent data, how it is created, and how it is stored and accessed (logically and physically) on various types of storage device .

Explain how the internet and the applications that uses it work, and address some of the issues that arise from transmitting data between computer systems.

Explain how databases facilitate the storage, access and protection of data, and how metadata is important in providing access to multimedia databases.

Explore the issues of privacy and ownership of data and analyze some of the risks arising from storing data on computers and transmitting it across networks.

Storing text-based data in Documents : Folders

We assume that you are familiar with the process of creating documents on your computer. If you want such documents to persist as so-called persistent data (i.e. to exist after closing down the application that created them or after switching off your computer) they need to be saved. To facilitate subsequent retrieval, you store your documents in some logical arrangement on a suitable storage medium for holding persistent data such as your computer’s hard disk.

In a window, there are many documents which can be reached by inspecting the contents of the folders. Each time you open (by double clicking) a folder you see a window whose contents are more documents and folders. Eventually you will reach the lowest level of the hierarchy, in which the contents are all documents and there are no further folders to open. This is called a hierarchical or nested folder structure, because each folder may contain other folders as shown below :

With a bit of imagination, you can think of the folder structure loosely as a tree lying on its side, see Figure 2.5. The desktop is the root of the tree, and each folder is a branch. The leaves of the tree correspond to documents. Any similar hierarchical arrangement of objects is frequently called a tree structure or just a tree.

2 | P a g e

Page 3: S€¦  · Web viewObviously documents in the same folder must have different names otherwise their path names would be identical; consequently there would be no way to distinguish

UNIT 5 – Storing, getting sending your data SUMMARY MATERIAL

A path allows you to identify unambiguously a folder or document and is often referred to as its full name or full path name. Obviously documents in the same folder must have different names otherwise their path names would be identical; consequently there would be no way to distinguish between them. In any case, the computer will not allow you to have two documents with the same name in one folder.

The Search/Find function

Operating systems come with a search function which allows you to find items you have ‘lost’. The Windows XP search function is called ‘Search’, and you can access it through the ‘Start’ menu. The search will begin in ‘My Computer’ (if that is displayed in the ‘Look in:’ panel). You can choose to start the search elsewhere, either by using the drop-down list or by browsing to the place where you want the search to start. You can also searchany disk or folder by right clicking its icon and selecting ‘Search ‘

When you open a folder under a Windows operating system, the folder window shows the path name of the folder in its title bar. Consider the Windows path name: C:\Projects\M150\Assignments\TMA02.doc

The .doc extension identifies this document as a Word document. Here ‘C:’ is the root, and refers to the computer’s hard disk. (Windows uses a quaint convention which identifies hard disks and other storage media using letters of the alphabet followed by a colon.) ‘Projects’ is the name of a folder at the top level of the hard disk. ‘Projects’ contains a folder called ‘M150’, which in turn contains an ‘Assignments’ folder. The document ‘TMA02.doc’ is in the ‘Assignments’ folder. Another place where you can see full path names is in a ‘Search Results’ window. The files and folders that meet the search criterion are listed together with the full path name of the enclosing folder.

Directories

Each folder has a list, or directory, of the folders and documents that it contains. Part of the directory for a given folder can be displayed on screen (using the Views button) in a number of ways to aid human identification of the contents:

Alphabetically by name is an obvious ordering; In order of last modification date so that you can easily spot the

3 | P a g e

Page 4: S€¦  · Web viewObviously documents in the same folder must have different names otherwise their path names would be identical; consequently there would be no way to distinguish

UNIT 5 – Storing, getting sending your data SUMMARY MATERIAL

Documents you have worked on most recently; By size is sometimes useful; By type can make it easier to locate what you are looking for if you have too

many items in a folder.

The directory of a folder also lists the address or physical location on the disk of each document and subfolder in that folder . This address is internal to the operating system and cannot be seen in a user window. In the case of a subfolder, the address given is the location of the directory of that subfolder.

Storage technologies : 1. Gigabytes and terabytes

The number of bytes of data that can be held on a storage medium is called the capacity of the medium. Typical document sizes are measured in kilobytes and megabytes, but media capacities are much larger than this and can be measured in gigabytes and terabytes. Table 2.1 gives the names and sizes of these commonly used terms.

2. Hard disk storage

Nowadays a typical PC comes with a 40GB hard disk, large enough to store the equivalent of several multi-volume encyclopedias (in 1990 20MB was considered generous). Much larger disks with a capacity of up to a terabyte are also available.The actual reading from and writing to the disk surface is performed by a read/write head, which is attached to an arm that moves to and from the centre so that it can locate any track on the disk .

The disk is kept spinning continuously, so each sector is under the head at some time. The head hovers close to the spinning surface, which needs to be engineered carefully to avoid physical contact between the head and the surface. If the two should touch, the surface coating will be damaged, destroying the magnetic pattern and the data stored can no longer be retrieved. This is an example of a disk crash. For each plate in a disk there are two read/write heads, one for each surface. In a read operation the head detects a magnetized pattern and transmits it as a series of bits to the processor. In a write operation, the head magnetizes the relevant pattern of bits on to the surface. When new data is written to a magnetic disk, the only thing that changes is the magnetic pattern recorded on the disk. For this reason, magnetic disks they are called can be reused repeatedly. It is only when they deteriorate physically that they can no longer be used.

3. Removable storage media devices

Your computer may be fitted with a Zip drive, which can accept removable hard disks of 100MB or 250MB capacity. A Zip drive works on exactly the same principle as a fixed hard disk, but you can change the disk in the drive. One use for removable disks is to

4 | P a g e

Page 5: S€¦  · Web viewObviously documents in the same folder must have different names otherwise their path names would be identical; consequently there would be no way to distinguish

UNIT 5 – Storing, getting sending your data SUMMARY MATERIAL

make documents portable. A document stored on a removable disk can be taken to another computer having a Zip drive and opened there.

Older computers are fitted with drives to read/write floppy disks, which work on the same principle as a Zip drive, but with a capacity limited to 1.4MB. A removable medium which is gaining in popularity is the memory card.

Conventional CDs are called CD-ROMs (where ROM stands for read-only memory), and have bits of data stored as ‘pits’ in their groove. Beams of laser light are used to burn the pits on the disc. A CD drive works by shining a low-power laser beam on the disc, which detects the presence or absence of a pit (the pits do not reflect the light). DVDs (also called DVD-ROMs) work in much the same way, but the data is packed more tightly, using:

Smaller pits A narrower groove Less overhead for error correction.

One important difference between optical CD/DVD-ROM discs and magnetic disks (fixed or removable) is the ability to write to them. Hard disks allow you to rewrite data, whereas standard optical discs do not. In fact, as mentioned above, there is no theoretical limit to the number of times the data on a hard disk can be changed. A new document can be saved on the disk, overwriting an existing one. However, once a pit has been burned on to the surface of an optical disc, it cannot be erased.

Most computers have drives capable of reading CD-ROMs and DVD-ROMs and nowadays it is common for them to have drives able to burn CDs as well. This requires special software and special writable CDs. There are two kinds of CDs which ordinary computer users can write to.

Recordable CDs, known as CD-R, have a sensitive dye layer. Instead of burning pits on the CD, the writing process dyes the relevant parts of the groove.

Rewritable CDs, known as CD-RW, use a different technology altogether. (For this reason not all CD drives can read CD-RW disks.) The CD-RW writer can heat a point on the disk to one of two temperatures corresponding to different states of the material. This process is reversible, so you can write to a CD-RW many times.

Labeling volumes

A printed book may come in one or more volumes and similar terminology is used for electronic media. Typically a Zip disk or CD is called a volume. A hard disk is likewise called a volume (this becomes significant if your computer has more than one hard disk mounted). Normally a hard disk is recognized by the operating system as a single volume. However, it is possible to partition or format a disk so that its contents appear to occupy more than one volume.

Besides its physical label a volume should also have an electronic label, which, for consistency, should be the same as the physical label. This electronic label is the name of the volume, and it will be displayed when you search the contents of your computer.Sensible organization of storage Each volume contains a large number of documents, so there has to be a means of locating the one you want. Just as the houses in a road are

5 | P a g e

Page 6: S€¦  · Web viewObviously documents in the same folder must have different names otherwise their path names would be identical; consequently there would be no way to distinguish

UNIT 5 – Storing, getting sending your data SUMMARY MATERIAL

normally numbered for reference, so too are the available positions on a disk. In the case of a magnetic disk three numbers are required to identify a block of data: cylinder number, surface number and sector number. This set of three numbers is called the address of the block. To locate a document on the disk the operating system needs to know its address.

A single document might occupy one or more blocks on the disk. Given a block size of 512 bytes (0.5KB), a document whose size is 1MB will occupy 2,048 blocks. At the end of each block there is a marker which either indicates that this is the final block for the document or gives the address of the block that holds the next portion of the document.

Moving documents

What actually happens when you move a document ‘M150notes.doc’ from a folder (say ‘Current’) to another (say ‘Models’) on your hard disk? Of course, you expect that from now on it will no longer be displayed in ‘Current’, but will appear when you inspect ‘Models’. What happens to the document itself, though? The simple answer is nothing. Moving a document between folders on a disk is really an illusion because the document does not move at all! What really happens is something different. The document’s physical location remains unchanged, but the directories change

Deleting documents

What actually happens when you delete a document?

Modern operating systems usually have mechanisms to protect users against themselves. The first line of defense is that when you delete a document, the operating system does not obey your instruction, but, instead moves the document to a special folder called ‘Recycle Bin’, ‘Wastebasket’ or ‘Trash’ from which it can be retrieved. This corresponds to the situation in a paper-based environment where you could throw an unwanted document into a wastebasket from which it could be retrieved.

The deleted files remained in the same physical position on the disk. It was the directory entry for the document that was removed, with a new directory entry being created in the recycle bin. What you perceive when you navigate through the folders on your computer is not where the documents are located physically, but where they are located logically.

That is, you are given a logical view of your documents which shows their relationship to each other in a hierarchical (nested) folder structure. The operating system hides from you where items are located physically. Since the document is not moved physically when you put it into the Recycle Bin, it does not need to be moved when you empty the bin. Instead the entries for the document in the directory of the Recycle Bin are deleted. So the document may remain on your disk for a long time without being overwritten.

Since the document may remain on the disk long after it has been ‘deleted’, it may be possible to recover the deleted document using a disk-recovery utility. This is also useful if for some reason some of the folder directories (or the VTOC) become corrupted. Provided that the original documents are still intact on the disk, they can be recovered and, if the corruption is only a minor one, it may be possible to rebuild the folder directories.

6 | P a g e

Page 7: S€¦  · Web viewObviously documents in the same folder must have different names otherwise their path names would be identical; consequently there would be no way to distinguish

UNIT 5 – Storing, getting sending your data SUMMARY MATERIAL

Other storage media : Tape

Magnetic tape is a storage medium which is slow and difficult to access. The key difference between tape and most other storage media is that tape is linear. To reach a point on the tape it is necessary to wind the tape to that point; there is no direct access as there is with disks. This makes tape slow for normal storage. magnetic tape is ideal for:

Data back-up to provide emergency copies in case the original is lost; Archiving to save data for an indefinite period. For example, if you have large

amounts of data that is rarely accessed, you do not want it clogging up your disk storage. Instead you can archive it on to tape.

Holostore

One idea currently being investigated to increase the capacity of high-speed storage is the use of holograms. A hologram is a three-dimensional image made with the aid of a laser.

Unlike discs, which are two-dimensional, a hologram is three dimensional, opening the way to storing much higher volumes of data. It is claimed that a holostore no bigger than a pack of playing cards could hold a terabyte of data.

Biological storage media

There is a great deal of interest in harnessing biological properties to provide large-scale storage media. In May 2002, a US patent was granted for a DNA optical storage device using chromophoric (colour changing) DNA.

The basic idea is to represent 0s and 1s using two colour states of a suitable form of synthetic DNA. A number of such memory units would be attached to a support substrate to form a memory cell. The cell should be capable of transferring data at high speeds since there are no moving parts.

Computer networking

Networks of computers have been around for more than thirty years. In fact, computers were networked before it was possible to input data to a computer directly from a keyboard. In recent years the internet has become an all-pervasive part of society, like the telephone, radio and television.

The web, which is based on the internet, has become the platform on which all kinds of information are disseminated. For example, it has generated a whole new and unplanned sphere of commerce, called e-commerce, with its own computing practices and legal framework which involves buying and selling goods and services on the web.

Networking issues

A network of computers is linked together by communications links. These links may be: Dedicated cable links; Public telephone networks; Radio or microwaves links.

7 | P a g e

Page 8: S€¦  · Web viewObviously documents in the same folder must have different names otherwise their path names would be identical; consequently there would be no way to distinguish

UNIT 5 – Storing, getting sending your data SUMMARY MATERIAL

Networks do not have to be dispersed over a wide area; their benefits are available locally. Any organisation using more than one computer is likely to have a local area network (LAN) to exploit the benefits of resource sharing. You could even have a LAN at home if you use more than one computer. A LAN may be contained within one building, or it may span several buildings on the same site.

The internet

The internet comprises a huge collection of computers (called hosts) with telecommunications links between them. The internet has its roots in the American military-funded research community of the early 1970s. The first applications to use the internet were based purely on text. Indeed, during the 1970s the bulk of computers were accessed by a command-line interface, in which text commands were issued to the computer and text responses were received. The internet then began to be used for email and for file transfer (i.e. for transferring documents and applications between computers). Modern graphical tools for accessing the internet (like Netscape Navigator) are much more recent, dating back to the early 1990s.

The internet links together not just one type of computer but any type of computer running any operating system. It makes no difference whether your computer is a Dell running Windows XP, a Macintosh running Mac OS X or a Sun running Linux. By adopting the internet protocol each of these computers can become an internet host.

Connecting to the internet needs the use of a piece of equipment called a modem (modulator-demodulator). The modem may be a separate box sitting between your computer and the telephone socket, or it may be a card inside your computer, so that you just have to link the modem port on your computer directly to the telephone line. The modem converts the data signals from the computer into analogue signals, which travel down the telephone lines to the routeing computer of your internet service provider(ISP).

Browsing the web

In 1990, Tim Berners-Lee at CERN (the European Organisation for Nuclear Research) in Switzerland created the forerunner of the web which today is a collection of hypertext documents distributed worldwide and linked by the internet.

The value of the web is that trillions of pages of web content are linked together via multiple hyperlinks, like a spider’s web. Also take into your consideration:

The software you use on your computer to access and view documents on the web is called a web browser;

The basic unit of web content is the web page which is an HTML Document like the ones you created in Unit 4. The browser accesses the page,

held on a remote computer, and downloads it to your computer.

Downloading a page means transmitting a document from a computer (web server) somewhere in the world to your computer (the client).

Internet addressing

8 | P a g e

Page 9: S€¦  · Web viewObviously documents in the same folder must have different names otherwise their path names would be identical; consequently there would be no way to distinguish

UNIT 5 – Storing, getting sending your data SUMMARY MATERIAL

When you want to download a web page, how does your request find its way to the host computer containing that page? And, equally, what mechanism exists to ensure that your chosen page comes to your computer and not someone else’s? A message sent across the internet must have an address like a letter sent via the postal system. When you send a letter by ‘snail mail’, you address it like this:

Linda O’TooleComputing DepartmentFaculty of Mathematics and ComputingWalton HallMilton KeynesMK7 6AA

The address has several levels to it, enabling the item of mail to be routed successfully to the correct destination. The postal system will first route the item to the Milton Keynes sorting office. From there it will be sent to Walton Hall, where the internal mail service dispatches it to the Faculty of Mathematics and Computing. Here a clerk batches up all post addressed to the Computing Department. Finally, the letter is deposited in Linda’s pigeonhole, from which she can retrieve it.

The addressing mechanism of the internet has much in common with this system. At the highest level, there is the top-level domain where domain means a collection of internet hosts. The internet has two types of top-level domain; those with codes of three letters or more group users by category as in Table 3.2, and those with two-letter codes are normally country specific as in Table 3.3.

We need a method of addressing down to the document level. This is done using UNIXconventions, since it was largely UNIX users who developed the internet. The address associated with a hyperlink is given in the form of a URI (uniform resource indicator), which specifies the service requested and the full address of the required document. Here is an example of a URI:

http://mcs.open.ac.uk/mcsexternal/courses/m150.htm

The first part of the URI (http://) identifies the protocol (HTTP) to be used when transferring the document. The protocol guarantees that the web server on the computer

9 | P a g e

Page 10: S€¦  · Web viewObviously documents in the same folder must have different names otherwise their path names would be identical; consequently there would be no way to distinguish

UNIT 5 – Storing, getting sending your data SUMMARY MATERIAL

being addressed (called a server) understands the nature of the request. By specifying the HTTP protocol, the requesting (client) host indicates to the server that it wants a document. However, if the host receiving the request is not a web server, the request will fail.

The next part of the address, ‘mcs.open.ac.uk’, specifies the server, i.e. the host that will supply the service. The host address is in two parts: ‘mcs’ , which identifies a particular computer, and ‘open.ac.uk’, which identifies the domain in which ‘mcs’, is to be found. The rest of the address is the path within ‘mcs’ that leads to the required document. In this case the document is not at the root level of ‘mcs’.

Instead, there is a folder ‘mcsexternal’ at root level, which contains the folder ‘courses’, which in turn contains the document ‘m150.htm’. This document is, we hope, a page written in HTML. When it arrives at the host which requested it, the browser will display it on the screen. If the requested document does not exist, the request will fail.The HTTP protocol also specifies certain defaults, For example, if the requested address is a folder and not a document, the server will look in that folder for a document called ‘index.html’, and will deliver that document if it is there.

Naming hostsIt is usually convenient to assign a name to each computer on a network so that users can identify it easily. In a small network the names may be chosen arbitrarily. In a larger network it is common to use a systematic naming scheme – this may be alphabetical or may follow a well-known .

IP numbers

The naming scheme for hosts is very convenient for humans but it is not actually used by the messages that travel across the internet. Instead each host has a 4-byte number associated with it, called its IP (internet protocol) number. The IP number carried by a message ensures that it reaches the correct destination. How does the message discover the IP number of its destination host? The answer is that special directories, called domain name servers, keep this information. The first thing that happens when aURI is executed is that the host name is sent to a domain name server to be resolved.Logical and physical names

Suppose you chose to host your website on ‘orchid.open.ac.uk’. Later you might need to move your website to another host, because orchid has crashed, or because it no longer has enough storage. So you acquire an upto- date computer which you name ‘peony.open.ac.uk’ and move your website there. Now no one can find your web pages

any more, as they are no longer at the same address. For example, the document previously accessible at the URI http://orchid.open.ac.uk/M150Unit5/npm.htmis now reached at http://peony.open.ac.uk/M150Unit5/npm.htm and the old name will no longer work. One solution would be to maintain the old web server and redirect all the requests for web pages to the new server. This is a rather messy solution (and would not work if the old server no longer exists). A better solution is to avoid reliance on named physical machines.

10 | P a g e

Page 11: S€¦  · Web viewObviously documents in the same folder must have different names otherwise their path names would be identical; consequently there would be no way to distinguish

UNIT 5 – Storing, getting sending your data SUMMARY MATERIAL

The way to do this is to identify the web server to the internet not by the name of the physical computer it resides on, but by a logical name. In other words you choose a name for the web server which is independent of its physical host.

Email and FirstClass :Email

You have already been introduced to another very popular use of computer networking: email. Typically you use a mail application, or mail client, on your computer to handle email. Email combines high speed with a permanent record. When you call someone by telephone, you want to achieve a synchronous conversation; that is, you require the other party to answer the call while you are on the phone. If they do not answer, you may be able to leave a message, provided they have an answering machine or subscribe to some form of message answering service.

Email over the internet

Like other internet applications, email works on all computing platforms . It depends only on being able to identify the recipient using the internet protocol. One way in which email achieves this universality is that it uses text messages comprising ASCII-coded text only. Unlike other URIs, which identify hosts on the internet, an email address identifies a user. It looks like this: [email protected] The part before the @ symbol is the user name.

The internet works in much the same way, using a standard protocol. You dispatch a document or email message and expect it to arrive. You do not know which route it took or which countries it passed through. In fact, it probably did not all go the same way. When data travels across the internet, it is broken up into units of a standard size called packets. Each packet carries the address information so that it will reach its intended destination.

The packets are re-assembled into a single item on arrival. Along with the actual message (or data) content, an email also carries transmission information in a number of lines, called headers. Figure 3.3 shows the headers of a typical message.

Sending attachments

11 | P a g e

Page 12: S€¦  · Web viewObviously documents in the same folder must have different names otherwise their path names would be identical; consequently there would be no way to distinguish

UNIT 5 – Storing, getting sending your data SUMMARY MATERIAL

Although email transmission is restricted to text, it is possible to attach documents of any kind to an email message. How is this paradox resolved? It is made possible by encoding the attached file as a series of alphabetic characters and appending them to the end of the message. Since any electronic document is encoded ultimately in a sequence of bits, it is possible to group these bits into bytes, interpreting the resulting sequence of bytes as text characters. In this way an arbitrary attachment can be converted into ASCII code suitable for email transmission.

In order to enable the receiving mail client to decode the attachment, the encoding scheme must conform to a standard. There are a number of standards for transmitting attachments. The internet standard for encoding mail attachments is MIME (Multipurpose Internet Mail Extensions). There are other standards for mail attachments, most of which pre-date the widespread use of the Internet. The key factor is that both sender and receiver can implement the protocol. For example, if someone sends you anattachment encoded using uuencode (the UNIX standard encoding method for mail attachments), you will not be able to unpack the attachment unless your mail client also supports uuencode. The MIME standard was originally published in 1982. Since then it has undergone a number of revisions. Currently it not only covers mail attachments, it allows the following extensions to the basic principle that email be ASCII-coded text:

Accessing data : Databases

A database is a collection of data stored in a computer system according to a set of rules, and organized to facilitate access involving complex searches and selection. As such you will recognize a database as being a form of persistent data. Databases may be particular to an organization or may cover a particular area of knowledge. Database applications have a different emphasis. True, they are used to create and modify data but their primary emphasis is on making the data persistent, and structuring it so as to minimize redundancy, avoid inconsistency and maximize the usefulness of the data for the purposes of access and updating.

To get this information from the database we use a query (a request that specifies what the user wants). The response to the query ideally extracts from the database all the relevant information. So a database is part of an information system; it exists to satisfy information requirements.

The following figure is a form of data

To obtain information from this form, you do not need to read it through in a linear fashion. You choose a particular field such as ‘Engine size’ (that specifies a property or attribute of a car) to obtain the required information.

12 | P a g e

Page 13: S€¦  · Web viewObviously documents in the same folder must have different names otherwise their path names would be identical; consequently there would be no way to distinguish

UNIT 5 – Storing, getting sending your data SUMMARY MATERIAL

Databases consist of many tables holding vast amounts of data, which have to be designed with great care in order to be able to provide answers to (possibly complex) queries.

Object databases expand the concept of a database just like the XML markup language adds several dimensions to HTML. Indeed there is a software product which can take an XML document and build a corresponding object database.

Metadata

In order to describe anything other than the simplest of data, it is necessary to provide some form of explanatory data (i.e. metadata) about the data. Web pages have a rudimentary form of metadata in the form of keywords that can be used by search engines to locate web pages of a particular topic.

An HTML document has two parts – a head and a body. You put the content of your document in the body. What goes into the head? Information about your document; data about data; metadata. In other words, the title of your HTML document is an example of metadata. HTML even has a <META> tag for including a variety of metadata. Figure 4.2 shows how metadata is included in an HTML document When data is assembled in multimedia databases new methods are needed to make them searchable. Effectively you need to have an adequate collection of (metadata) hooks or pointers which identify where various features can be found.

MPEG-7

Nowadays a massive amount of audio-visual information is becoming available in digital form. It started with audio CDs, then DVDs with digital video. You may have a personal archive of digital audio-visual material on your hard disk. Other sources include the web, broadcast data streams and professional databases. The more material there is, the less valuable it is unless a desired item can be retrieved with relative ease.

Ethical, legal and security issues : Data and the law

Most Western countries now have some form of data protection or data access legislation in place. However, most legislation cannot keep pace with the rapid changes in technology. Also, legislation may vary dramatically from jurisdiction to jurisdiction, and in any case all legislation depends on the willingness of the relevant authorities to enforce it, and of most individuals and organizations to adhere to it.

Data protection laws in any jurisdiction are likely to have some or all of the following characteristics:

A legal definition of data – for example, whether the law is limited to electronic forms or also covers handwritten and typed data, photographs, audio recordings, and so on;

A description of how data may be acquired lawfully– for example, setting out who may acquire data and under what circumstances; n what uses the data may be put to;

Any time limits on storage;

13 | P a g e

Page 14: S€¦  · Web viewObviously documents in the same folder must have different names otherwise their path names would be identical; consequently there would be no way to distinguish

UNIT 5 – Storing, getting sending your data SUMMARY MATERIAL

Who may lawfully access and use the data, and for what purpose(s); A description of what legal protection the subject of the data may have in regard

to type, means of gathering, correctness, access and use.

Computer ethics

Ethics is defined as a set of moral principles that should guide our acts as a citizen. This branch of philosophy is much older than the computer but its main principles can easily be applied to the use of a computer. Indeed, since the user of a computer can cause much more harm than a non-user, there is a compelling argument for ethical principles to be applied to computer use. We shall concentrate on the ten principles (written in the style of the Biblical Ten Commandments) listed by the Computer Ethics Institute.

1 Thou shalt not use a computer to harm other people.2 Thou shalt not interfere with other people’s computer work.3 Thou shalt not snoop around in other people’s computer files.4 Thou shalt not use a computer to steal.5 Thou shalt not use a computer to bear false witness.6 Thou shalt not copy or use proprietary software for which you have not paid.7 Thou shalt not use other people’s computer resources without authorisation or proper compensation.8 Thou shalt not appropriate other people’s intellectual output. 9 Thou shalt think about the social consequences of the program you are writing or the system you are designing.10 Thou shalt always use a computer in ways that ensure consideration and respect for your fellow humans.

Security

Once you link your computer to the internet, you need to think about ways of making it less accessible to unwanted visitors who, in modern computer jargon, are termed hackers.

In some circumstances, it may be desirable to secure a whole network of computers from unauthorized outside access. This may be achieved using a firewall; i.e. a software system which controls data traffic entering and leaving the network. It checks all arriving (and some leaving) data and allows passage only if it fulfils certain criteria.

Ownership and rights over data

Most countries have copyright laws that afford some sort of protection for intellectual property, but details vary from country to country. A work does not need to be published to fall under copyright law: the unpublished letters you write are an example of works that benefit from the protection of copyright

Junk email

A simple form of intrusion is unsolicited or junk email. Unless you are successful at hiding your email address from all but trusted acquaintances, there is a likelihood that it

14 | P a g e

Page 15: S€¦  · Web viewObviously documents in the same folder must have different names otherwise their path names would be identical; consequently there would be no way to distinguish

UNIT 5 – Storing, getting sending your data SUMMARY MATERIAL

will get into the hands of the mass mailers. Such email is a nuisance but will not normally harm your computer.

Worse than junk email are the various forms of malicious software that some people take a delight in distributing on the internet.

Worms

One form of malicious behavior is due to a worm which is intended to subvert a whole network of computers. A worm is a program that propagates itself over a network using the resources on one computer to transfer copies of itself to other machines on the network. It also consumes resources by running on a computer and in a major attack the whole of a computer’s processing resource could be used in running copies of theworm.

Viruses

A virus is a program or piece of code designed to cause some specific damage to your software by attaching itself to documents held on your computer. Frequently a virus writer may include code that deletes important documents from your hard disk. In extreme cases it may make your system totally unusable. Viruses appear in many forms.

Trojan horses

A particularly nasty case of infection can arise when you execute code which looks legitimate but attempts to do something quite different. Typically the name of the document will be misleading. It might be called ‘new.scr’, suggesting that it is a screensaver when in fact it is intended secretly to modify documents on your hard disk. This type of infection, called a Trojan horse, is sometimes written with criminal intent to collect passwords or convey network information..

How to protect against infection

In view of the widespread existence of viruses and other harmful software predators, it is necessary to protect your system with anti-virus software. This sort of software is available both commercially and in free versions. Unfortunately people with malicious intent are continually writing new viruses and distributing them over the internet.

15 | P a g e