12
Page 1 Information lifecycle management: Inf orma tion life cycl e man agement (ILM) is a proc ess for mana ging info rma tion through its life cycle , from conception until disposal, in a manner that optimizes storage and access at the lowest cost. ILM is not just hardware or software—it includes processes and policies to manage the information. It is designed upon the recognition that different types of information can have different values at different points in their lifecycle. Predicting storage needs and controlling costs can be especially challenging as the business grows. The overall objectives of managing information with ILM are to help reduce the total cost of ownership (TCO) and help implement data retention and compliance policies. In order to effectively implement ILM, owners of the data need to determine how information is created, how it ages, how it is modified, and if/whe n it can safely be deleted. ILM segments data according to value, which can help create an economical balance and sustainable strategy to align storage costs with businesses objectives and information value. ILM elements To manage the data lifecycle and make your business ready for on demand, there are four main elements that can address your business in an ILM structured environment. They are: 1) Tiered storage management 2) Long-term data retention 3) Data lifecycle management 4) Policy-based archive management Tiered storage management Most organizations today seek a storage solution that can help them manage data more efficiently. They want to reduce the costs of storing large and growing amounts of data and files and maintain business continuity . Through tiered storage, you can reduce overall disk-storage costs, by providing benefits like: 1) Reducing overall disk-storage costs by allocating the most recent and most critical business data to higher  pe rf orma nce dis k sto rag e, while mo vin g older and less critical business data to lower cost disk storage. 2) Speeding business processes by providing high-performance access to most recent and most frequently accessed data. 3) Reducing administrative tasks and human errors. Older data can be moved to lower cost disk stor age aut omatically an d transparently. Typical storage environment St or ag e en vi ronmen ts ty pi ca ll y ha ve mult ip le ti er s of  data value, such as appl icat ion data that is need ed dail y, and archive data that is accessed infr eque ntly . However, typ ica l sto ra ge configurations offer only a single tier of storage, as shown in Figure, which limits the ability to optimize cost and performance. Multi-tiered storage environment A tiered storage environment that utilizes the SAN infrastructure affords the flexibility to align storage cost with the changing value of information. The tiers will be related to data value. The most critical data is allocated to higher performance disk storage, while less critical business data is allocated to lower cost disk storage. An IBM ILM solution in a tiered storage environment is designed to:

Storage and Information Management - Unit 1 - Management Philosophies

Embed Size (px)

Citation preview

Page 1: Storage and Information Management - Unit 1 - Management Philosophies

8/6/2019 Storage and Information Management - Unit 1 - Management Philosophies

http://slidepdf.com/reader/full/storage-and-information-management-unit-1-management-philosophies 1/12

Page 1Information lifecycle management:Information lifecycle management (ILM) is a process for managing information through its lifecycle, fromconception until disposal, in a manner that optimizes storage and access at the lowest cost.

ILM is not just hardware or software—it includes processes and policies to manage the information. It is designedupon the recognition that different types of information can have different values at different points in their lifecycle. Predicting storage needs and controlling costs can be especially challenging as the business grows.

The overall objectives of managing information with ILM are to help reduce the total cost of ownership (TCO) and

help implement data retention and compliance policies. In order to effectively implement ILM, owners of the dataneed to determine how information is created, how it ages, how it is modified, and if/when it can safely be deleted.ILM segments data according to value, which can help create an economical balance and sustainable strategy toalign storage costs with businesses objectives and information value.

ILM elementsTo manage the data lifecycle and make your business ready for on demand, there are four main elements that canaddress your business in an ILM structured environment. They are:

1) Tiered storage management2) Long-term data retention3) Data lifecycle management4) Policy-based archive management

Tiered storage managementMost organizations today seek a storage solution that can help them manage data more efficiently. They want toreduce the costs of storing large and growing amounts of data and files and maintain business continuity. Throughtiered storage, you can reduce overall disk-storage costs, by providing benefits like:1) Reducing overall disk-storage costs by allocating the most recent and most critical business data to higher

performance disk storage, while movingolder and less critical business data to lower cost disk storage.2) Speeding business processes by providinghigh-performance access to most recent andmost frequently accessed data.3) Reducing administrative tasks and humanerrors. Older data can be moved to lower cost disk storage automatically andtransparently.Typical storage environmentStorage environments typically havemultiple tiers of data value , such asapplication data that is needed daily, andarchive data that is accessed infrequently.However, typical storage configurations

offer only a single tier of storage, as shownin Figure, which limits the ability to optimize cost and performance.

Multi-tiered storage environmentA tiered storage environment that utilizes the SAN infrastructure affords the flexibility to align storage cost withthe changing value of information. The tiers will be related to data value. The most critical data is allocated tohigher performance disk storage, while less critical business data is allocated to lower cost disk storage.An IBM ILM solution in a tiered storage environment is designed to:

Page 2: Storage and Information Management - Unit 1 - Management Philosophies

8/6/2019 Storage and Information Management - Unit 1 - Management Philosophies

http://slidepdf.com/reader/full/storage-and-information-management-unit-1-management-philosophies 2/12

Page 21) Reduce the total cost of ownership (TCO) of managinginformation. It can help optimize data costs andmanagement, freeing expensive disk storage for the mostvaluable information.2) Segment data according to value. This can help createan economical balance and sustainable strategy to alignstorage costs with business objectives and informationvalue.3) Help make decisions about moving, retaining, and

deleting data, because ILM solutions are closely tied toapplications.4) Manage information and determine how it should bemanaged based on content, rather than migrating data

based on technical specifications. This approach can helpresult in more responsive management, and offers you the

ability to retain or delete information in accordance with business rules.5) Provide the framework for a comprehensive enterprise content management strategy.

Long-term data retentionThere is a rapidly growing class of data that is best described by the way in which it is managed rather than thearrangement of its bits. The most important attribute of this kind of data is its retention period, hence it is calledretention managed data , and it is typically kept in an archive or a repository. In the past it has been variouslyknown as archive data, fixed content data, reference data, unstructured data, and other terms implying its read-onlynature. It is often measured in terabytes and is kept for long periods of time, sometimes forever.

Data lifecycle managementAt its core, the process of ILM moves data up and down a path of tiered storage resources, including high-

performance, high-capacity disk arrays, lower-cost disk arrays such as serial ATA (SATA), tape libraries, and permanent archival media where appropriate. Yet ILM involves more than just data movement; it encompassesscheduled deletion and regulatory compliance as well. Because decisions about moving, retaining, and deletingdata are closely tied to application use of data, ILM solutions are usually closely tied to applications.By migrating unused data off of more costly, high-performance disks, ILM is designed to help:1) Reduce costs to manage and retain data.2) Improve application performance.3) Reduce backup windows and ease system upgrades.4) Streamline™ data management.5) Allow the enterprise to respond to demand—in real-time.

6) Support a sustainable storagemanagement strategy.7) Scale as the business grows.

Policy-based archivemanagementAs businesses of all sizes

migrate to e-business solutionsand a new way of doing

business, they already havemountains of data and contentthat have been captured, stored,and distributed across theenterprise. This wealth of information provides a uniqueopportunity. By incorporatingthese assets into e-business

Page 3: Storage and Information Management - Unit 1 - Management Philosophies

8/6/2019 Storage and Information Management - Unit 1 - Management Philosophies

http://slidepdf.com/reader/full/storage-and-information-management-unit-1-management-philosophies 3/12

Page 3solutions, and at the same time delivering newly generated information media to their employees and clients, a

business can reduce costs and information redundancy and leverage the potential profit-making aspects of their information assets.

Five Pillars of Technology:

• Technologies are not in one language. In fact,the internet necessarily breaks down language

barriers.•

Tech stuff isn’t about expressions of the divine, but it is all about idioms, idiomatic expressionsof what people claim as important (evensacred?) in their lives — that’s the blogospherein a nutshell

• Blogging, facebook, twitter is sooo biographicalit’s almost too much for me. They’re all about

biography and community, though.• Clearly, there’s no one center of the net and

that’s what gives it enormous power • This last one is tricky, because technology

doesn’t really lose its cohesiveness when metwith a new environment, but it does become co-opted and gain a stronger cohesiveness when

used well in the new setting.

Data proliferation:Data proliferation refers to the unprecedented amount of data, structured and unstructured, that business andgovernment continue to generate at an unprecedented rate and the usability problems that result from attempting tostore and manage that data. While originally pertaining to problems associated with paper documentation, data

proliferation has become a major problem in primary and secondary data storage on computers.

Problems caused by data proliferation:

1)Difficulty when trying to find and retrieve information .

2)Data loss and legal liability when data is disorganized, not properly replicated,or cannot be found in atimely manner.

3)Increased manpower requirements to manage increasingly chaotic data storage resources.

4)Slower networks and application performance due to excess traffic as users search and search again for thematerial they need.

5)High cost in terms of the energy resources required to operate storage hardware. A 100 terabyte system will costup to $35,040 a year to run.

Proposed solutions:

1)Applications that better utilize modern technology.2)Reductions in duplicate data (especially as caused by data movement).3)Improvement of metadata structures.4)Improvement of file and storage transfer structures.

Page 4: Storage and Information Management - Unit 1 - Management Philosophies

8/6/2019 Storage and Information Management - Unit 1 - Management Philosophies

http://slidepdf.com/reader/full/storage-and-information-management-unit-1-management-philosophies 4/12

Page 45)The implementation of Information Lifecycle Management solutions to eliminate low-value information as earlyas possible before putting the rest into actively managed long-term storage in which it can be quickly and cheaplyaccessed.

Data Center:A data center is a facility used to house computer systems and associated components, such as telecommunicationsand storage systems. It generally includes redundant or backup power supplies, redundant data communicationsconnections, environmental controls (e.g., air conditioning, fire suppression), and special security devices

Data centers have their roots in the huge computer rooms of the early ages of the computing industry. Earlycomputer systems were complex to operate and maintain, and needed a special environment to keep working. A lotof cables were necessary to connect all the parts. Also, old computers required a lot of power, and had to be cooledto avoid overheating. Security was important; computers were expensive, and were often used for military

purposes. For this reason, engineering practices were developed since the start of the computing industry. Elementssuch as standard racks to mount equipment, elevated floors, and cable trays (installed overhead or under theelevated floor) were introduced in this early age, and have modernized relatively little compared to the computer systems themselves.

A data center can occupy one room of a building, one or more floors, or an entire building. Most of the equipmentis often in the form of servers racked up into 19 inch rack cabinets, which are usually placed in single rowsforming corridors between them. This allows people access to the front and rear of each cabinet. The physical

environment of the data center is usually under strict control:

1) Air conditioning is used to keep the room cool; it may also be used for humidity control. Generally, temperatureis kept around 20-22 degrees Celsius (about 68-72 degrees Fahrenheit). The primary goal of data center air conditioning systems is to keep the server components at the board level within the manufacturer's specifiedtemperature/humidity range. This is crucial since electronic equipment in a confined space generates much excessheat, and tends to malfunction if not adequately cooled. Air conditioning systems also help keep humidity withinacceptable parameters. The humidity parameters are kept between 35% and 65% relative humidity. Too muchhumidity and water may begin to condense on internal components; too little and static electricity may damagecomponents.

2) Data centers often have elaborate fire prevention and fire extinguishing systems. Modern data centers tend tohave two kinds of fire alarm systems; a first system designed to spot the slightest sign of particles being given off

by hot components, so a potential fire can be investigated and extinguished locally before it takes hold (sometimes, just by turning smoldering equipment off), and a second system designed to take full-scale action if the fire takeshold. Fire prevention and detection systems are also typically zoned and high-quality fire-doors and other physicalfire-breaks used, so that even if a fire does break out it can be contained and extinguished within a small part of thefacility.

3) Backup power is catered for via one or more uninterruptible power supplies and/or diesel generators.

4) To prevent single points of failure, all elements of the electrical systems, including backup system, are typicallyfully duplicated, and critical servers are connected to both the "A-side" and "B-side" power feeds.

5) Old data centers typically have raised flooring made up of 60 cm (2 ft) removable square tiles. The trend istowards 80-100cm void to cater for better and uniform air distribution. These provide a plenum for air to circulate

below the floor, as part of the air conditioning system, as well as providing space for power cabling.

6) Using conventional water sprinkler systems on operational electrical equipment can do just as much damage asa fire. Originally Halon gas, a halogenated organic compound that chemically stops combustion, was used toextinguish flames. However, the use of Halon has been banned by the Montreal Protocol because of the danger Halon poses the ozone layer. More environmentally-friendly alternatives include Argonite and HFC-227.

Page 5: Storage and Information Management - Unit 1 - Management Philosophies

8/6/2019 Storage and Information Management - Unit 1 - Management Philosophies

http://slidepdf.com/reader/full/storage-and-information-management-unit-1-management-philosophies 5/12

Page 6: Storage and Information Management - Unit 1 - Management Philosophies

8/6/2019 Storage and Information Management - Unit 1 - Management Philosophies

http://slidepdf.com/reader/full/storage-and-information-management-unit-1-management-philosophies 6/12

Page 6selected files or entire file systems. The traditional method of backup was to make a backup copy on tape, or in thecase of a personal computer, on a set of floppy disks or a small tape cartridge. However, as systems becamenetworked together, LAN-based backup systems replaced media-oriented approaches, and these ran automaticallyand unattended, often backing up from HDD to HDD. File-differential backup was subsequently introduced, inwhich only the changed bytes within a file are sent and managed at the backup server.

Hierarchical Storage Management (HSM ) is a data storage technique which automatically moves data betweenhigh-cost and low-cost storage media. HSM systems exist because high-speed storage devices, such as hard disk drive arrays, are more expensive (per byte stored) than slower devices, such as optical discs and magnetic tape

drives. While it would be ideal to have all data available on high-speed devices all the time, this is prohibitivelyexpensive for many organizations. Instead, HSM systems store the bulk of the enterprise's data on slower devices,and then copy data to faster disk drives when needed. In effect, HSM turns the fast disk drives into caches for theslower mass storage devices. The HSM system monitors the way data is used and makes best guesses as to whichdata can safely be moved to slower devices and which data should stay on the fast devices.

In a typical HSM scenario, data files which are frequently used are stored on disk drives, but are eventuallymigrated to tape if they are not used for a certain period of time, typically a few months. If a user does reuse a filewhich is on tape, it is automatically moved back to disk storage. The advantage is that the total amount of storeddata can be much larger than the capacity of the disk storage available, but since only rarely-used files are on tape,most users will usually not notice any slowdown.

Page 7: Storage and Information Management - Unit 1 - Management Philosophies

8/6/2019 Storage and Information Management - Unit 1 - Management Philosophies

http://slidepdf.com/reader/full/storage-and-information-management-unit-1-management-philosophies 7/12

Page 7Storage Management Challenges:

1)Variety of informationinformation technology holds the

promise of bringing a variety of newtypes of information to the peoplewho need it.

2)Volume of data : data is growingexponentially.

3)Velocity of change: It organizationsare under tremendous pressure todeliver the right IT services. 85% of

problems are caused by the changingIT staff. 80% of the problems are notdetected by the IT staff until reported

by the end user.

4)Leverage Information: This is

Capitalize on data sharing for collaboration along with storageinvestments and informational value.One can address leverage information

by reporting and data classification.Questions that may be asked are…

a)How much storage do I haveavailable for my applications.

b)Which applications, users anddatabases are the primary consumersof my storage?c)When do I need to buy morestorage?d) How reliable is my SAN?e)How my storage is is being used.

5)Optimize IT: This is for automateand simplify the IT operations. Alsoto optimize performance andfunctionality. One can addressoptimize IT solutions by centralizingmanagement and storage

virtualization Questions asked are…

a)How do I simplify and centralizedmy storage infrastructure.

b)How do I know the storage is notthe bottleneck for user response timeissues?

Page 8: Storage and Information Management - Unit 1 - Management Philosophies

8/6/2019 Storage and Information Management - Unit 1 - Management Philosophies

http://slidepdf.com/reader/full/storage-and-information-management-unit-1-management-philosophies 8/12

Page 8c) Is the storage infrastructureavailable and performing asneeded?

6)Mitigate Risks: This is forcomply with regulatory and securityrequirement. Also to keep your

business running continuously. Onecan address mitigate risks by tiered

storage and ILM. Questions askedare…a)How do I monitor and centrallymanage my replication services?

b)How do I maintain storageservice levels?c)Which files must be backed up,archived and retained for compliance?

7)Enable Business Flexibility: This isfor flexible, on demand IT

Infrastructure and to protect your ITinvestments. One can address enable

business flexibility by servicemanagement. Questions are…

a)How can I automate the provisioningof my storage systems, databases, filesystems and SAN?

b)How do I maintain storage servicelevels?c)How do I monitor and centrallymanage my replication services?

What needs to be managed?

1)servers

ApplicationsDatabasesFile systemsVolume managersHost Bus Adaptors and Multi-path

drivers

2)Network components

Switches, hubs, routersIntelligent switch replication

Page 9: Storage and Information Management - Unit 1 - Management Philosophies

8/6/2019 Storage and Information Management - Unit 1 - Management Philosophies

http://slidepdf.com/reader/full/storage-and-information-management-unit-1-management-philosophies 9/12

Page 10: Storage and Information Management - Unit 1 - Management Philosophies

8/6/2019 Storage and Information Management - Unit 1 - Management Philosophies

http://slidepdf.com/reader/full/storage-and-information-management-unit-1-management-philosophies 10/12

Page 10

Few Issues Related to DATA

Data identity : Persistent Unique Identifiers (or an alternative means to achieve this functionality) will enableglobal cross referencing between data objects. Such Identifiers will not only be used for data and software but alsofor other resources such as people, equipment, organizations etc.On the other hand, any scheme of identification is likely to undergo evolution so preservation, and in particular integration of archival and current data, is likely to require active management of identifiers.

Data objects: Data will be made available with all the necessary metadata to enable reuse. From its creation, andthroughout its lifecycle, data will be packaged with its metadata which will progressively accrue informationregarding its history throughout its evolution.

Data agents : Data will be “intelligent” in that it maintains for itself knowledge of where is has been used as wellas what it uses. (This can be achieved by bidirectional links between data and its uses or by making associations

between data themselves stand alone entities. In either case, active maintenance of the associations is required.)

Software: Software will join simulations, data, multimedia and text as a core research output. It will thereforerequire similar treatment in terms of metadata schemas, metadata creation and propagation (including context of software development and use, resilience, versioning and rights).

Data Forge”: Rather like sourceForge for software. We imagine a global (probably distributed) self servicerepository of the data which is made available under a variety of access agreements.There a requirement on the data management technology for greater ease in data collection, interoperation,

aggregation and access. Technology and tooling must be developed to meet these requirements both in themanifestation of the data itself and the software that manages it. Critical to this will be the collection, managementand propagation of metadata along with the data itself.

Data source

A data source is any of the following types of sources for (mostly) digitized data :

a database

a computer file

Page 11: Storage and Information Management - Unit 1 - Management Philosophies

8/6/2019 Storage and Information Management - Unit 1 - Management Philosophies

http://slidepdf.com/reader/full/storage-and-information-management-unit-1-management-philosophies 11/12

Page 11a data stream

A database is a structured collection of records or data that is stored in a computer system . The structure is achieved by organizing the data according to a database model . The model in most common use today is the relational model . Other models such as the hierarchical model and the network model use a more explicit representation

A computer file is a block of arbitrary information, or resource for storing information, which is available to acomputer program and is usually based on some kind of durable storage . A file is durable in the sense that it remainsavailable for programs to use after the current program has finished. Computer files can be considered as the moderncounterpart of paper documents which traditionally were kept in offices' and libraries' files , which are the source of theterm.

In telecommunications and computing , a data stream is a sequence of digitally encoded coherent signals ( packets of data or datapackets ) used to transmit or receive information that is in transmission .[1]

In electronics and computer architecture , a data stream determines for which time which data item is scheduled to enter or leave which port of a systolic array , a Reconfigurable Data Path Array or similar pipe network, or other processingunit or block. Often the data stream is seen as the counterpart of an instruction stream, since the von Neumann machine is instruction-stream-driven, whereas its counterpart, the Anti machine is data-stream-driven.

The term "data stream" has many more meanings, such as by the definition from the context of systolic arrays.

In formal way: A data stream is an ordered pair ( s,Δ) where:

s is a sequence of tuples ,

Δ is the sequence of time intervals (i.e. Rational or Real numbers) and each Δ n > 0.

DATA CLASSIFICATION:

Data classification is the determining of class intervals and class boundaries in that data to be mapped and it depends in part on the number of observations. Most of the maps are designed with 4-6 classifications however with moreobservations you have to choose a large number of classes but too many classes are also not good, since it makes themap interpretation difficult. There are four classification methods for making a graduated color or graduated symbolmap. All these methods reflect different patterns affecting the map display.

Natural Breaks Classification

It is a manual data classification method that divides data into classes based onthe natural groups in the data distribution. It uses a statistical formula ( Jenk’soptimization ) that calculates groupings of data values based on datadistribution, and also seeks to reduce variance within groups and maximizevariance between groups.

1 Natural Breaks Classification

2 Quantile Classification

3 Equal Interval Classification

4 Standard Deviation Classification

Page 12: Storage and Information Management - Unit 1 - Management Philosophies

8/6/2019 Storage and Information Management - Unit 1 - Management Philosophies

http://slidepdf.com/reader/full/storage-and-information-management-unit-1-management-philosophies 12/12