Information Management Flows - Doc

8/9/2019 Information Management Flows - Doc

1/18

CRF-RDTE-TR-20100202-07

11/2/2009

Public Distribution| Michael Corsello

CORSELLO

RESEARCH

FOUNDATION

INFORMATION MANAGEMENTBASIC ACTIVITIES


2/18

Corsello Research Foundation

Public Distribution CRF-RDTE-TR-20100202-07

AbstractThe Information Lifecycle has several basic operational workflows that need to be planned for each

organization. The basic concepts for these workflows are fairly straightforward when actively

considered.


3/18



Table of ContentsAbstract ......................................................................................................................................................... 2

Introduction .................................................................................................................................................. 4

Information Management Workflows .......................................................................................................... 4

Project Planning ........................................................................................................................................ 4

Data Management .................................................................................................................................... 4

Data Integration ........................................................................................................................................ 5

Systems Integration .................................................................................................................................. 5

Business Operational Workflows .................................................................................................................. 6

Effort Initiation .......................................................................................................................................... 7

Data Standardization ................................................................................................................................ 9

Data Collection ........................................................................................................................................ 10

Data Processing ....................................................................................................................................... 11

Data Analysis ........................................................................................................................................... 12

Conclusions ................................................................................................................................................. 13

Example Flow .............................................................................................................................................. 14

Example Distributed Applications ............................................................................................................... 15

Basic Software Architecture .................................................................................................................... 15

Appendices .................................................................................................................................................. 18

References .............................................................................................................................................. 18


4/18



IntroductionInformation management covers the entire spectrum of activities that relate to the creation, use and

disposal of data. In general, information management encompasses most information technology

activities as well as all information related activities that enable people to make use of data, digital or

otherwise. It is important to note that information management is not constrained by technology and

instead is concerned with the optimizing the human benefit realized from data.

Information Management WorkflowsThere are several basic functions in information management, each of which has its own workflows:

Project Planning Data Management Data Integration Systems Integration

Project Planning

Project planning includes the coordinated effort to identify what information is required for and

produced by a project. This effort includes matching identified data requirements to existing data

repositories and initiating efforts to create repositories for new data sets.

Data Management

Data management activities include all aspects of coordinating and optimizing the collection, discovery,

use, maintenance and disposal of data. This includes data modeling for new repositories and business

process definition to select an effective methodology for data handling from collection to disposal.


5/18



Data Integration

Data integration is the set of all activities required to ensure data is discoverable and that related data

repositories can be queried across. In multi-organizational scenarios this may include the development

of data sharing agreements, policies and software tools to integrate separate data repositories.

Systems Integration

The systems integration workflows include all activities for the planning of systems development

activities including the prioritization of potential development activities. Systems integration includesthe development of stand-alone tools, integrating existing tools and guiding organizational

architectures.


6/18



Business Operational WorkflowsOutside of the Information Management specific workflows, the day to day business activities involve

the application of Information Management based upon the currently implemented processes and

supporting technologies.

As it relates to the information lifecycle, business operations will flow through similar phases of activities

such as:

Business Phase Information Lifecycle Phase

Effort Initiation Creation (Planning)Data Standardization Creation (Planning)

Data Collection Creation (Collection)

Data Processing Creation (Assessment / Ingestion)

Data Analysis Distribution and Use

Each of these business phases provides an opportunity for automation where appropriate, and some

automation tools (applications) may be used across multiple phases. In general terms, the data, phase,

tools and use are all independent yet related. Application of technology and processes to business

efforts should be designed to increase the effectiveness of the personnel and data.

When depicted as an overall system, the flow of data products moves from collection through

processing to analysis, which becomes a secondary form of collection. This overall cycle is depicted

below.


7/18



When viewed at a high-level, omitting the processes of initiating efforts and standardizing data, the

process becomes a clear cycle as shown below.

The primary flow of read analyze post becomes an endless cycle of reuse for data and analytic

results. It is important to remember in this context that analysis includes any form of synthesis, such as

writing a report and not just numerical analysis. These output products are posted back to repositories

that are then available to be discovered for future use.

Effort Initiation

Whenever a new work effort (project, study, etc.) is undertaken, the entire process of identifying the

need, scoping the work and planning the execution is within this phase of the process. The initiation

phase also considers the high-level aspects of execution to the effort wrap-up and close out to ensure

the effort produced the results expected. An example of this phase workflow is provided below.


8/18



As the effort is planned, a scope is defined which will drive the data needs for the effort. Under the data

standardization phase, the relevant collection standards will be produced. These data standards are

relayed to the effort team prior to and along with the official kick off of the effort.

As the effort proceeds, technicians will collect data under the data collection phase based upon the

standards provided for the effort. This data is then submitted to the quality assurance team which

operates in the data processing phase and ensures all data is deposited in the appropriate operational

data repositories (which could be a database or file system).

As collected data is used, this occurs under the data analysis phase which allows for the use of data in

analysis and report generation. After all operations are complete, data results, reports and other (non-

technological) products contribute to the final portion of the initiation phase, which is effort closeout.

At this point, the effort is complete and all contractual work is complete, accepted and residing in the

appropriate locations.

It is of significant note that this set of processes indicates a logical set of flows that relate to where

technologies may be used in the execution of operational activities. There is no implication thattechnologies actually must be used or how those technologies used are applied. The adaptation of

technologies to operations should be considered on a cost/benefit/time basis and implemented

accordingly.


9/18



Data Standardization

The data standardization phase is a complex, ongoing set of processes that involves both project effort

and organizational (overarching) activities. In the overarching context, data standardization involves the

efforts of managing repositories and data models associated with the information lifecycle. In this

paper, we will only cover the processes associated with project efforts.

Data standardization processes involve the definition of data collection standards and data repositories

to conform to industry and organizational best practices and standard data models. As the organization

creates data repositories, those repositories will have well-defined data models. As data is collected, it

will need to conform or be transformable to the data model of the current repository. Once a data

collection standard is developed for a specific type of data (such as water quality or more simply water

temperature), that standard will be applied to all efforts collecting that type of data. Since a given effort

will likely collect multiple types of data, each with its own standard, an effort specific standard will be

created that aggregates the individual standards (perhaps as simple as just bundling individual pdf files).

An example of the data standardization workflow is depicted below.

Once an effort is scoped, the activities and general deliverables are provided to the data standardization

team, which will evaluate the data needs for the effort. If any anticipated data elements do not have acurrently defined standard, the data team will evaluate existing industry standards, instrument vendor

standards and applications to develop an appropriate data model. This data model will then be

evaluated for data repository development under the overall Information Management processes.

When the data collection standards are developed for the effort, they are delivered to the management

team prior to the initial effort kick-off. From there, the data standards are provided to all field data


10/18



technicians, labs, contractors and the internal data quality control team. From this point on in the

effort, the data standardization team is available to accommodate issues as they arise and to provide

guidance on implementation and execution of data related efforts.

Data Collection

The overall process of collecting data via any means (automated, telemetry, manual, etc); occur underthe data collection phase of the effort. It is quite likely that some data collections will transcend

individual efforts (such as telemetry) and as such should be considered as an atomic data collection

effort (data collection is the sum total of the effort). For operational efforts, the data collection

activities will likely involve multiple physical locations, tools and techniques which must be handled

under the constraints of the physical environment and the data collection standards. The general phase

processes are loosely defined to support flexibility in the field and really only define the planning of data

collection and the handling of data once collected. The overall workflow for data collection is defined

below.

Once data collection standards are provided to the effort management team, the effort can be officially

initiated. The data collection standards are provided to all data collection team members and the

quality assurance team. At this point, data collection efforts are conducted. As data is collected, the

raw data files are submitted via the collaboration tools (in the example, this is SharePoint), which

initiates the data processing phase for ingestion to the official repositories. If any data fails the QA/QC

process, the data collection teams will perform corrective actions and re-submit in the same manner as

before.

Availability and use of collaboration tools will enhance collection effectiveness and provide a basic chain

of custody for data submittals and rejections. For many real-time collection efforts (such as SCADA /telemetry), the data submittals are direct to repository and therefore have no posting to the

collaboration tools. However, the collaboration tools are still used for communicating interactions

between team members to provide a chain of communication and custody.


11/18



Data Processing

The overall process of assuring data is collected according to defined standards and is of sufficient

quality is conducted in the data processing phase. This phase handles the entire set of processes to take

collected data in its raw, submitted form through the QA/QC process and perform any actions upon the

data to prepare it for loading into the final data repositories. Finally, the last step of the data processing

phase is the loading or transfer of accepted data into any and all final production repositories for the

collected data.

In many cases, a data collection effort will involve splitting the collected data into multiple repositories

and possibly copying data into project/effort specific and global (authoritative) repositories. The

overall process for data processing is depicted below.

Once the data is collected and submitted in an acceptable format, the QA/QC team receives the

submitted data for assessment. The QA/QC team evaluates the data and if any issues are found, those

issues are posted to the collaboration tool where the collection team(s) are expected to perform

corrective actions. If the data passes all QA/QC checks, the QA/QC team will accept the data and beginprocessing the data.

The

QA/QC team performs any necessary transformations and partitioning to put the data in a loadable form

for the final repositories. At this point, the data is loaded into the appropriate locations/repositories

(which may be databases or file systems) for general use. If this is final data for public release, this may

also include steps to provide the data to the public affairs office for final handling and release. It is


12/18



significant to note that though the data may be loaded into separate repositories (as depicted above);

those repositories may be related via any number of mechanisms. It is this processing and loading

process that ensures the relations are in place, available and correct.

Data Analysis

Finally, once data is loaded and available, the use of data (from any effort, including historical data)occurs during the data analysis phase. The data analysis phase includes all uses of data from simple

visualization and report generation to numerical analysis and modeling. If a data inventory system is

available, the creation of data inventory (catalog) entries relating created products to source data may

also be included in this phase. A model of the data analysis phase is depicted below.

In this scenario, an analyst interacts with a data access application (user interface) that interacts with

several independent but related repositories. Through this application the analyst extracts a data set for

use in an analytical model run. The analyst then runs an analytic application which takes the extracted

data set as input and executes an actual analytic model. In this scenario, the user interface is

independent of the model application which is run in a batch mode. Once the model is completed, it

writes an output analysis result file and an analyst is notified of the completion. The second analyst uses

this analysis result in the generation of a report.


13/18



Of specific note in this scenario is that there may be any number of user interface applications that

connect to separate data repositories. This allows the creation of very lightweight, specialized user

interface applications that are very easy to use for specific purposes. It is likely that this scenario could

be further developed to automate the entire process of data extraction through the model run with no

interaction from the analyst. Finally, it is feasible that certain reports may also be automatically

generated as a secondary output of the model when run to avoid the final step of report generation as

well.

ConclusionsInformation management includes a complex set of activities that are focused on ensuring investments

in data yield maximal returns. These activities primarily serve to enhance peoples experiences

interacting with information. A successful information management team will address data from a

human perspective to ensure data is available where and when it is required, subject to relevant access

and regulatory constraints. Further, information management provides cost savings to an organization

by ensuring data is used effectively and avoiding duplicative collection efforts.

The application of technology to operational practices and processes should above all increase the

effectiveness of the organization. The consideration of effectiveness should apply to both human

activities and operational costs. Redundant data collection is a significant cost in both monetary and

human labor. The effective application of technology may result in a change in where labor hours are

spent, but should result in a net reduction in labor or cost. In current operational paradigms, there is

always more work to be done than can be accomplished. In many cases, it is the effective application of

technologies that enables these additional capabilities to be accomplished within current resource

constraints. This is the do more with less reality.

Automation technology, primarily in the form of software development is an ongoing effort that is best

initiated as in-house practices to enhance effectiveness and efficiency. As a specific development

effort becomes too large to effectively undertake in-house, only then should it be contracted as a

deliverable product. In all circumstances, integration of technologies to current and planned

infrastructures should be a pre-requisite when considering a technology acquisition. Commercial off the

shelf applications should be considered, but not necessarily adopted based upon their ability to be

integrated into the organizations infrastructure.


14/18



Example FlowThis section is a depiction of an example data workflow from data collection to final incorporation for

use within a set of data repositories. The identification, modeling, implementation and management of

data repositories is a process, those listed in the example flow are merely notional at this time.


15/18



Example Distributed ApplicationsThis section is a depiction of how user interface applications can be constructed that provides differing

views into isolated data repositories. All applications and data repositories are notional to provide a

pictorial representation of how data repositories may be reused and dynamically integrated by user

interface applications.

Basic Software Architecture

First, it is critical that there is a basic understanding of software applications. A software application is

executable code that consists of three primary elements:

User interface, which may be command-line, graphical or programmatic (such as a web-service or RSS feed)

Business logic, which is the portion of the code that performs the computation theapplication is constructed for

Data logic, which is all portions of the code that deal with the manipulation and storageof the data for the application.

Within every application, the code is arranged as a set of computational units which are a set of

functions. Each function acts upon input data provided to the function to produce output data which is

returned to the caller. This basic flow forms the basis of everything a computer can do.

In many cases, the computational unit will access data from or save data to an external source (such as a

file or a display field). This pattern forms the basis for long-term storage and retrieval of data.

In a basic application, all of this may be contained in a single file such as an exe within Microsoft

Windows operating systems. In more complex applications, this logic may be separated into any

number of files, each of which performs some portion of the overall operations for the application.

As an application becomes more complex or is developed as a network application (such as a web-site or

web-enabled application) the three primary elements may be physically separate and run on different

computers. For example, in a typical web-site the user interface physically executes on the web

browser, the business logic runs on the web server and the data logic runs within a database server. Theseparation of the application into separate pieces or tiers allows for more flexibility. This basic form of

application is known as a three tierapplication as it involves three separate computing tiers: user

interface (presentation), processing (business) and storage (data).


16/18



As applications become pervasive, there are many areas where any portion of an application may be

reused. Data for example, may be common to several applications and therefore use a single, common

database. Likewise, computational processing can be reused across multiple applications.

In general terms, there is a separation of concerns between each part of a system. The data for a

system is handled by a specific data input/output (I/O) component. That data I/O component is

specialized to a fixed set of data and operations upon that data. The analytical processing is handled by

a specific business logic component. The user interactions are handled by a specialized user interface

(presentation layer) component. This forms the basic three-tier architecture for an application as

depicted below.

As this model is depicted above, there is little room for enhancement. By extending this concept as

separate, stand-alone applications, each underlying component can be reused as needed. In the model

ofcomponent based computing, each part of the system is developed as a stand-alone component,

which may be used as a stand-alone application. Each of these lightweight, purpose-specific

applications may be connected in various ways to create new applications with minimal development

effort.

As new applications are developed, they each share an increasing amount of capability and retrieve

their data using a standard component. This permits multiple applications to share data elements

seamlessly even when using isolated databases. Eventually, each application becomes simply a user-


17/18



interface connecting existing processing and data services together. An example of this form of reuse is

depicted below.

Each of the application blocks is a notional user interface application that connects to a single data

service. The data service for the application connects to multiple data repositories as needed through

separate data specific services. For example, the Site Management Application user interface

application calls into a projects service (not depicted), which provides read-only access to the

Projects DB repository. The Site Management Application also calls into a sites service (not

depicted), which provides full read/write access to the Sites DB repository. These activities allow for a

single repository for project data (the depiction of the project itself, not data collected for the project)

to be reused by many applications consistently.

In the graphic below, there are five separate applications as shown by the presentation layers (GUIs). A

user may log-in to any of the five applications and each will provide a different set of capabilities.


18/18



Applications A, B and C are well-designed three tier applications. Application D has a dedicated

business layer, but uses the existing data layer from application C for its data. This allows application

D to be developed more quickly using an existing data layer from application C. Finally, application

E reuses the business logic layer from application D and the data layer from application A. This

form of reuse allows new applications to be constructed based upon a minimal cross-section of what is

required based upon which components already exist. In practice this has been difficult to achieve

(these concepts have been around since the 1970s), due to technology (which is finally being overcome

via technologies such as XML and web services) and business limitations. The business limitations tend

to be the hardest to overcome as they are driven by time and costs. In general, this form of

development requires a commitment to doing the right thing while striving to maintain reasonable

costs and schedules.

Appendices

References

Documents

Information Management Flows - Doc