8/9/2019 Information Management Flows - Doc
1/18
CRF-RDTE-TR-20100202-07
11/2/2009
Public Distribution| Michael Corsello
CORSELLO
RESEARCH
FOUNDATION
INFORMATION MANAGEMENTBASIC ACTIVITIES
8/9/2019 Information Management Flows - Doc
2/18
Corsello Research Foundation
Public Distribution CRF-RDTE-TR-20100202-07
AbstractThe Information Lifecycle has several basic operational workflows that need to be planned for each
organization. The basic concepts for these workflows are fairly straightforward when actively
considered.
8/9/2019 Information Management Flows - Doc
3/18
Corsello Research Foundation
Public Distribution CRF-RDTE-TR-20100202-07
Table of ContentsAbstract ......................................................................................................................................................... 2
Introduction .................................................................................................................................................. 4
Information Management Workflows .......................................................................................................... 4
Project Planning ........................................................................................................................................ 4
Data Management .................................................................................................................................... 4
Data Integration ........................................................................................................................................ 5
Systems Integration .................................................................................................................................. 5
Business Operational Workflows .................................................................................................................. 6
Effort Initiation .......................................................................................................................................... 7
Data Standardization ................................................................................................................................ 9
Data Collection ........................................................................................................................................ 10
Data Processing ....................................................................................................................................... 11
Data Analysis ........................................................................................................................................... 12
Conclusions ................................................................................................................................................. 13
Example Flow .............................................................................................................................................. 14
Example Distributed Applications ............................................................................................................... 15
Basic Software Architecture .................................................................................................................... 15
Appendices .................................................................................................................................................. 18
References .............................................................................................................................................. 18
8/9/2019 Information Management Flows - Doc
4/18
Corsello Research Foundation
Public Distribution CRF-RDTE-TR-20100202-07
IntroductionInformation management covers the entire spectrum of activities that relate to the creation, use and
disposal of data. In general, information management encompasses most information technology
activities as well as all information related activities that enable people to make use of data, digital or
otherwise. It is important to note that information management is not constrained by technology and
instead is concerned with the optimizing the human benefit realized from data.
Information Management WorkflowsThere are several basic functions in information management, each of which has its own workflows:
Project Planning Data Management Data Integration Systems Integration
Project Planning
Project planning includes the coordinated effort to identify what information is required for and
produced by a project. This effort includes matching identified data requirements to existing data
repositories and initiating efforts to create repositories for new data sets.
Data Management
Data management activities include all aspects of coordinating and optimizing the collection, discovery,
use, maintenance and disposal of data. This includes data modeling for new repositories and business
process definition to select an effective methodology for data handling from collection to disposal.
8/9/2019 Information Management Flows - Doc
5/18
Corsello Research Foundation
Public Distribution CRF-RDTE-TR-20100202-07
Data Integration
Data integration is the set of all activities required to ensure data is discoverable and that related data
repositories can be queried across. In multi-organizational scenarios this may include the development
of data sharing agreements, policies and software tools to integrate separate data repositories.
Systems Integration
The systems integration workflows include all activities for the planning of systems development
activities including the prioritization of potential development activities. Systems integration includesthe development of stand-alone tools, integrating existing tools and guiding organizational
architectures.
8/9/2019 Information Management Flows - Doc
6/18
Corsello Research Foundation
Public Distribution CRF-RDTE-TR-20100202-07
Business Operational WorkflowsOutside of the Information Management specific workflows, the day to day business activities involve
the application of Information Management based upon the currently implemented processes and
supporting technologies.
As it relates to the information lifecycle, business operations will flow through similar phases of activities
such as:
Business Phase Information Lifecycle Phase
Effort Initiation Creation (Planning)Data Standardization Creation (Planning)
Data Collection Creation (Collection)
Data Processing Creation (Assessment / Ingestion)
Data Analysis Distribution and Use
Each of these business phases provides an opportunity for automation where appropriate, and some
automation tools (applications) may be used across multiple phases. In general terms, the data, phase,
tools and use are all independent yet related. Application of technology and processes to business
efforts should be designed to increase the effectiveness of the personnel and data.
When depicted as an overall system, the flow of data products moves from collection through
processing to analysis, which becomes a secondary form of collection. This overall cycle is depicted
below.
8/9/2019 Information Management Flows - Doc
7/18
Corsello Research Foundation
Public Distribution CRF-RDTE-TR-20100202-07
When viewed at a high-level, omitting the processes of initiating efforts and standardizing data, the
process becomes a clear cycle as shown below.
The primary flow of read analyze post becomes an endless cycle of reuse for data and analytic
results. It is important to remember in this context that analysis includes any form of synthesis, such as
writing a report and not just numerical analysis. These output products are posted back to repositories
that are then available to be discovered for future use.
Effort Initiation
Whenever a new work effort (project, study, etc.) is undertaken, the entire process of identifying the
need, scoping the work and planning the execution is within this phase of the process. The initiation
phase also considers the high-level aspects of execution to the effort wrap-up and close out to ensure
the effort produced the results expected. An example of this phase workflow is provided below.
8/9/2019 Information Management Flows - Doc
8/18
Corsello Research Foundation
Public Distribution CRF-RDTE-TR-20100202-07
As the effort is planned, a scope is defined which will drive the data needs for the effort. Under the data
standardization phase, the relevant collection standards will be produced. These data standards are
relayed to the effort team prior to and along with the official kick off of the effort.
As the effort proceeds, technicians will collect data under the data collection phase based upon the
standards provided for the effort. This data is then submitted to the quality assurance team which
operates in the data processing phase and ensures all data is deposited in the appropriate operational
data repositories (which could be a database or file system).
As collected data is used, this occurs under the data analysis phase which allows for the use of data in
analysis and report generation. After all operations are complete, data results, reports and other (non-
technological) products contribute to the final portion of the initiation phase, which is effort closeout.
At this point, the effort is complete and all contractual work is complete, accepted and residing in the
appropriate locations.
It is of significant note that this set of processes indicates a logical set of flows that relate to where
technologies may be used in the execution of operational activities. There is no implication thattechnologies actually must be used or how those technologies used are applied. The adaptation of
technologies to operations should be considered on a cost/benefit/time basis and implemented
accordingly.
8/9/2019 Information Management Flows - Doc
9/18
Corsello Research Foundation
Public Distribution CRF-RDTE-TR-20100202-07
Data Standardization
The data standardization phase is a complex, ongoing set of processes that involves both project effort
and organizational (overarching) activities. In the overarching context, data standardization involves the
efforts of managing repositories and data models associated with the information lifecycle. In this
paper, we will only cover the processes associated with project efforts.
Data standardization processes involve the definition of data collection standards and data repositories
to conform to industry and organizational best practices and standard data models. As the organization
creates data repositories, those repositories will have well-defined data models. As data is collected, it
will need to conform or be transformable to the data model of the current repository. Once a data
collection standard is developed for a specific type of data (such as water quality or more simply water
temperature), that standard will be applied to all efforts collecting that type of data. Since a given effort
will likely collect multiple types of data, each with its own standard, an effort specific standard will be
created that aggregates the individual standards (perhaps as simple as just bundling individual pdf files).
An example of the data standardization workflow is depicted below.
Once an effort is scoped, the activities and general deliverables are provided to the data standardization
team, which will evaluate the data needs for the effort. If any anticipated data elements do not have acurrently defined standard, the data team will evaluate existing industry standards, instrument vendor
standards and applications to develop an appropriate data model. This data model will then be
evaluated for data repository development under the overall Information Management processes.
When the data collection standards are developed for the effort, they are delivered to the management
team prior to the initial effort kick-off. From there, the data standards are provided to all field data
8/9/2019 Information Management Flows - Doc
10/18
Corsello Research Foundation
Public Distribution CRF-RDTE-TR-20100202-07
technicians, labs, contractors and the internal data quality control team. From this point on in the
effort, the data standardization team is available to accommodate issues as they arise and to provide
guidance on implementation and execution of data related efforts.
Data Collection
The overall process of collecting data via any means (automated, telemetry, manual, etc); occur underthe data collection phase of the effort. It is quite likely that some data collections will transcend
individual efforts (such as telemetry) and as such should be considered as an atomic data collection
effort (data collection is the sum total of the effort). For operational efforts, the data collection
activities will likely involve multiple physical locations, tools and techniques which must be handled
under the constraints of the physical environment and the data collection standards. The general phase
processes are loosely defined to support flexibility in the field and really only define the planning of data
collection and the handling of data once collected. The overall workflow for data collection is defined
below.
Once data collection standards are provided to the effort management team, the effort can be officially
initiated. The data collection standards are provided to all data collection team members and the
quality assurance team. At this point, data collection efforts are conducted. As data is collected, the
raw data files are submitted via the collaboration tools (in the example, this is SharePoint), which
initiates the data processing phase for ingestion to the official repositories. If any data fails the QA/QC
process, the data collection teams will perform corrective actions and re-submit in the same manner as
before.
Availability and use of collaboration tools will enhance collection effectiveness and provide a basic chain
of custody for data submittals and rejections. For many real-time collection efforts (such as SCADA /telemetry), the data submittals are direct to repository and therefore have no posting to the
collaboration tools. However, the collaboration tools are still used for communicating interactions
between team members to provide a chain of communication and custody.
8/9/2019 Information Management Flows - Doc
11/18
Corsello Research Foundation
Public Distribution CRF-RDTE-TR-20100202-07
Data Processing
The overall process of assuring data is collected according to defined standards and is of sufficient
quality is conducted in the data processing phase. This phase handles the entire set of processes to take
collected data in its raw, submitted form through the QA/QC process and perform any actions upon the
data to prepare it for loading into the final data repositories. Finally, the last step of the data processing
phase is the loading or transfer of accepted data into any and all final production repositories for the
collected data.
In many cases, a data collection effort will involve splitting the collected data into multiple repositories
and possibly copying data into project/effort specific and global (authoritative) repositories. The
overall process for data processing is depicted below.
Once the data is collected and submitted in an acceptable format, the QA/QC team receives the
submitted data for assessment. The QA/QC team evaluates the data and if any issues are found, those
issues are posted to the collaboration tool where the collection team(s) are expected to perform
corrective actions. If the data passes all QA/QC checks, the QA/QC team will accept the data and beginprocessing the data.
The
QA/QC team performs any necessary transformations and partitioning to put the data in a loadable form
for the final repositories. At this point, the data is loaded into the appropriate locations/repositories
(which may be databases or file systems) for general use. If this is final data for public release, this may
also include steps to provide the data to the public affairs office for final handling and release. It is
8/9/2019 Information Management Flows - Doc
12/18
Corsello Research Foundation
Public Distribution CRF-RDTE-TR-20100202-07
significant to note that though the data may be loaded into separate repositories (as depicted above);
those repositories may be related via any number of mechanisms. It is this processing and loading
process that ensures the relations are in place, available and correct.
Data Analysis
Finally, once data is loaded and available, the use of data (from any effort, including historical data)occurs during the data analysis phase. The data analysis phase includes all uses of data from simple
visualization and report generation to numerical analysis and modeling. If a data inventory system is
available, the creation of data inventory (catalog) entries relating created products to source data may
also be included in this phase. A model of the data analysis phase is depicted below.
In this scenario, an analyst interacts with a data access application (user interface) that interacts with
several independent but related repositories. Through this application the analyst extracts a data set for
use in an analytical model run. The analyst then runs an analytic application which takes the extracted
data set as input and executes an actual analytic model. In this scenario, the user interface is
independent of the model application which is run in a batch mode. Once the model is completed, it
writes an output analysis result file and an analyst is notified of the completion. The second analyst uses
this analysis result in the generation of a report.
8/9/2019 Information Management Flows - Doc
13/18
Corsello Research Foundation
Public Distribution CRF-RDTE-TR-20100202-07
Of specific note in this scenario is that there may be any number of user interface applications that
connect to separate data repositories. This allows the creation of very lightweight, specialized user
interface applications that are very easy to use for specific purposes. It is likely that this scenario could
be further developed to automate the entire process of data extraction through the model run with no
interaction from the analyst. Finally, it is feasible that certain reports may also be automatically
generated as a secondary output of the model when run to avoid the final step of report generation as
well.
ConclusionsInformation management includes a complex set of activities that are focused on ensuring investments
in data yield maximal returns. These activities primarily serve to enhance peoples experiences
interacting with information. A successful information management team will address data from a
human perspective to ensure data is available where and when it is required, subject to relevant access
and regulatory constraints. Further, information management provides cost savings to an organization
by ensuring data is used effectively and avoiding duplicative collection efforts.
The application of technology to operational practices and processes should above all increase the
effectiveness of the organization. The consideration of effectiveness should apply to both human
activities and operational costs. Redundant data collection is a significant cost in both monetary and
human labor. The effective application of technology may result in a change in where labor hours are
spent, but should result in a net reduction in labor or cost. In current operational paradigms, there is
always more work to be done than can be accomplished. In many cases, it is the effective application of
technologies that enables these additional capabilities to be accomplished within current resource
constraints. This is the do more with less reality.
Automation technology, primarily in the form of software development is an ongoing effort that is best
initiated as in-house practices to enhance effectiveness and efficiency. As a specific development
effort becomes too large to effectively undertake in-house, only then should it be contracted as a
deliverable product. In all circumstances, integration of technologies to current and planned
infrastructures should be a pre-requisite when considering a technology acquisition. Commercial off the
shelf applications should be considered, but not necessarily adopted based upon their ability to be
integrated into the organizations infrastructure.
8/9/2019 Information Management Flows - Doc
14/18
Corsello Research Foundation
Public Distribution CRF-RDTE-TR-20100202-07
Example FlowThis section is a depiction of an example data workflow from data collection to final incorporation for
use within a set of data repositories. The identification, modeling, implementation and management of
data repositories is a process, those listed in the example flow are merely notional at this time.
8/9/2019 Information Management Flows - Doc
15/18
Corsello Research Foundation
Public Distribution CRF-RDTE-TR-20100202-07
Example Distributed ApplicationsThis section is a depiction of how user interface applications can be constructed that provides differing
views into isolated data repositories. All applications and data repositories are notional to provide a
pictorial representation of how data repositories may be reused and dynamically integrated by user
interface applications.
Basic Software Architecture
First, it is critical that there is a basic understanding of software applications. A software application is
executable code that consists of three primary elements:
User interface, which may be command-line, graphical or programmatic (such as a web-service or RSS feed)
Business logic, which is the portion of the code that performs the computation theapplication is constructed for
Data logic, which is all portions of the code that deal with the manipulation and storageof the data for the application.
Within every application, the code is arranged as a set of computational units which are a set of
functions. Each function acts upon input data provided to the function to produce output data which is
returned to the caller. This basic flow forms the basis of everything a computer can do.
In many cases, the computational unit will access data from or save data to an external source (such as a
file or a display field). This pattern forms the basis for long-term storage and retrieval of data.
In a basic application, all of this may be contained in a single file such as an exe within Microsoft
Windows operating systems. In more complex applications, this logic may be separated into any
number of files, each of which performs some portion of the overall operations for the application.
As an application becomes more complex or is developed as a network application (such as a web-site or
web-enabled application) the three primary elements may be physically separate and run on different
computers. For example, in a typical web-site the user interface physically executes on the web
browser, the business logic runs on the web server and the data logic runs within a database server. Theseparation of the application into separate pieces or tiers allows for more flexibility. This basic form of
application is known as a three tierapplication as it involves three separate computing tiers: user
interface (presentation), processing (business) and storage (data).
8/9/2019 Information Management Flows - Doc
16/18
Corsello Research Foundation
Public Distribution CRF-RDTE-TR-20100202-07
As applications become pervasive, there are many areas where any portion of an application may be
reused. Data for example, may be common to several applications and therefore use a single, common
database. Likewise, computational processing can be reused across multiple applications.
In general terms, there is a separation of concerns between each part of a system. The data for a
system is handled by a specific data input/output (I/O) component. That data I/O component is
specialized to a fixed set of data and operations upon that data. The analytical processing is handled by
a specific business logic component. The user interactions are handled by a specialized user interface
(presentation layer) component. This forms the basic three-tier architecture for an application as
depicted below.
As this model is depicted above, there is little room for enhancement. By extending this concept as
separate, stand-alone applications, each underlying component can be reused as needed. In the model
ofcomponent based computing, each part of the system is developed as a stand-alone component,
which may be used as a stand-alone application. Each of these lightweight, purpose-specific
applications may be connected in various ways to create new applications with minimal development
effort.
As new applications are developed, they each share an increasing amount of capability and retrieve
their data using a standard component. This permits multiple applications to share data elements
seamlessly even when using isolated databases. Eventually, each application becomes simply a user-
8/9/2019 Information Management Flows - Doc
17/18
Corsello Research Foundation
Public Distribution CRF-RDTE-TR-20100202-07
interface connecting existing processing and data services together. An example of this form of reuse is
depicted below.
Each of the application blocks is a notional user interface application that connects to a single data
service. The data service for the application connects to multiple data repositories as needed through
separate data specific services. For example, the Site Management Application user interface
application calls into a projects service (not depicted), which provides read-only access to the
Projects DB repository. The Site Management Application also calls into a sites service (not
depicted), which provides full read/write access to the Sites DB repository. These activities allow for a
single repository for project data (the depiction of the project itself, not data collected for the project)
to be reused by many applications consistently.
In the graphic below, there are five separate applications as shown by the presentation layers (GUIs). A
user may log-in to any of the five applications and each will provide a different set of capabilities.
8/9/2019 Information Management Flows - Doc
18/18
Corsello Research Foundation
Public Distribution CRF-RDTE-TR-20100202-07
Applications A, B and C are well-designed three tier applications. Application D has a dedicated
business layer, but uses the existing data layer from application C for its data. This allows application
D to be developed more quickly using an existing data layer from application C. Finally, application
E reuses the business logic layer from application D and the data layer from application A. This
form of reuse allows new applications to be constructed based upon a minimal cross-section of what is
required based upon which components already exist. In practice this has been difficult to achieve
(these concepts have been around since the 1970s), due to technology (which is finally being overcome
via technologies such as XML and web services) and business limitations. The business limitations tend
to be the hardest to overcome as they are driven by time and costs. In general, this form of
development requires a commitment to doing the right thing while striving to maintain reasonable
costs and schedules.
Appendices
References