Information Management Flows - Doc

Embed Size (px)

Citation preview

  • 8/9/2019 Information Management Flows - Doc

    1/18

    CRF-RDTE-TR-20100202-07

    11/2/2009

    Public Distribution| Michael Corsello

    CORSELLO

    RESEARCH

    FOUNDATION

    INFORMATION MANAGEMENTBASIC ACTIVITIES

  • 8/9/2019 Information Management Flows - Doc

    2/18

    Corsello Research Foundation

    Public Distribution CRF-RDTE-TR-20100202-07

    AbstractThe Information Lifecycle has several basic operational workflows that need to be planned for each

    organization. The basic concepts for these workflows are fairly straightforward when actively

    considered.

  • 8/9/2019 Information Management Flows - Doc

    3/18

    Corsello Research Foundation

    Public Distribution CRF-RDTE-TR-20100202-07

    Table of ContentsAbstract ......................................................................................................................................................... 2

    Introduction .................................................................................................................................................. 4

    Information Management Workflows .......................................................................................................... 4

    Project Planning ........................................................................................................................................ 4

    Data Management .................................................................................................................................... 4

    Data Integration ........................................................................................................................................ 5

    Systems Integration .................................................................................................................................. 5

    Business Operational Workflows .................................................................................................................. 6

    Effort Initiation .......................................................................................................................................... 7

    Data Standardization ................................................................................................................................ 9

    Data Collection ........................................................................................................................................ 10

    Data Processing ....................................................................................................................................... 11

    Data Analysis ........................................................................................................................................... 12

    Conclusions ................................................................................................................................................. 13

    Example Flow .............................................................................................................................................. 14

    Example Distributed Applications ............................................................................................................... 15

    Basic Software Architecture .................................................................................................................... 15

    Appendices .................................................................................................................................................. 18

    References .............................................................................................................................................. 18

  • 8/9/2019 Information Management Flows - Doc

    4/18

    Corsello Research Foundation

    Public Distribution CRF-RDTE-TR-20100202-07

    IntroductionInformation management covers the entire spectrum of activities that relate to the creation, use and

    disposal of data. In general, information management encompasses most information technology

    activities as well as all information related activities that enable people to make use of data, digital or

    otherwise. It is important to note that information management is not constrained by technology and

    instead is concerned with the optimizing the human benefit realized from data.

    Information Management WorkflowsThere are several basic functions in information management, each of which has its own workflows:

    Project Planning Data Management Data Integration Systems Integration

    Project Planning

    Project planning includes the coordinated effort to identify what information is required for and

    produced by a project. This effort includes matching identified data requirements to existing data

    repositories and initiating efforts to create repositories for new data sets.

    Data Management

    Data management activities include all aspects of coordinating and optimizing the collection, discovery,

    use, maintenance and disposal of data. This includes data modeling for new repositories and business

    process definition to select an effective methodology for data handling from collection to disposal.

  • 8/9/2019 Information Management Flows - Doc

    5/18

    Corsello Research Foundation

    Public Distribution CRF-RDTE-TR-20100202-07

    Data Integration

    Data integration is the set of all activities required to ensure data is discoverable and that related data

    repositories can be queried across. In multi-organizational scenarios this may include the development

    of data sharing agreements, policies and software tools to integrate separate data repositories.

    Systems Integration

    The systems integration workflows include all activities for the planning of systems development

    activities including the prioritization of potential development activities. Systems integration includesthe development of stand-alone tools, integrating existing tools and guiding organizational

    architectures.

  • 8/9/2019 Information Management Flows - Doc

    6/18

    Corsello Research Foundation

    Public Distribution CRF-RDTE-TR-20100202-07

    Business Operational WorkflowsOutside of the Information Management specific workflows, the day to day business activities involve

    the application of Information Management based upon the currently implemented processes and

    supporting technologies.

    As it relates to the information lifecycle, business operations will flow through similar phases of activities

    such as:

    Business Phase Information Lifecycle Phase

    Effort Initiation Creation (Planning)Data Standardization Creation (Planning)

    Data Collection Creation (Collection)

    Data Processing Creation (Assessment / Ingestion)

    Data Analysis Distribution and Use

    Each of these business phases provides an opportunity for automation where appropriate, and some

    automation tools (applications) may be used across multiple phases. In general terms, the data, phase,

    tools and use are all independent yet related. Application of technology and processes to business

    efforts should be designed to increase the effectiveness of the personnel and data.

    When depicted as an overall system, the flow of data products moves from collection through

    processing to analysis, which becomes a secondary form of collection. This overall cycle is depicted

    below.

  • 8/9/2019 Information Management Flows - Doc

    7/18

    Corsello Research Foundation

    Public Distribution CRF-RDTE-TR-20100202-07

    When viewed at a high-level, omitting the processes of initiating efforts and standardizing data, the

    process becomes a clear cycle as shown below.

    The primary flow of read analyze post becomes an endless cycle of reuse for data and analytic

    results. It is important to remember in this context that analysis includes any form of synthesis, such as

    writing a report and not just numerical analysis. These output products are posted back to repositories

    that are then available to be discovered for future use.

    Effort Initiation

    Whenever a new work effort (project, study, etc.) is undertaken, the entire process of identifying the

    need, scoping the work and planning the execution is within this phase of the process. The initiation

    phase also considers the high-level aspects of execution to the effort wrap-up and close out to ensure

    the effort produced the results expected. An example of this phase workflow is provided below.

  • 8/9/2019 Information Management Flows - Doc

    8/18

    Corsello Research Foundation

    Public Distribution CRF-RDTE-TR-20100202-07

    As the effort is planned, a scope is defined which will drive the data needs for the effort. Under the data

    standardization phase, the relevant collection standards will be produced. These data standards are

    relayed to the effort team prior to and along with the official kick off of the effort.

    As the effort proceeds, technicians will collect data under the data collection phase based upon the

    standards provided for the effort. This data is then submitted to the quality assurance team which

    operates in the data processing phase and ensures all data is deposited in the appropriate operational

    data repositories (which could be a database or file system).

    As collected data is used, this occurs under the data analysis phase which allows for the use of data in

    analysis and report generation. After all operations are complete, data results, reports and other (non-

    technological) products contribute to the final portion of the initiation phase, which is effort closeout.

    At this point, the effort is complete and all contractual work is complete, accepted and residing in the

    appropriate locations.

    It is of significant note that this set of processes indicates a logical set of flows that relate to where

    technologies may be used in the execution of operational activities. There is no implication thattechnologies actually must be used or how those technologies used are applied. The adaptation of

    technologies to operations should be considered on a cost/benefit/time basis and implemented

    accordingly.

  • 8/9/2019 Information Management Flows - Doc

    9/18

    Corsello Research Foundation

    Public Distribution CRF-RDTE-TR-20100202-07

    Data Standardization

    The data standardization phase is a complex, ongoing set of processes that involves both project effort

    and organizational (overarching) activities. In the overarching context, data standardization involves the

    efforts of managing repositories and data models associated with the information lifecycle. In this

    paper, we will only cover the processes associated with project efforts.

    Data standardization processes involve the definition of data collection standards and data repositories

    to conform to industry and organizational best practices and standard data models. As the organization

    creates data repositories, those repositories will have well-defined data models. As data is collected, it

    will need to conform or be transformable to the data model of the current repository. Once a data

    collection standard is developed for a specific type of data (such as water quality or more simply water

    temperature), that standard will be applied to all efforts collecting that type of data. Since a given effort

    will likely collect multiple types of data, each with its own standard, an effort specific standard will be

    created that aggregates the individual standards (perhaps as simple as just bundling individual pdf files).

    An example of the data standardization workflow is depicted below.

    Once an effort is scoped, the activities and general deliverables are provided to the data standardization

    team, which will evaluate the data needs for the effort. If any anticipated data elements do not have acurrently defined standard, the data team will evaluate existing industry standards, instrument vendor

    standards and applications to develop an appropriate data model. This data model will then be

    evaluated for data repository development under the overall Information Management processes.

    When the data collection standards are developed for the effort, they are delivered to the management

    team prior to the initial effort kick-off. From there, the data standards are provided to all field data

  • 8/9/2019 Information Management Flows - Doc

    10/18

    Corsello Research Foundation

    Public Distribution CRF-RDTE-TR-20100202-07

    technicians, labs, contractors and the internal data quality control team. From this point on in the

    effort, the data standardization team is available to accommodate issues as they arise and to provide

    guidance on implementation and execution of data related efforts.

    Data Collection

    The overall process of collecting data via any means (automated, telemetry, manual, etc); occur underthe data collection phase of the effort. It is quite likely that some data collections will transcend

    individual efforts (such as telemetry) and as such should be considered as an atomic data collection

    effort (data collection is the sum total of the effort). For operational efforts, the data collection

    activities will likely involve multiple physical locations, tools and techniques which must be handled

    under the constraints of the physical environment and the data collection standards. The general phase

    processes are loosely defined to support flexibility in the field and really only define the planning of data

    collection and the handling of data once collected. The overall workflow for data collection is defined

    below.

    Once data collection standards are provided to the effort management team, the effort can be officially

    initiated. The data collection standards are provided to all data collection team members and the

    quality assurance team. At this point, data collection efforts are conducted. As data is collected, the

    raw data files are submitted via the collaboration tools (in the example, this is SharePoint), which

    initiates the data processing phase for ingestion to the official repositories. If any data fails the QA/QC

    process, the data collection teams will perform corrective actions and re-submit in the same manner as

    before.

    Availability and use of collaboration tools will enhance collection effectiveness and provide a basic chain

    of custody for data submittals and rejections. For many real-time collection efforts (such as SCADA /telemetry), the data submittals are direct to repository and therefore have no posting to the

    collaboration tools. However, the collaboration tools are still used for communicating interactions

    between team members to provide a chain of communication and custody.

  • 8/9/2019 Information Management Flows - Doc

    11/18

    Corsello Research Foundation

    Public Distribution CRF-RDTE-TR-20100202-07

    Data Processing

    The overall process of assuring data is collected according to defined standards and is of sufficient

    quality is conducted in the data processing phase. This phase handles the entire set of processes to take

    collected data in its raw, submitted form through the QA/QC process and perform any actions upon the

    data to prepare it for loading into the final data repositories. Finally, the last step of the data processing

    phase is the loading or transfer of accepted data into any and all final production repositories for the

    collected data.

    In many cases, a data collection effort will involve splitting the collected data into multiple repositories

    and possibly copying data into project/effort specific and global (authoritative) repositories. The

    overall process for data processing is depicted below.

    Once the data is collected and submitted in an acceptable format, the QA/QC team receives the

    submitted data for assessment. The QA/QC team evaluates the data and if any issues are found, those

    issues are posted to the collaboration tool where the collection team(s) are expected to perform

    corrective actions. If the data passes all QA/QC checks, the QA/QC team will accept the data and beginprocessing the data.

    The

    QA/QC team performs any necessary transformations and partitioning to put the data in a loadable form

    for the final repositories. At this point, the data is loaded into the appropriate locations/repositories

    (which may be databases or file systems) for general use. If this is final data for public release, this may

    also include steps to provide the data to the public affairs office for final handling and release. It is

  • 8/9/2019 Information Management Flows - Doc

    12/18

    Corsello Research Foundation

    Public Distribution CRF-RDTE-TR-20100202-07

    significant to note that though the data may be loaded into separate repositories (as depicted above);

    those repositories may be related via any number of mechanisms. It is this processing and loading

    process that ensures the relations are in place, available and correct.

    Data Analysis

    Finally, once data is loaded and available, the use of data (from any effort, including historical data)occurs during the data analysis phase. The data analysis phase includes all uses of data from simple

    visualization and report generation to numerical analysis and modeling. If a data inventory system is

    available, the creation of data inventory (catalog) entries relating created products to source data may

    also be included in this phase. A model of the data analysis phase is depicted below.

    In this scenario, an analyst interacts with a data access application (user interface) that interacts with

    several independent but related repositories. Through this application the analyst extracts a data set for

    use in an analytical model run. The analyst then runs an analytic application which takes the extracted

    data set as input and executes an actual analytic model. In this scenario, the user interface is

    independent of the model application which is run in a batch mode. Once the model is completed, it

    writes an output analysis result file and an analyst is notified of the completion. The second analyst uses

    this analysis result in the generation of a report.

  • 8/9/2019 Information Management Flows - Doc

    13/18

    Corsello Research Foundation

    Public Distribution CRF-RDTE-TR-20100202-07

    Of specific note in this scenario is that there may be any number of user interface applications that

    connect to separate data repositories. This allows the creation of very lightweight, specialized user

    interface applications that are very easy to use for specific purposes. It is likely that this scenario could

    be further developed to automate the entire process of data extraction through the model run with no

    interaction from the analyst. Finally, it is feasible that certain reports may also be automatically

    generated as a secondary output of the model when run to avoid the final step of report generation as

    well.

    ConclusionsInformation management includes a complex set of activities that are focused on ensuring investments

    in data yield maximal returns. These activities primarily serve to enhance peoples experiences

    interacting with information. A successful information management team will address data from a

    human perspective to ensure data is available where and when it is required, subject to relevant access

    and regulatory constraints. Further, information management provides cost savings to an organization

    by ensuring data is used effectively and avoiding duplicative collection efforts.

    The application of technology to operational practices and processes should above all increase the

    effectiveness of the organization. The consideration of effectiveness should apply to both human

    activities and operational costs. Redundant data collection is a significant cost in both monetary and

    human labor. The effective application of technology may result in a change in where labor hours are

    spent, but should result in a net reduction in labor or cost. In current operational paradigms, there is

    always more work to be done than can be accomplished. In many cases, it is the effective application of

    technologies that enables these additional capabilities to be accomplished within current resource

    constraints. This is the do more with less reality.

    Automation technology, primarily in the form of software development is an ongoing effort that is best

    initiated as in-house practices to enhance effectiveness and efficiency. As a specific development

    effort becomes too large to effectively undertake in-house, only then should it be contracted as a

    deliverable product. In all circumstances, integration of technologies to current and planned

    infrastructures should be a pre-requisite when considering a technology acquisition. Commercial off the

    shelf applications should be considered, but not necessarily adopted based upon their ability to be

    integrated into the organizations infrastructure.

  • 8/9/2019 Information Management Flows - Doc

    14/18

    Corsello Research Foundation

    Public Distribution CRF-RDTE-TR-20100202-07

    Example FlowThis section is a depiction of an example data workflow from data collection to final incorporation for

    use within a set of data repositories. The identification, modeling, implementation and management of

    data repositories is a process, those listed in the example flow are merely notional at this time.

  • 8/9/2019 Information Management Flows - Doc

    15/18

    Corsello Research Foundation

    Public Distribution CRF-RDTE-TR-20100202-07

    Example Distributed ApplicationsThis section is a depiction of how user interface applications can be constructed that provides differing

    views into isolated data repositories. All applications and data repositories are notional to provide a

    pictorial representation of how data repositories may be reused and dynamically integrated by user

    interface applications.

    Basic Software Architecture

    First, it is critical that there is a basic understanding of software applications. A software application is

    executable code that consists of three primary elements:

    User interface, which may be command-line, graphical or programmatic (such as a web-service or RSS feed)

    Business logic, which is the portion of the code that performs the computation theapplication is constructed for

    Data logic, which is all portions of the code that deal with the manipulation and storageof the data for the application.

    Within every application, the code is arranged as a set of computational units which are a set of

    functions. Each function acts upon input data provided to the function to produce output data which is

    returned to the caller. This basic flow forms the basis of everything a computer can do.

    In many cases, the computational unit will access data from or save data to an external source (such as a

    file or a display field). This pattern forms the basis for long-term storage and retrieval of data.

    In a basic application, all of this may be contained in a single file such as an exe within Microsoft

    Windows operating systems. In more complex applications, this logic may be separated into any

    number of files, each of which performs some portion of the overall operations for the application.

    As an application becomes more complex or is developed as a network application (such as a web-site or

    web-enabled application) the three primary elements may be physically separate and run on different

    computers. For example, in a typical web-site the user interface physically executes on the web

    browser, the business logic runs on the web server and the data logic runs within a database server. Theseparation of the application into separate pieces or tiers allows for more flexibility. This basic form of

    application is known as a three tierapplication as it involves three separate computing tiers: user

    interface (presentation), processing (business) and storage (data).

  • 8/9/2019 Information Management Flows - Doc

    16/18

    Corsello Research Foundation

    Public Distribution CRF-RDTE-TR-20100202-07

    As applications become pervasive, there are many areas where any portion of an application may be

    reused. Data for example, may be common to several applications and therefore use a single, common

    database. Likewise, computational processing can be reused across multiple applications.

    In general terms, there is a separation of concerns between each part of a system. The data for a

    system is handled by a specific data input/output (I/O) component. That data I/O component is

    specialized to a fixed set of data and operations upon that data. The analytical processing is handled by

    a specific business logic component. The user interactions are handled by a specialized user interface

    (presentation layer) component. This forms the basic three-tier architecture for an application as

    depicted below.

    As this model is depicted above, there is little room for enhancement. By extending this concept as

    separate, stand-alone applications, each underlying component can be reused as needed. In the model

    ofcomponent based computing, each part of the system is developed as a stand-alone component,

    which may be used as a stand-alone application. Each of these lightweight, purpose-specific

    applications may be connected in various ways to create new applications with minimal development

    effort.

    As new applications are developed, they each share an increasing amount of capability and retrieve

    their data using a standard component. This permits multiple applications to share data elements

    seamlessly even when using isolated databases. Eventually, each application becomes simply a user-

  • 8/9/2019 Information Management Flows - Doc

    17/18

    Corsello Research Foundation

    Public Distribution CRF-RDTE-TR-20100202-07

    interface connecting existing processing and data services together. An example of this form of reuse is

    depicted below.

    Each of the application blocks is a notional user interface application that connects to a single data

    service. The data service for the application connects to multiple data repositories as needed through

    separate data specific services. For example, the Site Management Application user interface

    application calls into a projects service (not depicted), which provides read-only access to the

    Projects DB repository. The Site Management Application also calls into a sites service (not

    depicted), which provides full read/write access to the Sites DB repository. These activities allow for a

    single repository for project data (the depiction of the project itself, not data collected for the project)

    to be reused by many applications consistently.

    In the graphic below, there are five separate applications as shown by the presentation layers (GUIs). A

    user may log-in to any of the five applications and each will provide a different set of capabilities.

  • 8/9/2019 Information Management Flows - Doc

    18/18

    Corsello Research Foundation

    Public Distribution CRF-RDTE-TR-20100202-07

    Applications A, B and C are well-designed three tier applications. Application D has a dedicated

    business layer, but uses the existing data layer from application C for its data. This allows application

    D to be developed more quickly using an existing data layer from application C. Finally, application

    E reuses the business logic layer from application D and the data layer from application A. This

    form of reuse allows new applications to be constructed based upon a minimal cross-section of what is

    required based upon which components already exist. In practice this has been difficult to achieve

    (these concepts have been around since the 1970s), due to technology (which is finally being overcome

    via technologies such as XML and web services) and business limitations. The business limitations tend

    to be the hardest to overcome as they are driven by time and costs. In general, this form of

    development requires a commitment to doing the right thing while striving to maintain reasonable

    costs and schedules.

    Appendices

    References