Upload
shayne-fountain
View
218
Download
5
Tags:
Embed Size (px)
Citation preview
Exploiting Reference Architecture to
Guide the NASA Earth Science
System EnterpriseChris A. Mattmann
NASA Jet Propulsion LaboratoryUniversity of Southern California
Apache Software Foundation
2
Objectives
Concepts What is domain-specific software engineering
(DSSE) The “Three Lampposts” of DSSE: Domain,
Business, and Technology Domain Specific Software Architectures
Product Lines Relationship between DSSAs, Product Lines, and
Architectural Styles Examples of DSSE at work
3
Objectives
Concepts What is domain-specific software engineering
(DSSE) The Three Key Factors of DSSE: Domain,
Business, and Technology Domain Specific Software Architectures
Product Lines Relationship between DSSAs, Product Lines, and
Architectural Styles Examples of DSSE at work
4
Domain-Specific Software Engineering The traditional view of software engineering shows
us how to come up with solutions for problems de novo
But starting from scratch every time is infeasible This will involve re-inventing many wheels
Once we have built a number of systems that do similar things, we gain critical knowledge that lets us exploit common solutions to common problems In theory, we can simply build “the difference”
between our new target system and systems that have come before
5
Examples of Domains Compilers for programming languages Consumer electronics Electronic commerce system/Web stores Video game Business applications
Basic/Standard/“Pro”
We can subdivide, too: Avionics systems
Boeing Jets Boeing 747-400
Software Architecture: Foundations, Theory, and Practice; Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy; (C) 2008 John Wiley & Sons, Inc. Reprinted with permission.
6
Traditional Software Engineering
One particular problem can be solved in innumerable ways
Software Architecture: Foundations, Theory, and Practice; Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy; (C) 2008 John Wiley & Sons, Inc. Reprinted with permission.
7
Architecture-Based Software Engineering
Given a single problem, we select from a handful of potential architectural styles or architectures, and go from these into specific implementations
Software Architecture: Foundations, Theory, and Practice; Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy; (C) 2008 John Wiley & Sons, Inc. Reprinted with permission.
8
Domain-Specific Software Engineering
We map regions of the problem space (domains) into domain-specific software architectures (DSSAs)
These are specialized into application-specific architectures These are implemented
Software Architecture: Foundations, Theory, and Practice; Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy; (C) 2008 John Wiley & Sons, Inc. Reprinted with permission.
9
Three Key Factors of DSSE Domain
Must have a domain to constrain the problem space and focus development
Technology Must have a variety of technological solutions—tools,
patterns, architectures & styles, legacy systems—to bring to bear on a domain
Business Business goals motivate the use of DSSE
Minimizing costs: reuse assets when possibleMaximize market: develop many related
applications for different kinds of end users
10
Domain Business
Technology
Three Key Factors
Domain Must have a domain
to constrain the problem space and focus development
11
Domain Business
Technology
Three Key Factors Technology
Must have a variety of technological solutions—tools, patterns, architectures & styles, legacy systems—to bring to bear on a domain
12
Domain Business
Technology
Three Key Factors Business
Business goals motivate the use of DSSE
Minimizing costs: reuse assetswhen possible
Maximize market: develop many related applications for different kinds of end users
13
Domain Business
Technology
Three Key Factors Domain + Business “Corporate Core
Competencies” Domain expertise
augmented bybusinessacumen andknowledge ofthe market
14
Domain Business
Technology
Three Key Factors Domain + Technology “Application Family
Architectures” All possible
technologicalsolutions toproblems in adomain
Uninformed and unconstrained bybusiness goalsand knowledge
15
Domain Business
Technology
Three Key Factors Business + Technology “Domain independent
infrastructure” Tools and
techniques forconstructingsystems independent of anyparticular domain
E.g., most generic ADLs, UML, compilers, word processors, general-purpose PCs
16
Domain Business
Technology
Three Key Factors Domain + Business +
Technology “Domain-specific
software engineering” Applies technology
to domain-specificgoals, tempered bybusiness and marketknowledge
17
Domain Business
Technology
Three Key Factors Product-Line
Architectures A specific, related set
of solutions withina broader DSSE
More focus oncommonalities andvariability betweenindividual solutions
18
Becoming More Concrete
Applying DSSE means developing a set of artifacts more specific than an ordinary software architecture Focus on aspects of the domain Focus on domain-specific solutions, techniques,
and patterns These are
A domain model and A domain-specific software architecture (DSSA)
19
Domain Model
A domain model is a set of artifacts that capture information about a domain Functions performed Objects (also known as entities) that perform the
functions, and on which the functions are performed
Data and information that flows through the system Standardizes terminology and semantics Provides the basis for standardizing (or at least
normalizing) descriptions of problems to be solved in the domain
20
Domain Model
Software Architecture: Foundations, Theory, and Practice; Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy; (C) 2008 John Wiley & Sons, Inc. Reprinted with permission.
21
Domain Model
Defines vocabulary for the domain
Software Architecture: Foundations, Theory, and Practice; Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy; (C) 2008 John Wiley & Sons, Inc. Reprinted with permission.
22
Domain Model
Describes the entities and data in the domainSoftware Architecture: Foundations, Theory, and Practice; Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy; (C) 2008 John Wiley & Sons, Inc. Reprinted with permission.
23
Domain Model
Defines how entities and data combine to provide features Software Architecture: Foundations, Theory, and Practice; Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy; (C) 2008 John Wiley & Sons, Inc. Reprinted with permission.
24
Domain Model
Defines how data and control flow through entitiesSoftware Architecture: Foundations, Theory, and Practice; Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy; (C) 2008 John Wiley & Sons, Inc. Reprinted with permission.
25
Info Model: Context Info Diagram Defines high-
level entities Defines what
is considered inside and outside the domain (or subdomains)
Defines relationships and high-level flows
Software Architecture: Foundations, Theory, and Practice; Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy; (C) 2008 John Wiley & Sons, Inc. Reprinted with permission.
26
Info Model: Entity-Relationship Diagram
Defines entities and cardinal relationships between them
Software Architecture: Foundations, Theory, and Practice; Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy; (C) 2008 John Wiley & Sons, Inc. Reprinted with permission.
27
Info Model: Object Diagram
Defines attributes and operations on entities
Closely resembles class diagram in UML but may be more abstract
Software Architecture: Foundations, Theory, and Practice; Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy; (C) 2008 John Wiley & Sons, Inc. Reprinted with permission.
28
Feature Model: Use Case Diagram
Defines use caseswithin the domain
Similar to use casemodels in UML
Software Architecture: Foundations, Theory, and Practice; Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy; (C) 2008 John Wiley & Sons, Inc. Reprinted with permission.
29
Operational Model: State Transition Diagram
Focuses on statesof systems andtransitions betweenthem
Resembles UMLstate diagrams
Software Architecture: Foundations, Theory, and Practice; Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy; (C) 2008 John Wiley & Sons, Inc. Reprinted with permission.
30
Reference Requirements Mandatory
Must display the current status of the Lunar Lander (horizontal and vertical velocities, altitude, remaining fuel)
Must indicate points earned by player based on quality of landing
Optional May display time elapsed
Variant May have different levels of difficulty based on pilot
experience (novice, expert, etc) May have different types of input depending on whether
Auto Navigation is enabled Auto Throttle is enabled
May have to land on different celestial bodies Moon Mars Jupiter’s moons Asteroid
Software Architecture: Foundations, Theory, and Practice; Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy; (C) 2008 John Wiley & Sons, Inc. Reprinted with permission.
31
Domain-Specific Software Architecture Definition: Definition. A domain-specific
software architecture (DSSA) comprises: a reference architecture, which describes a
general computational framework for a significant domain of applications;
a component library, which contains reusable chunks of domain expertise; and
an application configuration method for selecting and configuring components within the architecture to meet particular application requirements. (Hayes-Roth)
32
Reference Architecture
Definition. Reference architecture is the set of principal design decisions that are simultaneously applicable to multiple related systems, typically within an application domain, with explicitly defined points of variation.
Reference architectures are still architectures (since they are also sets of principal design decisions) Distinguished by the presence of explicit points
of variation (explicitly “unmade” decisions)
33
Different Kinds of Reference Architecture
Complete single product architecture A fully worked out exemplar of a system in a
domain, with optional documentation as to how to diversify
Can be relatively weak due to lack of explicit guidance and possibility that example is a ‘toy’
Incomplete invariant architecture Points of commonality as in ordinary architecture,
points of variation are indicated but omitted Invariant architecture with explicit variation
Points of commonality as in ordinary architecture, specific variations indicated and enumerated
34
Example Reference Architecture
Structural viewof Lunar LanderDSSA
Invariant withexplicit pointsof variation Satellite relay Sensors
Software Architecture: Foundations, Theory, and Practice; Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy; (C) 2008 John Wiley & Sons, Inc. Reprinted with permission.
More Specific Example
OODT (Object Oriented Data Technology) framework
Catalog and Archive Service component
35
File Management
04/11/23 ESIP-SUMMER1036
Managing the locations and ancillary information about files, and collections of files Ancillary information is metadata
What’s a product? A collection of some set of files, and/or
collections of files So, you could have collections of other collections
Along with metadata about the product
File Management: Core Terminology
04/11/23 ESIP-SUMMER1037
File Management: Features (1/4)
04/11/23 ESIP-SUMMER1038
Persisting archived files using dynamic metadata and flexible, adaptable policies based on product types rather than the monolithic and inflexible old method of
ProductTypeRepository/ProductName/ProductVersion/ as the filesystem location to store products for all product types.
Clearly separating out the Workflow aspects of the File Manager, from Product ingestion, and flexibly supporting association of Workflows and their subsequent Tasks with any event, not only ingestion.
File Management: Features (2/4)
04/11/23 ESIP-SUMMER1039
Leverage existing transactional models such as Java's Transaction API to support transactional management rather than building our own API.
If we do use any database communication, then making sure that all DB communication is dealt with using standard, available, existing db pooling APIs such as commons-dbcp , available from Apache .
File Management: Features (3/4)
04/11/23 ESIP-SUMMER1040
Clearly separating out the administrative portions of policy management from the existing webapp, and distinguishing what pieces of the webapp are user-centric, and what are administrative-centric.
Supporting heirarchical product structures, such as nested directories that contain many sub-directories, and sub-directories of those sub-directories, with files strewn about at all levels rather than only supporting the existing method of
flat product structures, where all files in a product are at the same tree level.
File Management: Features (4/4)
04/11/23 ESIP-SUMMER1041
Support metadata extraction based on product type or mime-type
Support dynamic product types. The file management component should not need to know about every product type a priori
File Management: Architectural Implications (1/4)
04/11/23 ESIP-SUMMER1042
Managing files Repository: follow the typical repository pattern Manage information about Products, Product Types, and
References to products Managing metadata
Catalog: follow the typical registry pattern Manage product Metadata
Key/Value pairs Separate out the repository and the catalog
This allows data and metadata to be managed independently
File Management: Architectural Implications (2/4)
04/11/23 ESIP-SUMMER1043
Need a way to transfer a product from the client to the File Management service Client gives URIs of files, or collections of files,
which identify References belonging to a Product
How does the transfer actually occur? Can have many different types of data transfer
Local Use native system calls, or cp
Remote Use whatever protocol you want, REST, XML-RPC,
SOAP, WebDAV, etc. Don’t use CORBA or RMI: they’re sooooo last year!
File Management: Architectural Implications (3/4)
04/11/23 ESIP-SUMMER1044
Translating the URIs from the client to the File Manager presents an interesting challenge For example, where should
file:///home/chris/myfile.file be transferred to on the File Manager’s system?
Leverage and extend existing CAS method Existing CAS would have answered the above
questions with ProductTypeRepositoryPath/ProductName/VersionId/
Why should that be the only answer?
File Management: Architectural Implications (4/4)
04/11/23 ESIP-SUMMER1045
Have the concept of a Versioner interface Versioner is called by the File Manager before
the product is transferred from the client to the File Manager system (or by the system to itself) Versioner uses the Product metadata, and the
original product references to generate data store URIs that tell the DataTransfer implementation where to physically transfer the files for a particular Product
Where did I get the name Versioner? Don’t ask!
File Management: Architectural Conclusion
04/11/23 ESIP-SUMMER1046
So, how do we put all these different generic interfaces together?
Well, something like the following A File Manager has…
A repository manager that controls how data is stored A catalog, to store metadata and product instance
information to A set of Versioners that are associated with Product Types
in order to figure out how to generate the reference data store URIs for a particular product
A Data Transferer that moves a Product’s file from the client to the File Manager using the source URIs and the data store URIs
An external interface to it (e.g., XML-RPC, WebDAV, etc.)
File Management Architecture
04/11/23 ESIP-SUMMER1047
Workflow Management
04/11/23 ESIP-SUMMER1048
Modeling, executing and monitoring groups of one or more Workflow Tasks
Tasks could be A script file A java process An external command A call to a web service Many more…
Workflow
04/11/23 ESIP-SUMMER1049
Workflow has many definitions It’s typically represented as a graph
In traditional science data pipeline systems, this graph is constrained to be a sequential set of process nodes
Taxonomy of Workflow Management Systems http://www.gridbus.org/reports/GridWorkflowTaxonomy.pdf
Workflow Patterns http://is.tm.tue.nl/research/patterns/
Task A
Task B
Task C
Task D
Task E
04/11/23 ESIP-SUMMER1050
Workflow Management: Core Terminology
Workflow Management Features
04/11/23 ESIP-SUMMER1051
Workflow should be represented as a graph. This will allow for true parallelism.
Workflow Management should support identified workflow patterns especially control-flow. The current level of support for control-flow has to a large
extent been relegated to tasks. A collection of tasks is associated with a product ingestion and there is only a priority to sort out the order of execution.
Data-flow should be captured. The workflow should be able to minimally hook
together input and output streams between tasks. Workflow need not have any interaction with a
database What if I want to persist a workflow in XML? Or as a flat file, or some other lightweight format
Workflow Management: Architectural Implications (1/3)
04/11/23 ESIP-SUMMER1052
Workflow Repositories Places to go and fetch and “abstract” workflow
description from Places to store running “workflow instance”
information Workflow Execution Engines
Give it an abstract workflow, and let it rip Turns an abstract workflow into a “Workflow Instance”
Should allow monitoring of the workflow instance System interface
Associate abstract workflows with “events” This way, workflows can be tied to things other than
just product ingestion
Workflow Management: Architectural Implications (2/3)
04/11/23 ESIP-SUMMER1053
The Workflow Repository need not be a relational Database It could be a flat file A (set of) XML file(s) An object database Factories create Workflow Repositories, which create
Workflows Tasks are associated with “Workflows”, not
“Product Types” This decouples workflow from the File Management
aspects of the CAS Conditions can be pre, or post
Workflow Management: Architectural Implications (3/3)
04/11/23 ESIP-SUMMER1054
Workflows are interfaces They could be backed by a (directed graph), or by an
iterator (i.e., a sequential pipeline) or by a HashMap Workflow Tasks have clearly separated out
dynamic and static metadata, and they can share metadata Dynamic metadata is passed via the Workflow Engine
between all the tasks in a workflow They can all read/write to it
Static metadata is associated with each workflow task
Workflow Execution
04/11/23 ESIP-SUMMER1055
Once you’ve got a Workflow, how do you execute it and turn it into a Workflow Instance?
You hand it off to a Workflow Engine Workflow Engine manages:
A configurable, extensible thread pool “Worker Threads” are used to process the Workflow
Instance they are each handed A queue of worker threads if they aren’t any
available workers in the thread pool to process a Workflow
Monitoring which Workers are handling which Workflow Instances, and the state and status of each Workflow Instance
Workflow Engines execute instances of Workflows
What is the external interface to the system?
04/11/23 ESIP-SUMMER1056
Event-based Event names come into the Workflow Manager The Workflow Manager looks up any Workflows
associated with the event name The Workflow Manager then calls the Workflow
Repository to obtain representations of the Workflow
The Workflow Manager then hands off Workflow representations to the Workflow Engine for execution
Current implementation uses XML-RPC, but it’s an interface, so it could use REST/HTTP/SOAP/etc.
Workflow Management: Architectural Conclusion
04/11/23 ESIP-SUMMER1057
So, how do we put all of these things together? Well, something like:
A Workflow Manager hasA Workflow Repository to obtain abstract
Workflow descriptions fromA Workflow Instance Repository to
store/persist running Workflow Instance information
A Workflow Engines to execute Workflows on
An external interface
Workflow Management Architecture
04/11/23 ESIP-SUMMER1058
Resource Management
04/11/23 ESIP-SUMMER1059
Resource management is the notion borne from the high performance computing and grid communities dealing with the execution of Jobs, and Tasks across heterogeneous computing resources including grids, clouds, clusters and individual Resource Nodes
Involves monitoring the underlying hardware resources Disk space, CPU utilization, Load, etc., across the
nodes Should provide an easy to use façade for the
workflow manager to pipeline jobs to
04/11/23 ESIP-SUMMER1060
Resource Management: Core Terminology
Resource Management: Definitions
04/11/23 ESIP-SUMMER1061
Job - an abstract representation of an execution unit, that stores information about an underlying program, or execution that must be run on some hardware node ,including information about the Job Input that the Job requires, information about the job load, and the queue that the job should be submitted to.
Job Input - an abstract representation of the input that a Job requires.
Job Spec - a complete specification of a Job, including its Job Input, and the Job definition itself.
Job Instance - the physical code that performs the underlying job execution.
Resource Node - an available execution node that a Job is sent to by the Resource Manager.
Resource Management: Features
04/11/23 ESIP-SUMMER1062
Easy execution - of compute jobs to heterogeneous computing resources, with very different underlying specifications: large and small disks, network file storage, storage area networks, and exotic processor architectures.
Cluster Management Pluggability - the ability to plug into existing batch submission systems (e.g., Torque, LSF), and resource monitoring (e.g., Ganglia).
Scalability – The Resource Manager uses the popular client-server paradigm, allowing new Resource Manager servers to be instantiated, as needed, without affecting the Resource Manager clients, and vice-versa.
Communication over lightweight, standard protocols – The Resource Manager uses XML-RPC, as its main external interface, between Resource Manager client and server. XML-RPC, the little brother of SOAP, is fast, extensible, and uses the underlying HTTP protocol for data transfer.
Wrapping - the use of wrappers to insulate the internal code of Job Instances from their external interfaces allows a variety of different popular programming languages (e.g., shell scripting, Java, Python, Perl, Ruby) to be used to implement the actual job.
Scheduler Pluggability - the ability to define the underlying job scheduling policy.
XML-based job description - allows for existing XML-based editing tools to visualize the different job properties, and for standard job definitions, and interchange.
Resource Management: Architectural Implications (1/2)
04/11/23 ESIP-SUMMER1063
Batch Manager responsible for sending Jobs to the actual nodes that the
Resource Manager determines that it is appropriate that they execute on.
Job Queue responsible for queuing up Jobs when the Resource Manager
determines that there are no Resource Nodes available to execute the Job on
persistence, and queuing policy (e.g., LRU, FIFO) are all dealt with by this extension point.
Job Repository responsible for actual persistence of a Job, throughout its
lifecycle in the Resource Manager retrieve Job and Job Spec information whether the Job is queued,
or executing, or finished.
Resource Management: Architectural Implications (2/2)
04/11/23 ESIP-SUMMER1064
Monitor responsible for monitoring the execution of a Job once it
has been sent to a Resource Node by the Batch Manager extension point.
Job Scheduler responsible for determining the availability of
underlying Resource Nodes determining the policy for pulling Jobs off of the Job
Queue to schedule for execution, interacting with the Job Repository, the Batch Manager, the Monitor
System provides the external interface to the Resource
Manager services.
Resource Management: Architectural Conclusion
04/11/23 ESIP-SUMMER1065
So, how do we put all of these things together? Well, something like:
A Resource Manager has A Job Queue to hold Jobs that cannot be run for
scheduling or resource constraint reasons and to provide Jobs to the Scheduler.
A Scheduler to implement a policy for determining which Jobs are ready to run.
A Monitor to determine the current state (disk space, availability) of Resource Nodes in the system to run Jobs on.
A Batch Submission system to physically execute jobs on remote Resource Nodes.
A Job Repository to record the current state of a running Job.
An external interface
04/11/23 ESIP-SUMMER1066
Resource Management Architecture
Conclusions
Domain specific software engineering, reference architectures, etc. Keep it abstract, focus on terminology,
vocabulary and components/functions and styles
Cull reusable requirements, information, etc., from prior experience Need to have built it before to really get it
Lots of examples to draw from!
67