Upload
ian-clymer
View
223
Download
1
Tags:
Embed Size (px)
Citation preview
File Management
Chris A. Mattmann
OODT Component Working Group
Apr 18, 2023 FILE-MGMT CAM-2
What is File Management?
• Managing the locations and ancillary information about files, and collections of files– Ancillary information is metadata
• What’s a product?– A collection of some set of files, and/or collections
of files• So, you could have collections of other collections
– Along with metadata about the product
Apr 18, 2023 FILE-MGMT CAM-3
The state of things
• The existing CAS system does file management– For past missions and projects, it’s done the job well
• CAS implementation– Needs an update, and overall refactoring to allow for
modularity and separation of concerns, and general technology and architectural updates
• In particular, a couple of new requirements and drivers for projects– Suggested some ways to extend and improve the CAS to
satisfy the new requirements and drivers
• What are these new requirements and drivers?
Apr 18, 2023 FILE-MGMT CAM-4
New Requirements and Drivers
• Persisting archived files using dynamic metadata and flexible, adaptable policies based on product types– rather than the monolithic and inflexible existing method of
ProductTypeRepository/ProductName/ProductVersion/ as the filesystem location to store products for all product types.
• Clearly separating out the Workflow aspects of the File Manager, from Product ingestion, and flexibly supporting association of Workflows and their subsequent Tasks with any event, not only ingestion.
Apr 18, 2023 FILE-MGMT CAM-5
New Requirements and Drivers
• Leverage existing transactional models such as Java's Transaction API to support transactional management rather than building our own API.
• If we do use any database communication, then making sure that all DB communication is dealt with using standard, available, existing db pooling APIs such as commons-dbcp , available from Apache .
Apr 18, 2023 FILE-MGMT CAM-6
New Requirements and Drivers
• Clearly separating out the administrative portions of policy management from the existing webapp, and distinguishing what pieces of the webapp are user-centric, and what are administrative-centric.
• Supporting heirarchical product structures, such as nested directories that contain many sub-directories, and sub-directories of those sub-directories, with files strewn about at all levels– rather than only supporting the existing method of flat
product structures, where all files in a product are at the same tree level.
Apr 18, 2023 FILE-MGMT CAM-7
New Requirements and Drivers
• Support metadata extraction based on product type or mime-type
• Support dynamic product types. The file management component should not need to know about every product type a priori
Apr 18, 2023 FILE-MGMT CAM-8
New Requirements and Drivers
• You can read/add to the list– Available at:
http://oodt.jpl.nasa.gov/wiki/display/oodt/File+Management
• Please, speak your mind!
Apr 18, 2023 FILE-MGMT CAM-9
File Management: Architectural implications
• Managing files– Data Store: follow the typical repository pattern– Manage information about Products, Product Types, and
References to products
• Managing metadata– Metadata Store: follow the typical registry pattern– Manage product Metadata
• Key/Value pairs
• Separate out the data store and metadata store– This allows data and metadata to be managed
independently
Apr 18, 2023 FILE-MGMT CAM-10
Data Store
+addProduct(Product product):Product+addProductReferences(String productId, String productTypeId, List refs)+addProductType(ProductType productType)+modifyProduct(Product product):Product+modifyProductType(ProductType productType):ProductType+removeProduct(String productId)+removeProductType(String productTypeId)+getProductById(String productId):Product+getProductByName(String productName):Product+getProductReferences(String productId, String productTypeId):List+getProducts():List+getProductsByProductTypeId(String productTypeId):List+getProductsGroupedByProductType():Map+getProductTypeById(String productTypeId):ProductType+getProductTypeByName(java.lang.String productTypeName):ProductType +getProductTypes():List
DataStore
«Interface»
+getProductId():String+getProductName():String+getProductReferences():List+getProductStructure():String+getProductType():ProductType +getTransferStatus():String+setProductId(String productId)+setProductName(String productName)+setProductReferences(List references)+setProductStructure(String productStructure)+setProductType(ProductType productType)+setTransferStatus(String transferStatus)
Product
«Object»
+getDescription():String+getName():String+getProductRepositoryPath():String+getProductTypeId():String+getVersioner():String+setDescription(String description)+setName(String name)+setProductRepositoryPath(String Path)+setProductTypeId(String productTypeId)+setVersioner(String versioner)
ProductType
«Object»
Reference
«Object»
1 *
*
*
1
1
+getDataStoreReference():String+getOrigReference():String+setDataStoreReference(String dataStoreRef)+setOrigReference(String origReference)
+createDataStore():DataStore
DataStoreFactory
«Interface»
1
*
Apr 18, 2023 FILE-MGMT CAM-11
Metadata Store
+addMetadataElement(Element element):Element+addMetadataElementToProductType(String typeId, Element element)+modifyMetadataElement(Element element):Element+removeMetadataElement(String elementId)+removeMetadataElementFromProductType(String typeId, Element elem)+getMetadata(String productId, String productTypeId):Metadata+getMetadataElements():List+getMetadataElements(String productTypeId):List
MetadataStore
«Interface»
+createMetadataStore():MetadataStore
MetadataStoreFactory
«Interface»
1
*
+getDescription():String+getElementId():String+getElementName():String+getProps():Properties+setDescription(String description)+setElementId(String elementId)+setElementName(String elementName)+setProps(Properties props)
Element
«Object»
+getElementMap():Map+setElementMap(Map elementMap)+toXML():org.w3c.dom.Document
Metadata
«Object»
1
1
*
*
Apr 18, 2023 FILE-MGMT CAM-12
How is this different from the existing CAS?
• Separation of concerns– Anything to do with data goes into the data store package– Anything to do with metadata goes into the metadata store
package• Modularity
– Can have different backend implementations of standard interfaces for data stores and metadata stores
• Lucene as a backend for metadata, or if you prefer, traditional DB backend
– Can have multiple data stores and metadata stores per CAS• The existing CAS lumped these two capabilities
together– Was difficult to reason about how to pull them apart
Apr 18, 2023 FILE-MGMT CAM-13
What else do we need to do File Management?
• Need a way to transfer a product from the client to the File Management service– Client gives URIs of files, or collections of
files, which identify References belonging to a Product
Apr 18, 2023 FILE-MGMT CAM-14
Data Transfer Architecture
+transferProduct(Product p)
DataTransfer
«Interface»
+createDataTransfer():DataTransfer
DataTransferFactory
«Interface»
1
*
Apr 18, 2023 FILE-MGMT CAM-15
Transferring files
• How does the transfer actually occur?• You as a developer define how that happens
– Implement the transferProduct(Product p) method
– Can have many different types of data transfer• Local
– Use native system calls, or cp
• Remote– Use whatever protocol you want, XML-RPC, SOAP,
WebDAV, etc.– Don’t use CORBA or RMI: they’re sooooo last year!
Apr 18, 2023 FILE-MGMT CAM-16
Translating the URIs
• Translating the URIs from the client to the File Manager presents an interesting challenge– For example, where should
file:///home/chris/myfile.file be transferred to on the File Manager’s system?
• Leverage and extend existing CAS method– Existing CAS would have answered the above
questions with ProductTypeRepositoryPath/ProductName/VersionId/
– Why should that be the only answer?
Apr 18, 2023 FILE-MGMT CAM-17
Versioners
• Have the concept of a Versioner interface• Versioner is called by the File Manager
before the product is transferred from the client to the File Manager system– Versioner uses the Product metadata, and the
original product references to generate data store URIs that tell the DataTransfer implementation where to physically transfer the files for a particular Product
Apr 18, 2023 FILE-MGMT CAM-18
Versioner Architecture
Versioner
«Interface»
+createDataStoreReferences(Product product, Metadata metadata) +getDescription():String+getName():String+getProductRepositoryPath():String+getProductTypeId():String+getVersioner():String+setDescription(String description)+setName(String name)+setProductRepositoryPath(String Path)+setProductTypeId(String productTypeId)+setVersioner(String versioner)
ProductType
«Object»
1 1
Apr 18, 2023 FILE-MGMT CAM-19
Versioner Example
• Given an mp3 Product, with Metadata:– Mp3Artist: 50cent– Mp3Genre: rap
• And with references:– file:///home/chris/mp3s/gangsta-rap.mp3
Apr 18, 2023 FILE-MGMT CAM-20
Versioner Example
• Use a MusicVersionerpublic class MusicVersioner implements Versioner{
public void createDataStoreReferences(Product p, Metadata m) throws VersioningException{
String origUri = ((Reference)p.getReferences().get(0)).getOrigReference();
String mp3RepoPath = getRepoPath(“Mp3ProductTypeName”);
String dataStoreUri = mp3RepoPath + m.getElementMap().get(“Mp3Genre”) + “/” + m.getElementMap().get(“Mp3Artist”) + “/” + getFileName(origUri);
((Reference)p.getReferences().get(0).setDataStoreRef(dataStoreUri);
}
}
Apr 18, 2023 FILE-MGMT CAM-21
Versioner Example
• So– file:///home/chris/mp3s/gangsta-rap.mp3
• …Yields– file:///path/to/mp3/repo/rap/50cent/gangsta-
rap.mp3
Apr 18, 2023 FILE-MGMT CAM-22
The File Manager
• So, how do we put all these different generic interfaces together?
• Well, something like the following– A File Manager has…
• One or more data stores, to store data to• One or more metadata stores, to store metadata to• A set of Versioners that are associated with Product Types in
order to figure out how to generate the reference data store URIs for a particular product
• A Data Transferer that moves a Product’s file from the client to the File Manager using the source URIs and the data store URIs
• An external interface to it (e.g., XML-RPC, WebDAV, etc.)
Apr 18, 2023 FILE-MGMT CAM-23
What’s implemented so far?
• The basic components of the architecture• Several default implementations of the interfaces
– javax.sql.DataSource based implementations of DataStore and MetadataStore
• Uses Apache’s DBCP for connection pooling– Local Data Transfer using Apache’s commons-io component
that can handle heirarchical product structures, as well as flat product structures
– Several versioners, including one that versions Products using the existing CAS approach of ProductTypeRepositoryPath/ProductName/Version, along with one that versions a product’s references based on production date time
– An external interface based on Apache’s XML-RPC
Apr 18, 2023 FILE-MGMT CAM-24
What needs to be done?
• A lot!– Check out http://oodt.jpl.nasa.gov/vc/, and log in with your JPL Username
and Password. Navigate to “SVN”, and check out the cas-filemgr component.
– Modify the code– Look for bugs– Contribute!
• I find new bugs everyday– Feel free to talk to me about it– Create issues in JIRA (http://oodt.jpl.nasa.gov/jira/)
• Bug Fixes, RFIs, new features, you name it!
• Be sure to check out the apidocs– You can build these yourself by checking out cas-filemgr from our SVN
repository, and then typing: maven site– Or you can visit: http://terra.jpl.nasa.gov/~mattmann/oco/javadoc/cas-
filemgr/
Apr 18, 2023 FILE-MGMT CAM-25
Questions?