Memops
Data modelling and automatic code generation
Edinburgh 9 September 2008
Memops - main points
■ Code generation frameworkCode generation framework
■ Data access subroutine librariesData access subroutine libraries
■ Fully automatic code generation from modelFully automatic code generation from model
■ Several programming languages in parallelSeveral programming languages in parallel
■ Precise, detailed, validated dataPrecise, detailed, validated data
Memops
● IntroductionIntroduction● Code generationCode generation● Generated librariesGenerated libraries● Applications of MemopsApplications of Memops
The CCPN Project
■ CCollaborative ollaborative CComputing omputing PProject for roject for NNMRMR
■ Since 1999Since 1999
■ Unifying platform for NMR software Unifying platform for NMR software similar to CCP4 for X-ray crystallographysimilar to CCP4 for X-ray crystallography
■ Community-based, open-source, software Community-based, open-source, software developmentdevelopment
■ Code generation, data model, applications, meetingsCode generation, data model, applications, meetings
NMR Structural Biology Pipeline
SamplePreparation
NMRMachine
StructureCalculation
DataProcessing
SpectrumAnalysis
RepositoryDatabase
Slow, complex,interactive
Native Anarchy
Convert
Task1
Task2
ConvertT
ask2
Tas
k1
Task1
Convert
Task3
Conve
rt
Task3
Convert
Task3
With Data Standard
DataStandard
Convert
Task1
Convert
Task2
Task2
Tas
k1
Conve
rt
Task1
Convert
Task3
Conve
rt
Task3
Convert
Task3
Data standard - objectives
● Lossless data transfer between programsLossless data transfer between programs- different approaches and architectures- different approaches and architectures
● All data needed for pipeline softwareAll data needed for pipeline software■ Creating data, not analysing end resultsCreating data, not analysing end results■ Intermediate results neededIntermediate results needed■ Comprehensive, detailed, complexComprehensive, detailed, complex
● Completeness, integrity of changing dataCompleteness, integrity of changing data
● Precisely defined standardPrecisely defined standard■ A single central descriptionA single central description■ Validation directly against standardValidation directly against standard
■ Standard API, no stable formatStandard API, no stable format● easier to maintain as model changeseasier to maintain as model changes
■ Abstract data model Abstract data model ● Exact correspondence to APIsExact correspondence to APIs
■ API implementations for several languagesAPI implementations for several languages
■ Transparent access to XML Transparent access to XML oror DB storage DB storage
■ Complete validation of model rules and Complete validation of model rules and constraintsconstraints
CCPN approach
Memops
● IntroductionIntroduction● Code generationCode generation● Generated librariesGenerated libraries● Applications of MemopsApplications of Memops
■ Model will change over timeModel will change over time● Several parallel implementationsSeveral parallel implementations● Synchronisation between APIs and modelSynchronisation between APIs and model● Maintenance and debuggingMaintenance and debugging● Resources are limitedResources are limited
■ Automatic Code GenerationAutomatic Code Generation● Write and debug once and for allWrite and debug once and for all● Any domain, from Astrophysics to ZoologyAny domain, from Astrophysics to Zoology● Quick and simple to extend modelQuick and simple to extend model
■ E.g. Application-specific packagesE.g. Application-specific packages
Automatic Code generation
Code Generation Framework
DomainExperts
MEMOPSframework
SoftwareDevelopers
User
Docum
entationA
pplicationD
eposition
APIs
Python
Java
C
Storage
SQL
XML
Handcoded (< 1%)
UML Model
Package 1
Package 2
Package 3
Autogeneration
Wrappers
Code Generation
ObjectDomain
UML data
edit UML
MetaModelIn-Memory Model
Python objects
On-disk modelXML file
API codeSchemasMappingsetc.
Autogeneration
CCPN codeOff-the-shelffiles
CCPN generated
Legend:
Export
API generator
ModelTraverseTextWriter
ApiGenPyLanguage
PyFileApiGen
FileApiGenPyApiGenPyType
• Written in Python• Modular• Different generators share code
Memops
● IntroductionIntroduction● Code generationCode generation● Generated librariesGenerated libraries● Applications of MemopsApplications of Memops
Model features
■ PackagesPackages to subdivide model, code, and data files to subdivide model, code, and data files
■ ObjectsObjects. Unique context, compare-by-identity. Unique context, compare-by-identity
■ Complex data typesComplex data types. Different contexts, . Different contexts, compare-by-valuecompare-by-value
■ Simple data typesSimple data types, , PositiveInt, enumerations, …PositiveInt, enumerations, …
■ Attributes and linksAttributes and links::● Cardinality, frozen/modifiable, derivedCardinality, frozen/modifiable, derived● Unique/ordered collections (sets, lists, unique lists)Unique/ordered collections (sets, lists, unique lists)
■ Ad-hocAd-hoc constraintsconstraints on attributes, simple and on attributes, simple and complex datatypes, and objects.complex datatypes, and objects.
Molstructure model package
*
** *
*
1
StructureEnsemble
+ensembleId: Int+atomNamingSystem: Line+resNamingSystem: Line
+getEnsembleValidations()
Chain
+code: Line
+getChain()
Model
+serial: Int+name: Line+details: Text
Coord
+altLocationCode: Line = +x: Float+y: Float+z: Float
+bFactor: Float = 0.0+occupancy: Float = 1.0
Residue
+seqId: Int+seqCode: Int
+seqInsertCode: Line =
+getResidue()
Atom
+name: Word+elementSymbol: Word
+getAtom()+getElementSymbol()+getChemAtom()
ccp.molecule.MolSystem.Chain
ccp.molecule.MolSystem.Residue
ccp.molecule.MolSystem.Atom
ccp.molecule.ChemComp.ChemAtom
+coordChains
1*
1
1
1
1
*
1
*
11
1
*1
1
*
11
ccp.molecule.MolSystem.MolSystem
+code: Word+name: Text+keywords: Line...:
1
CCPN APIs
■ AApplication pplication PProgramming rogramming IInterfacenterface● Object orientedObject oriented● Data accessed in memory as if stored in the data Data accessed in memory as if stored in the data
modelmodel
■ Implementations come with:Implementations come with:● Integrated, transparent I/O (file or database)Integrated, transparent I/O (file or database)● Complete validity checkingComplete validity checking● Protection against casual change (data Protection against casual change (data
encapsulation) encapsulation) ● Versioning and backwards compatibilityVersioning and backwards compatibility● Event notifier systemEvent notifier system● Slot for application-specific dataSlot for application-specific data
Science code
User Interface
Utility functions
Python+XML at runtime
Python API
XML I/O codeXML I/O mappings
Data StorageXML files
User application
Data get, set. Validity check
Generic XML read/write
User data in CCPN XMLformat
What to do for which element
CCPN codeOff-the-shelfApplication codefiles
CCPN generated
Legend:
XML parser
Java+DB at runtime
CCPN code Off-the-shelfApplication code files
CCPN generated
Legend:
HQL
Science code
User Interface
Utility functions
Java API
HibernateHibernate mappings
Database
Presentation layer
Database Schema
Hibernate
Optional
Custom queries(Hibernate Query
Language)
Now Available
■ Version 2.0 just releasedVersion 2.0 just released
■ Python+XML, Java+XML, C+XML Python+XML, Java+XML, C+XML Java+DB (with Hibernate)Java+DB (with Hibernate)
■ Available under GPL licenseAvailable under GPL licensefrom Sourceforge or www.ccpn.ac.ukfrom Sourceforge or www.ccpn.ac.uk
■ CCPN Data Standard:CCPN Data Standard:● NMR, Macromolecules, LIMSNMR, Macromolecules, LIMS● 46 packages46 packages● 552 classes and data types552 classes and data types● Python+XML implementation Python+XML implementation
800,000+ lines of code800,000+ lines of code
Memops
● IntroductionIntroduction● Code generationCode generation● Generated librariesGenerated libraries● Applications of MemopsApplications of Memops
CcpNmr Suite
■ AnalysisAnalysis ● Interactive NMR analysisInteractive NMR analysis
■ FormatConverterFormatConverter● Convert between 30+ NMR and structure formatsConvert between 30+ NMR and structure formats
■ Built on top of CCPN model (Python+XML)Built on top of CCPN model (Python+XML)
■ Version 2.0 releasedVersion 2.0 released
■ Widely used in macromlecular NMRWidely used in macromlecular NMR
CcpNmr Analysis
ExtendNMR NMR pipeline
■ Integrated macromolecular NMR pipelineIntegrated macromolecular NMR pipeline- from sample to structure- from sample to structure
■ Pre-existing programs from 8 groupsPre-existing programs from 8 groups
■ In-memory conversion to internal data In-memory conversion to internal data structuresstructures
■ Integrated versions released:Integrated versions released:● ARIA (NMR structure generation)ARIA (NMR structure generation)● Bruker TOPSPIN, Manufacturers Bruker TOPSPIN, Manufacturers
processing/analysis packageprocessing/analysis package
BIOXDM
■ Software pipeline for on-synchrotron Software pipeline for on-synchrotron crystallographycrystallography● Exploit new technology (Exploit new technology ( goniometers) goniometers)● Experiment optimisation, acquisition, and on-line Experiment optimisation, acquisition, and on-line
processingprocessing
■ Independent data model, with Memops Independent data model, with Memops machinerymachinery
■ Java+DB implementation for runtime Java+DB implementation for runtime concurrent accessconcurrent access
EUROCarbDB
■ Distributed deposition database Distributed deposition database ● Glycobiology and glycomics Glycobiology and glycomics ● NMR, MS, HPLCNMR, MS, HPLC and topology and topology
■ Java. Database storage using HibernateJava. Database storage using Hibernate
■ CCPN model Java+DB implementation CCPN model Java+DB implementation slot in as-isslot in as-is
Funding acknowledgementsFunding acknowledgements
■ BBSRC CCPN grants
■ European Union grants● EXTEND-NMR, EU-NMR, NMR-Life, NMRQUAL, and
TEMBLOR contracts
■ Industry support● AstraZeneca, Dupont Pharma (now BMS), Genentech,
GlaxoSmithKline
● Peter Keller (BIOXDM) thanks Synchrotron ‘Soleil’, the Global Phasing Consortium and EU FP6 ‘BIOXHIT’
People
■ Authors: Authors: Prof. Ernest Laue, Wayne Boucher, Rasmus Fogh, Tim Prof. Ernest Laue, Wayne Boucher, Rasmus Fogh, Tim Stevens, John Ionides, Wim Vranken (EBI), Peter Keller Stevens, John Ionides, Wim Vranken (EBI), Peter Keller (Global Phasing)(Global Phasing)
■ Collaborators at U. Cambridge: Collaborators at U. Cambridge: Dan O’Donovan, Wolfgang Rieping, Alan da Silva, Darima Dan O’Donovan, Wolfgang Rieping, Alan da Silva, Darima LamazhapovaLamazhapova
■ Collaborators at EBI (MSD), Hinxton: Collaborators at EBI (MSD), Hinxton: Kim Henrick, Anne Pajon, Chris PenkettKim Henrick, Anne Pajon, Chris Penkett
■ Special thanks to: Special thanks to: Bruker Biospin GmbH (TOPSPIN), Michael Nilges (ARIA), Bruker Biospin GmbH (TOPSPIN), Michael Nilges (ARIA), Bas Leeflang (EUROCarbDB; FP6 contract RIDS-CT-2004-Bas Leeflang (EUROCarbDB; FP6 contract RIDS-CT-2004-0119501195
ENDEND
Overview
● PackagesPackages● The Implementation packageThe Implementation package
■ ObjectsObjects■ DataTypes and DataObjTypesDataTypes and DataObjTypes
● Access controlAccess control
ARIA – structure generation from NMR dataARIA – structure generation from NMR data
Custom conversionARIA Data Model
CCPNData Model
CCPNXML
Application
ARIAXML
■ ARIA importsARIA imports● Peak ListsPeak Lists● ConstraintsConstraints● SequencesSequences● Chemical shiftsChemical shifts
■ ARIA exportsARIA exports● Peak AssignmentsPeak Assignments● Filtered ConstraintsFiltered Constraints● ViolationsViolations● StructuresStructures
API functions
■ ‘‘get’ and ‘set’ get’ and ‘set’ (Attributes and links)(Attributes and links)
■ ‘‘add’ and ‘remove’ add’ and ‘remove’ (Collection attributes and links)(Collection attributes and links)
■ ‘‘sortedsorted’ (Unordered collection links)’ (Unordered collection links)■ ‘‘findFirst’ and ‘findAll’ findFirst’ and ‘findAll’ (Collection links)(Collection links)
● Simple filtering (attribute == value)Simple filtering (attribute == value)
■ create and ‘new’ create and ‘new’ (Objects)(Objects)● Normal and ‘factory function’ object creationNormal and ‘factory function’ object creation
■ delete delete (Objects)(Objects)● ‘‘Delete’ function – cascades to objects rendered invalid by deletionDelete’ function – cascades to objects rendered invalid by deletion
■ checkValid, checkAllValid checkValid, checkAllValid (Objects)(Objects)
■ API classes are strongly coupled. API classes are strongly coupled. For efficiency reasons object-to-object links are two-way.For efficiency reasons object-to-object links are two-way.
FormatConverter - The NMR Translator
CCPNData Model
Peaks Chemical shifts Acquisition parameters
XEasy NmrView XEasy NmrView Bruker Varian... ...
Generic peak converter
Generic chemical shift converter
Generic acquisition parameters converter
Processing parameters
XEasy XEasy NmrView NMRPipeAzara... ...NmrView
Fo
rmat
sp
ecif
ic r
ead
ers
Dat
a m
od
e l e
ntr
yF
orm
at s
pec
ific
wri
ters
Chemical shiftsPeaks
ExtendNMR: ARIA
■ Structure generation from macromolecular Structure generation from macromolecular NMR data, ambiguous distance constraintsNMR data, ambiguous distance constraints
■ One of two leading programsOne of two leading programs
■ Python and scripts, with CNS dynamics Python and scripts, with CNS dynamics engineengine
■ All input and output integrated to CCPN All input and output integrated to CCPN standardstandard
ARIA: CCPN object selection
ExtendNMR: Bruker TOPSPIN
■ NMR processing program of major NMR NMR processing program of major NMR instrument company instrument company
■ Java. In-memory conversion to CCPN Java. In-memory conversion to CCPN Java+XML implementationJava+XML implementation
■ CCPN output in current TOPSPIN release,CCPN output in current TOPSPIN release,Expanded in upcoming release.Expanded in upcoming release.
Data Model v. Data Format
Atom_ID elementName Bond_ID Atom_ID Bond_ID bondOrder
Relational Database :
Abstract model (UML) :
XML :<Atom ID=“AT1” elementName=“C”> <Bond ID=“BD1” bondOrder=“1.0”> <BondList> <Atom1 IDREF=“AT1”/> <Bond IDREF=“BD1”/> <Atom2 IDREF=“AT2/> . </Bond> . </BondList></Atom>
Atom BondAtom_Bond_Connect
Atom+elementName: String = C
Bond+bondOrder: Float = 1.0*
2 +bonds
+atoms
Packages
ChemElementChemComp
Molecule
MolStructure
MolSystem
memops.AccessControl
memops.Implementation
Packages
■ Partition model, code, and dataPartition model, code, and data■ Import each otherImport each other■ Can be omittedCan be omitted■ All import Implementation and All import Implementation and
AccessControlAccessControl
■ Each have a TopObjectEach have a TopObject■ No links between data from rival Topbjects No links between data from rival Topbjects
(different e(different extentsxtents of data) of data)
Root and TopObjects
ccp.molecule.Molecule.Molecule
ccp.molecule.Molecule.MolResidue
1
*
ccp.molecule.ChemComp.ChemComp
1
ccp.molecule.ChemComp.ChemAtom
ccp.molecule.ChemComp.AbstractChemAtom
+chemAtoms
1
*
ccp.molecule.ChemComp.ChemBond
+chemAtoms
*2
*
1
memops.Implementation.MemopsRoot
+name: Word = ccpProject+override: Boolean = False+currentUserId: Word = user
+newGuid()+getPackageLocator()
1
*
1
*+currentMolecule+currentChemComp
memops.Implementation.TopObject
+guid: Line
+getPackageLocator()
*
1
TopObjects
■ One in every packageOne in every package● Ultimate parent to all objects in packageUltimate parent to all objects in package
■ Have globally unique identifier (‘guid’)Have globally unique identifier (‘guid’)■ currentXyz links from rootcurrentXyz links from root■ Links can constrain links between descendantsLinks can constrain links between descendants
■ In file implementations:In file implementations:● Hold links to storage and backup locationsHold links to storage and backup locations● Live in Implementation as almost empty shellLive in Implementation as almost empty shell
Overview
● PackagesPackages● The Implementation packageThe Implementation package
■ ObjectsObjects■ DataTypes and DataObjTypesDataTypes and DataObjTypes
● Access controlAccess control
CcpNmr AnalysisCcpNmr Analysis
■ NMR Assignment ProgramNMR Assignment Program● Inspired by ANSIG and SparkyInspired by ANSIG and Sparky
● Demonstrates CCPN approachDemonstrates CCPN approach
● Modern interface and scriptingModern interface and scripting
● Scalable and extensibleScalable and extensible
■ Operating SystemsOperating Systems● Linux, Sun, SGI, OSX, WindowsLinux, Sun, SGI, OSX, Windows
■ LanguagesLanguages● PythonPython
■ Data model interactionData model interaction
■ Tk Graphical interfaceTk Graphical interface
■ ScriptingScripting
● CC■ OpenGL/Tk contoursOpenGL/Tk contours
■ Structure displayStructure display
■ Mathematical operationsMathematical operations
Implementation Package
■ Model and Code:Model and Code:● Supertypes that define all objectsSupertypes that define all objects
■ Objects Objects ■ DataTypes DataTypes ■ DataObjTypsDataObjTyps
● Basic data typesBasic data types
■ Data – how to access the real data:Data – how to access the real data:● Data location pointersData location pointers● Current-package pointersCurrent-package pointers● Implementation data are Implementation data are notnot part of the data set, and part of the data set, and
are are notnot in the database. in the database.● Represent view or session?Represent view or session?
Data Location
FileStorageObject
+isLoaded: Boolean+isModified: Boolean+isReading: Boolean+isModifiable: Boolean = True+createdBy: Word+lastUnlockedBy: Word
+setIsModifiable()+touch()+saveTo(repository)+removeFrom(repository)+save()+backup()
MemopsRoot
+name: Word = ccpProject+override: Boolean = False+currentUserId: Word = user
+newGuid()+getPackageLocator()
Repository
+name: Line+format: StorageFormat = xml+url: Url
+getFileLocation(packageName)
TopObject
+guid: Line
+getPackageLocator()
PackageLocator
+targetName: Word = any
+repositories
1
*
{ordered}
+activeRepositories
*
*
1
+backedUp +backup
*
{ordered}
+stored +repositories
* 1..*
1
*1 1
Objects and their Supertypes
DataObject
+applicationData: ApplicationData
DbMemopsRoot
DbTopObject
FileMemopsRoot
+saveModified()+saveAll()+refreshTopObjects(packageName)+backupAll()
+importData(filePath)
FileStorageObject
+isLoaded: Boolean
+isModified: Boolean+isReading: Boolean+isModifiable: Boolean = True+createdBy: Word+lastUnlockedBy: Word
+setIsModifiable()+touch()
+saveTo(repository)+removeFrom(repository)+save()+backup()
FileTopObject
+loadFrom(repository)+load()
+restore()
ImplementationObject
MemopsObject
+isDeleted: Boolean
+getExpandedKey()
MemopsRoot
+name: Word = ccpProject+override: Boolean = False+currentUserId: Word = user
+newGuid()+getPackageLocator()
TopObject
+guid: Line
+getPackageLocator()
ComplexDataType
«DataType»
+className: Word+packageName: Word+packageShortName: Word
+qualifiedName: Line+inConstructor: Boolean
+getQualifiedName()
ccp.molecule.Molecule.Molecule
ccp.molecule.Molecule.MolResidue
+topObject1
+root1
1
*
1
1*
+currentMolecule
1
*
Simple Data Types
Boolean DataType
Int DataType
Float DataType
String DataType
Line DataType
Text DataType
Long DataType
Double DataType
Word DataType
PositiveInt DataType
SingleLine DataType
NonNegativeInt DataType
Dict DataType
DateTime DataType
StringKeyDict DataType
Any DataType
Token DataType
NonNegativeFloat DataType
FloatRatio DataType
PositiveFloat DataType
SpacelessString DataType
LongWord DataType
PositiveDouble DataType
NonNegativeDouble DataType
UrlProtocol DataType
Complex Data Types
ComplexDataType«DataType»
+className: Word+packageName: Word+packageShortName: Word+qualifiedName: Line+inConstructor: Boolean
+getQualifiedName()
MemopsDataTypeObject«DataType»
+override: Boolean
+endOverride()
Url«DataType»
+protocol: UrlProtocol = file+user: Line+password: Line+host: Line+path: PathString+port: Int+dataLocation: PathString
+getDataLocation()
AppDataBoolean«DataType»
+value: Boolean
AppDataDouble«DataType»
+value: Double
AppDataFloat«DataType»
+value: Float
AppDataInt«DataType»
+value: Int
AppDataLong«DataType»
+value: Long
AppDataString«DataType»
+value: String
ApplicationData«DataType»
+application: Line+keyword: Line