Upload
mongodb
View
908
Download
1
Embed Size (px)
DESCRIPTION
The DOHMH (NYC Department of Mental Health and Hygiene) uses MongoDB for their internal document management system called DocSpace. This presentation outlines -the system -how they came to adopt MongoDB -migrating from a relational DB to a document-oriented one -the advantages and disadvantages we’ve encountered and how we have managed them -Next steps with MongoDB
Citation preview
Why EDP chose
Artyom Diky
William Biesty
Mark Velez
Agenda
• Who are we?
• Evolution of Document Management
• File system to relational DB
• Relational to document-oriented DB
• Paper to electronic
• Advantages and Challenges
• Questions?
Who Are We?
• New York City Department of Health and Mental Hygiene
• Environmental Health Services (EHS)
• Environmental Disease Prevention (EDP) • Lead Poisoning Prevention Program (LPPP)
• MIS Unit we are here
• We support many programs within EDP
• Who are our stakeholders? • Inspectors
• Researchers
• Clinical Staff
• Lawyers (FOIL)
Evolution of Document Management Paper
• A lot of legal documents on paper
• Historic - from the '70s and up
• Current (ongoing)
• Problems with Paper
• Time and Labor Intensive • Locate, Copy, Redact, Copy, Mail (Repeat….)
• Storage Space
• Disaster Recovery
Evolution of Document Management eFiles
• VB6
• Scanning utilities
• File-system based storage
• Millions of files
• Identifiers based on child ID
Evolution of Document Management eFiles Issues
• Technical • VB6 phased out
• Outdated 3rd party tools changed API
• License expired
• Security • Documents have been redacted permanently
• No access control to private information
• Scalability • New document types
• New indexing (tagging) mechanisms for search
Evolution of Document Management
• Need for better document management
• Paperless offices mandate
• Expand searchable attributes and document text
• Update technology
• Improved security
• HIPAA compliance
• Platform for future applications
File System to Relational DB
• Challenges:
• 1M+ historical documents as image files
• Need for document metadata
• Various and evolving schemas
• Security
• Updates and migration
• Fail-safe storage
Technologies
• We use Microsoft technologies
• SQL Server
• .NET
• We are a small team that develop and support dozens of data collection apps (forms)
• Risk assessments
• Inspection Reports
• Research
• Case Management
Example Documents event_date child ID document_type
me_num
File System to Relational DB
FileStream • MSSQL 2008
o Data storage with FileStream
o Metadata with Entity-Attribute-Value
sql_variant
o Data-driven application design
• Rich service-oriented API through WCF
• Search engine
• Added features
o Versioning
Change and revert
DocSpace SQL Architecture
Limitations of Relational Model
• Need faster development cycle
• Double effort for development and maintenance
• On application and database level
• Document definition (metadata) first, content later
• Changing schema
• Rigid document structure • Not amenable to change
• No support for non-primitive values
Effects on Development Cycle
• SQL Waterfall-like approach
• Fully develop requirements before implementation • Gotta get the schema right to avoid hassle
• Change discouraged
• MongoDB Rapid Application Development
• Prototyping
• Change accommodated
Document Management System Done Right
• Faster development cycles
• No translation of complex document structure into relational model
• Application driven schema
• Document content first, metadata later
• Flexible document structure driven by user requirements
• GridFS for large documents
DocSpace MongoDB Architecture
Case Study - Traffic Fatalities
• A study of traffic-related fatalities in NYC
• Injury Surveillance and Prevention
• Offline data collection
• 330+ data points
• Multiple weekly changes to schema
o Add/remove fields
o Value types
• Developed in 500 hrs (3 months)
• 1 intermediate developer, 1 novice
Evolving Use of MongoDB
• Single Node with Database Security
• Nightly Dump for Backup Archiving
• Master – Slave Nodes
• Replica Sets – 3 Nodes
• Distributed across Metropolitan Area Network
• Bare Iron Primary, VMware ESX and Hyper-V VM Secondaries
•Hurricane Sandy – No downtime, one node failed
Thank you!
Questions
Contact Us
William Biesty, Database Administrator, [email protected]
Art Diky, Software Engineer, [email protected]
Mark Velez, Software Engineer, [email protected]
nyc.gov/health