How Apollo Group Evaluted MongoDB

  • Published on

  • View

  • Download

Embed Size (px)


Brig Lamoreaux\'s of Apollo Group worked with his colleagues to put together this WP detailing their evaluation of MongoDB. He also presented at Oracle Openworld 2012 on their use case with MongoDB.


  • 1.How Apollo Group Evaluated MongoDBBrig Lamoreaux, Forward Engineering Project Lead, Apollo Group

2. New York 578 Broadway, New York, NY 10012 London 200 Aldersgate St., London EC1A US (866) 237-8815 INTL (650) 440-4474 3. How Apollo Group Evaluated MongoDBBy Brig Lamoreaux, Forward Engineering Project Lead at Apollo GroupIntroduction: A Strategic InitiativeFor most people, Apollo Group is best known as the parent company of the University of Phoenix. We educate 350,000+students a year, reinventing higher education for todays working learner.Always a leader in harnessing technology in the service of its students, Apollo Group found itself at a critical juncture in 2011.Our organization was planning a strategic initiative to create a cloud-based learning management platform. This initiativewould require significant changes to the existing IT infrastructure in order to accommodate a substantial increase in thenumber and diversity of users, data, and applications.We set out to evaluate which technologies would be best suited to support the platform. As part of this process, we evaluatedMongoDB, a NoSQL document-oriented database. We tested MongoDB in several cloud-based, on-premise, and hybrid configu-rations, under a variety of stress conditions. MongoDBs ease of use, performance, availability, and cost effectiveness exceededour expectations, leading us to choose it as one of the platforms underlying data stores. This paper describes the process andoutcomes of our assessment.The ChallengesWe faced the following challenges with our existing infrastructure while planning for the new platform: Scalability. We were unable to scale our current system to support the anticipated number of users and volume of content, which would increase significantly as we would add applications to the platform. Technology Fit. Much of the data targeted for the platform was semi-structured and thus not a natural fit with relational databases.Our ApproachThe Apollo IT Team driving the initiative knew that it was contending with an aging infrastructure. The Oracle system hadbeen in place for nearly 20 years and would have neither the flexibility nor the capacity to meet our future needs.We decided to look for a solution with a better technological and financial fit, and the team developed a short list of potentialsolutions. While we strongly favored a solution that was already in-house, we added MongoDB to the short list because ourresearch indicated that it might provide excellent query performance with less investment in software licenses and hardwarethan other solutions. However, our primary concern with MongoDB was that we had no hands-on experience with it.Apollo management tasked the Forward Engineering group within IT my team with assessing MongoDB. We respondedwith an evaluation process designed to determine in a rigorous yet time-sensitive manner whether it would suit our needs.The Purpose of This PaperOur goal in producing this paper is to help other organizations with the type of analysis we applied to MongoDB. In eightweeks, we were able to produce relevant and useful data on the behavior of MongoDB when deployed in the Amazon ElasticCompute Cloud (Amazon EC2) that we are eager to share with the community. This paper also identifies areas for additionalresearch on the behavior of MongoDB under stress conditions. 3 4. The Evaluation ProcessEach year, the Forward Engineering team evaluates dozens of new technologies for use within Apollo Group. Given this level ofexperience, we were confident in our ability to evaluate the capabilities of MongoDB and to determine whether it would meetour requirements.The ProcessOur goal was to complete the evaluation in eight weeks. We divided the evaluation into four phases of two weeks each. Eachphase had pre-defined goals.Table 1: Project Phase Schedule & ObjectivesPHASE TIMEFRAME OBJECTIVESPhase 1 2 weeks - Launch a cross-functional team of stakeholders- Agree on goals and objectives- Gather query usage data from legacy Oracle systemPhase 2 2 weeks - Develop MongoDB data model- Develop sample use case application and generic service layer to access data store- Stand up a small MongoDB serverPhase 3 2 weeks - Stand up a ve-node MongoDB deployment- Develop Apollos runbook for MongoDBPhase 4 2 weeks - Performance testPhase 0At the outset, our mission was somewhat loosely defined: to learn about MongoDB and to determine its suitability as adata store.FORM TEAM OF STAKEHOLDERSFirst, we launched a cross-functional team of stakeholders to determine objectives and to guide the project. Stakeholders in-cluded representatives from Apollos business side, the database administration group, application development, and ForwardEngineering.IDENTIFY GOALS AND OUTCOMESAfter some discussion, the Stakeholder Team decided that the evaluation process should meet the following goals: Learn how to design and deploy a large MongoDB farm, and document it in a runbook for MongoDB. Learn how to maintain and troubleshoot production-scale deployments of MongoDB, documenting it in the runbook. Determine how Apollo should organize and train its teams to support a MongoDB deployment. This meant understandingthe different roles needed, (e.g., system administrator, developer, database administrator) and the expertise required ofeach person on the team. Answer the teams questions about MongoDB (summarized in Table 2).4 5. The stakeholder team sought to answer the following questions about MongoDB to determine whether it would be a suitabledata store for the platform.Table 2: Evaluation Questions for MongoDBResiliencyIs MongoDB robust enough to be a critical component in Apollos next-generation platform? If failures occur,how does MongoDB respond?Stability MongoDB is relatively new. Is it high-quality enough to support our infrastructure withoutunexpected failures?AdaptabilityIf the data model needs to change, can this be done quickly and efciently in MongoDB?of Data ModelHow do changes to the data model impact the applications and services that consume it?Performance Does MongoDB perform well enough to serve a massive application and user base, without creating delaysand a poor user experience?What is the performance of MongoDB when deployed on Amazon EC2? Is response time acceptable? Isthroughput sufcient?CongurationHow suitable is MongoDB for a hybrid deployment with both cloud-based and on-premise components?FlexibilityTime to How long does it take to install and deploy a production MongoDB conguration?ImplementAdministrator How difcult is it to administer MongoDB, including tasks like performing backups, adding and removingFunctionality indexes, and changing out hardware?TrainingWhat current and ongoing training do Apollo operations staff and developers need if we adopt MongoDB?Data MigrationHow should we migrate data from our current Oracle data stores into MongoDB?& MovementOnce deployed, how should we load data into MongoDB on an ongoing basis? How can data be retrievedfrom MongoDB?ConformitySince MongoDB is not yet a corporate standard for Apollo, does it t well with the rest of Apollo Groupswith Companytechnical infrastructure?& IndustryIs MongoDB an industry de-facto standard? If not, is it well positioned to become one?StandardsQuality & If something goes wrong with our MongoDB conguration, can we get qualied, top-notch assistance Availability of even in the middle of the night or on a holiday?Support5 6. IDENTIFY A REALISTIC USE CASEShortly after organizing our stakeholder team and setting our goals, the team identified a good use case that would let usconduct an apples-to-apples comparison with our Oracle configuration.In creating a use case, the team focused on commonly-used, discrete functionality from the Oracle system: For a specific student, retrieve a list of classes in which the student is enrolled. For a specific class, retrieve a list of the students enrolled. For a specific instructor, retrieve a list of the classes taught.In Oracle, the data required to serve these queries was stored in 6 tables.GATHER BASELINE METRICS FROM LEGACY SERVERIn order to have a basis for comparison, we pulled the actual historical usage logs for the student, course, and instructor tablesin the legacy Oracle system. These logs covered the previous four weeks; they contained the queries triggered and their per-formance from whenever students, instructors, or administrators examined class schedules and class rosters.ANALYZE LEGACY QUERY PERFORMANCE & IDENTIFY EVALUATION METRICSUpon analyzing the query usage data from the legacy Oracle system, we Figure 1: Actual Oracle Query Usagediscovered the following (see Figure 1): EXECUTION COUNT The Oracle system was able to perform about 450 queries per second withacceptable response times. The production MongoDB system would need toQuery 1handle at least the same load.Query 2 Logs revealed that Query 1 a query to return all courses for a given student Query 3 dominated all other queries. Four weeks of historical data showed thatQuery 4 Query 1 was executed 15.6 million times out of 33.6 million query executions Query 5 total. Furthermore, the top 5 queries comprised over 85% of all queries. Query 6 Query 1 was responsible for nearly 50% of all query executions. Queries 1Query 7 through 5 were cumulatively responsible for over 85% of all query executions.Phase 1Phase 1 involved implementing our use case with real-world data on a single-server configuration of MongoDB.DESIGN THE DATA MODELThe most fundamental difference between Oracle and MongoDB is the data model. In Oracle, all data is stored in relationaltables, and most queries require joins of these tables. MongoDB, on the other hand, uses neither tables nor joins. Instead, ituses a document-based approach where data that is usually accessed together is also stored together in the same MongoDBdocument.In Oracle, our use cases data was stored in 6 relational tables with several indexes, and accessed via complex SQL queries thatused several joins. We needed to transform this data into a MongoDB document-based data model. Creating an optimized datamodel is critically important if not done well, one can lose many of the benefits of MongoDB, such as reducing the numberof queries and reducing the amount of reads and writes to disk.Fortunately, the task came together easily, because we focused on how the data was being used. Armed with our query andtable usage data from the Oracle system (from Phase 0), and guided by knowledgeable 10gen consultants, we designed a datamodel that was optimized for the most common queries.As mentioned earlier, a single query accounted for nearly 50% of the query executions. We thus designed a data model thatrepresented the exact fields from this query. We then progressed to the next most common query. This query was very similarto the first, and asked for all the students in a given course. In fact, the top five queries which cumulatively accounted for85% of all query executions were just variations of the same basic query.6 7. As a result, we were able to reduce the original 6 rela-Figure 2: Our Simplied MongoDB Data Modeltional tables with numerous indexes in Oracle to justone collection with 2 indexes in MongoDB (see Figure {2). For our use case, data that is most frequently ac- "_id": "8738728763872",cessed together is also stored together. So, user data(students and instructors) and course data are stored"role" : "Student",as user-course pairs. In a relational database this"user :{would be stored differently, with separate tables for"id" : "b7ed789f198a",users and courses. "firstName" : "Rick", "lastName" : "Matin"DESIGN THE SERVICE LAYER FOR DATA STORE ACCESS },To do an apples-to-apples comparison of Oracle andMongoDB, we designed a common service layer that "course" : {our student roster application could use to access "dateRange" : {either data store without code changes. This service "startDate" : ISODate("2011-12-30T07:00:00Z"),layer allowed the test application to create, read, "endDate" : ISODate("2012-01-30T07:00:00Z")update, and delete records individually, and also per-},formed some basic data aggregations."courseId" : "734234274",STAND UP SMALL MONGODB SERVER "code" : "MATH/101",With our MongoDB data model designed, our next task "title" : "Introduction to Mathematics"was to get a small MongoDB server running. This was}a simple process that took half a day. { Install and Activate. We quickly spun up a blank virtual machine1 on Amazon EC2, and then installed MongoDB on it. Populate with Course Data. We used approximately 300,000 very simple records. We used a Python script to import the data.After 4 hours, our small MongoDB server was operational and fully loaded with our test data. We were ready to test the imple-mentation of our use case, the student roster application.RUN USE CASE ON SINGLE SERVERWe ran our student roster test case against the single MongoDB instance. All the queries executed correctly and we experi-enced no problems. While we were not measuring performance at this point, we noted that the system seemed quite respon-sive and returned results quickly.Figure 3: Hybrid Conguration Multi-Node MongoDB Clusters Phase 2Having successfully deployed and testedMongoDB on a single Amazon EC2 server,we moved on to standing up various largeMongoDB configurations. Our aim in thisphase was to learn how to deploy multi-nodeMongoDB clusters that used replica sets, andhow to deploy MongoDB in a hybrid cloud/on-premise configuration.Our operations teams built a hybrid configu-ration of geographically distributed nodesthat were both in Amazon EC2 and on-prem-ise in an Apollo data center. We deployed fivenodes total: three nodes on Amazon EC2 (onemaster and two slaves) and two slaves within1The server and clients were Amazon EC2 m1.large instances: 7.5 GB memory, 4 EC2 Compute Units (2 virtual cores with 2 EC27Compute Units each), 50 GB instance storage, 64-bit platform, I/O Performance: High, EBS-Optimized Available: 500 Mbps 8. an Apollo Group data center. All the Amazon EC2 nodes were within the same Amazon EC2 Region, but each within differentavailability zones.In standing up these various large configurations, we developed Chef and Puppet scripts that let us quickly deploy new Mon-goDB farms and add replica sets, as well as monitor the servers. We also created a runbook to instruct administrators on howto install MongoDB instances and keep them running.Phase 3Whereas the first three phases centered around getting familiar with MongoDB and how to set it up, Phase 3 focused on per-formance testing. MongoDB performed flawlessly in our preliminary setups, but we did not yet know how it would perform ina production-scale deployment. In particular, we wanted to learn how using various Amazon EC2 availability zones and regionsaffected overall performance.Table 3: Results of Performance Testing Across Various MongoDB Congurations on Amazon EC2. See appendix for moredet...