14
Data Data Inglorious Inglorious Atlas: “All this data sure is heavy.” Data: “Indeed, may I suggest moving it to the cloud.

Data Inglorious

  • Upload
    una

  • View
    33

  • Download
    0

Embed Size (px)

DESCRIPTION

Data Inglorious. Atlas: “All this data sure is heavy.”. Data: “Indeed, may I suggest moving it to the cloud.”. d atabase defined. A database is a collection of data, which is organized into files called tables. These tables provide a systematic way of accessing, managing, and updating data. - PowerPoint PPT Presentation

Citation preview

Page 1: Data Inglorious

Data Data IngloriousInglorious

Atlas: “All this data sure is heavy.”

Data: “Indeed, may I suggest moving it to the cloud.”

Page 2: Data Inglorious

database defineddatabase defined• A database is a collection of data, which is

organized into files called tables.• These tables provide a systematic way of

accessing, managing, and updating data.• A relational database is one that contains

multiple tables of data that relate to each other through special key fields.

• Relational databases are far more flexible (though harder to design and maintain) than what are known as flat file databases, which contain a single table of data.

Page 3: Data Inglorious

overview, the payloadoverview, the payload• Oracle Internet Directory, (OID)• Zynga Games/Farmville • Facebook• bioinformatics• Calmail

Page 4: Data Inglorious

ex. oracle OIDex. oracle OID• Oracle Internet Directory: 400,000 operations per

second on a 500 million user database

Page 5: Data Inglorious

ex. zynga gamesex. zynga games• 65 million players a day, millions of web browsers

open, millions of farms (Farmville game), millions of frontiers, millions of objects bought and sold…all recorded on a database

• 500,000 operations-per-second database behind Farmville

• http://www.readwriteweb.com/cloud/2010/08/membase-the-database-powering.php

Page 6: Data Inglorious

ex. facebookex. facebook• 60,000 servers• 1,800 MySQL servers,• 400 million active users,• 200 million a day• 50 million operations per second

Page 7: Data Inglorious

ex. bioinformaticsex. bioinformatics• DNA sequence data = prime

candidate for study with database systems,

• Homologous strings• Nucleic acids: Adenine, Guanine,

Cytosine, Thymine• 3.4 million base pairs in the human

genome, expressed as a string of AGC and T

• Human Genome Project : 3.4 billion letters of the human genome, Sanger Institute: 1 billion on MySQL

Page 8: Data Inglorious

ex. calmailex. calmail• Calmail: 4 million e-mails offered a day, 1 million

served, MySQL backend, that just failed

Page 9: Data Inglorious

flat file v. relationalflat file v. relational• Imagine the needs of two small companies that take customer orders for their products. Company A uses a

flat file database with a single table named orders to record orders they receive, while Company B uses a relational database with two tables: orders and customers.

• When a customer places an order with Company A, a new record (or row) in the table orders is created. Because Company A has only one table of data, all the information pertaining to that order must be put into a single record. This means that the customer's general information, such as name and address, is stored in the same record as the order information, such as product description, quantity, and price. If customers place more than one order, their general information will need to be re-entered and thus duplicated for each order they place.

• Whenever there is duplicate data, as in the case above, many inconsistencies may arise when users try to query the database. Additionally, a customer's change of address would require the database manager to find all records in orders that the customer placed, and change the address data for each one.

• Company B is much better off with its relational database. Each of its customers has one and only one record of general information stored in the table customers. Each customer's record is identified by a unique customer code which will serve as the relational key. When a customer orders from Company B, the record in orders need contain only a reference to the customer's code, because all of the customer's general information is already stored in customers.

• This approach to entering data solves the problems of duplicate data and making changes to customer information. The database manager need change only one record in customers if someone changes addresses.

• This is document ahrp in domain all.Last modified on April 24, 2006.

• Indiana University, Knowledge Base http://kb.iu.edu/data/ahrp.html

Page 10: Data Inglorious

flat file v. relationalflat file v. relational• Single table (flat file) v multiple tables (relational)

Page 11: Data Inglorious

web Connectionweb Connection• Example: Plone Content Management System

connection to a MySQL database

Page 12: Data Inglorious

go graphic, go graphic, phpMyAdminphpMyAdmin

• A graphic interface tool for working with MySQL

Page 13: Data Inglorious

phpMyAdminphpMyAdmin• GSPP and phpMyAdmin• localhost

Page 14: Data Inglorious

other database other database systemssystems

• Hadoop: distributed processing of large data sets• http

://code.zynga.com/2011/06/deciding-how-to-store-billions-of-rows-per-day/

• Membase: new for games and other apps• http://www.readwriteweb.com/cloud/2010/08/

membase-the-database-powering.php• CouchDB: no schema• http://couchdb.apache.org/docs/intro.html