Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Freeing Data Storage from Cages
Alekh Jindal Jorge Quiané Jens Dittrich
Saarland University, GermanyQCRI, Qatar
WWHow! Freeing Data Storage from Cages
Alekh Jindal
FJorge-Arnulfo Quiané-Ruiz
_,⌥ Jens Dittrich
F
FInformation Systems Group, Saarland University
http://infosys.cs.uni-saarland.de
_QCRI, Qatar Foundation
http://www.qcri.qa
Abstract. E�cient data storage is a key component of data manag-ing systems to achieve good performance. However, currently datastorage is either heavily constrained by static decisions (e.g. fixeddata stores in DBMSs) or left to be tuned and configured by users(e.g. manual data backup in File Systems). In this paper, we takea holistic view of data storage and envision a virtual storage layer.Our virtual storage layer provides a unified storage framework forseveral use-cases including personal, enterprise, and cloud storage.
1. INTRODUCTIONTo better understand the problem, let us first see some of the typ-
ical data storage scenarios and analyze what they have in common.
1.1 Current Approaches to Data StorageWe start from the simplest use case of a personal user using a File
System Storage, right up to an enterprise user using Cloud Storage.Use Case 1: File System. Consider a researcher, Alice, having twolaptops (one personal, one from her university). Alice stores di↵er-ent types of files at di↵erent locations. For example, she storesmovies on her personal laptop and grant proposals on her univer-sity laptop. Alice also maintains regular copies of her data on ex-ternal devices, in case one of her laptop crashes. For instance, shemaintains copies of grant proposals on hard-drives and recent con-ference talks on USB sticks. Finally, Alice also changes the dataformat (e.g. compressed) of her files in order to save storage space.Use Case 2: RAID. Consider a University IT department usingRAID storage servers to store its data. The RAID system automat-ically stores parity information on di↵erent disks and stripes themto allow for one or more disk failures. The server admin does notneed to worry about the reliability of data. She just needs to choosethe RAID level for the first time. However, changing the RAIDlevel is often a complex task in practice.Use Case 3: Relational DBMS. Consider a car manufacturer us-ing an RDBMS for managing its inventory and sales. The manufac-turer provides the schema definition as well as the data to store. TheRDBMS takes care of physical data storage, recovery, and data lay-outs (e.g. row store in IBM DB2). For an expert DBA, the RDBMS
⌥Work done at Saarland University.
This article is published under a Creative Commons Attribution License(http://creativecommons.org/licenses/by/3.0/), which permits distributionand reproduction in any medium as well allowing derivative works, pro-vided that you attribute the original work to the author(s) and CIDR 2013.6th Biennial Conference on Innovative Data Systems Research (CIDR ’13)January 6-9, 2013, Asilomar, California, USA.
provides several additional data tunings knobs such as replication,indexing, partitioning, and storage locations.Use Case 4: Cloud. Consider a web analytics start-up companyusing Cloud storage to store large volumes of web logs. The start-up company needs to pick a cloud provider for its data. However,it does not need to worry about the data placement. The CloudStorage automatically replicates and distributes data over severalstorage locations in order to guarantee the availability of their data.Furthermore, the start-up company does not care about adding newstorage locations: the Cloud Storage does so automatically. Addi-tionally, the company may chose to store the data in multiple datalayouts [10] or indexes [5].
All of these use-cases are very di↵erent, right? No! We arguethat they are all facets of the same single data storage problem.We observe that in each of the four Use Cases, the users are im-plicitly or explicitly answering the following three key questions:(1) What to store? (i.e. which parts to store from a dataset or fileand how many times), (2) Where to store? (i.e. on which devices orlocations), and (3) How to store? (i.e. in which layout or format).Table 1 categorizes these storage decisions in the four Use Casesand labels them according to whether they use fixed ( ), flexible &manual ( ), or flexible & automatic ( ) storage decisions.What to store? The second column of Table 1 shows the what de-cisions in the four Use Cases. In Use Case 1 (UC1 for short), thefile system requires Alice to make manual choices of which mas-ter copies as well as the backup copies of data to store. In UC2,RAID automatically creates data replicas or parity, depending onthe RAID level. In UC3, an RDBMS typically stores the data aswell as the recovery log. A DBA has further control to create in-dexes and materialized views to speed up query execution. Simi-larly, Fractured Mirrors makes a fixed decision of storing exactlytwo copies of the data. Finally, in UC4, Cloud Storage is flexibleto create one or more data replicas for availability.Where to store? The third column of Table 1 shows the wheredecisions in the four Use Cases. In UC1, Alice has to manuallydecide where to store each type of data, e.g. movies on her per-sonal laptop and grant proposals on her university laptop. Goingfurther, RAID (UC2) makes fixed decisions (based on the RAIDlevel) to store data on a set of storage devices, which can be addedor removed manually. Similarly, an RDBMS (UC3) allows usersto define table spaces, typically residing on di↵erent storage loca-tions, for their application. However, Fractured Mirrors stores thedata on two (fixed) di↵erent machines. Cloud Storage (UC4), onthe other hand, is fully flexible on where to store the data. A cloudprovider can change data storage locations and even provision ad-ditional physical machines automatically.How to store? The fourth column of Table 1 shows the how deci-sions in the four Use Cases. Again, users in UC1 must manually
WWHow! Freeing Data Storage from Cages
Alekh Jindal
FJorge-Arnulfo Quiané-Ruiz
_,⌥ Jens Dittrich
F
FInformation Systems Group, Saarland University
http://infosys.cs.uni-saarland.de
_QCRI, Qatar Foundation
http://www.qcri.qa
Abstract. E�cient data storage is a key component of data manag-ing systems to achieve good performance. However, currently datastorage is either heavily constrained by static decisions (e.g. fixeddata stores in DBMSs) or left to be tuned and configured by users(e.g. manual data backup in File Systems). In this paper, we takea holistic view of data storage and envision a virtual storage layer.Our virtual storage layer provides a unified storage framework forseveral use-cases including personal, enterprise, and cloud storage.
1. INTRODUCTIONTo better understand the problem, let us first see some of the typ-
ical data storage scenarios and analyze what they have in common.
1.1 Current Approaches to Data StorageWe start from the simplest use case of a personal user using a File
System Storage, right up to an enterprise user using Cloud Storage.Use Case 1: File System. Consider a researcher, Alice, having twolaptops (one personal, one from her university). Alice stores di↵er-ent types of files at di↵erent locations. For example, she storesmovies on her personal laptop and grant proposals on her univer-sity laptop. Alice also maintains regular copies of her data on ex-ternal devices, in case one of her laptop crashes. For instance, shemaintains copies of grant proposals on hard-drives and recent con-ference talks on USB sticks. Finally, Alice also changes the dataformat (e.g. compressed) of her files in order to save storage space.Use Case 2: RAID. Consider a University IT department usingRAID storage servers to store its data. The RAID system automat-ically stores parity information on di↵erent disks and stripes themto allow for one or more disk failures. The server admin does notneed to worry about the reliability of data. She just needs to choosethe RAID level for the first time. However, changing the RAIDlevel is often a complex task in practice.Use Case 3: Relational DBMS. Consider a car manufacturer us-ing an RDBMS for managing its inventory and sales. The manufac-turer provides the schema definition as well as the data to store. TheRDBMS takes care of physical data storage, recovery, and data lay-outs (e.g. row store in IBM DB2). For an expert DBA, the RDBMS
⌥Work done at Saarland University.
This article is published under a Creative Commons Attribution License(http://creativecommons.org/licenses/by/3.0/), which permits distributionand reproduction in any medium as well allowing derivative works, pro-vided that you attribute the original work to the author(s) and CIDR 2013.6th Biennial Conference on Innovative Data Systems Research (CIDR ’13)January 6-9, 2013, Asilomar, California, USA.
provides several additional data tunings knobs such as replication,indexing, partitioning, and storage locations.Use Case 4: Cloud. Consider a web analytics start-up companyusing Cloud storage to store large volumes of web logs. The start-up company needs to pick a cloud provider for its data. However,it does not need to worry about the data placement. The CloudStorage automatically replicates and distributes data over severalstorage locations in order to guarantee the availability of their data.Furthermore, the start-up company does not care about adding newstorage locations: the Cloud Storage does so automatically. Addi-tionally, the company may chose to store the data in multiple datalayouts [10] or indexes [5].
All of these use-cases are very di↵erent, right? No! We arguethat they are all facets of the same single data storage problem.We observe that in each of the four Use Cases, the users are im-plicitly or explicitly answering the following three key questions:(1) What to store? (i.e. which parts to store from a dataset or fileand how many times), (2) Where to store? (i.e. on which devices orlocations), and (3) How to store? (i.e. in which layout or format).Table 1 categorizes these storage decisions in the four Use Casesand labels them according to whether they use fixed ( ), flexible &manual ( ), or flexible & automatic ( ) storage decisions.What to store? The second column of Table 1 shows the what de-cisions in the four Use Cases. In Use Case 1 (UC1 for short), thefile system requires Alice to make manual choices of which mas-ter copies as well as the backup copies of data to store. In UC2,RAID automatically creates data replicas or parity, depending onthe RAID level. In UC3, an RDBMS typically stores the data aswell as the recovery log. A DBA has further control to create in-dexes and materialized views to speed up query execution. Simi-larly, Fractured Mirrors makes a fixed decision of storing exactlytwo copies of the data. Finally, in UC4, Cloud Storage is flexibleto create one or more data replicas for availability.Where to store? The third column of Table 1 shows the wheredecisions in the four Use Cases. In UC1, Alice has to manuallydecide where to store each type of data, e.g. movies on her per-sonal laptop and grant proposals on her university laptop. Goingfurther, RAID (UC2) makes fixed decisions (based on the RAIDlevel) to store data on a set of storage devices, which can be addedor removed manually. Similarly, an RDBMS (UC3) allows usersto define table spaces, typically residing on di↵erent storage loca-tions, for their application. However, Fractured Mirrors stores thedata on two (fixed) di↵erent machines. Cloud Storage (UC4), onthe other hand, is fully flexible on where to store the data. A cloudprovider can change data storage locations and even provision ad-ditional physical machines automatically.How to store? The fourth column of Table 1 shows the how deci-sions in the four Use Cases. Again, users in UC1 must manually
WWHow! Freeing Data Storage from Cages
Alekh Jindal
FJorge-Arnulfo Quiané-Ruiz
_,⌥ Jens Dittrich
F
FInformation Systems Group, Saarland University
http://infosys.cs.uni-saarland.de
_QCRI, Qatar Foundation
http://www.qcri.qa
Abstract. E�cient data storage is a key component of data manag-ing systems to achieve good performance. However, currently datastorage is either heavily constrained by static decisions (e.g. fixeddata stores in DBMSs) or left to be tuned and configured by users(e.g. manual data backup in File Systems). In this paper, we takea holistic view of data storage and envision a virtual storage layer.Our virtual storage layer provides a unified storage framework forseveral use-cases including personal, enterprise, and cloud storage.
1. INTRODUCTIONTo better understand the problem, let us first see some of the typ-
ical data storage scenarios and analyze what they have in common.
1.1 Current Approaches to Data StorageWe start from the simplest use case of a personal user using a File
System Storage, right up to an enterprise user using Cloud Storage.Use Case 1: File System. Consider a researcher, Alice, having twolaptops (one personal, one from her university). Alice stores di↵er-ent types of files at di↵erent locations. For example, she storesmovies on her personal laptop and grant proposals on her univer-sity laptop. Alice also maintains regular copies of her data on ex-ternal devices, in case one of her laptop crashes. For instance, shemaintains copies of grant proposals on hard-drives and recent con-ference talks on USB sticks. Finally, Alice also changes the dataformat (e.g. compressed) of her files in order to save storage space.Use Case 2: RAID. Consider a University IT department usingRAID storage servers to store its data. The RAID system automat-ically stores parity information on di↵erent disks and stripes themto allow for one or more disk failures. The server admin does notneed to worry about the reliability of data. She just needs to choosethe RAID level for the first time. However, changing the RAIDlevel is often a complex task in practice.Use Case 3: Relational DBMS. Consider a car manufacturer us-ing an RDBMS for managing its inventory and sales. The manufac-turer provides the schema definition as well as the data to store. TheRDBMS takes care of physical data storage, recovery, and data lay-outs (e.g. row store in IBM DB2). For an expert DBA, the RDBMS
⌥Work done at Saarland University.
This article is published under a Creative Commons Attribution License(http://creativecommons.org/licenses/by/3.0/), which permits distributionand reproduction in any medium as well allowing derivative works, pro-vided that you attribute the original work to the author(s) and CIDR 2013.6th Biennial Conference on Innovative Data Systems Research (CIDR ’13)January 6-9, 2013, Asilomar, California, USA.
provides several additional data tunings knobs such as replication,indexing, partitioning, and storage locations.Use Case 4: Cloud. Consider a web analytics start-up companyusing Cloud storage to store large volumes of web logs. The start-up company needs to pick a cloud provider for its data. However,it does not need to worry about the data placement. The CloudStorage automatically replicates and distributes data over severalstorage locations in order to guarantee the availability of their data.Furthermore, the start-up company does not care about adding newstorage locations: the Cloud Storage does so automatically. Addi-tionally, the company may chose to store the data in multiple datalayouts [10] or indexes [5].
All of these use-cases are very di↵erent, right? No! We arguethat they are all facets of the same single data storage problem.We observe that in each of the four Use Cases, the users are im-plicitly or explicitly answering the following three key questions:(1) What to store? (i.e. which parts to store from a dataset or fileand how many times), (2) Where to store? (i.e. on which devices orlocations), and (3) How to store? (i.e. in which layout or format).Table 1 categorizes these storage decisions in the four Use Casesand labels them according to whether they use fixed ( ), flexible &manual ( ), or flexible & automatic ( ) storage decisions.What to store? The second column of Table 1 shows the what de-cisions in the four Use Cases. In Use Case 1 (UC1 for short), thefile system requires Alice to make manual choices of which mas-ter copies as well as the backup copies of data to store. In UC2,RAID automatically creates data replicas or parity, depending onthe RAID level. In UC3, an RDBMS typically stores the data aswell as the recovery log. A DBA has further control to create in-dexes and materialized views to speed up query execution. Simi-larly, Fractured Mirrors makes a fixed decision of storing exactlytwo copies of the data. Finally, in UC4, Cloud Storage is flexibleto create one or more data replicas for availability.Where to store? The third column of Table 1 shows the wheredecisions in the four Use Cases. In UC1, Alice has to manuallydecide where to store each type of data, e.g. movies on her per-sonal laptop and grant proposals on her university laptop. Goingfurther, RAID (UC2) makes fixed decisions (based on the RAIDlevel) to store data on a set of storage devices, which can be addedor removed manually. Similarly, an RDBMS (UC3) allows usersto define table spaces, typically residing on di↵erent storage loca-tions, for their application. However, Fractured Mirrors stores thedata on two (fixed) di↵erent machines. Cloud Storage (UC4), onthe other hand, is fully flexible on where to store the data. A cloudprovider can change data storage locations and even provision ad-ditional physical machines automatically.How to store? The fourth column of Table 1 shows the how deci-sions in the four Use Cases. Again, users in UC1 must manually
WWHow! Freeing Data Storage from Cages
Alekh Jindal
FJorge-Arnulfo Quiané-Ruiz
_,⌥ Jens Dittrich
F
FInformation Systems Group, Saarland University
http://infosys.cs.uni-saarland.de
_QCRI, Qatar Foundation
http://www.qcri.qa
Abstract. E�cient data storage is a key component of data manag-ing systems to achieve good performance. However, currently datastorage is either heavily constrained by static decisions (e.g. fixeddata stores in DBMSs) or left to be tuned and configured by users(e.g. manual data backup in File Systems). In this paper, we takea holistic view of data storage and envision a virtual storage layer.Our virtual storage layer provides a unified storage framework forseveral use-cases including personal, enterprise, and cloud storage.
1. INTRODUCTIONTo better understand the problem, let us first see some of the typ-
ical data storage scenarios and analyze what they have in common.
1.1 Current Approaches to Data StorageWe start from the simplest use case of a personal user using a File
System Storage, right up to an enterprise user using Cloud Storage.Use Case 1: File System. Consider a researcher, Alice, having twolaptops (one personal, one from her university). Alice stores di↵er-ent types of files at di↵erent locations. For example, she storesmovies on her personal laptop and grant proposals on her univer-sity laptop. Alice also maintains regular copies of her data on ex-ternal devices, in case one of her laptop crashes. For instance, shemaintains copies of grant proposals on hard-drives and recent con-ference talks on USB sticks. Finally, Alice also changes the dataformat (e.g. compressed) of her files in order to save storage space.Use Case 2: RAID. Consider a University IT department usingRAID storage servers to store its data. The RAID system automat-ically stores parity information on di↵erent disks and stripes themto allow for one or more disk failures. The server admin does notneed to worry about the reliability of data. She just needs to choosethe RAID level for the first time. However, changing the RAIDlevel is often a complex task in practice.Use Case 3: Relational DBMS. Consider a car manufacturer us-ing an RDBMS for managing its inventory and sales. The manufac-turer provides the schema definition as well as the data to store. TheRDBMS takes care of physical data storage, recovery, and data lay-outs (e.g. row store in IBM DB2). For an expert DBA, the RDBMS
⌥Work done at Saarland University.
This article is published under a Creative Commons Attribution License(http://creativecommons.org/licenses/by/3.0/), which permits distributionand reproduction in any medium as well allowing derivative works, pro-vided that you attribute the original work to the author(s) and CIDR 2013.6th Biennial Conference on Innovative Data Systems Research (CIDR ’13)January 6-9, 2013, Asilomar, California, USA.
provides several additional data tunings knobs such as replication,indexing, partitioning, and storage locations.Use Case 4: Cloud. Consider a web analytics start-up companyusing Cloud storage to store large volumes of web logs. The start-up company needs to pick a cloud provider for its data. However,it does not need to worry about the data placement. The CloudStorage automatically replicates and distributes data over severalstorage locations in order to guarantee the availability of their data.Furthermore, the start-up company does not care about adding newstorage locations: the Cloud Storage does so automatically. Addi-tionally, the company may chose to store the data in multiple datalayouts [10] or indexes [5].
All of these use-cases are very di↵erent, right? No! We arguethat they are all facets of the same single data storage problem.We observe that in each of the four Use Cases, the users are im-plicitly or explicitly answering the following three key questions:(1) What to store? (i.e. which parts to store from a dataset or fileand how many times), (2) Where to store? (i.e. on which devices orlocations), and (3) How to store? (i.e. in which layout or format).Table 1 categorizes these storage decisions in the four Use Casesand labels them according to whether they use fixed ( ), flexible &manual ( ), or flexible & automatic ( ) storage decisions.What to store? The second column of Table 1 shows the what de-cisions in the four Use Cases. In Use Case 1 (UC1 for short), thefile system requires Alice to make manual choices of which mas-ter copies as well as the backup copies of data to store. In UC2,RAID automatically creates data replicas or parity, depending onthe RAID level. In UC3, an RDBMS typically stores the data aswell as the recovery log. A DBA has further control to create in-dexes and materialized views to speed up query execution. Simi-larly, Fractured Mirrors makes a fixed decision of storing exactlytwo copies of the data. Finally, in UC4, Cloud Storage is flexibleto create one or more data replicas for availability.Where to store? The third column of Table 1 shows the wheredecisions in the four Use Cases. In UC1, Alice has to manuallydecide where to store each type of data, e.g. movies on her per-sonal laptop and grant proposals on her university laptop. Goingfurther, RAID (UC2) makes fixed decisions (based on the RAIDlevel) to store data on a set of storage devices, which can be addedor removed manually. Similarly, an RDBMS (UC3) allows usersto define table spaces, typically residing on di↵erent storage loca-tions, for their application. However, Fractured Mirrors stores thedata on two (fixed) di↵erent machines. Cloud Storage (UC4), onthe other hand, is fully flexible on where to store the data. A cloudprovider can change data storage locations and even provision ad-ditional physical machines automatically.How to store? The fourth column of Table 1 shows the how deci-sions in the four Use Cases. Again, users in UC1 must manually
WWHow! Freeing Data Storage from Cages
Alekh Jindal
FJorge-Arnulfo Quiané-Ruiz
_,⌥ Jens Dittrich
F
FInformation Systems Group, Saarland University
http://infosys.cs.uni-saarland.de
_QCRI, Qatar Foundation
http://www.qcri.qa
Abstract. E�cient data storage is a key component of data manag-ing systems to achieve good performance. However, currently datastorage is either heavily constrained by static decisions (e.g. fixeddata stores in DBMSs) or left to be tuned and configured by users(e.g. manual data backup in File Systems). In this paper, we takea holistic view of data storage and envision a virtual storage layer.Our virtual storage layer provides a unified storage framework forseveral use-cases including personal, enterprise, and cloud storage.
1. INTRODUCTIONTo better understand the problem, let us first see some of the typ-
ical data storage scenarios and analyze what they have in common.
1.1 Current Approaches to Data StorageWe start from the simplest use case of a personal user using a File
System Storage, right up to an enterprise user using Cloud Storage.Use Case 1: File System. Consider a researcher, Alice, having twolaptops (one personal, one from her university). Alice stores di↵er-ent types of files at di↵erent locations. For example, she storesmovies on her personal laptop and grant proposals on her univer-sity laptop. Alice also maintains regular copies of her data on ex-ternal devices, in case one of her laptop crashes. For instance, shemaintains copies of grant proposals on hard-drives and recent con-ference talks on USB sticks. Finally, Alice also changes the dataformat (e.g. compressed) of her files in order to save storage space.Use Case 2: RAID. Consider a University IT department usingRAID storage servers to store its data. The RAID system automat-ically stores parity information on di↵erent disks and stripes themto allow for one or more disk failures. The server admin does notneed to worry about the reliability of data. She just needs to choosethe RAID level for the first time. However, changing the RAIDlevel is often a complex task in practice.Use Case 3: Relational DBMS. Consider a car manufacturer us-ing an RDBMS for managing its inventory and sales. The manufac-turer provides the schema definition as well as the data to store. TheRDBMS takes care of physical data storage, recovery, and data lay-outs (e.g. row store in IBM DB2). For an expert DBA, the RDBMS
⌥Work done at Saarland University.
This article is published under a Creative Commons Attribution License(http://creativecommons.org/licenses/by/3.0/), which permits distributionand reproduction in any medium as well allowing derivative works, pro-vided that you attribute the original work to the author(s) and CIDR 2013.6th Biennial Conference on Innovative Data Systems Research (CIDR ’13)January 6-9, 2013, Asilomar, California, USA.
provides several additional data tunings knobs such as replication,indexing, partitioning, and storage locations.Use Case 4: Cloud. Consider a web analytics start-up companyusing Cloud storage to store large volumes of web logs. The start-up company needs to pick a cloud provider for its data. However,it does not need to worry about the data placement. The CloudStorage automatically replicates and distributes data over severalstorage locations in order to guarantee the availability of their data.Furthermore, the start-up company does not care about adding newstorage locations: the Cloud Storage does so automatically. Addi-tionally, the company may chose to store the data in multiple datalayouts [10] or indexes [5].
All of these use-cases are very di↵erent, right? No! We arguethat they are all facets of the same single data storage problem.We observe that in each of the four Use Cases, the users are im-plicitly or explicitly answering the following three key questions:(1) What to store? (i.e. which parts to store from a dataset or fileand how many times), (2) Where to store? (i.e. on which devices orlocations), and (3) How to store? (i.e. in which layout or format).Table 1 categorizes these storage decisions in the four Use Casesand labels them according to whether they use fixed ( ), flexible &manual ( ), or flexible & automatic ( ) storage decisions.What to store? The second column of Table 1 shows the what de-cisions in the four Use Cases. In Use Case 1 (UC1 for short), thefile system requires Alice to make manual choices of which mas-ter copies as well as the backup copies of data to store. In UC2,RAID automatically creates data replicas or parity, depending onthe RAID level. In UC3, an RDBMS typically stores the data aswell as the recovery log. A DBA has further control to create in-dexes and materialized views to speed up query execution. Simi-larly, Fractured Mirrors makes a fixed decision of storing exactlytwo copies of the data. Finally, in UC4, Cloud Storage is flexibleto create one or more data replicas for availability.Where to store? The third column of Table 1 shows the wheredecisions in the four Use Cases. In UC1, Alice has to manuallydecide where to store each type of data, e.g. movies on her per-sonal laptop and grant proposals on her university laptop. Goingfurther, RAID (UC2) makes fixed decisions (based on the RAIDlevel) to store data on a set of storage devices, which can be addedor removed manually. Similarly, an RDBMS (UC3) allows usersto define table spaces, typically residing on di↵erent storage loca-tions, for their application. However, Fractured Mirrors stores thedata on two (fixed) di↵erent machines. Cloud Storage (UC4), onthe other hand, is fully flexible on where to store the data. A cloudprovider can change data storage locations and even provision ad-ditional physical machines automatically.How to store? The fourth column of Table 1 shows the how deci-sions in the four Use Cases. Again, users in UC1 must manually
Data Storage is Easy!
Data Storage is Easy!
Raw data
Data Management
System
Raw data
Data Management
System
???
?
Queries
Raw data
Data Management
System
Is performance as expected?
???
?
Is your Data Storage Efficient?
What to store?
TPC-H Lineitem
copy 1 copy 2 copy 3
TPC-H Lineitem
Where to store?
?
TPC-H Lineitem
How to store?
?a b+
What, Where, and Howin our everyday life
ppt
ppt
ppt
ppt
pdfppt
pdfppt
pdfppt
pdfppt
ppt
pdfppt
ppt
pdfppt
However
WhatWhereHow
TPC-HLineitem
Data Management
System
Store Engine
TPC-HLineitem
Data Management
System
Store Engine
DBMS-X
TPC-HLineitem
Store Engine
DBMS-X
TPC-HLineitem
row
layo
ut
WhatWhere
How
ppt
pdfppt
ppt
pdfppt
WWHow!
WWHow!hat
here
WWHow
WWHow! VisionFlexible on What, Where, and How1
WWHow! VisionFlexible on What, Where, and How1Declarative Data Storage Language2
WWHow! VisionFlexible on What, Where, and How1Declarative Data Storage Language2Data Storage Optimizer3
DSL DSL
Logi
cal
Dat
a Vi
ewPh
ysic
al
Dat
a Vi
ew
WWHow! Language
Physical Storage Interface
Data Management
System
WWHow! Layer
WWHow!hat
WWHow
can we do?
I want to store 3 replicas of the weblogs on the cloud with a different index each replica
STORE ‘/webApp/logs/UserVisits.log’
I want to store 3 replicas of the weblogs on the cloud with a different index each replica
STORE ‘/webApp/logs/UserVisits.log’WHAT * AS rep-1, * AS rep-2, * AS rep-3
I want to store 3 replicas of the weblogs on the cloud with a different index each replica
STORE ‘/webApp/logs/UserVisits.log’WHAT * AS rep-1, * AS rep-2, * AS rep-3WHERE ec2-007-23-compute-1-amazonaws.com
I want to store 3 replicas of the weblogs on the cloud with a different index each replica
STORE ‘/webApp/logs/UserVisits.log’WHAT * AS rep-1, * AS rep-2, * AS rep-3WHERE ec2-007-23-compute-1-amazonaws.comHOW idx(url) FOR rep-1, idx(sourceIP) FOR rep-2, idx(visitDate) FOR rep-3;
I want to store 3 replicas of the weblogs on the cloud with a different index each replica
I want to store 3 replicas of the weblogs with a different index each replica and
with high privacy
STORE ‘/webApp/logs/UserVisits.log’WHAT * AS rep-1, * AS rep-2, * AS rep-3HOW idx(url) FOR rep-1,
idx(sourceIP) FOR rep-2,idx(visitDate) FOR rep-3;
I want to store 3 replicas of the weblogs with a different index each replica and
with high privacy
STORE ‘/webApp/logs/UserVisits.log’WHAT * AS rep-1, * AS rep-2, * AS rep-3HOW idx(url) FOR rep-1,
idx(sourceIP) FOR rep-2,idx(visitDate) FOR rep-3;
CONSTRAINT Privacy = ‘high’
I want to store 3 replicas of the weblogs with a different index each replica and
with high privacy
STORE ‘/webApp/logs/UserVisits.log’WHAT * AS rep-1, * AS rep-2, * AS rep-3HOW idx(url) FOR rep-1,
idx(sourceIP) FOR rep-2,idx(visitDate) FOR rep-3;
CONSTRAINT Privacy = ‘high’
I want to store 3 replicas of the weblogs with a different index each replica and
with high privacy
job for the WWhow! data storage optimizer
I want to store the weblogs with high privacy and if
possible with high availability
I want to store the weblogs with high privacy and if
possible with high availability
STORE ‘/webApp/logs/UserVisits.log’
I want to store the weblogs with high privacy and if
possible with high availability
STORE ‘/webApp/logs/UserVisits.log’PREFERENCE Availability = ‘high’
I want to store the weblogs with high privacy and if
possible with high availability
STORE ‘/webApp/logs/UserVisits.log’PREFERENCE Availability = ‘high’ CONSTRAINT Privacy = ‘high’
I want to store the weblogs with high privacy and if
possible with high availability
STORE ‘/webApp/logs/UserVisits.log’PREFERENCE Availability = ‘high’ CONSTRAINT Privacy = ‘high’
job for the WWhow! data storage optimizer
Euh!
What about existing
technology?
WWHow!here
WWHow
do we go?
Data Management System
TPC-HLineitem
Store Enginero
w la
yout
WWHow! Layer
row
layo
ut
Data Management System
TPC-HLineitem
WWHow! Layer
colu
mn
layo
ut
Data Management System
TPC-HLineitem
WWHow! Layer
colu
mn
layo
ut
Data Management Systems
TPC-HLineitem
DMS-X DMS-Y
WWHow!WWHowto proceed?
WWHow! Language
Physical Storage Interface
WWHow!Logi
cal
Dat
a Vi
ewPh
ysic
al
Dat
a Vi
ewW
WH
ow! L
ayer
Data Management
System
WWHow! Language
Physical Storage Interface
WWHow!Logi
cal
Dat
a Vi
ewPh
ysic
al
Dat
a Vi
ewW
WH
ow! L
ayer
Data Management
System
API (Logical View)WWHow! Language
Physical Storage Interface
WWHow!Logi
cal
Dat
a Vi
ewPh
ysic
al
Dat
a Vi
ewW
WH
ow! L
ayer
Data Management
System
API (Logical View)WWHow! Language
Physical Storage Interface
WWHow! Language
WWHow!Logi
cal
Dat
a Vi
ewPh
ysic
al
Dat
a Vi
ewW
WH
ow! L
ayer
Data Management
System
API (Logical View)WWHow! Language
Physical Storage Interface
WWHow! Language
WWHow!Logi
cal
Dat
a Vi
ewPh
ysic
al
Dat
a Vi
ewW
WH
ow! L
ayer
Data Management
System
WWHow! LanguageInterpreter
API (Logical View)WWHow! Language
Physical Storage Interface
WWHow! Language
WWHow!Logi
cal
Dat
a Vi
ewPh
ysic
al
Dat
a Vi
ewW
WH
ow! L
ayer
Data Management
System
WWHow! LanguageInterpreter
WWHow! ControllerStorage themes
API (Logical View)WWHow! Language
Physical Storage Interface
WWHow! Language
WWHow!Logi
cal
Dat
a Vi
ewPh
ysic
al
Dat
a Vi
ewW
WH
ow! L
ayer
Data Management
System
WWHow! LanguageInterpreter
WWHow! ControllerStorage themes
API (Logical View)WWHow! Language
Physical Storage Interface
WWHow! Language
WWHow!Logi
cal
Dat
a Vi
ewPh
ysic
al
Dat
a Vi
ewW
WH
ow! L
ayer
Data Management
System
WWHow! LanguageInterpreter
WWHow! ControllerStorage themes
WWHow! OptimizerWhat
Where How
Data Storage Optimizer
API (Logical View)WWHow! Language
Physical Storage Interface
WWHow! Language
WWHow!Logi
cal
Dat
a Vi
ewPh
ysic
al
Dat
a Vi
ewW
WH
ow! L
ayer
Data Management
System
WWHow! LanguageInterpreter
WWHow! ControllerStorage themes
WWHow! OptimizerWhat
Where How
Data Storage Optimizer
API (Logical View)WWHow! Language
Physical Storage Interface
WWHow! Language
WWHow!Logi
cal
Dat
a Vi
ewPh
ysic
al
Dat
a Vi
ewW
WH
ow! L
ayer
Data Management
System
WWHow! LanguageInterpreter
WWHow! ControllerStorage themes
WWHow! OptimizerWhat
Where How
Data Storage Optimizer
Data Access Optimizer
API (Logical View)WWHow! Language
Physical Storage Interface
WWHow! Language
WWHow!Logi
cal
Dat
a Vi
ewPh
ysic
al
Dat
a Vi
ewW
WH
ow! L
ayer
Data Management
System
WWHow! LanguageInterpreter
WWHow! ControllerStorage themes
WWHow! OptimizerWhat
Where How
Data Storage Optimizer
Data Access Optimizer