46
Causality-Based Versioning Kiran-Kumar Muniswamy-Reddy and David A. Holland Slides By Authors And Aleatha Parker-Wood Tuesday, June 1, 2010

Causality Based Versioning

Embed Size (px)

DESCRIPTION

Slides for CMPS229

Citation preview

Page 1: Causality Based Versioning

Causality-Based VersioningKiran-Kumar Muniswamy-Reddy and David A. HollandSlides By Authors And Aleatha Parker-Wood

Tuesday, June 1, 2010

Page 2: Causality Based Versioning

Versioning

• Already popular

• Saves back up “versions” of files as they change

• Two flavors: versioning (event based) and snapshotting (time based)

• Snapshots: WAFL, Venti...

• Versioning: Elephant, VersionFS...

Tuesday, June 1, 2010

Page 3: Causality Based Versioning

Why Version/Snapshot?

• Disaster recovery is baked into the file system

• “Oops, I needed that...”

• “Oops, I didn’t mean to click that virus...”

• “Oops, that new driver patch broke everything...”

• Maintains backup files to which you can recover (without going offsite)

Tuesday, June 1, 2010

Page 4: Causality Based Versioning

Causality

• Depends on time (to cause Y, X must be before it)

• Uni-directional (If X causes Y, Y cannot cause X)

• Defined in terms of data flow

• A reads B ⇒ B causes A

• A writes B ⇒ A causes B

• PASS, Intrusion Dectection Systems (BackTracker, Taser...)

Tuesday, June 1, 2010

Page 5: Causality Based Versioning

Why Causality?

• Track propagation of data

• Find out what files were modified by what processes

• Reconstruct the scene of the crime

Tuesday, June 1, 2010

Page 6: Causality Based Versioning

Causality-Based Versioning

• Decide when to version using causal relationships between two files

• Has advantages of versioning file systems or snapshots

• Eases recovery from corruption, viruses, and user mistakes

• In addition, creates causal links between files

• Easier to decide what to restore

• Sort of like transactions on steroids

Tuesday, June 1, 2010

Page 7: Causality Based Versioning

Applications

• Intrusion Recovery

• System configuration management

• IP compliance

• Reproduction of research results

Tuesday, June 1, 2010

Page 8: Causality Based Versioning

A Scenario...

• Apache split-logfile Vulnerability

• Vulnerability in Apache 1.3

• Vulnerability allows attacker to overwrite any file with a .log extension

• Let’s look at the current versioning options...

Tuesday, June 1, 2010

Page 9: Causality Based Versioning

������ !������ ! ������������� ���������� "�#$% !������������� ���������� "�#$% ! 77

$%��$%��

$&��$&��

'$��'$��

''��''��

'()�'()�

����*�+������*�+��

,������-,������-

,������-,������-

*��������������*��������������

#'������#'������

Tuesday, June 1, 2010

Page 10: Causality Based Versioning

������ !������ ! ������������� ���������� "�#$% !������������� ���������� "�#$% ! !!

$%��$%��

$&��$&��

'$��'$��

�'.*�+���'.*�+��

8)��'����8)��'����

'()�'()� *��������������*��������������

����������������

������������������

%���%���

Tuesday, June 1, 2010

Page 11: Causality Based Versioning

������ !������ ! ������������� ���������� "�#$% !������������� ���������� "�#$% ! 5 5

$%��$%��

$&��$&��

'$��'$��

�'.*�+���'.*�+��

�������������� ������������������� �����

�(.*�+���(.*�+��

��.*�+����.*�+��

��/(.*�+����/(.*�+��

��/'.*�+����/'.*�+��

����������������������������

'$���0�!���'$���0�!���

�-��������-�������

Tuesday, June 1, 2010

Page 12: Causality Based Versioning

The Goal

• One of these has too much information

• The other not enough

• Can we leverage causality to create just enough versions?

Tuesday, June 1, 2010

Page 13: Causality Based Versioning

Creating Just Enough Versions

• Building on top of the Provenance Aware Storage System (PASS)

• Two options

• Cycle Avoidance

• Graph Finesse

Tuesday, June 1, 2010

Page 14: Causality Based Versioning

How PASS works

• Translates system calls to provenance records (read/write become edges in a dependency graph)

• Maintains provenance for transient objects such as pipes and processes, and creates virtual objects as needed

• Analyzes to ensure there are no cyclic dependencies between objects

• Causality based versioning extends the analysis phase

Tuesday, June 1, 2010

Page 15: Causality Based Versioning

The big idea

• Cycles are violations of causality

• The creation of a cycle is an indicator that this is an interesting event

• We can prevent cycles by creating a new version every time a cycle is about to occur

Tuesday, June 1, 2010

Page 16: Causality Based Versioning

������ !������ ! ������������� ���������� "�#$% !������������� ���������� "�#$% ! 5!5!

<6��)�� �'������<6��)�� �'������

3 D

���� �

����

�����

����� �

���� �

����

�������2��������� �������2���������

��������!�������������!�����

��������������������������������

Tuesday, June 1, 2010

Page 17: Causality Based Versioning

������ !������ ! ������������� ���������� "�#$% !������������� ���������� "�#$% ! � �

8)�������8)�������

�'�'

))

3 D

������

����

�����

����� �

���� �

����

Tuesday, June 1, 2010

Page 18: Causality Based Versioning

������ !������ ! ������������� ���������� "�#$% !������������� ���������� "�#$% ! �5�5

8)�������8)�������

�'�'

)) 33

�'�'

3 D

���� �

������

�����

����� �

���� �

����

Tuesday, June 1, 2010

Page 19: Causality Based Versioning

������ !������ ! ������������� ���������� "�#$% !������������� ���������� "�#$% ! ����

8)�������8)�������

�'�'

))

�(�(

33

�'�'

3 D

���� �

����

�������

����� �

���� �

����

Tuesday, June 1, 2010

Page 20: Causality Based Versioning

������ !������ ! ������������� ���������� "�#$% !������������� ���������� "�#$% ! �/�/

8)�������8)�������

�'�'

))

�(�( �(�(

33

�'�'

3 D

���� �

����

�����

�������

���� �

����

Tuesday, June 1, 2010

Page 21: Causality Based Versioning

������ !������ ! ������������� ���������� "�#$% !������������� ���������� "�#$% ! �0�0

8)�������8)�������

�'�'

))

�(�( �(�(

33

�'�'

3 D

���� �

����

�����

����� �

������

����

Tuesday, June 1, 2010

Page 22: Causality Based Versioning

������ !������ ! ������������� ���������� "�#$% !������������� ���������� "�#$% ! ����

8)�������8)�������

�'�'

))

�(�( �(�(

33

�'�'

3 D

���� �

����

�����

����� �

���� �

������

4���5�������������������������+4���5�������������������������+

����������������������������������

Tuesday, June 1, 2010

Page 23: Causality Based Versioning

Version-On-Write?

• We could remove cycles using Version-On-Write

• Every read creates a new version of the process

• Every write creates a new version of the file

• But this results in 8 versions

• Huge management overhead

Tuesday, June 1, 2010

Page 24: Causality Based Versioning

Cycle Avoidance Algorithm

• Uses local information about the object

• Create a new version of an object whenever a new ancestor is added

• Different versions are considered to be “new” ancestors

• Not every write causes a new version

Tuesday, June 1, 2010

Page 25: Causality Based Versioning

The Algorithm

• Assume new data: A1 depends on B2

• If B is not in A’s dependencies, create a new version of A

• Else if B is already in A’s dependencies:

• If B2 is in dependencies, discard (no new information)

• If B3 is in dependencies, discard (no new causality)

• If B1 is in dependencies, create new version of A

Tuesday, June 1, 2010

Page 26: Causality Based Versioning

������ !������ ! ������������� ���������� "�#$% !������������� ���������� "�#$% ! �!�!

��'�� �������'���'�� �������'�

�'�'

)()(

�'�'

)')'

3 D

������

����

�����

����� �

���� �

����

Tuesday, June 1, 2010

Page 27: Causality Based Versioning

������ !������ ! ������������� ���������� "�#$% !������������� ���������� "�#$% ! / /

��'�� �������'���'�� �������'�

�'�'

)()(

�(�( �(�(

3(3(

�'�'

)6)6

3 D

���� �

����

�����

����� �

���� �

������

Tuesday, June 1, 2010

Page 28: Causality Based Versioning

������ !������ ! ������������� ���������� "�#$% !������������� ���������� "�#$% ! /5/5

��'�� �������'���'�� �������'�

�'�'

)()(

�(�( �(�(

3(3(

�'�'

)6)6 3636

3 D

���� �

����

�����

����� �

���� �

�����������5������������������������0�����5������������������������0

!����������������������!����������������������

Tuesday, June 1, 2010

Page 29: Causality Based Versioning

Graph Finesse

• As before: A1 depends on B2

• If B2 is already in A’s history, discard

• Otherwise, check for a path from B2 -> A1

• If yes, we have a cycle. Make a new version of A1

• Otherwise, add A1-> B2 to the dependency graph

Tuesday, June 1, 2010

Page 30: Causality Based Versioning

������ !������ ! ������������� ���������� "�#$% !������������� ���������� "�#$% ! /0/0

9��)& "������9��)& "������

�'�'

)')'

�(�( �(�(

3'3'

�'�'

3(3(

3 D

���� �

����

�����

����� �

���� �

������

Tuesday, June 1, 2010

Page 31: Causality Based Versioning

������ !������ ! ������������� ���������� "�#$% !������������� ���������� "�#$% ! /�/�

��'�� �������'���'�� �������'�

�'�'

)')'

�(�( �(�(

3'3'

�'�'

3(3(

�'�'

)()(

�(�( �(�(

3(3(

�'�'

3636)6)6

9��)& "������

7����8����������������������+�7����8����������������������+�

������������������������������������������������������������

����������������������������

Tuesday, June 1, 2010

Page 32: Causality Based Versioning

������ !������ ! ������������� ���������� "�#$% !������������� ���������� "�#$% ! /1/1

��'�� �������'� 9��)& "������

.��� ?�'�� ����� .��� 9��+�� �����

������� � *�� ��

��'������ ��������

������� *����

��������

��� ����� �������

����&���

��� &��� &��& ���

���� ����&����

Tuesday, June 1, 2010

Page 33: Causality Based Versioning

Evaluation

• Run-time overhead

• Space overhead

• Recovery costs

• All results are average of 5 runs

• Less than 5% standard deviation

Tuesday, June 1, 2010

Page 34: Causality Based Versioning

Workloads used

• Linux compile (CPU intensive)

• Postmark (I/O intensive)

• Applying patches with Mercurial (developer workload)

• blast protein-sequencing (scientific workload)

Tuesday, June 1, 2010

Page 35: Causality Based Versioning

Algorithms used

• Without causal data:

• Ext2: Baseline (Lasagna, Harvard’s versioning FS, on top of ext2)

• VER: Plain open-close versioning

• With causal data

• OC: Open-close

• CA: Cycle-Avoidance

• GF: Graph Finesse

• ALL: version on every write

Tuesday, June 1, 2010

Page 36: Causality Based Versioning

������ !������ ! ������������� ���������� "�#$% !������������� ���������� "�#$% ! 0�0�

?���6 ���)���= <��)��� $���?���6 ���)���= <��)��� $���

''+&:'B+': '%+6: ('+6:

;B+C:

$

;$$

'$$$

';$$

($$$

(;$$

6$$$

<�( � = 4� �� 78 �>>

�����?�@

,���

A���

������

Tuesday, June 1, 2010

Page 37: Causality Based Versioning

������ !������ ! ������������� ���������� "�#$% !������������� ���������� "�#$% ! 0707

?���6 ���)���= #)�'� 8���&����?���6 ���)���= #)�'� 8���&����

(+&:';+%: 'B+D: ';+%:

'('+D:

$+$

$+;

'+$

'+;

(+$

(+;

6+$

<�( � = 4� �� 78 �>>

������?7�@

Tuesday, June 1, 2010

Page 38: Causality Based Versioning

������ !������ ! ������������� ���������� "�#$% !������������� ���������� "�#$% ! �5�5

���'����� �'������= <��)��� $������'����� �'������= <��)��� $���

(;+&: (%+%: (B+&:

%&+D:

D'+6:

$+$

($$+$

C$$+$

D$$+$

%$$+$

'$$$+$

'($$+$

'C$$+$

<�( � = 4� �� 78 �>>

����?�@

,���

A���

������

Tuesday, June 1, 2010

Page 39: Causality Based Versioning

������ !������ ! ������������� ���������� "�#$% !������������� ���������� "�#$% ! �0�0

���'����� �'������= #)�'� 8���&�������'����� �'������= #)�'� 8���&����

(D+D: 6'+D: 6$+(: 6'+&:

;6+B:

$+$

$+(

$+C

$+D

$+%

'+$

'+(

'+C

<�( � = 4� �� 78 �>>

������?7�@

Tuesday, June 1, 2010

Page 40: Causality Based Versioning

������ !������ ! ������������� ���������� "�#$% !������������� ���������� "�#$% ! ����

��'����� ��'&���,���'����� ��'&���,�

� ��� �&� �������&�� )��*��� �� �&� �'������ ��� �&� �������&�� )��*��� �� �&� �'������

�&��� �)�� '���� �� ��� ��**�'�����&��� �)�� '���� �� ��� ��**�'����

� ��'��+��'&���,��'��+��'&���,� ������ �&� �)�'&� �)������ �'������������ �&� �)�'&� �)������ �'������

Tuesday, June 1, 2010

Page 41: Causality Based Versioning

������ !������ ! ������������� ���������� "�#$% !������������� ���������� "�#$% ! �1�1

��'����� ��'����'&���,��'����� ��'����'&���,

))

����������

33�� ��

��������

������������������

��������

Tuesday, June 1, 2010

Page 42: Causality Based Versioning

������ !������ ! ������������� ���������� "�#$% !������������� ���������� "�#$% ! �4�4

��'����� ��'��+��'&���,= #)�'� .������'����� ��'��+��'&���,= #)�'� .����

������ ���� ������� ����

8� 1 � 5��

�� 541� 04 ���

9" 570� 04 ���

�?? 41�!� 5�!49

Tuesday, June 1, 2010

Page 43: Causality Based Versioning

������ !������ ! ������������� ���������� "�#$% !������������� ���������� "�#$% ! �7�7

��'����� $������'����� $����

$

;

'$

';

($

(;

6$

=��!�� �' =��!�� �; =����� �&

=����������

���?�@

��

78

Tuesday, June 1, 2010

Page 44: Causality Based Versioning

������ !������ ! ������������� ���������� "�#$% !������������� ���������� "�#$% ! �!�!

��'����� $������'����� $����

(;+'-

'B+&-

&+6-

$

'$$

($$

6$$

C$$

;$$

D$$

B$$

%$$

=��!�� �' =��!�� �; =����� �&

=����������

�?�@

��

78

�>>

Tuesday, June 1, 2010

Page 45: Causality Based Versioning

Conclusions

• Both algorithms require less time and space than Version-On-Write

• Both algorithms offer finer grained control than Open-Close

• Graph-Finesse creates fewer unnecessary versions

• Cycle-Avoidance has overhead comparable to Open-Close

Tuesday, June 1, 2010

Page 46: Causality Based Versioning

Expanding on it

• Not just good for disaster recovery

• Search

• Social network analysis

Tuesday, June 1, 2010