Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Fault-tolerant and Transactional Stateful Serverless Workflows
Haoran Zhang, Adney Cardoza, Peter Baile ChenSebastian Angel, Vincent Liu
What is serverless?
DeveloperClient
Cloud
What is serverless?
APIGateway
UserDeveloperClient
Cloud
What is serverless?
WorkerWorkerWorkerWorker
APIGateway
UserDeveloperClient
Cloud
What is serverless?
SharedDatabaseDatabase
WorkerWorkerWorkerWorker
APIGateway
UserDeveloperClient
Cloud
What is serverless?
SharedDatabaseDatabase
WorkerWorkerWorkerWorker
APIGateway
UserDeveloperClient
Cloud
X
What is serverless?
SharedDatabaseDatabase
WorkerWorkerWorkerWorker
APIGateway
UserDeveloperClient
Cloud
X
What is serverless?
SharedDatabaseDatabase
WorkerWorkerWorkerWorker
APIGateway
UserDeveloperClient
Cloud
X
What is serverless?
SharedDatabaseDatabase
WorkerWorkerWorkerWorker
APIGateway
UserDeveloperClient
Cloud
X
What is serverless?
Workers can fail!
How could serverless go wrong?
End
Write(“a,endees”,N+1)
N=Read(“a,endees”)
StartSendRequest
CloudClient
End
Write(“a,endees”,N+1)
N=Read(“a,endees”)
Start
ReceiveError/Timeout
SendRequest
CloudClient
How could serverless go wrong?
End
Write(“a,endees”,N+1)
N=Read(“a,endees”)
Start
ShouldIRetry?
ReceiveError/Timeout
SendRequest
CloudClient
How could serverless go wrong?
End
Write(“a,endees”,N+1)
N=Read(“a,endees”)
Start
ShouldIRetry?
ReceiveError/Timeout
SendRequest
CloudClient
How could serverless go wrong?
End
Write(“a,endees”,N+1)
N=Read(“a,endees”)
Start
ShouldIRetry?
RecieveError/Timeout
SendRequest
CloudClient
How could serverless go wrong?
Write Idempotent Functions!
Beldi makes stateful serverless functions idempotent automatically!
Outline
• Beldi’s Infrastructure• Linked DAAL• Invocation with exactly-once semantics• Evaluation• Conclusion
Worker BeldiRun.me Storage
End
Write(“a7endees”,N+1)
N=Read(“a7endees”)
Start
Beldi’s architecture
Worker BeldiRun.me Storage
End
Write(“a7endees”,N+1)
N=Read(“a7endees”)
Start
DatabaseAPI
Beldi’s architecture
Worker BeldiRun.me Storage
End
Write(“a7endees”,N+1)
N=Read(“a7endees”)
Start
DatabaseAPI
Invoca.onAPI
Beldi’s architecture
Worker BeldiRun.me Storage
End
Write(“a7endees”,N+1)
N=Read(“a7endees”)
Start
DatabaseAPI
Transac.onAPI
Invoca.onAPI
Beldi’s architecture
Worker BeldiRun.me Storage
End
Write(“a7endees”,N+1)
N=Read(“a7endees”)
Start InstanceId Done
DatabaseAPI
Transac.onAPI
Invoca.onAPI
Beldi’s architecture
Worker BeldiRun.me Storage
End
Write(“a7endees”,N+1)
N=Read(“a7endees”)
Start
Key Value
a7endees 10
InstanceId Done
DatabaseAPI
Transac.onAPI
Invoca.onAPI
Beldi’s architecture
Worker BeldiRun.me Storage
End
Write(“a7endees”,N+1)
N=Read(“a7endees”)
Start
Key Value
a7endees 10
InstanceId Done
Opera.on Value
DatabaseAPI
Transac.onAPI
Invoca.onAPI
Beldi’s architecture
Worker BeldiRun.me Storage
End
Write(“a7endees”,N+1)
N=Read(“a7endees”)
Start
Key Value
a7endees 10
InstanceId Done
Opera.on Value
DatabaseAPI
Beldi’s architecture
Worker BeldiRun.me Storage
End
Write(“a7endees”,N+1)
N=Read(“a7endees”)
Start
Key Value
a7endees 10
InstanceId Done
d78590e False
Opera.on Value
DatabaseAPI
Beldi’s architecture
Worker BeldiRun.me Storage
End
Write(“a7endees”,N+1)
N=Read(“a7endees”)
Start
Key Value
a7endees 10
InstanceId Done
d78590e False
Opera.on Value
DatabaseAPI
ProgressLambda
Beldi’s architecture
Worker BeldiRun.me Storage
End
Write(“a7endees”,N+1)
N=Read(“a7endees”)
Start
Key Value
a7endees 10
InstanceId Done
d78590e False
Opera.on Value
DatabaseAPI
ProgressLambda
Beldi’s architecture
Worker BeldiRun.me Storage
End
Write(“a7endees”,N+1)
N=Read(“a7endees”)
Start
Key Value
a7endees 10
InstanceId Done
d78590e False
Opera.on Value
d78590e-1 10
DatabaseAPI
ProgressLambda
Beldi’s architecture
Worker BeldiRun.me Storage
End
Write(“a7endees”,N+1)
N=Read(“a7endees”)
Start
Key Value
a7endees 10
InstanceId Done
d78590e False
Opera.on Value
d78590e-1 10
DatabaseAPI
ProgressLambda
Beldi’s architecture
Beldi’s architecture
Worker BeldiRun.me Storage
End
Write(“a7endees”,N+1)
N=Read(“a7endees”)
Start
Key Value
a7endees 10
InstanceId Done
d78590e False
Opera.on Value
d78590e-1 10
DatabaseAPI
ProgressLambda
Worker BeldiRun.me Storage
End
Write(“a7endees”,N+1)
N=Read(“a7endees”)
Start Key Value
a7endees 11
Opera.on Value
d78590e-1
d78590e-2
10
DatabaseAPI
①
Beldi’s architecture
Worker BeldiRun.me Storage
End
Write(“a7endees”,N+1)
N=Read(“a7endees”)
Start Key Value
a7endees 11
Opera.on Value
d78590e-1
d78590e-2
10
DatabaseAPI
②
①
Beldi’s architecture
Worker BeldiRun.me Storage
End
Write(“a7endees”,N+1)
N=Read(“a7endees”)
Start Key Value
a7endees 11
Opera.on Value
d78590e-1
d78590e-2
10
DatabaseAPI
②
①
Beldi’s architecture
Worker BeldiRun.me Storage
End
Write(“a7endees”,N+1)
N=Read(“a7endees”)
Start Key Value
a7endees 11
Opera.on Value
d78590e-1
d78590e-2
10
DatabaseAPI
②
①
Beldi’s architecture
Problem: ➀ and ➁ must be done atomicallySolution: Collocate write log with the data!
Worker BeldiRun.me Storage
End
Write(“a7endees”,N+1)
N=Read(“a7endees”)
Start
Key Value RecentWrites
a7endees 10
InstanceId Done
d78590e False
Opera.on Value
d78590e-1 10
DatabaseAPI
ProgressLambda
Beldi’s architecture
Worker BeldiRun.me Storage
End
Write(“a7endees”,N+1)
N=Read(“a7endees”)
Start
Key Value RecentWrites
a7endees [d78590e-2]11
InstanceId Done
d78590e False
Opera.on Value
d78590e-1 10
DatabaseAPI
ProgressLambda
Beldi’s architecture
Worker BeldiRun.me Storage
End
Write(“a7endees”,N+1)
N=Read(“a7endees”)
Start
Key Value RecentWrites
a7endees [d78590e-2]11
InstanceId Done
d78590e False
Opera.on Value
d78590e-1 10
DatabaseAPI
ProgressLambda
GarbageCollector
Beldi’s architecture
Technical Challenges
1. Limitation of databases
2. Federated setup
3. Transactions across multiple lambdas
Key Value RecentWrites
a1endees [d78590e-1,d78590e-2,…,d78590e-1000]10
Limitation of databases
Solution: spread the log for a given keyacross multiple rows
NextRowRowId
f9cec2e
Key Value RecentWrites
a:endees [d78590e-1001]11
NextRow
f9cec2e
RowId
HEAD
Key Value RecentWrites
a:endees [d78590e-1,d78590e-2,…,d78590e-1000]10
Limitation of databases
RowId Key Value RecentWrites NextRow
RowId Key Value RecentWrites NextRow
HEAD Key Value RecentWrites NextRow
{PrimaryKey
Linked DAAL
How do we traverse to the tail?
RowId Key Value RecentWrites NextRow
RowId Key Value RecentWrites NextRow
HEAD Key Value RecentWrites NextRow
{PrimaryKey
Linked DAAL
RowId Key Value RecentWrites NextRow
RowId Key Value RecentWrites NextRow
HEAD Key Value RecentWrites NextRow
{PrimaryKey
Linked DAAL
RowId Key Value RecentWrites NextRow
RowId Key Value RecentWrites NextRow
HEAD Key Value RecentWrites NextRow
{PrimaryKey
Linked DAAL
RowId Key Value RecentWrites NextRow
RowId Key Value RecentWrites NextRow
HEAD Key Value RecentWrites NextRow
{PrimaryKey
Linked DAAL
RowId Key Value RecentWrites NextRow
RowId Key Value RecentWrites NextRow
HEAD Key Value RecentWrites NextRow
{PrimaryKey
Linked DAAL
Solution: Use scan and projection todownload a skeleton version of Linked DAAL
RowId Key Value RecentWrites NextRow
RowId Key Value RecentWrites NextRow
HEAD Key Value RecentWrites NextRow
{PrimaryKey
Linked DAAL
RowId Key Value RecentWrites NextRow
RowId Key Value RecentWrites NextRow
HEAD Key Value RecentWrites NextRow
{PrimaryKey
Linked DAAL
RowId Key Value RecentWrites NextRow
RowId Key Value RecentWrites NextRow
HEAD Key Value RecentWrites NextRow
{PrimaryKey
RowId NextRow
RowId NextRow
HEAD NextRow 256Bits
Linked DAAL
RowId Key Value RecentWrites NextRow
RowId Key Value RecentWrites NextRow
HEAD Key Value RecentWrites NextRow
{PrimaryKey
RowId NextRow
RowId NextRow
HEAD NextRow 256Bits
Linked DAAL
Outline
• Beldi’s Infrastructure• Linked DAAL• Invocation with exactly-once semantics• Evaluation• Conclusion
Invocation with exactly-once semantics
CallLambda2
Lambda1 Lambda2
CallLambda2
Opera.on Callee
d78590e-1 b97bbe0
Lambda1 Lambda2
Invocation with exactly-once semantics
InstanceId Done
b97bbe0 False
LoginProgressTableCallLambda2
Opera=on Callee
d78590e-1 b97bbe0
Lambda1 Lambda2
Invocation with exactly-once semantics
InstanceId Done
b97bbe0 False
makesomewrites
LoginProgressTableCallLambda2
Opera?on Callee
d78590e-1 b97bbe0
Lambda1 Lambda2
Invocation with exactly-once semantics
InstanceId Done
b97bbe0 True
MarkasDone
makesomewrites
LoginProgressTableCallLambda2
Opera@on Callee
d78590e-1 b97bbe0
Lambda1 Lambda2
Invocation with exactly-once semantics
InstanceId Done
b97bbe0 True
MarkasDone
makesomewrites
LoginProgressTableCallLambda2
Opera@on Callee
d78590e-1 b97bbe0
Lambda1 Lambda2
X
Invocation with exactly-once semantics
InstanceId Done
b97bbe0 True
MarkasDone
makesomewrites
LoginProgressTableCallLambda2
Opera@on Callee
d78590e-1 b97bbe0
X
Lambda1 Lambda2
X
Invocation with exactly-once semantics
InstanceId Done
b97bbe0 True
CallLambda2
Opera:on Callee
d78590e-1 b97bbe0
Lambda1 Lambda2
Invocation with exactly-once semantics
InstanceId Done
b97bbe0 True
MarkasDone
makesomewrites
LoginProgressTableCallLambda2
Opera@on Callee
d78590e-1 b97bbe0
Lambda1 Lambda2
Invocation with exactly-once semantics
InstanceId Done
b97bbe0 True
MarkasDone
makesomewrites
LoginProgressTableCallLambda2
Opera@on Callee
d78590e-1 b97bbe0
Lambda1 Lambda2
ReceiveResponse
Invocation with exactly-once semantics
InstanceId Done
b97bbe0 True
GC
CallLambda2
Opera;on Callee
d78590e-1 b97bbe0
Lambda1 Lambda2
Invocation with exactly-once semantics
InstanceId Done
GC
CallLambda2
Opera6on Callee
d78590e-1 b97bbe0
Lambda1 Lambda2
Invocation with exactly-once semantics
InstanceId Done
b97bbe0 False
makesomewrites
LoginProgressTableCallLambda2
Opera?on Callee
d78590e-1 b97bbe0
Lambda1 Lambda2
Invocation with exactly-once semantics
CallLambda2
Opera.on Callee Result
d78590e-1 b97bbe0
Lambda1 Lambda2
Invocation with exactly-once semantics
makesomewrites
LoginIntentTableCallLambda2
Opera8on Callee Result
d78590e-1 b97bbe0
Lambda1 Lambda2
Invocation with exactly-once semantics
Callback
makesomewrites
LoginIntentTableCallLambda2
Opera9on Callee Result
resultd78590e-1 b97bbe0
Lambda1 Lambda2
Invocation with exactly-once semantics
MarkasDone
Callback
makesomewrites
LoginIntentTableCallLambda2
Opera;on Callee Result
resultd78590e-1 b97bbe0
Lambda1 Lambda2
X
Invocation with exactly-once semantics
Outline
• Beldi’s Infrastructure• Linked DAAL• Invocation with exactly-once semantics• Evaluation• Conclusion
Evaluation
1. What are the costs of Beldi’s API operations?
2. How does Beldi perform in real-world applications?
3. What is the effect of garbage collection?
What are the costs of Beldi’s API operations?
20 rows in Linked DAAL, 2 - 4x more expensive than baseline
��
���
���
���
���
���
���
�� ��� ���� ��� �����
���
���
���
����������������������� ��!���"��
How does Beldi perform in real-worldapplications?
Frontend
Search
Reserve
User
Profile
Geo
Rate
Reserve Flight
RecommendClient
Reserve Hotel
DeathStarBench (ASPLOS 19): open-source microservices benchmark• Movie review service (Cf. IMDB)• Travel reservation (Cf. Expedia)• Social media site (Cf. Twitter)
How does Beldi perform in real-worldapplications?
��
����
�����
�����
�����
�����
�� ���� ���� ���� ���� ���� ���� ���
��
���
���
��
������������� �� ���� �����
��� !� ������� !� �""�� �!����� �!�""�
<400 req/s:2× higher thanbaseline
700 req/s(saturation):3.3 × higher thanbaseline
Outline
• Beldi’s Infrastructure• Linked DAAL• Invocation with exactly-once semantics• Evaluation• Conclusion
Conclusion
1. A framework to write transactional and fault-tolerant applicationson serverless.
2. A lock-free data structure (Linked DAAL) to support fast logging andexactly-once semantics
3. A collaborative distributed transaction protocol across multiple lambdas
4. An efficient garbage collection algorithm that runs independently without affecting running lambdas or requiring any pauses.
https://github.com/eniac/beldi
Thank you!