55
Un able managing “disasters” without loosing your cool @eleddy

Ungooglable

Embed Size (px)

DESCRIPTION

This talk covers a basic methodology for finding and fixing problems in a live system. It covers general techniques for finding the source of issues quickly, workarounds, patching, digging into code, when and how to get help.

Citation preview

Page 1: Ungooglable

Un ablemanaging “disasters” without loosing your cool

@eleddy

Page 2: Ungooglable

DeveladminisystematorsThis talk is for the

who have to

constantly deal with UNKNOWNS

Page 3: Ungooglable
Page 4: Ungooglable

‣ Know thy system

‣ Know thy tools

‣ Know thy neighbors

ThreeCommands

Page 5: Ungooglable
Page 6: Ungooglable

Stairway to Freedom

Prepare

Isolate

Damage Control

Diagnose

Patch

Clean

Fix

Document

Horizon of Intervention

Page 7: Ungooglable

Communicate

Prepare Isolate Control Diagnose Patch Clean Fix Document

Dear Magic Makers -

As some of you may already know, customers are experiencing troubles retrieving their historical records because our archive server is not responding. I am investigating the issue now and will send an update in 20 minutes.

Please fence calls in the meanwhile. If someone can please get me a redbull and some nacho cheese corn nuts in the meanwhile, that would be stellar.

Thanks!

coworkers

Mayday! High Priority

bossman

Page 8: Ungooglable

Prepare for the Worst

‣ Backups

‣ Local Data.fs

‣ Set a time limit

Prepare Isolate Control Diagnose Patch Clean Fix Document

Page 9: Ungooglable

Disable Interference Disabled all backups and packing

Opened up port 8080 to outside network

Moved logs to temporary disk

Prepare Isolate Control Diagnose Patch Clean Fix Document

Page 10: Ungooglable

Isolation by Elimination

Prepare Isolate Control Diagnose Patch Clean Fix Document

Network Hardware Software Data

works for me

obvious, sporadic crazy shit

everything else

not recreatable

locally

Page 11: Ungooglable

Isolation by Elimination

Prepare Isolate Control Diagnose Patch Clean Fix Document

Network Hardware Software Data

works for me

obvious, sporadic crazy shit

everything else

not recreatable

locally

Page 12: Ungooglable

Isolation by Elimination

Prepare Isolate Control Diagnose Patch Clean Fix Document

Network Hardware Software Data

works for me

obvious, sporadic crazy shit

everything else

not recreatable

locally

Page 13: Ungooglable

Isolation by Elimination

Prepare Isolate Control Diagnose Patch Clean Fix Document

Network Hardware Software Data

works for me

obvious, sporadic crazy shit

everything else

not recreatable

locally

Page 14: Ungooglable

Isolation by Elimination

Prepare Isolate Control Diagnose Patch Clean Fix Document

Network Hardware Software Data

works for me

obvious, sporadic crazy shit

everything else

not recreatable

locally

Page 15: Ungooglable

Zopesplosion 3000 Architecture

Apache

Varnish

HAProxy

CDN

APIs

Zope

Zope

Zope

Zope

Zope

Zope

Zope MySQL

MongoDB

SPARQL

WTF mate

ZEO 1-4

ZEO 5-8

ZEO 9-12

Prepare Control Diagnose Patch Clean Fix DocumentIsolate

Page 16: Ungooglable

Zopesplosion 3000 Architecture

Apache

Varnish

HAProxy

CDN

APIs

Zope

Zope

Zope

Zope

Zope

Zope

Zope MySQL

MongoDB

SPARQL

ZEO 1-4

ZEO 5-8

ZEO 9-12

Prepare Control Diagnose Patch Clean Fix DocumentIsolate

?

Page 17: Ungooglable

Zopesplosion 3000 Architecture

Apache

Varnish

HAProxy

CDN

APIs

Zope

Zope

Zope

Zope

Zope

Zope

Zope MySQL

MongoDB

SPARQL

ZEO 1-4

ZEO 5-8

ZEO 9-12

Prepare Control Diagnose Patch Clean Fix DocumentIsolate

? ?

Page 18: Ungooglable

Zopesplosion 3000 Architecture

Apache

Varnish

HAProxy

CDN

APIs

Zope

Zope

Zope

Zope

Zope

Zope

Zope MySQL

MongoDB

SPARQL

ZEO 1-4

ZEO 5-8

ZEO 9-12

Prepare Control Diagnose Patch Clean Fix DocumentIsolate

?

Page 19: Ungooglable

Machine BMachine A

How Zeo Cache Works

Zope Mem. Cache

Zeo

Page 20: Ungooglable

Machine BMachine A

How Zeo Cache Works

Zope Mem. Cache

Zeo

I Want X

Page 21: Ungooglable

Machine BMachine A

How Zeo Cache Works

Zope Mem. Cache

Zeo

I Want XI Need X

Page 22: Ungooglable

Machine BMachine A

How Zeo Cache Works

Zope Mem. Cache

Zeo

I Want XI Need X

X

Page 23: Ungooglable

Machine BMachine A

How Zeo Cache Works

Zope Mem. Cache

Zeo

I Want XI Need X

X

X

Page 24: Ungooglable

Machine BMachine A

How Zeo Cache Works

Zope Mem. Cache

Zeo

I Want XI Need X

XX

X

Page 25: Ungooglable

Machine BMachine A

How Zeo Cache Works

Zope Mem. Cache

Zeo

I Want XI Need X

XX

XModified X

Page 26: Ungooglable

Machine BMachine A

How Zeo Cache Works

Zope Mem. Cache

Zeo

I Want XI Need X

XX

XModified X

Page 27: Ungooglable

Machine BMachine A

How Zeo Cache Works

Zope Mem. Cache

Zeo

I Want XI Need X

XX

XModified X

‘ Modified X

Page 28: Ungooglable

Machine BMachine A

Machine BMachine A

How Zeo Cache Works

Zope Mem. Cache

Zeo

I Want XI Need X

XX

XModified X

‘ Modified X

Zope Disk Cache

Zeo

I Want X

X

XModified X

‘ RESTART

Inconsistent State!

Page 29: Ungooglable

Zopesplosion 3000 Architecture

Apache

Varnish

HAProxy

CDN

APIs

Zope

Zope

Zope

Zope

Zope

Zope

Zope MySQL

MongoDB

SPARQL

ZEO 1-4

ZEO 5-8

ZEO 9-12

Prepare Control Diagnose Patch Clean Patch DocumentIsolate

Hot damn!

Page 30: Ungooglable

Take time to make time

‣ Minimize customer angst

‣ Hang out in custom

‣ Acquisition is your friend

‣ Remember request and response

Prepare Control Diagnose Patch Clean Fix DocumentIsolate

Page 31: Ungooglable

Prepare Control Diagnose Patch Clean Fix DocumentIsolate

Page 32: Ungooglable

Unique or Just Not Obvious?

‣ Zope, zeo, system logs

‣ System stats/monitoring

Prepare Isolate Control Diagnose Patch Clean Fix Document

Page 33: Ungooglable

Test Case

Prepare Isolate Control Diagnose Patch Clean Fix Document

Sarcoidosis!

Probably not...

EstimateFix Time

+

Page 34: Ungooglable

Horizon of Intervention

Prepare Isolate Control Diagnose Patch Clean Fix Document

Can I handle this problem?

Can I do it in a timely manner?

Yes

IRC Plone-users

Yes

NONO

Friends Colleagues

Page 35: Ungooglable

Front End Errors

Take the performance hit

Disable the malfunctioning piece

Prepare Isolate Control Diagnose Patch Clean Fix Document

Page 36: Ungooglable

temporary patch

Prepare Isolate Control Diagnose Patch Clean Fix Document

full patch

Page 37: Ungooglable

Have I mentioned theimportance of

Prepare Isolate Control Diagnose Patch Clean Fix Document

BACKUPSworking with

yet?

Especially when unfucking data...

Page 38: Ungooglable

Clean up

Prepare Isolate Control Diagnose Patch Clean Fix Document

Disabled all backups and packing

Opened up port 8080 to outside network

Moved logs to temporary disk

Disabled zopes 5-10

Page 39: Ungooglable

Clean up

Prepare Isolate Control Diagnose Patch Clean Fix Document

Disabled all backups and packing

Opened up port 8080 to outside network

Moved logs to temporary disk

Disabled zopes 5-10

Page 40: Ungooglable

Prepare Isolate Control Diagnose Patch Clean Fix Document

Delete extra/bad files

Scripts in version control

Communicate

Clean up

Page 41: Ungooglable

Prepare Isolate Control Diagnose Patch Clean Fix Document

I’ve got a fever, and the only solution... is

MORE PATCH!

Page 42: Ungooglable

‣ Update/Close Tickets

‣ Integrate Test Cases

‣ Document Processes

Prepare Isolate Control Diagnose Patch Clean Fix Document

Page 43: Ungooglable

Handling Data Errors

Prepare Isolate Control Diagnose Patch Clean Fix Document

Network Hardware Software Data

works for me

obvious, sporadic crazy shit

everything else

not recreatable

locally

Page 44: Ungooglable

Handling Data Errors

Prepare Isolate Control Diagnose Patch Clean Fix Document

Network Hardware Software Data

works for me

obvious, sporadic crazy shit

everything else

not recreatable

locally

Page 45: Ungooglable

Handling Data Errors

Prepare Isolate Control Diagnose Patch Clean Fix Document

Network Hardware Software Data

works for me

obvious, sporadic crazy shit

everything else

not recreatable

locally

Page 46: Ungooglable

Handling Data Errors

Prepare Isolate Control Diagnose Patch Clean Fix Document

Network Hardware Software Data

works for me

obvious, sporadic crazy shit

everything else

not recreatable

locally

Page 47: Ungooglable

Handling Data Errors

Prepare Isolate Control Diagnose Patch Clean Fix Document

Network Hardware Software Data

works for me

obvious, sporadic crazy shit

everything else

not recreatable

locally

Page 48: Ungooglable

Prepare Isolate Control Diagnose Patch Clean Fix Document

Page 49: Ungooglable

How Data is Stored

Plone

root (app)

NewsMembers Events

acl_users

acl_users

users roles

users roles

news.2010.09.08 news.2010.06.13

Prepare Isolate Control Diagnose Patch Clean Fix Document

temp_folder

Page 50: Ungooglable

The Basics

Prepare Isolate Control Diagnose Patch Clean Fix Document

‣ ./bin/instance debug

‣ app

‣ dir, __dict__

Page 51: Ungooglable

Direct Connect>>> from ZODB.FileStorage import FileStorage>>> from ZODB.DB import DB>>> storage = FileStorage('var/filestorage/Data.fs')>>> db = DB(storage)>>> connection = db.open()>>> root = connection.root()

Prepare Isolate Control Diagnose Patch Clean Fix Document

>>> from ZEO import ClientStorage>>> from ZODB import DB>>> address = '10.0.1.5', 8001>>> db = DB(storage)>>> connection = db.open()>>> root = connection.root()

>>> root[‘app’] = PloneSite()>>> root[‘status’] = ‘Running’

Page 52: Ungooglable

Prepare Isolate Control Diagnose Patch Clean Fix Document

>>> import transaction

>>> del app.Plone.news[‘news-item-id’]

>>> transaction.commit()

Page 53: Ungooglable

_p_changed

Prepare Isolate Control Diagnose Patch Clean Fix Document

Page 54: Ungooglable

When in doubt...

‣ PDB is your friend

‣ The source is your friend

‣ Throw a party for your friends

Prepare Isolate Control Diagnose Patch Clean Fix Document

Page 55: Ungooglable

‣ Know your System

‣ Understand the Tools

‣ Be Nice to your Neighbors