46
#engageug Back from the Dead: When Bad Code Kills a Good Server Engage User Group Conference, Eindhoven March 2016 Serdar Basegmez - Developi - @serdar_basegmez William Malchisky Jr. - ESS - @BillMalchisky

Engage 2016: Back From the Dead: How Bad Code Kills a Good Server

Embed Size (px)

Citation preview

#engageug

BackfromtheDead:WhenBadCode

KillsaGoodServerEngageUserGroupConference,Eindhoven

March2016

SerdarBasegmez-Developi-@serdar_basegmezWilliamMalchiskyJr.-ESS-@BillMalchisky

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 2

• Preface

• ChapterI-TheBeginning

• Chapter2-SearchingforClues

• Chapter3-CreaSngaSolidPlaTorm

• Chapter4-TheSoUsideofPerformanceGains

• TheFinalChapter-Results

OurStoryinFortyMinutes

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 3

"Ladies and Gentlemen. The story you are about to see is true; the names have been changed to

protect the innocent." --Dragnet

For example... Acme Corporation is now referred to as Acme, Inc.

Disclaimer

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 4

• Whatwewillcover

• Problemanalysis

• TroubleshooSngskills

• BestpracSces

• TheperformanceimpactofsubopSmalapplicaSons

• Whatweomi[ed

• Boring,rambling,dry,lectures

• Uselessdrivel

Se^ngExpectaSons

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 5

• Preface

• ChapterI-TheBeginning

• Chapter2-SearchingforClues

• Chapter3-CreaSngaSolidPlaTorm

• Chapter4-TheSoUsideofPerformanceGains

• TheFinalChapter-Results

OurStoryinFortyMinutes

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 6

• "We'rehavingaproblem.Canyouhelp?"

• "Absolutely.What'shappening?"

• "OurmissioncriScalDBisreally$%&@#$^&ourusers.It'swaytooslow.IttakeslessSmetoreboot[Windows3.1onani386with32MBRAM]thantoopenadocument."

• "Anyideawhatchanged?"

• "Wedon'tknow.Wehavenottouchedthebox."

CustomerCalls

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 7

• LackofexperSseand/orknowledge

• Unplannedand/orunexpectedexpansion

• NodedicatedAdministrator

• Nochangemanagement

• Nomonitoring

• Workaroundoverloading

WhyDominoServersFail?

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 8

• Preface

• ChapterI-TheBeginning

• Chapter2-SearchingforClues

• Chapter3-CreaSngaSolidPlaTorm

• Chapter4-TheSoUsideofPerformanceGains

• TheFinalChapter-Results

OurStoryinFortyMinutes

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 9

• WhilewaiSngforaccess...requestthefollowing

• HelpsestablishthelevelofcriScality

"Round Up the Usual Suspects"

notes.ini log.sfsh tasks top vmstat iosysdf -h User to server ping results

mount swapon -sServer NAB DB copy, sans users

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 10

malchw@san-domino:~$ iostat

Linux 3.13.0-83-generic (san-domino) 03/23/2016 _x86_64_ (8 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle

6.21 0.25 3.69 0.51 0.00 89.34

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn

sda 45.34 2075.44 778.25 6028264 2260469

sdb 0.36 1.52 0.03 4422 80

dm-0 24.51 117.04 186.80 339957 542584

dm-1 16.17 415.61 79.82 1207173 231836

dm-2 17.64 1540.92 511.61 4475713 1485996

malchw@san-domino:~$ vmstat

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----

r b swpd free buff cache si so bi bo in cs us sy id wa st

1 0 0 16943764 153144 7941660 0 0 262 98 144 681 6 4 89 1 0

QuickExample-iostat,vmstat

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 11

• RunDCT-returnedafewitems,butnothingapplicabletotheperformanceissueexperienced

• CheckDominostats

• Locatedakeyissue-needleinhaystack

• SAIfluctuatedwildly,frequently,plummeSngto18%forminutesonend

• LocateanyrecentNSDfilesforanalysis

Data,DataEverywhere

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 12

• Watchtheserverwhennobodyelsedoes

• Lotsofstrangethingshappenonserversovernight

• Observedthesystemprocessingoveronemillionrecordsin:15twiceaweek,atdifferentSmes

• Forexample…nooneatAcme,Inc.knewthisoccurredorwhy

ProTiponDataCollecSon

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 13

• Swapspace50%ofinstalledmemory

• Memorywasunder1GBformissioncriScalserver

• SeveralkeyDBscontained100k+docs

• CombinaSoncreatedpagefaulSngplaguefurthererodingperformance

• Systemproperlypatched

• Freespaceadequate

IniSalDataAnalysis-OS

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 14

• Obviousbutimportantdatapoints

• Serverlayout

• Whereitemslocated

• Recognizedserver.idfile

• Servertasks

• Contrasttoshtasksrequestedearlier

• Noobviousproblems

IniSalDataAnalysis-Notes.ini

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 15

• Agentsrunningallhoursofthenightandday

• AgentsrunningfromDBsacSvelybeingcompacted

• AgentsrunningfromDBswhenupdallandfixuprunning

• Notallscheduledagentsneededtorunallweekend

IniSalDataAnalysis-Amgr

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 16

• CompactsSllrunningwhenupdallProgramfires-off

• CompactneverfinishedbeforeexecuSonSmeceilinghit

• LeUlargestDBsinacompletelysubopSmalstate

• Connectedtoserversthatdidnotexist

• ScheduledreplicaSondocuments

• SignificantdelayswithreplicasynchronizaSon

• Ensureddataneverproperlysynchronizedacrossdomain

• CertainconnecSondocumentsonlycoveredtwoDBs

IniSalDataAnalysis-Log.sf

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 17

• SeveralbigDBslastfixupcompletedtwoyearsago

• Mostheavilyusedfiles30-75%Used

• Manyviewsmeansclickingoneforcesanewindexbuild

• Nodesign,document,ora[achmentcompression

• DesignservertaskciSngnon-existenttemplates

IniSalDataAnalysis-DBs

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 18

• Preface

• ChapterI-TheBeginning

• Chapter2-SearchingforClues

• Chapter3-CreaSngaSolidPlaTorm

• Chapter4-TheSoUsideofPerformanceGains

• TheFinalChapter-Results

OurStoryinFortyMinutes

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 19

• Swapspace-Nosetrulethesedays

• 1.5x-2.0xRAMisgoodruleofthumb

• Memory-4GBperprocessoronbusyservers

• VMwarese^ngsifavailable

• AvoidtemptaSonoftoomanyprocessors

• ReviewparSSonsandfreespace

Tier1-OS

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 20

• CheckthatpreviousmadesystemchangessSck

• Unfamiliarserverscanexhibitoddbehavior

• CheckTechnotesforanyrecentperformanceissues

• OnceOSisworking,checktoensurethatvirtualizaSonisopSmal

AddiSonalOSConsideraSons

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 21

• SpaceproperlyProgramDocuments

• AvoidoverlapwithagentsandotherPrograms

• Pauseagentscheduleduringmaintenance

• Scheduleaweekendtocompletefirstfullmaintenance

• Firstfullcompactwilltakemuchlongerthanyourealize

• Createmaintenancescheduleoftasksagreedtobybusinesslinemanagers

• Ensuresallneededjobsareavailablewhenneeded

Tier2-Domino

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 22

• ReviewallenabledDominofeaturestoensurethattheyfuncSonproperly

• SimpleconfiguraSonmiscuescanimpactnegaSvely

• ClusterreplicaSonunabletolocateaclustermember

• DNSerrorscreatelookupdelays

• Removeunneeded,deprecatednetworkports

AddiSonalItemstoFix

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 23

• Preface

• ChapterI-TheBeginning

• Chapter2-SearchingforClues

• Chapter3-CreaSngaSolidPlaTorm

• Chapter4-TheSoUsideofPerformanceGains

• TheFinalChapter-Results

OurStoryinFortyMinutes

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 24

• DominoAdminhandledthefirstleveltreatment

• Serverperformswell,butnotgoodenough

• Triangulatedtheissuetoamission-criScalapplicaSon

• Nowwhat?

WhereareWe?

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 25

• LackofexperSseand/orknowledge

• Developersevolvedfrompowerusers

• Architectureoverloading

• Unplannedand/orunexpectedexpansion

• Undocumentedcodeand/orbusinessprocess

• Nochangemanagement

• Quick&dirtydevelopment

WhyDominoAppsFail?

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 26

• Thereisnomagicpillforfindingaperformanceissue

• ManyproblemsarecircumstanSal

• Dependsonwho/when/how…

• RepeaSngtheproblemonacontrolledenvironment

• NeedforProof!

• Themostdifficultpartofthetask

• NeedtobesystemaKcal

DevelopersvsPerformanceIssues

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 27

• ResearchandAssessment,

• SpeculaSonforfixes,

• Experiment,

• Prove!

ScienceJustWorks!

http://www.wired.com/2013/04/whats-wrong-with-the-scientific-method/

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 28

MethodologyResearch

✤ Symptoms (e.g. logs, performance data, etc.)✤ Story (e.g. user input)✤ Application code

Hypothesis ✤ Speculation on possible reasons✤ Search for ‘Usual Suspects’

Experiment ✤ Testing for possible reasons

Analyze ✤ Check symptoms if fixed

Conclusion ✤ Issue validated and proved to be fixed.

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 29

• Whattocollect,basedonthesymptom;

• CPU/memoryload,hangs,spikes,crashes,etc.

• AlltheSme,thesameSmeeverydayorrandom?

• Experiencedbyspecificusers?

• Wearelookingforapa[ernbetweenincidents.

Research&Assessment

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 30

Log/NSD/Semaphorefiles

ServerconfiguraSon(inc.notes.ini)

ServermonitoringandstaSsScsdata

Weblogs(forwebapplicaSonissues)

XPagesandOSGilogs(forXPagesspecificissues)

ApplicaSonanddependencies

DataCollecSonChecklist

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 31

• SomeSmes,evenopeninginDDEmaycauseissues!

• e.g.XPagescomponentsareautomaScallybuilt

• ApplicaSoncodemighthavesideeffects

• e.g.UpdaSngonanotherdatasource,addingauditlogs,performancedegradaSonontheserver,etc.

• Therewillbedependencies

• Onceisolated,wecanstartinspecSon…

IsolatetheApplicaSon

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 32

• DatabasecorrupSons

• @Today/@Nowinviews

• CodesnippetsacSnglikeanadmin

• UpdaSngviews,replicaSngdatabases,runningservercommands,etc.

• CodesnippetsusingtheworstpracSces

• Searchinalargedatabase,wronglooping,etc.

• Anythingthatfitsintothepa[ernifthereisone

• e.g.AnagentmatchingtheincidentSming

UsualSuspects

#engageug

Nothingyet?Diggingdeeper!

33

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 34

• DeeperinvesSgaSonneedsateamingeffort

• AdminsandDevelopersshouldcollaborate

• AtestsetuptosimulatetheproducSonenvironment

• Intensive/ControlleddebuggingsessionsinlimitedSmewindows

• SharingexperSse

• ExperimenSngonproducSonshouldbethelastresort

• Oncearepeatableerrorfound,cooperateforasoluSon

TeamUp!

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 35

• JVMCrashwiththeHTTPtask

• RandomSmes

• Nopa[erninthelog

• MemorydumpspointaleakintheJVMHeap

• InspectedXPagesapplicaSons,nothingfound

• TriangulatedtheproblemintooneXPagesapp,followingcluesinintensivedebuggingonmemory

• IsolatedtheapplicaSonforaloadtest,nothingfound

• Increasedlogging,tocollectmoredata,nohope!

ExampleCase-Analysis

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 36

• CheckedtheserverconfiguraSonandnoSced

• Loggingdataincomplete

• Removedexclusions

• Newlogspointedtheproblem

• SearchingsoUwarecrawlingaspecificpage

• Pagegeneratesstatedata,fillsupthememory

• Simulatedthesamecrashonthetestenvironment

• Onelineofcodefixedtheissue

ExampleCase-ResoluSon

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 37

• AmissioncriScalapplicaSonatabank

• WebapplicaSonwith2000+users

• CPUspikesandrandomhangs,mostlyaUernoon

• Logsareclear,nocrashes,noerrormessages

• IsolatedtheapplicaSon,inspectedthe‘usualsuspects’

• FoundawebagentupdaSngaview!

• TriangulatedtheproblemusingweblogsandSEMDEBUG

• But,cannotvalidatetheissueonthetestenvironment…

AnotherCase-Analysis

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 38

• CooperatedwiththeDominoAdmin

• DetailedassessmentontheserverconfiguraSon

• Wefoundtheissue!

• “ServerTasksAt14”runninganupdalltask.

• AnotherProgramfilerunningUpdallonaspecificdatabase,every30minutes

• AppliedtothetestplaTorm,validatedbyaloadtest

• Problemsolved!

AnotherCase-ResoluSon

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 39

• Preface

• ChapterI-TheBeginning

• Chapter2-SearchingforClues

• Chapter3-CreaSngaSolidPlaTorm

• Chapter4-TheSoUsideofPerformanceGains

• TheFinalChapter-Results

OurStoryinFortyMinutes

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 40

• Pagefaultsreducedtozero

• GeneralDBusageandadministraSontasksworkwell

• SAInowover80%

• Weirdovernight(agent)systemoperaSonsresolved

• KeyDBshave93%usedspacethresholdnow

• AllDBscompressed:design,documents,alla[achments

• Programdocuments,agentschedulesalladjusted:finish,nooverlap

QualityAnalysisYieldsQualityResults

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 41

NoteonPerformance

Whendoneproperly,fewuserstendtonoScethechange,butifrevertedtheywillallcomplain

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 42

Neitheradminnordeveloper

couldsolvealloftheseissuesalone!

Teamworkvs.Performance

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 43

• YoucangethelpinspecSngapplicaSonsandservers!

• TheyhavealsohelpedEngage!

BonusSlide

cooperteam MartinScott

teamstudio Ytria

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 44

• IBMChampion(2011-2016)

• DevelopiInformaSonSystems,Istanbul

• ContribuSng…

• OpenNTF/LUGTR/LotusNotus.com

• Featuredon…

• EngageUG,IBMConnect,ICONUK,NotesIn9…

• Also…

• BloggerandPodcasteronScienSficSkepScism

SerdarBaşeğmez

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 45

• IBMChampion(2011-2016)

• EffecSveSoUwareSoluSons,LLC

• Co-founderofLinuxfestatLotusphere/Connect

• Speakerat20+Lotus/IBMrelatedevents/LUGs

• Co-authoredtwoIBMRedbooks

• Co-wrotetheIBMEducaSonAdministraSoncerSficaSontrackforDomino8.5

WilliamMalchiskyJr.

#engageug ©2016SerdarBasegmezandWilliamMalchiskyJr.LicensedunderCreativeCommonsBY-NC-SA4.0 46

FollowUp-ContactInformaSon

Serdar Basegmez

[email protected]

@serdar_basegmez Skype: sbasegmez

Bill Malchisky Jr.

[email protected]

@billmalchisky Skype: FairTaxBill