Ruby Hacking Guide

Preview:

Citation preview

RubyHackingGuide

Preface

Thisbookexploresseveralthemeswiththefollowinggoalsinmind:

TohaveknowledgeofthestructureofrubyTogainknowledgeaboutlanguageprocessingsystemsingeneralToacquireskillsinreadingsourcecode

Rubyisanobject-orientedlanguagedevelopedbyYukihiroMatsumoto.TheofficialimplementationoftheRubylanguageiscalledruby.Itisactivelydevelopedandmaintainedbytheopensourcecommunity.Ourfirstgoalistounderstandtheinner-workingsoftherubyimplementation.Thisbookisgoingtoinvestigaterubyasawhole.

Secondly,byknowingabouttheimplementationofRuby,wewillbeabletoknowaboutotherlanguageprocessingsystems.Itriedtocoveralltopicsnecessaryforimplementingalanguage,suchashashtable,scannerandparser,evaluationprocedure,andmanyothers.Becausethisbookisnotintendedasatextbook,goingthroughentireareasandideaswithoutanylackwasnot

reasonable.Howeverthepartsrelatingtotheessentialstructuresofalanguageimplementationareadequatelyexplained.AndabriefsummaryofRubylanguageitselfisalsoincludedsothatreaderswhodon’tknowaboutRubycanreadthisbook.

Themainthemesofthisbookarethefirstandthesecondpointabove.Though,whatIwanttoemphasizethemostisthethirdone:Toacquireskillinreadingsourcecode.Idaretosayit’sa“hidden”theme.IwillexplainwhyIthoughtitisnecessary.

Itisoftensaid“Tobeaskilledprogrammer,youshouldreadsourcecodewrittenbyothers.”Thisiscertainlytrue.ButIhaven’tfoundabookthatexplainshowyoucanactuallydoit.TherearemanybooksthatexplainOSkernelsandtheinterioroflanguageprocessingsystemsbyshowingtheconcretestructureor“theanswer,”buttheydon’texplainthewaytoreachthatanswer.It’sclearlyone-sided.

Canyou,perhaps,naturallyreadcodejustbecauseyouknowhowtowriteaprogram?Isittruethatreadingcodesissoeasythatallpeopleinthisworldcanreadcodewrittenbyotherswithnosweat?Idon’tthinkso.Readingprogramsiscertainlyasdifficultaswritingprograms.

Therefore,thisbookdoesnotsimplyexplainrubyassomethingalreadyknown,ratherdemonstratetheanalyzingprocessasgraphicaspossible.ThoughIthinkI’mareasonablyseasonedRubyprogrammer,IdidnotfullyunderstandtheinnerstructureofrubyatthetimewhenIstartedtowritethisbook.Inotherwords,

regardingthecontentofruby,Istartedfromthepositionascloseaspossibletoreaders.Thisbookisthesummaryofboththeanalyzingprocessstartedfromthatpointanditsresult.

IaskedYukihiroMatsumoto,theauthorofruby,forsupervision.ButIthoughtthespiritofthisbookwouldbelostifeachanalysiswasmonitoredbytheauthorofthelanguagehimself.ThereforeIlimitedhisreviewtothefinalstageofwriting.Inthisway,withoutloosingthesenseofactuallyreadingthesourcecodes,IthinkIcouldalsoassurethecorrectnessofthecontents.

Tobehonest,thisbookisnoteasy.Intheveryleast,itislimitedinitssimplicitybytheinherentcomplexityofitsaim.However,thiscomplexitymaybewhatmakesthebookinterestingtoyou.Doyoufinditinterestingtobechatteringaroundapieceofcake?Doyoutaketoyourdesktosolveapuzzlethatyouknowtheanswertoinaheartbeat?Howaboutasuspensenovelwhosecriminalyoucanguesshalfwaythrough?Ifyoureallywanttocometonewknowledge,youneedtosolveaproblemengagingallyourcapacities.Thisisthebookthatletsyoupracticesuchidealismexhaustively.“It’sinterestingbecauseit’sdifficult.”I’mgladifthenumberofpeoplewhothinksowillincreasebecauseofthisbook.

Targetaudience

Firstly,knowledgeabouttheRubylanguageisn’trequired.

However,sincetheknowledgeoftheRubylanguageisabsolutelynecessarytounderstandcertainexplanationsofitsstructure,supplementaryexplanationsofthelanguageareinsertedhereandthere.

KnowledgeabouttheClanguageisrequired,tosomeextent.Iassumeyoucanallocatesomestructswithmalloc()atruntimetocreatealistorastackandyouhaveexperienceofusingfunctionpointersatleastafewtimes.

Also,sincethebasicsofobject-orientedprogrammingwillnotbeexplainedsoseriously,withouthavinganyexperienceofusingatleastoneofobject-orientedlanguages,youwillprobablyhaveadifficulttime.Inthisbook,ItriedtousemanyexamplesinJavaandC++.

Structureofthisbook

Thisbookhasfourmainparts:

Part1:ObjectsPart2:SyntacticanalysisPart3:EvaluationPart4:Peripheralaroundtheevaluator

Supplementarychaptersareincludedatthebeginningofeachpartwhennecessary.Theseprovideabasicintroductionforthosewho

arenotfamiliarwithRubyandthegeneralmechanismofalanguageprocessingsystem.

Now,wearegoingthroughtheoverviewofthefourmainparts.Thesymbolinparenthesesaftertheexplanationindicatesthedifficultygauge.Theyare(C),(B),(A)inorderofeasytohard,(S)beingthehighest.

Part1:ObjectChapter1 FocusesthebasicsofRubytogetreadytoaccomplishPart1.(C)Chapter2 GivesconcreteinnerstructureofRubyobjects.(C)Chapter3 Statesabouthashtable.(C)

Chapter4WritesaboutRubyclasssystem.Youmayreadthroughthischapterquicklyatfirst,becauseittellsplentyofabstractstories.(A)

Chapter5Showsthegarbagecollectorwhichisresponsibleforgeneratingandreleasingobjects.Thefirststoryinlow-levelseries.(B)

Chapter6Describestheimplementationofglobalvariables,classvariables,andconstants.(C)Chapter7 OutlineofthesecurityfeaturesofRuby.(C)

Part2:SyntacticanalysisChapter8 TalksaboutalmostcompletespecificationoftheRuby

language,inordertoprepareforPart2andPart3.(C)

Chapter9 Introductiontoyaccrequiredtoreadthesyntaxfileatleast.(B)

Chapter10 Lookthroughtherulesandphysicalstructureoftheparser.(A)

Chapter11Explorearoundtheperipheralsoflex_state,whichisthemostdifficultpartoftheparser.Themostdifficultpartofthisbook.(S)

Chapter12 FinalizationofPart2andconnectiontoPart3.(C)

Part3:EvaluatorChapter13 Describethebasicmechanismoftheevaluator.(C)

Chapter14 ReadstheevaluationstackthatcreatesthemaincontextofRuby.(A)Chapter15 Talksaboutsearchandinitializationofmethods.(B)

Chapter16Defiestheimplementationoftheiterator,themostcharacteristicfeatureofRuby.(A)Chapter17 Describetheimplementationoftheevalmethods.(B)

Part4:PeripheralaroundtheevaluatorChapter18 Run-timeloadingoflibrariesinCandRuby.(B)

Chapter19 Describestheimplementationofthreadattheendofthecorepart.(A)

Environment

Thisbookdescribesonruby1.7.32002-09-12version.It’sattachedontheCD-ROM.Chooseanyoneofruby-rhg.tar.gz,ruby-rhg.lzh,orruby-rhg.zipaccordingtoyourconvenience.Contentisthesameforall.Alternativelyyoucanobtainfromthesupportsite(footnote{http://i.loveruby.net/ja/rhg/})ofthisbook.

Forthepublicationofthisbook,thefollowingbuildenvironmentwaspreparedforconfirmationofcompilingandtestingthebasicoperation.Thedetailsofthisbuildtestaregivenindoc/buildtest.htmlintheattachedCD-ROM.However,itdoesn’tnecessarilyassumetheprobabilityoftheexecutionevenunderthesameenvironmentlistedinthetable.Theauthordoesn’tguaranteeinanyformtheexecutionofruby.

BeOS5PersonalEdition/i386DebianGNU/Linuxpotato/i386DebianGNU/Linuxwoody/i386DebianGNU/Linuxsid/i386FreeBSD4.4-RELEASE/Alpha(Requiresthelocalpatchforthisbook)FreeBSD4.5-RELEASE/i386FreeBSD4.5-RELEASE/PC98FreeBSD5-CURRENT/i386HP-UX10.20HP-UX11.00(32bitmode)HP-UX11.11(32bitmode)MacOSX10.2NetBSD1.6F/i386OpenBSD3.1PlamoLinux2.0/i386LinuxforPlayStation2Release1.0RedhatLinux7.3/i386Solaris2.6/SparcSolaris8/Sparc

UX/4800VineLinux2.1.5VineLinux2.5VineSeedWindows98SE(Cygwin,MinGW+Cygwin,MinGW+MSYS)WindowsMe(BorlandC++Compiler5.5,Cygwin,MinGW+Cygwin,MinGW+MSYS,VisualC++6)WindowsNT4.0(Cygwin,MinGW+Cygwin)Windows2000(BorlandC++Compiler5.5,VisualC++6,VisualC++.NET)WindowsXP(VisualC++.NET,MinGW+Cygwin)

Thesenumeroustestsaren’tofaloneeffortbytheauthor.Thosetestbuildcouldn’tbeachievedwithoutmagnificentcooperationsbythepeoplelistedbelow.

I’dliketoextendwarmestthanksfrommyheart.

TietewkjananyasusakazukiMasahiroSatoKenichiTamuraMorikyuYuyaKatoYasuhiroKuboKentaroGotoTomoyukiShimomura

MasakiSukedaKojiAraiKazuhiroNishiyamaShinyaKawajiTetsuyaWatanabeNaokuniFujimoto

However,theauthorowestheresponsibilityforthistest.Pleaserefrainfromattemptingtocontactthesepeopledirectly.Ifthere’sanyflawinexecution,pleasebeadvisedtocontacttheauthorbye-mail:aamine@loveruby.net.

Website

Thewebsiteforthisbookishttp://i.loveruby.net/ja/rhg/.Iwilladdinformationaboutrelatedprogramsandadditionaldocumentation,aswellaserrata.Inaddition,I’mgoingtopublisizethefirstfewchaptersofthisbookatthesametimeoftherelease.Iwilllookforacertaincircumstancetopublicizemorechapters,andthewholecontentsofthebookwillbeatthiswebsiteattheend.

Acknowledgment

Firstofall,IwouldliketothankMr.YukihiroMatsumoto.Heis

theauthorofRuby,andhemadeitinpublicasanopensourcesoftware.Notonlyhewillinglyapprovedmetopublishabookaboutanalyzingruby,butalsoheagreedtosupervisethecontentofit.Inaddition,hehelpedmystayinFloridawithsimultaneoustranslation.ThereareplentyofthingsbeyondenumerationIhavetosaythankstohim.Insteadofwritingallthethings,Igivethisbooktohim.

Next,Iwouldliketothankarton,whoproposedmetopublishthisbook.Thewordsofartonalwaysmovesme.OneofthethingsI’mcurrentlystruggledduetohiswordsisthatIhavenoreasonIdon’tgeta.NETmachine.

KojiArai,the‘captain’ofdocumentationintheRubysociety,conductedascrutinyreviewasifhebecametheofficialeditorofthisbookwhileIwasnottoldso.Ithankallhisreview.

AlsoI’dliketomentionthosewhogavemecomments,pointedoutmistakesandsubmittedproposalsabouttheconstructionofthebookthroughoutallmywork.

Tietew,Yuya,Kawaji,Gotoken,Tamura,Funaba,Morikyu,Ishizuka,Shimomura,Kubo,Sukeda,Nishiyama,Fujimoto,Yanagawa,(I’msorryifthere’sanypeoplemissing),Ithankallthosepeoplecontributed.

Asafinalnote,IthankOtsuka,Haruta,andKanemitsuwhoyouforarrangingeverythingdespitemybrokedeadlineasmuchasfourtimes,andthatthemanuscriptexceeded200pagesthan

originallyplanned.

Icannotexpandthefulllistheretomentionthenameofallpeoplecontributedtothisbook,butIsaythatIcouldn’tsuccessfullypublishthisbookwithoutsuchassistance.Letmetakethisplacetoexpressmyappreciation.Thankyouverymuch.

MineroAoki

Ifyouwanttosendremarks,suggestionsandreportsoftypographcalerrors,pleaseaddresstoMineroAoki<aamine@loveruby.net>.

“Rubyソースコード完全解説”canbereserved/orderedatImpressDirect.(Jumptotheintroductionpage)

Copyright©2002-2004MineroAoki,Allrightsreserved.

TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera

CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License

RubyHackingGuide

Introduction

CharacteristicsofRuby

SomeofthereadersmayhavealreadybeenfamiliarwithRuby,but(Ihope)therearealsomanyreaderswhohavenot.Firstlet’sgothougharoughsummaryofthecharacteristicsofRubyforsuchpeople.

Hereaftercapital“Ruby”referstoRubyasalanguagespecification,andlowercase“ruby”referstorubycommandasanimplementation.

DevelopmentstyleRubyisalanguagethatisbeingdeveloppedbythehandofYukihiroMatsumotoasanindividual.UnlikeCorJavaorScheme,itdoesnothaveanystandard.Thespecificationismerelyshownasanimplementationasruby,anditsvaryingcontinuously.Forgood

orbad,it’sfree.

Furthermorerubyitselfisafreesoftware.It’sprobablynecessarytomentionatleastthetwopointshere:Thesourcecodeisopeninpublicanddistributedfreeofcharge.Thankstosuchcondition,anattemptlikethisbookcanbeapproved.

Ifyou’dliketoknowtheexactlisence,youcanreadREADMEandLEGAL.Forthetimebeing,I’dlikeyoutorememberthatyoucandoatleastthefollowingthings:

YoucanredistributesourcecodeofrubyYoucanmodifysourcecodeofrubyYoucanredistributeacopyofsourcecodewithyourmodification

Thereisnoneedforspecialpermissionandpaymentinallthesecases.

Bytheway,thepurposeofthisbookistoreadtheoriginalruby,thusthetargetsourceistheonenotmodifiedunlessitisparticularyspecified.However,whitespaces,newlinesandcommentswereaddedorremovedwithoutasking.

It’sconservativeRubyisaveryconservativelanguage.Itisequippedwithonlycarefullychosenfeaturesthathavebeentestedandwashedoutinavarietyoflanguages.Thereforeitdoesn’thaveplentyoffreshand

experimentalfeaturesverymuch.Soithasatendencytoappealtoprogrammerswhoputimportanceonpracticalfunctionalities.Thedyed-in-the-woolhackerslikeSchemeandHaskellloversdon’tseemtofindappealinruby,atleastinashortglance.

Thelibraryisconservativeinthesameway.Clearandunabbreviatednamesaregivenfornewfunctions,whilenamesthatappearsinCandPerllibrarieshavebeentakenfromthem.Forexample,printf,getpwent,sub,andtr.

Itisalsoconservativeinimplementation.Assemblerisnotitsoptionforseekingspeed.Portabilityisalwaysconsideredahigherprioritywhenitconflictswithspeed.

Itisanobject-orientedlanguageRubyisanobject-orientedlanguage.ItisabsolutelyimpossibletoexcludeitfromthefeaturesofRuby.

Iwillnotgiveapagetothisbookaboutwhatanobject-orientedlanguageis.Totellaboutanobject-orientedfeatureaboutRuby,theexpressionofthecodethatjustgoingtobeexplainedistheexactsample.

ItisascriptlanguageRubyisascriptlanguage.ItseemsalsoabsolutelyimpossibletoexcludethisfromthefeaturesofRuby.Togainagreementofeveryone,anintroductionofRubymustinclude“object-oriented”

and“scriptlanguage”.

However,whatisa“scriptlanguage”forexample?Icouldn’tfigureoutthedefinitionsuccessfully.Forexample,JohnK.Ousterhout,theauthorofTcl/Tk,givesadefinitionas“executablelanguageusing#!onUNIX”.Thereareotherdefinitionsdependingontheviewpoints,suchasonethatcanexpressausefulprogramwithonlyoneline,orthatcanexecutethecodebypassingaprogramfilefromthecommandline,etc.

However,Idaretouseanotherdefinition,becauseIdon’tfindmuchinterestin“what”ascriptlanguage.Ihavetheonlyonemeasuretodecidetocallitascriptlanguage,thatis,whethernoonewouldcomplainaboutcallingitascriptlanguage.Tofulfillthisdefinition,Iwoulddefinethemeaningof“scriptlanguage”asfollows.

Alanguagethatitsauthorcallsita“scriptlanguage”.

I’msurethisdefinitionwillhavenofailure.AndRubyfulfillsthispoint.ThereforeIcallRubya“scriptlanguage”.

It’saninterpreterrubyisaninterpreter.That’sthefact.Butwhyit’saninterpreter?Forexample,couldn’titbemadeasacompiler?Itmustbebecauseinsomepointsbeinganinterpreterisbetterthanbeingacompiler…atleastforruby,itmustbebetter.Well,whatisgoodaboutbeinganinterpreter?

Asapreparationsteptoinvestigatingintoit,let’sstartbythinkingaboutthedifferencebetweenaninterpreterandacompiler.Ifthematteristoattemptatheoreticalcomparisonintheprocesshowaprogramisexecuted,there’snodifferencebetweenaninterpreterlanguageandacompilelanguage.BecauseitworksbylettingCPUinterpretthecodecompiledtothemachinelanguage,itmaybepossibletosayitworksasaninterpretor.Thenwhereistheplacethatactuallymakesadifference?Itisamorepracticalplace,intheprocessofdevelopment.

Iknowsomebody,assoonashearing“intheprocessofdevelopment”,wouldclaimusingastereotypicalphrase,thataninterpreterreduceseffortofcompilationthatmakesthedevelopmentprocedureeasier.ButIdon’tthinkit’saccurate.Alanguagecouldpossiblybeplannedsothatitwon’tshowtheprocessofcompilation.Actually,DelphicancompileaprojectbyhittingjustF5.Aclaimaboutalongtimeforcompilationisderivedfromthesizeoftheprojectoroptimizationofthecodes.Compilationitselfdoesn’toweanegativeside.

Well,whypeopleperceiveaninterpreterandcompilersomuchdifferentlikethis?Ithinkthatitisbecausethelanguagedeveloperssofarhavechoseneitherimplementationbasedonthetraitofeachlanguage.Inotherwords,ifitisalanguageforacomparativelysmallpurposesuchasadailyroutine,itwouldbeaninterpretor.Ifitisforalargeprojectwhereanumberofpeopleareinvolvedinthedevelopmentandaccuracyisrequired,itwouldbeacompiler.Thatmaybebecauseofthespeed,aswellastheeaseof

creatingalanguage.

Therefore,Ithink“it’shandybecauseit’saninterpreter”isanoutsizedmyth.Beinganinterpreterdoesn’tnecessarilycontributethereadinessinusage;seekingreadinessinusagenaturallymakesyourpathtowardbuildinganinterpreterlanguage.

Anyway,rubyisaninterpreter;ithasanimportantfactaboutwherethisbookisfacing,soIemphasizeithereagain.ThoughIdon’tknowabout“it’shandybecauseitisaninterpreter”,anywayrubyisimplementedasaninterpreter.

HighportabilityEvenwithaproblemthatfundamentallytheinterfacesareUnix-centered,Iwouldinsistrubypossessesahighportability.Itdoesn’trequireanyextremelyunfamiliarlibrary.Ithasonlyafewpartswritteninassembler.Thereforeportingtoanewplatformiscomparativelyeasy.Namely,itworksonthefollowingplatformscurrently.

LinuxWin32(Windows95,98,Me,NT,2000,XP)CygwindjgppFreeBSDNetBSDOpenBSD

BSD/OSMacOSXSolarisTru64UNIXHP-UXAIXVMSUX/4800BeOSOS/2(emx)Psion

IheardthatthemainmachineoftheauthorMatsumotoisLinux.ThuswhenusingLinux,youwillnotfailtocompileanytime.

Furthermore,youcanexpectastablefunctionalityona(typical)Unixenvironment.Consideringthereleasecycleofpackages,theprimaryoptionfortheenvironmenttohitaroundrubyshouldfallonabranchofPCUNIX,currently.

Ontheotherhand,theWin32environmenttendstocauseproblemsdefinitely.ThelargegapsinthetargetingOSmodeltendtocauseproblemsaroundthemachinestackandthelinker.Yet,recentlyWindowshackershavecontributedtomakebettersupport.IuseanativerubyonWindows2000andMe.Onceitgetssuccessfullyrun,itdoesn’tseemtoshowspecialconcernslikefrequentcrashing.ThemainproblemsonWindowsmaybethegapsinthespecifications.

AnothertypeofOSthatmanypeoplemaybeinterestedinshouldprobablybeMacOS(priortov9)andhandheldOSlikePalm.

Aroundruby1.2andbefore,itsupportedlegacyMacOS,butthedevelopmentseemstobeinsuspension.Evenacompilingcan’tgetthrough.ThebiggestcauseisthatthecompilerenvironmentoflegacyMacOSandthedecreaseofdevelopers.TalkingaboutMacOSX,there’snoworriesbecausethebodyisUNIX.

ThereseemtobediscussionstheportabilitytoPalmseveralbranches,butIhaveneverheardofasuccessfulproject.Iguessthedifficultyliesinthenecessityofsettlingdownthespecification-levelstandardssuchasstdioonthePalmplatform,ratherthantheprocessesofactualimplementation.WellIsawaportingtoPsionhasbeendone.([ruby-list:36028]).

HowabouthotstoriesaboutVMseeninJavaand.NET?BecauseI’dliketotalkaboutthemcombiningtogetherwiththeimplementation,thistopicwillbeinthefinalchapter.

AutomaticmemorycontrolFunctionallyit’scalledGC,orGarbageCollection.SayingitinC-language,thisfeatureallowsyoutoskipfree()aftermalloc().Unusedmemoryisdetectedbythesystemautomatically,andwillbereleased.It’ssoconvenientthatonceyougetusedtoGCyouwon’tbewillingtodosuchmanualmemorycontrolagain.

ThetopicsaboutGChavebeencommonbecauseofitspopularity

inrecentlanguageswithGCasastandardset,anditisfunthatitsalgorithmscanstillbeimprovedfurther.

TypelessvariablesThevariablesinRubydon’thavetypes.Thereasonisprobablytypelessvariablesconformsmorewithpolymorphism,whichisoneofthestrongestadvantagesofanobject-orientedlanguage.Ofcoursealanguagewithvariabletypehasawaytodealwithpolymorphism.WhatImeanhereisatypelessvariableshavebetterconformance.

Thelevelof“betterconformance”inthiscasereferstosynonymslike“handy”.It’ssometimescorrespondstocrucialimportance,sometimesitdoesn’tmatterpractically.Yet,thisiscertainlyanappealingpointifalanguageseeksfor“handyandeasy”,andRubydoes.

MostofsyntacticelementsareexpressionsThistopicisprobablydifficulttounderstandinstantlywithoutalittlesupplementalexplanation.Forexample,thefollowingC-languageprogramresultsinasyntacticerror.

result=if(cond){process(val);}else{0;}

BecausetheC-languagesyntaxdefinesifasastatement.Butyou

canwriteitasfollows.

result=cond?process(val):0;

Thisrewriteispossiblebecausetheconditionaloperator(a?b:c)isdefinedasanexpression.

Ontheotherhand,inRuby,youcanwriteasfollowsbecauseifisanexpression.

result=ifcondthenprocess(val)elsenilend

Roughlyspeaking,ifitcanbeanargumentofafunctionoramethod,youcanconsideritasanexpression.

Ofcourse,thereareotherlanguageswhosesyntacticelementsaremostlyexpressions.Lispisthebestexample.Becauseofthecharacteristicaroundthis,thereseemsmanypeoplewhofeellike“RubyissimilartoLisp”.

IteratorsRubyhasiterators.Whatisaniterator?Beforegettingintoiterators,Ishouldmentionthenecessityofusinganalternativeterm,becausetheword“iterator”isdislikedrecently.However,Idon’thaveagoodalternative.Soletuskeepcallingit“iterator”forthetimebeing.

Wellagain,whatisaniterator?Ifyouknowhigher-orderfunction,forthetimebeing,youcanregarditassomethingsimilartoit.InC-language,thecounterpartwouldbepassingafunctionpointerasanargument.InC++,itwouldbeamethodtowhichtheoperationpartofSTL’sIteratorisenclosed.IfyouknowshorPerl,it’sgoodtoimaginesomethinglikeacustomforstatementwhichwecandefine.

Yet,theabovearemerelyexamplesof“similar”concepts.Allofthemaresimilar,buttheyarenotidenticaltoRuby’siterator.Iwillexpandtheprecisestorywhenit’sagoodtimelater.

WritteninC-languageBeingwritteninC-languageisnotnotablethesedays,butit’sstillacharacteristicforsure.AtleastitisnotwritteninHaskellorPL/I,thusthere’sthehighpossibilitythattheordinarypeoplecanreadit.(Whetheritistrulyso,I’dlikeyouconfirmitbyyourself.)

Well,Ijustsaidit’sinC-language,buttheactuallanguageversionwhichrubyistargettingisbasicallyK&RC.Untilalittlewhileago,therewereadecentnumberof–notplentythough–K&R-only-environment.Butrecently,thereareafewenvironmentswhichdonotacceptprogramswritteninANSIC,technicallythere’snoproblemtomoveontoANSIC.However,alsobecauseoftheauthorMatsumoto’spersonalpreference,itisstillwritteninK&Rstyle.

Forthisreason,thefunctiondefinitionisallinK&Rstyle,andtheprototypedeclarationsarenotsoseriouslywritten.Ifyoucarelesslyspecify-Walloptionofgcc,therewouldbeplentyofwarningsshown.IfyoutrytocompileitwithaC++compiler,itwouldwarnprototypemismatchandcouldnotcompile.…Thesekindofstoriesareoftenreportedtothemailinglist.

ExtensionlibraryWecanwriteaRubylibraryinCandloaditatruntimewithoutrecompilingRuby.Thistypeoflibraryiscalled“Rubyextensionlibrary”orjust“Extensionlibrary”.

NotonlythefactthatwecanwriteitinC,buttheverysmalldifferenceinthecodeexpressionbetweenRuby-levelandC-levelisalsoasignificanttrait.AsfortheoperationsavailableinRuby,wecanalsousetheminCinthealmostsameway.Seethefollowingexample.

#Methodcallobj.method(arg)#Rubyrb_funcall(obj,rb_intern("method"),1,arg);#C

#Blockcallyieldarg#Rubyrb_yield(arg);#C

#RaisingexceptionraiseArgumentError,'wrongnumberofarguments'#Rubyrb_raise(rb_eArgError,"wrongnumberofarguments");#C

#Generatinganobject

arr=Array.new#RubyVALUEarr=rb_ary_new();#C

It’sgoodbecauseitprovideseasinessincomposinganextensionlibrary,andactuallyitmakesanindispensableprominenceofruby.However,it’salsoaburdenforrubyimplementation.Youcanseetheaffectsofitinmanyplaces.TheaffectstoGCandthread-processingiseminent.

ThreadRubyisequippedwiththread.Assumingaveryfewpeopleknowingnoneaboutthreadthesedays,Iwillomitanexplanationaboutthethreaditself.Iwillstartastoryindetail.

ruby’sthreadisauser-levelthreadthatisoriginallywritten.Thecharacteristicofthisimplementationisaveryhighportabilityinbothspecificationandimplementation.SurprisinglyaMS-DOScanrunthethread.Furthermore,youcanexpectthesameresponseinanyenvironment.Manypeoplementionthatthispointisthebestfeatureofruby.

However,asatradeoffforsuchanextremenessofportability,rubyabandonsthespeed.It’s,say,probablytheslowestofalluser-levelthreadimplementationsinthisworld.Thetendencyofrubyimplementationmaybeseenherethemostclearly.

Techniquetoreadsourcecode

Well.Afteranintroductionofruby,weareabouttostartreadingsourcecode.Butwait.

Anyprogrammerhastoreadasourcecodesomewhere,butIguesstherearenotmanyoccasionsthatsomeoneteachesyoutheconcretewayshowtoread.Why?Doesitmeanyoucannaturallyreadaprogramifyoucanwriteaprogram?

ButIcan’tthinkreadingtheprogramwrittenbyotherpeopleissoeasy.Inthesamewayaswritingprograms,theremustbetechniquesandtheoriesinreadingprograms.Andtheyarenecessary.Therefore,beforestartingtoreadyruby,I’dliketoexpandageneralsummaryofanapproachyouneedtotakeinreadingasourcecode.

PrinciplesAtfirst,Imentiontheprinciple.

DecideagoalAnimportantkeytoreadingthesourcecodeistosetaconcretegoal.

ThisisawordbytheauthorofRuby,Matsumoto.Indeed,hiswordisveryconvincingforme.Whenthemotivationisaspontaneous

idea“MaybeIshouldreadakernel,atleast…”,youwouldgetsourcecodeexpandedorexplanatorybooksreadyonthedesk.Butnotknowingwhattodo,thestudiesaretobeleftuntouched.Haven’tyou?Ontheotherhand,whenyouhaveinmind“I’msurethereisabugsomewhereinthistool.Ineedtoquicklyfixitandmakeitwork.OtherwiseIwillnotbeabletomakethedeadline…”,youwillprobablybeabletofixthecodeinablink,evenifit’swrittenbysomeoneelse.Haven’tyou?

Thedifferenceinthesetwocasesismotivationyouhave.Inordertoknowsomething,youatleasthavetoknowwhatyouwanttoknow.Therefore,thefirststepofallistofigureoutwhatyouwanttoknowinexplicitwords.

However,ofcoursethisisnotallneededtomakeityourown“technique”.Because“technique”needstobeacommonmethodthatanybodycanmakeuseofitbyfollowingit.Inthefollowingsection,Iwillexplainhowtobringthefirststepintothelandingplacewhereyouachievethegoalfinally.

VisualisingthegoalNowletussupposethatourfinalgoalisset“Understandallaboutruby”.Thisiscertainlyconsideredas“onesetgoal”,butapparentlyitwillnotbeusefulforreadingthesourcecodeactually.Itwillnotbeatriggerofanyconcreteaction.Therefore,yourfirstjobwillbetodragdownthevaguegoaltothelevelofaconcretething.

Thenhowcanwedoit?Thefirstwayisthinkingasifyouarethe

personwhowrotetheprogram.Youcanutilizeyourknowledgeinwritingaprogram,inthiscase.Forexample,whenyouarereadingatraditional“structured”programmingbysomebody,youwillanalyzeithiringthestrategiesofstructuredprogrammingtoo.Thatis,youwilldividethetargetintopieces,littlebylittle.IfitissomethingcirculatinginaeventloopsuchasaGUIprogram,firstroughlybrowsetheeventloopthentrytofindouttheroleofeacheventhandler.Or,trytoinvestigatethe“M”ofMVC(ModelViewController)first.

Second,it’sgoodtobeawareofthemethodtoanalyze.Everybodymighthavecertainanalysismethods,buttheyareoftendonerelyingonexperienceorintuition.Inwhatwaycanwereadsourcecodeswell?Thinkingaboutthewayitselfandbeingawareofitarecruciallyimportant.

Well,whataresuchmethodslike?Iwillexplainitinthenextsection.

AnalysismethodsThemethodstoreadsourcecodecanberoughlydividedintotwo;oneisastaticmethodandtheotherisdynamicmethod.Staticmethodistoreadandanalyzethesourcecodewithoutrunningtheprogram.Dynamicmethodistowatchtheactualbehaviorusingtoolslikeadebugger.

It’sbettertostartstudyingaprogrambydynamicanalysis.Thatis

becausewhatyoucanseethereisthe“fact”.Theresultsfromstaticanalysis,duetothefactofnotrunningtheprogramactually,maywellbe“prediction”toagreaterorlesserextent.Ifyouwanttoknowthetruth,youshouldstartfromwatchingthefact.

Ofcourse,youdon’tknowwhethertheresultsofdynamicanalysisarethefactreally.Thedebuggercouldrunwithabug,ortheCPUmaynotbeworkingproperlyduetooverheat.Theconditionsofyourconfigurationcouldbewrong.However,theresultsofstaticanalysisshouldatleastbeclosertothefactthandynamicanalysis.

Dynamicanalysis

UsingthetargetprogramYoucan’tstartwithoutthetargetprogram.Firstofall,youneedtoknowinadvancewhattheprogramislike,andwhatareexpectedbehaviors.

FollowingthebehaviorusingthedebuggerIfyouwanttoseethepathsofcodeexecutionandthedatastructureproducedasaresult,it’squickertolookattheresultbyrunningtheprogramactuallythantoemulatethebehaviorinyourbrain.Inordertodosoeasily,usethedebugger.

Iwouldbemorehappyifthedatastructureatruntimecanbeseen

asapicture,butunfortunatelywecannearlyscarcelyfindatoolforthatpurpose(especiallyfewtoolsareavailableforfree).Ifitisaboutasnapshotofthecomparativelysimplerstructure,wemightbeabletowriteitoutasatextandconvertittoapicturebyusingatoollikegraphviz\footnote{graphviz……Seedoc/graphviz.htmlintheattachedCD-ROM}.Butit’sverydifficulttofindawayforgeneralpurposeandrealtimeanalysis.

TracerYoucanusethetracerifyouwanttotracetheproceduresthatcodegoesthrough.IncaseofC-language,thereisatoolnamedctrace\footnote{ctrace……http://www.vicente.org/ctrace}.Fortracingasystemcall,youcanusetoolslikestrace\footnote{strace……http://www.wi.leidenuniv.nl/~wichert/strace/},truss,andktrace.

PrinteverywhereThereisaword“printfdebugging”.Thismethodalsoworksforanalysisotherthandebugging.Ifyouarewatchingthehistoryofonevariable,forexample,itmaybeeasiertounderstandtolookatthedumpoftheresultoftheprintstatementsembed,thantotrackthevariablewithadebugger.

ModifyingthecodeandrunningitSayforexample,intheplacewhereit’snoteasytounderstandits

behavior,justmakeasmallchangeinsomepartofthecodeoraparticularparameterandthenre-runtheprogram.Naturallyitwouldchangethebehavior,thusyouwouldbeabletoinferthemeaningofthecodefromit.

Itgoeswithoutsaying,youshouldalsohaveanoriginalbinaryanddothesamethingonbothofthem.

Staticanalysis

TheimportanceofnamesStaticanalysisissimplysourcecodeanalysis.Andsourcecodeanalysisisreallyananalysisofnames.Filenames,functionnames,variablenames,typenames,membernames—Aprogramisabunchofnames.

Thismayseemobviousbecauseoneofthemostpowerfultoolsforcreatingabstractionsinprogrammingisnaming,butkeepingthisinmindwillmakereadingmuchmoreefficient.

Also,we’dliketoknowaboutcodingrulesbeforehandtosomeextent.Forexample,inClanguage,externfunctionoftenusesprefixtodistinguishthetypeoffunctions.Andinobject-orientedprograms,functionnamessometimescontaintheinformationaboutwheretheybelongtoinprefixes,anditbecomesvaluableinformation(e.g.rb_str_length).

ReadingdocumentsSometimesadocumentdescribestheinternalstructureisincluded.EspeciallybecarefulofafilenamedHACKINGetc.

ReadingthedirectorystructureLookingatinwhatpolicythedirectoriesaredivided.Graspingtheoverviewsuchashowtheprogramisstructured,andwhatthepartsare.

ReadingthefilestructureWhilebrowsing(thenamesof)thefunctions,alsolookingatthepolicyofhowthefilesaredivided.Youshouldpayattentiontothefilenamesbecausetheyarelikecommentswhoselifetimeisverylong.

Additionally,ifafilecontainssomemodulesinit,foreachmodulethefunctionstocomposeitshouldbegroupedtogether,soyoucanfindoutthemodulestructurefromtheorderofthefunctions.

InvestigatingabbreviationsAsyouencounterambiguousabbreviations,makealistofthemandinvestigateeachofthemasearlyaspossible.Forexample,whenitiswritten“GC”,thingswillbeverydifferentdependingonwhetheritmeans“GarbageCollection”or“GraphicContext”.

Abbreviationsforaprogramaregenerallymadebythemethodsliketakingtheinitiallettersordroppingthevowels.Especially,popularabbreviationsinthefieldsofthetargetprogramareusedunconditionally,thusyoushouldbefamiliarwiththematanearlystage.

UnderstandingdatastructureIfyoufindbothdataandcode,youshouldfirstinvestigatethedatastructure.Inotherwords,whenexploringcodeinC,it’sbettertostartwithheaderfiles.Andinthiscase,let’smakethemostofourimaginationfromtheirfilenames.Forexample,ifyoufindframe.h,itwouldprobablybethestackframedefinition.

Also,youcanunderstandmanythingsfromthemembernamesofastructandtheirtypes.Forexample,ifyoufindthemembernext,whichpointstoitsowntype,thenitwillbealinkedlist.Similarly,whenyoufindmemberssuchasparent,children,andsibling,thenitmustbeatreestructure.Whenprev,itwillbeastack.

UnderstandingthecallingrelationshipbetweenfunctionsAfternames,thenextmostimportantthingtounderstandistherelationshipsbetweenfunctions.Atooltovisualizethecallingrelationshipsisespeciallycalleda“callgraph”,andthisisveryuseful.Forthis,we’dliketoutilizetools.

Atext-basedtoolissufficient,butit’sevenbetterifatoolcangeneratediagrams.Howeversuchtoolisseldomavailable(especiallyfewtoolsareforfree).WhenIanalyzedrubytowritethisbook,IwroteasmallcommandlanguageandaparserinRubyandgenerateddiagramshalf-automaticallybypassingtheresultstothetoolnamedgraphviz.

ReadingfunctionsReadinghowitworkstobeabletoexplainthingsdonebythefunctionconcisely.It’sgoodtoreaditpartbypartaslookingatthefigureofthefunctionrelationships.

Whatisimportantwhenreadingfunctionsisnot“whattoread”but“whatnottoread”.Theeaseofreadingisdecidedbyhowmuchwecancutoutthecodes.Whatshouldexactlybecutout?Itishardtounderstandwithoutseeingtheactualexample,thusitwillbeexplainedinthemainpart.

Additionally,whenyoudon’tlikeitscodingstyle,youcanconvertitbyusingthetoollikeindent.

ExperimentingbymodifyingitasyoulikeIt’samysteryofhumanbody,whensomethingisdoneusingalotofpartsofyourbody,itcaneasilypersistinyourmemory.Ithinkthereasonwhynotafewpeoplepreferusingmanuscriptpaperstoakeyboardisnotonlytheyarejustnostalgicbutsuchfactisalso

related.

Therefore,becausemerelyreadingonamonitorisveryineffectivetorememberwithourbodies,rewriteitwhilereading.Thiswayoftenhelpsourbodiesgetusedtothecoderelativelysoon.Iftherearenamesorcodeyoudon’tlike,rewritethem.Ifthere’sacrypticabbreviation,substituteitsothatitwouldbenolongerabbreviated.

However,itgoeswithoutsayingbutyoushouldalsokeeptheoriginalsourceasideandchecktheoriginalonewhenyouthinkitdoesnotmakesensealongtheway.Otherwise,youwouldbewonderingforhoursbecauseofasimpleyourownmistake.Andsincethepurposeofrewritingisgettingusedtoandnotrewritingitself,pleasebecarefulnottobeenthusiasticverymuch.

ReadingthehistoryAprogramoftencomeswithadocumentwhichisaboutthehistoryofchanges.Forexample,ifitisasoftwareofGNU,there’salwaysafilenamedChangeLog.Thisisthebestresourcetoknowabout“thereasonwhytheprogramisasitis”.

Alternatively,whenaversioncontrolsystemlikeCVSorSCCSisusedandyoucanaccessit,itsutilityvalueishigherthanChangeLog.TakingCVSasanexample,cvsannotate,whichdisplaystheplacewhichmodifiedaparticularline,andcvsdiff,whichtakesdifferencefromthespecifiedversion,andsoonareconvenient.

Moreover,inthecasewhenthere’samailinglistoranewsgroupfordevelopers,youshouldgetthearchivessothatyoucansearchoverthemanytimebecauseoftenthere’stheinformationabouttheexactreasonofacertainchange.Ofcourse,ifyoucansearchonline,it’salsosufficient.

ThetoolsforstaticanalysisSincevarioustoolsareavailableforvariouspurposes,Ican’tdescribethemasawhole.ButifIhavetochooseonlyoneofthem,I’drecommendglobal.Themostattractivepointisthatitsstructureallowsustoeasilyuseitfortheotherpurposes.Forinstance,gctags,whichcomeswithit,isactuallyatooltocreatetagfiles,butyoucanuseittocreatealistofthefunctionnamescontainedinafile.

~/src/ruby%gctagsclass.c|awk'{print$1}'SPECIAL_SINGLETONSPECIAL_SINGLETONclone_methodinclude_class_newins_methods_iins_methods_priv_iins_methods_prot_imethod_list::

Thatsaid,butthisisjustarecommendationofthisauthor,youasareadercanusewhichevertoolyoulike.Butinthatcase,youshouldchooseatoolequippedwithatleastthefollowingfeatures.

listupthefunctionnamescontainedinafilefindthelocationfromafunctionnameoravariablename(It’smorepreferableifyoucanjumptothelocation)functioncross-reference

Build

TargetversionTheversionofrubydescribedinthisbookis1.7(2002-09-12).Regardingruby,itisastableversionifitsminorversionisanevennumber,anditisadevelopingversionifitisanoddnumber.Hence,1.7isadevelopingversion.Moreover,9/12doesnotindicateanyparticularperiod,thusthisversionisnotdistributedasanofficialpackage.Therefore,inordertogetthisversion,youcangetfromtheCD-ROMattachedtothisbookorthesupportsite\footnote{Thesupportsiteofthisbook……http://i.loveruby.net/ja/rhg/}oryouneedtousetheCVSwhichwillbedescribedlater.

Therearesomereasonswhyitisnot1.6,whichisthestableversion,but1.7.Onethingisthat,becauseboththespecificationandtheimplementationareorganized,1.7iseasiertodealwith.Secondly,it’seasiertouseCVSifitistheedgeofthedevelopingversion.Additionally,itislikelythat1.8,whichisthenextstableversion,willbeoutinthenearfuture.Andthelastoneis,

investigatingtheedgewouldmakeourmoodmorepleasant.

GettingthesourcecodeThearchiveofthetargetversionisincludedintheattachedCD-ROM.InthetopdirectoryoftheCD-ROM,

ruby-rhg.tar.gzruby-rhg.zipruby-rhg.lzh

thesethreeversionsareplaced,soI’dlikeyoutousewhicheveronethatisconvenientforyou.Ofcourse,whicheveroneyouchoose,thecontentisthesame.Forexample,thearchiveoftar.gzcanbeextractedasfollows.

~/src%mount/mnt/cdrom~/src%gzip-dc/mnt/cdrom/ruby-rhg.tar.gz|tarxf-~/src%umount/mnt/cdrom

CompilingJustbylookingatthesourcecode,youcan“read”it.Butinordertoknowabouttheprogram,youneedtoactuallyuseit,remodelitandexperimentwithit.Whenexperimenting,there’snomeaningifyoudidn’tusethesameversionyouarelookingat,thusnaturallyyou’dneedtocompileitbyyourself.

Therefore,fromnowon,I’llexplainhowtocompile.First,let’sstartwiththecaseofUnix-likeOS.There’sseveralthingsto

consideronWindows,soitwillbedescribedinthenextsectionaltogether.However,CygwinisonWindowsbutalmostUnix,thusI’dlikeyoutoreadthissectionforit.

BuildingonaUnix-likeOSWhenitisaUnix-likeOS,becausegenerallyitisequippedwithaCcompiler,byfollowingthebelowprocedures,itcanpassinmostcases.Letussuppose~/src/rubyistheplacewherethesourcecodeisextracted.

~/src/ruby%./configure~/src/ruby%make~/src/ruby%su~/src/ruby#makeinstall

Below,I’lldescribeseveralpointstobecarefulabout.

OnsomeplatformslikeCygwin,UX/4800,youneedtospecifythe--enable-sharedoptionatthephaseofconfigure,oryou’dfailtolink.--enable-sharedisanoptiontoputthemostofrubyoutofthecommandassharedlibraries(libruby.so).

~/src/ruby%./configure--enable-shared

Thedetailedtutorialaboutbuildingisincludedindoc/build.htmloftheattachedCD-ROM,I’dlikeyoutotryasreadingit.

BuildingonWindows

Ifthethingistobuildonwindows,itbecomeswaycomplicated.Thesourceoftheproblemis,therearemultiplebuildingenvironments.

VisualC++MinGWCygwinBorlandC++Compiler

First,theconditionoftheCygwinenvironmentisclosertoUNIXthanWindows,youcanfollowthebuildingproceduresforUnix-likeOS.

Ifyou’dliketocompilewithVisualC++,VisualC++5.0andlaterisrequired.There’sprobablynoproblemifitisversion6or.NET.

MinGWorMinimalistGNUforWindows,itiswhattheGNUcompilingenvironment(Namely,gccandbinutils)isportedonWindows.CygwinportsthewholeUNIXenvironment.Onthecontrary,MinGWportsonlythetoolstocompile.Moreover,aprogramcompiledwithMinGWdoesnotrequireanyspecialDLLatruntime.Itmeans,therubycompiledwithMinGWcanbetreatedcompletelythesameastheVisualC++version.

Alternatively,ifitispersonaluse,youcandownloadtheversion5.5ofBorlandC++CompilerforfreefromthesiteofBoarland.\footnote{TheBorlandsite:http://www.borland.co.jp}Becauserubystartedtosupportthisenvironmentfairlyrecently,there’smoreorlessanxiety,buttherewasnotanyparticularproblemonthebuild

testdonebeforethepublicationofthisbook.

Then,amongtheabovefourenvironments,whichoneshouldwechoose?First,basicallytheVisualC++versionisthemostunlikelytocauseaproblem,thusIrecommendit.IfyouhaveexperiencedwithUNIX,installingthewholeCygwinandusingitisgood.IfyouhavenotexperiencedwithUNIXandyoudon’thaveVisualC++,usingMinGWisprobablygood.

Below,I’llexplainhowtobuildwithVisualC++andMinGW,butonlyabouttheoutlines.FormoredetailedexplanationsandhowtobuildwithBorlandC++Compiler,theyareincludedindoc/build.htmloftheattachedCD-ROM,thusI’dlikeyoutocheckitwhenitisnecessary.

VisualC++ItissaidVisualC++,butusuallyIDEisnotused,we’llbuildfromDOSprompt.Inthiscase,firstweneedtoinitializeenvironmentvariablestobeabletorunVisualC++itself.SinceabatchfileforthispurposecamewithVisualC++,let’sexecuteitfirst.

C:\>cd"\ProgramFiles\MicrosoftVisualStudio.NET\Vc7\bin"C:\ProgramFiles\MicrosoftVisualStudio.NET\Vc7\bin>vcvars32

ThisisthecaseofVisualC++.NET.Ifitisversion6,itcanbefoundinthefollowingplace.

C:\ProgramFiles\MicrosoftVisualStudio\VC98\bin\

Afterexecutingvcvars32,allyouhavetodoistomovetothewin32\folderofthesourcetreeofrubyandbuild.Below,letussupposethesourcetreeisinC:\src.

C:\>cdsrc\rubyC:\src\ruby>cdwin32C:\src\ruby\win32>configureC:\src\ruby\win32>nmakeC:\src\ruby\win32>nmakeDESTDIR="C:\ProgramFiles\ruby"install

Then,rubycommandwouldbeinstalledinC:\ProgramFiles\ruby\bin\,andRubylibrarieswouldbeinC:\ProgramFiles\ruby\lib\.Becauserubydoesnotuseregistriesandsuchatall,youcanuninstallitbydeletingC:\ProgramFiles\rubyandbelow.

MinGWAsdescribedbefore,MinGWisonlyanenvironmenttocompile,thusthegeneralUNIXtoolslikesedorsharenotavailable.However,becausetheyarenecessarytobuildruby,youneedtoobtainitfromsomewhere.Forthis,therearealsotwomethods:CygwinandMSYS(MinimalSYStem).

However,Ican’trecommendMSYSbecausetroubleswerecontinuouslyhappenedatthebuildingcontestperformedbeforethepublicationofthisbook.Onthecontrary,inthewayofusingCygwin,itcanpassverystraightforwardly.Therefore,inthisbook,I’llexplainthewayofusingCygwin.

First,installMinGWandtheentiredevelopingtoolsbyusingsetup.exeofCygwin.BothCygwinandMinGWarealsoincludedintheattachedCD-ROM.\footnote{CygwinandMinGW……Seealsodoc/win.htmloftheattachedCD-ROM}Afterthat,allyouhavetodoistotypeasfollowsfrombashpromptofCygwin.

~/src/ruby%./configure--with-gcc='gcc-mno-cygwin'\--enable-sharedi386-mingw32~/src/ruby%make~/src/ruby%makeinstall

That’sit.Herethelineofconfigurespansmulti-linesbutinpracticewe’dwriteitononelineandthebackslashisnotnecessary.Theplacetoinstallis\usr\local\andbelowofthedriveonwhichitiscompiled.Becausereallycomplicatedthingsoccuraroundhere,theexplanationwouldbefairlylong,soI’llexplainitcomprehensivelyindoc/build.htmloftheattachedCD-ROM.

BuildingDetails

Untilhere,ithasbeentheREADME-likedescription.Thistime,let’slookatexactlywhatisdonebywhatwehavebeendone.However,thetalksherepartiallyrequireveryhigh-levelknowledge.Ifyoucan’tunderstand,I’dlikeyoutoskipthisanddirectlyjumptothenextsection.Thisshouldbewrittensothatyoucanunderstandbycomingbackafterreadingtheentirebook.

Now,onwhicheverplatform,buildingrubyisseparatedintothreephases.Namely,configure,makeandmakeinstall.Asconsideringtheexplanationaboutmakeinstallunnecessary,I’llexplaintheconfigurephaseandthemakephase.

configure

First,configure.Itscontentisashellscript,andwedetectthesystemparametersbyusingit.Forexample,“whetherthere’stheheaderfilesetjmp.h”or“whetheralloca()isavailable”,thesethingsarechecked.Thewaytocheckisunexpectedlysimple.

Targettocheck Method

commands executeitactuallyandthencheck$?headerfiles if[-f$includedir/stdio.h]

functions compileasmallprogramandcheckwhetherlinkingissuccess

Whensomedifferencesaredetected,somehowitshouldbereportedtous.Thewaytoreportis,thefirstwayisMakefile.IfweputaMakefile.ininwhichparametersareembeddedintheformof@param@,itwouldgenerateaMakefileinwhichtheyaresubstitutedwiththeactualvalues.Forexample,asfollows,

Makefile.in:CFLAGS=@CFLAGS@↓Makefile:CFLAGS=-g-O2

Alternatively,itwritesouttheinformationabout,forinstance,

whethertherearecertainfunctionsorparticularheaderfiles,intoaheaderfile.Becausetheoutputfilenamecanbechanged,itisdifferentdependingoneachprogram,butitisconfig.hinruby.I’dlikeyoutoconfirmthisfileiscreatedafterexecutingconfigure.Itscontentissomethinglikethis.

▼config.h

::#defineHAVE_SYS_STAT_H1#defineHAVE_STDLIB_H1#defineHAVE_STRING_H1#defineHAVE_MEMORY_H1#defineHAVE_STRINGS_H1#defineHAVE_INTTYPES_H1#defineHAVE_STDINT_H1#defineHAVE_UNISTD_H1#define_FILE_OFFSET_BITS64#defineHAVE_LONG_LONG1#defineHAVE_OFF_T1#defineSIZEOF_INT4#defineSIZEOF_SHORT2::

Eachmeaningiseasytounderstand.HAVE_xxxx_Hprobablyindicateswhetheracertainheaderfileexists,SIZEOF_SHORTmustindicatethesizeoftheshorttypeofC.Likewise,SIZEOF_INTindicatesthebytelengthofint,HAVE_OFF_Tindicateswhethertheoffset_ttypeisdefinedornot.

Aswecanunderstandfromtheabovethings,configuredoesdetectthedifferencesbutitdoesnotautomaticallyabsorbthedifferences.

Bridgingthedifferenceislefttoeachprogrammer.Forexample,asfollows,

▼AtypicalusageoftheHAVE_macro

24#ifdefHAVE_STDLIB_H25#include<stdlib.h>26#endif

(ruby.h)

autoconf

configureisnotaruby-specifictool.Whethertherearefunctions,thereareheaderfiles,…itisobviousthatthesetestshaveregularity.Itiswastefulifeachpersonwhowritesaprogramwroteeachowndistincttool.

Hereatoolnamedautoconfcomesin.Inthefilesnamedconfigure.inorconfigure.ac,writeabout“I’dliketodothesechecks”,processitwithautoconf,thenanadequateconfigurewouldbegenerated.The.inofconfigure.inisprobablyanabbreviationofinput.It’sthesameastherelationshipbetweenMakefileandMakefile.in..acis,ofcourse,anabbreviationofAutoConf.

Toillustratethistalkupuntilhere,itwouldbelikeFigure1.

Figure1:TheprocessuntilMakefileiscreated

Forthereaderswhowanttoknowmoredetails,Irecommend“GNUAutoconf/Automake/Libtool”GaryV.Vaughan,BenElliston,TomTromey,IanLanceTaylor.

Bytheway,ruby‘sconfigureis,assaidbefore,generatedbyusingautoconf,butnotalltheconfigureinthisworldaregeneratedwithautoconf.Itcanbewrittenbyhandoranothertooltoautomaticallygeneratecanbeused.Anyway,it’ssufficientifultimatelythereareMakefileandconfig.handmanyothers.

make

Atthesecondphase,make,whatisdone?Ofcourse,itwouldcompilethesourcecodeofruby,butwhenlookingattheoutputofmake,Ifeelliketherearemanyotherthingsitdoes.I’llbrieflyexplaintheprocessofit.

1. compilethesourcecodecomposingrubyitself2. createthestaticlibrarylibruby.agatheringthecrucialpartsof

ruby

3. create“miniruby”,whichisanalwaysstatically-linkedruby

4. createthesharedlibrarylibruby.sowhen--enable-shared5. compiletheextensionlibraries(underext/)byusingminiurby6. Atlast,generatetherealruby

Therearetworeasonswhyitcreatesminirubyandrubyseparately.Thefirstoneisthatcompilingtheextensionlibrariesrequiresruby.Inthecasewhen--enable-shared,rubyitselfisdynamicallylinked,thusthere’sapossibilitynotbeabletoruninstantlybecauseoftheloadpathsofthelibraries.Therefore,createminiruby,whichisstaticallylinked,anduseitduringthebuildingprocess.

Thesecondreasonis,inaplatformwherewecannotusesharedlibraries,there’sacasewhentheextensionlibrariesarestaticallylinkedtorubyitself.Inthiscase,itcannotcreaterubybeforecompilingallextensionlibraries,buttheextensionlibrariescannotbecompiledwithoutruby.Inordertoresolvethisdilemma,itusesminiruby.

CVS

TherubyarchiveincludedintheattachedCD-ROMis,asthesameastheofficialreleasepackage,justasnapshotwhichisanappearanceatjustaparticularmomentofruby,whichisacontinuouslychangingprogram.Howrubyhasbeenchanged,whyithasbeenso,thesethingsarenotdescribedthere.Thenwhatis

thewaytoseetheentirepictureincludingthepast.WecandoitbyusingCVS.

AboutCVSCVSisshortlyanundolistofeditors.IfthesourcecodeisunderthemanagementofCVS,thepastappearancecanberestoredanytime,andwecanunderstandwhoandwhereandwhenandhowchangeditimmediatelyanytime.GenerallyaprogramdoingsuchjobiscalledsourcecodemanagementsystemandCVSisthemostfamousopen-sourcesourcecodemanagementsysteminthisworld.

SincerubyisalsomanagedwithCVS,I’llexplainalittleaboutthemechanismandusageofCVS.First,themostimportantideaofCVSisrepositoryandworking-copy.IsaidCVSissomethinglikeanundolistofeditor,inordertoarchivethis,therecordsofeverychanginghistoryshouldbesavedsomewhere.Theplacetostoreallofthemis“CVSrepository”.

Directlyspeaking,repositoryiswhatgathersallthepastsourcecodes.Ofcourse,thisisonlyaconcept,inreality,inordertosavespaces,itisstoredintheformofonerecentappearanceandthechangingdifferences(namely,batches).Inanyways,itissufficientifwecanobtaintheappearanceofaparticularfileofaparticularmomentanytime.

Ontheotherhand,“workingcopy”istheresultoftakingfilesfromtherepositorybychoosingacertainpoint.There’sonlyone

repository,butyoucanhavemultipleworkingcopies.(Figure2)

Figure2:Repositoryandworkingcopies

Whenyou’dliketomodifythesourcecode,firsttakeaworkingcopy,edititbyusingeditorandsuch,and“return”it.Then,thechangeisrecordedtotherepository.Takingaworkingcopyfromtherepositoryiscalled“checkout”,returningiscalled“checkin”or“commit”(Figure3).Bycheckingin,thechangeisrecordedtotherepository,thenwecanobtainitanytime.

Figure3:CheckinandCheckout

ThebiggesttraitofCVSiswecanaccessitoverthenetworks.Itmeans,ifthere’sonlyoneserverwhichholdstherepository,everyonecancheckin/checkoutovertheinternetanytime.Butgenerallytheaccesstocheckinisrestrictedandwecan’tdoitfreely.

RevisionHowcanwedotoobtainacertainversionfromtherepository?Onewayistospecifywithtime.Byrequiring“givemetheedgeversionofthattime”,itwouldselectit.Butinpractice,werarelyspecifywithtime.Mostcommonly,weusesomethingnamed“revision”.

“Revision”and“Version”havethealmostsamemeaning.Butusually“version”isattachedtotheprojectitself,thususingtheword“version”canbeconfusing.Therefore,theword“revision”isusedtoindicateabitsmallerunit.

InCVS,thefilejuststoredintherepositoryisrevision1.1.Checkingoutit,modifyingit,checkinginit,thenitwouldberevision1.2.Nextitwouldbe1.3then1.4.

AsimpleusageexampleofCVSKeepinginmindtheabovethings,I’lltalkabouttheusageofCVSveryverybriefly.First,cvscommandisessential,soI’dlikeyoutoinstallitbeforehand.ThesourcecodeofcvsisincludedintheattachedCD-ROM\footnote{cvs:archives/cvs-1.11.2.tar.gz}.Howtoinstallcvsisreallyfarfromthemainline,thusitwon’tbeexplainedhere.

Afterinstallingit,let’scheckoutthesourcecodeofrubyasanexperiment.Typethefollowingcommandswhenyouareonline.

%cvs-d:pserver:anonymous@cvs.ruby-lang.org:/srcloginCVSPassword:anonymous%cvs-d:pserver:anonymous@cvs.ruby-lang.org:/srccheckoutruby

Anyoptionswerenotspecified,thustheedgeversionwouldbeautomaticallycheckedout.Thetrulyedgeversionofrubymustappearunderruby/.

Additionally,ifyou’dliketoobtaintheversionofacertainday,youcanuse-Doptionofcvscheckout.Bytypingasfollows,youcanobtainaworkingcopyoftheversionwhichisbeingexplainedbythisbook.

%cvs-d:pserver:anonymous@cvs.ruby-lang.org:/srccheckout-D2002-09-12ruby

Atthismoment,youhavetowriteoptionsimmediatelyaftercheckout.Ifyouwrote“ruby”first,itwouldcauseastrangeerrorcomplaining“missingamodule”.

And,withtheanonymousaccesslikethisexample,wecannotcheckin.Inordertopracticecheckingin,it’sgoodtocreatea(local)repositoryandstorea“Hello,World!”programinit.Theconcretewaytostoreisnotexplainedhere.Themanualcomingwithcvsisfairlyfriendly.RegardingbookswhichyoucanreadinJapanese,Irecommendtranslated“OpenSourceDevelopmentwithCVS”KarlFogel,MosheBar.

Thecompositionofruby

ThephysicalstructureNowitistimetostarttoreadthesourcecode,butwhatisthethingweshoulddofirst?Itislookingoverthedirectorystructure.Inmostcases,thedirectorystructure,meaningthesourcetree,directlyindicatethemodulestructureoftheprogram.Abruptlysearchingmain()byusinggrepandreadingfromthetopinitsprocessingorderisnotsmart.Ofcoursefindingoutmain()isalsoimportant,butfirstlet’staketimetodolsorheadtograspthewholepicture.

BelowistheappearanceofthetopdirectoryimmediatelyaftercheckingoutfromtheCVSrepository.Whatendwithaslasharesubdirectories.

COPYINGcompar.cgc.cnumeric.csample/COPYING.jaconfig.guesshash.cobject.csignal.cCVS/config.subinits.cpack.csprintf.cChangeLogconfigure.ininstall-shparse.yst.cGPLcygwin/instruby.rbprec.cst.hLEGALdefines.hintern.hprocess.cstring.cLGPLdir.cio.crandom.cstruct.cMANIFESTdjgpp/keywordsrange.ctime.cMakefile.indln.clex.cre.cutil.cREADMEdln.hlib/re.hutil.hREADME.EXTdmyext.cmain.cregex.cvariable.cREADME.EXT.jadoc/marshal.cregex.hversion.cREADME.jaenum.cmath.cruby.1version.hToDoenv.hmisc/ruby.cvms/array.cerror.cmissing/ruby.hwin32/bcc32/eval.cmissing.hrubyio.hx68/

bignum.cext/mkconfig.rbrubysig.hclass.cfile.cnode.hrubytest.rb

Recentlythesizeofaprogramitselfhasbecomelarger,andtherearemanysoftwareswhosesubdirectoriesaredividedintopieces,butrubyhasbeenconsistentlyusedthetopdirectoryforalongtime.Itbecomesproblematiciftherearetoomanyfiles,butwecangetusedtothisamount.

Thefilesatthetoplevelcanbecategorizedintosix:

documentsthesourcecodeofrubyitselfthetooltobuildrubystandardextensionlibrariesstandardRubylibrariestheothers

Thesourcecodeandthebuildtoolareobviouslyimportant.Asidefromthem,I’lllistupwhatseemsusefulforus.

ChangeLog

Therecordsofchangesonruby.Thisisveryimportantwheninvestigatingthereasonofacertainchange.

README.EXTREADME.EXT.ja

Howtocreateanextensionlibraryisdescribed,butinthecourseofit,thingsrelatingtotheimplementationofrubyitselfarealso

written.

DissectingSourceCodeFromnowon,I’llfurthersplitthesourcecodeofrubyitselfintomoretinypieces.Asforthemainfiles,itscategorizationisdescribedinREADME.EXT,thusI’llfollowit.Regardingwhatisnotdescribed,Icategorizeditbymyself.

RubyLanguageCoreclass.c classrelatingAPIerror.c exceptionrelatingAPIeval.c evaluatorgc.c garbagecollectorlex.c reservedwordtableobject.c objectsystemparse.y parservariable.c constants,globalvariables,classvariablesruby.h Themainmacrosandprototypesofruby

intern.htheprototypesofCAPIofruby.internseemstobeanabbreviationofinternal,butthefunctionswrittenherecanbeusedfromextensionlibraries.

rubysig.h theheaderfilecontainingthemacrosrelatingtosignalsnode.h thedefinitionsrelatingtothesyntaxtreenodes

env.h thedefinitionsofthestructstoexpressthecontextoftheevaluator

Thepartstocomposethecoreoftherubyinterpretor.Themostofthefileswhichwillbeexplainedinthisbookarecontainedhere.If

youconsiderthenumberofthefilesoftheentireruby,itisreallyonlyafew.Butifyouthinkbasedonthebytesize,50%oftheentireamountisoccupiedbythesefiles.Especially,eval.cis200KB,parse.yis100KB,thesefilesarelarge.

Utilitydln.c dynamicloaderregex.c regularexpressionenginest.c hashtableutil.c librariesforradixconversionsandsortandsoon

Itmeansutilityforruby.However,someofthemaresolargethatyoucannotimagineitfromtheword“utility”.Forinstance,regex.cis120KB.

Implementationofrubycommanddmyext.c dummyoftheroutinetoinitializeextensionlibraries(

DumMYEXTension)

inits.c theentrypointforcoreandtheroutinetoinitializeextensionlibraries

main.c theentrypointofrubycommand(thisisunnecessaryforlibruby)

ruby.c themainpartofrubycommand(thisisalsonecessaryforlibruby)

version.c theversionofruby

Theimplementationofrubycommand,whichisofwhentypingrubyonthecommandlineandexecuteit.Thisisthepart,forinstance,tointerpretthecommandlineoptions.Asidefromruby

command,asthecommandsutilizingrubycore,therearemod_rubyandvim.Thesecommandsarefunctioningbylinkingtothelibrubylibrary(.a/.so/.dllandsoon).

ClassLibrariesarray.c classArraybignum.c classBignumcompar.c moduleComparabledir.c classDirenum.c moduleEnumerablefile.c classFilehash.c classHash(Itsactualbodyisst.c)io.c classIOmarshal.c moduleMarshalmath.c moduleMathnumeric.c classNumeric,Integer,Fixnum,Floatpack.c Array#pack,String#unpackprec.c modulePrecisionprocess.c moduleProcessrandom.c Kernel#srand(),rand()range.c classRangere.c classRegexp(Itsactualbodyisregex.c)signal.c moduleSignalsprintf.c ruby-specificsprintf()string.c classStringstruct.c classStructtime.c classTime

TheimplementationsoftheRubyclasslibraries.WhatlistedherearebasicallyimplementedinthecompletelysamewayastheordinaryRubyextensionlibraries.Itmeansthattheselibrariesare

alsoexamplesofhowtowriteanextensionlibrary.

Filesdependingonaparticularplatformbcc32/ BorlandC++(Win32)beos/ BeOScygwin/ Cygwin(theUNIXsimulationlayeronWin32)djgpp/ djgpp(thefreedevelopingenvironmentforDOS)vms/ VMS(anOShadbeenreleasedbyDECbefore)win32/ VisualC++(Win32)x68/ SharpX680x0series(OSisHuman68k)

Eachplatform-specificcodeisstored.

fallbackfunctionsmissing/

Filestooffsetthefunctionswhicharemissingoneachplatform.Mainlyfunctionsoflibc.

LogicalStructureNow,therearetheabovefourgroupsandthecorecanbedividedfurtherintothree:First,“objectspace”whichcreatestheobjectworldofRuby.Second,“parser”whichconvertsRubyprograms(intext)totheinternalformat.Third,“evaluator”whichdrivesRubyprograms.Bothparserandevaluatorarecomposedaboveobjectspace,parserconvertsaprogramintotheinternalformat,andevaluatoractuatestheprogram.Letmeexplaintheminorder.

ObjectSpaceThefirstoneisobjectspace.Thisisveryeasytounderstand.Itisbecauseallofwhatdealtwithbythisarebasicallyonthememory,thuswecandirectlyshowormanipulatethembyusingfunctions.Therefore,inthisbook,theexplanationwillstartwiththispart.Part1isfromchapter2tochapter7.

ParserThesecondoneisparser.Probablysomepreliminaryexplanationsarenecessaryforthis.

rubycommandistheinterpretorofRubylanguage.Itmeansthatitanalyzestheinputwhichisatextoninvocationandexecutesitbyfollowingit.Therefore,rubyneedstobeabletointerpretthemeaningoftheprogramwrittenasatext,butunfortunatelytextisveryhardtounderstandforcomputers.Forcomputers,textfilesaremerelybytesequencesandnothingmorethanthat.Inordertocomprehendthemeaningoftextfromit,somespecialgimmickisnecessary.Andthegimmickisparser.Bypassingthroughparser,(atextas)aRubyprogramwouldbeconvertedintotheruby-specificinternalexpressionwhichcanbeeasilyhandledfromtheprogram.

Theinternalexpressioniscalled“syntaxtree”.Syntaxtreeexpressesaprogrambyatreestructure,forinstance,figure4showshowanifstatementisexpressed.

Figure4:anifstatementanditscorrespondingsyntaxtree

ParserwillbedescribedinPart2“SyntacticAnalysis”.Part2isfromchapter10tochapter12.Itstargetfileisonlyparse.y.

EvaluatorObjectsareeasytounderstandbecausetheyaretangible.Alsoregardingparser,Whatitdoesisultimatelyconvertingadataformatintoanotherone,soit’sreasonablyeasytounderstand.However,thethirdone,evaluator,thisiscompletelyelusive.

Whatevaluatordoesis“executing”aprogrambyfollowingasyntaxtree.Thissoundseasy,butwhatis“executing”?Toanswerthisquestionpreciselyisfairlydifficult.Whatis“executinganifstatement”?Whatis“executingawhilestatement”?Whatdoes“assigningtoalocalvariable”mean?Wecannotunderstandevaluatorwithoutansweringallofsuchquestionsclearlyand

precisely.

Inthisbook,evaluatorwillbediscussedinPart3“Evaluate”.Itstargetfileiseval.c.evalisanabbreviationof“evaluator”.

Now,I’vedescribedbrieflyaboutthestructureofruby,howevereventhoughtheideaswereexplained,itdoesnotsomuchhelpusunderstandthebehaviorofprogram.Inthenextchapter,we’llstartwithactuallyusingruby.

TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera

CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License

RubyHackingGuide

TranslatedbySebastianKrause

Chapter1:Introduction

AMinimalIntroductiontoRuby

HeretheRubyprerequisitesareexplained,whichoneneedstoknowinordertounderstandthefirstsection.Iwon’tpointoutprogrammingtechniquesorpointsoneshouldbecarefulabout.Sodon’tthinkyou’llbeabletowriteRubyprogramsjustbecauseyoureadthischapter.ReaderswhohavepriorexperiencewithRubycanskipthischapter.

Wewilltalkaboutgrammarextensivelyinthesecondsection,henceIwon’tdelveintothefinerpointsofgrammarhere.FromhashliteralsandsuchI’llshowonlythemostwidelyusednotations.OnprincipleIwon’tomitthingsevenifIcan.Thiswaythesyntaxbecomesmoresimple.Iwon’talwayssay“Wecanomitthis”.

Objects

Strings

EverythingthatcanbemanipulatedinaRubyprogramisanobject.TherearenoprimitivesasJava’sintandlong.Forinstanceifwewriteasbelowitdenotesastringobjectwithcontentcontent.

"content"

Icasuallycalleditastringobjectbuttobeprecisethisisanexpressionwhichgeneratesastringobject.Thereforeifwewriteitseveraltimeseachtimeanotherstringobjectisgenerated.

"content""content""content"

Herethreestringobjectswithcontentcontentaregenerated.

Bytheway,objectsjustexistingtherecan’tbeseenbyprogrammers.Let’sshowhowtoprintthemontheterminal.

p("content")#Shows"content"

Everythingafteran#isacomment.Fromnowon,I’llputtheresultofanexpressioninacommentbehind.

p(……)callsthefunctionp.Itdisplaysarbitraryobjects“assuch”.It’sbasicallyadebuggingfunction.

Preciselyspeaking,therearenofunctionsinRuby,butjustfornowwecanthinkofitasafunction.Youcanusefunctionswhereveryouare.

VariousLiteralsNow,let’sexplainsomemoretheexpressionswhichdirectlygenerateobjects,theso-calledliterals.Firsttheintegersandfloatingpointnumbers.

#Integer121009999999999999999999999999#Arbitrarilybigintegers

#Float1.099.9991.3e4#1.3×10^4

Don’tforgetthattheseareallexpressionswhichgenerateobjects.I’mrepeatingmyselfbuttherearenoprimitivesinRuby.

Belowanarrayobjectisgenerated.

[1,2,3]

Thisprogramgeneratesanarraywhichconsistsofthethreeintegers1,2and3inthatorder.Astheelementsofanarraycanbearbitraryobjectsthefollowingisalsopossible.

[1,"string",2,["nested","array"]]

Andfinally,ahashtableisgeneratedbytheexpressionbelow.

{"key"=>"value","key2"=>"value2","key3"=>"value3"}

Ahashtableisastructurewhichexpressesone-to-onerelationshipsbetweenarbitraryobjects.Theabovelinecreatesatablewhichstoresthefollowingrelationships.

"key"→"value""key2"→"value2""key3"→"value3"

Ifweaskahashtablecreatedinthisway“What’scorrespondingtokey?”,it’llanswer“That’svalue.”Howcanweask?Weusemethods.

MethodCallsWecancallmethodsonanobject.InC++Jargontheyarememberfunctions.Idon’tthinkit’snecessarytoexplainwhatamethodis.I’lljustexplainthenotation.

"content".upcase()

Heretheupcasemethodiscalledonastringobject(withcontentcontent).Asupcaseisamethodwhichreturnsanewstringwiththesmalllettersreplacedbycapitalletters,wegetthefollowingresult.

p("content".upcase())#Shows"CONTENT"

Methodcallscanbechained.

"content".upcase().downcase()

Herethemethoddowncaseiscalledonthereturnvalueof"content".upcase().

Therearenopublicfields(membervariables)asinJavaorC++.Theobjectinterfaceconsistsofmethodsonly.

TheProgram

TopLevelInRubywecanjustwriteexpressionsanditbecomesaprogram.Onedoesn’tneedtodefineamain()asinC++orJava.

p("content")

ThisisacompleteRubyprogram.Ifweputthisintoafilecalledfirst.rbwecanexecuteitfromthecommandlineasfollows.

%rubyfirst.rb"content"

Withthe-eoptionoftherubyprogramwedon’tevenneedtocreateafile.

%ruby-e'p("content")'"content"

Bytheway,theplacewherepiswrittenisthelowestnestingleveloftheprogram,itmeansthehighestlevelfromtheprogram’sstandpoint,thusit’scalled“top-level”.Havingtop-levelisacharacteristictraitofRubyasascriptinglanguage.

InRuby,onelineisusuallyonestatement.Asemicolonattheendisn’tnecessary.Thereforetheprogrambelowisinterpretedasthreestatements.

p("content")p("content".upcase())p("CONTENT".downcase())

Whenweexecuteititlookslikethis.

%rubysecond.rb"content""CONTENT""content"

LocalVariablesInRubyallvariablesandconstantsstorereferencestoobjects.That’swhyonecan’tcopythecontentbyassigningonevariabletoanothervariable.VariablesoftypeObjectinJavaorpointerstoobjectsinC++aregoodtothinkof.However,youcan’tchangethevalueofeachpointeritself.

InRubyonecantelltheclassification(scope)ofavariablebythebeginningofthename.Localvariablesstartwithasmallletteror

anunderscore.Onecanwriteassignmentsbyusing“=”.

str="content"arr=[1,2,3]

Aninitialassignmentservesasdeclaration,anexplicitdeclarationisnotnecessary.Becausevariablesdon’thavetypes,wecanassignanykindofobjectsindiscriminately.Theprogrambelowiscompletelylegal.

lvar="content"lvar=[1,2,3]lvar=1

Butevenifwecan,wedon’thavetodoit.Ifdifferentkindofobjectsareputinonevariable,ittendstobecomedifficulttoread.InarealworldRubyprogramonedoesn’tdothiskindofthingswithoutagoodreason.Theabovewasjustanexampleforthesakeofit.

Variablereferencehasalsoaprettysensiblenotation.

str="content"p(str)#Shows"content"

Inadditionlet’scheckthepointthatavariableholdareferencebytakinganexample.

a="content"b=ac=b

Afterweexecutethisprogramallthreelocalvariablesabcpointtothesameobject,astringobjectwithcontent"content"createdonthefirstline(Figure1).

Figure1:Rubyvariablesstorereferencestoobjects

Bytheway,asthesevariablesarecalledlocal,theyshouldbelocaltosomewhere,butwecannottalkaboutthisscopewithoutreadingabitfurther.Let’ssayfornowthatthetoplevelisonelocalscope.

ConstantsConstantsstartwithacapitalletter.Theycanonlybeassignedonce(attheircreation).

Const="content"PI=3.1415926535

p(Const)#Shows"content"

I’dliketosaythatifweassigntwiceanerroroccurs.Butthereisjustawarning,notanerror.ItisinthiswayinordertoavoidraisinganerrorevenwhenthesamefileisloadedtwiceinapplicationsthatmanipulateRubyprogramitself,forinstanceindevelopmentenvironments.Therefore,itisallowedduetopracticalrequirementsandthere’snootherchoice,butessentiallythereshouldbeanerror.Infact,upuntilversion1.1therereallywasanerror.

C=1C=2#Thereisawarningbutideallythereshouldbeanerror.

Alotofpeoplearefooledbythewordconstant.Aconstantonlydoesnotswitchobjectsonceitisassigned.Butitdoesnotmeanthepointedobjectitselfwon’tchange.Theterm“readonly”mightcapturetheconceptbetterthan“constant”.

Bytheway,toindicatethatanobjectitselfshouldn’tbechangedanothermeansisused:freeze.

Figure2:constantmeansreadonly

Andthescopeofconstantsisactuallyalsocannotbedescribedyet.Itwillbediscussedlaterinthenextsectionmixingwithclasses.

ControlStructuresSinceRubyhasawideabundanceofcontrolstructures,justliningupthemcanbeahugetask.Fornow,Ijustmentionthatthereareifandwhile.

ifi<10then#bodyend

whilei<10do#bodyend

Inaconditionalexpression,onlythetwoobjects,falseandnil,arefalseandallothervariousobjectsaretrue.0ortheemptystringarealsotrueofcourse.

Itwouldn’tbewiseiftherewerejustfalse,thereisalsotrue.Anditisofcoursetrue.

ClassesandMethods

ClassesInobjectorientedsystem,essentiallymethodsbelongtoobjects.Itcanholdonlyinaidealworld,though.Inanormalprogramtherearealotofobjectswhichhavethesamesetofmethods,itwouldbe

anenormousworkifeachobjectrememberthesetofcallablemethods.Usuallyamechanismlikeclassesormultimethodsisusedtogetridoftheduplicationofdefinitions.

InRuby,asthetraditionalwaytobindobjectsandmethodstogether,theconceptofclassesisused.Namelyeveryobjectbelongstoaclass,themethodswhichcanbecalledaredeterminedbytheclass.Andinthisway,anobjectiscalled“aninstanceoftheXXclass”.

Forexamplethestring"str"isaninstanceoftheStringclass.AndonthisStringclassthemethodsupcase,downcase,stripandmanyothersaredefined.Soitlooksasifeachstringobjectcanrespondtoallthesemethods.

#TheyallbelongtotheStringclass,#hencethesamemethodsaredefined"content".upcase()"Thisisapen.".upcase()"chapterII".upcase()

"content".length()"Thisisapen.".length()"chapterII".length()

Bytheway,whathappensifthecalledmethodisn’tdefined?InastaticlanguageacompilererroroccursbutinRubythereisaruntimeexception.Let’stryitout.Forthiskindofprogramsthe-eoptionishandy.

%ruby-e'"str".bad_method()'-e:1:undefinedmethod`bad_method'for"str":String(NoMethodError)

Whenthemethodisn’tfoundthere’sapparentlyaNoMethodError.

Alwayssaying“theupcasemethodofString”andsuchiscumbersome.Let’sintroduceaspecialnotationString#upcasereferstothemethodupcasedefinedintheclassString.

Bytheway,ifwewriteString.upcaseithasacompletelydifferentmeaningintheRubyworld.Whatcouldthatbe?Iexplainitinthenextparagraph.

ClassDefinitionUptonowwetalkedaboutalreadydefinedclasses.Wecanofcoursealsodefineourownclasses.Todefineclassesweusetheclassstatement.

classCend

ThisisthedefinitionofanewclassC.Afterwedefineditwecanuseitasfollows.

classCendc=C.new()#createaninstanceofCandassignittothevariablec

NotethatthenotationforcreatinganewinstanceisnotnewC.Theastutereadermightthink:Hmm,thisC.new()reallylookslikeamethodcall.InRubytheobjectgeneratingexpressionsareindeed

justmethods.

InRubyclassnamesandconstantnamesarethesame.Then,whatisstoredintheconstantwhosenameisthesameasaclassname?Infact,it’stheclass.InRubyallthingswhichaprogramcanmanipulateareobjects.Soofcourseclassesarealsoexpressedasobjects.Let’scalltheseclassobjects.EveryclassisaninstanceoftheclassClass.

Inotherwordsaclassstatementcreatesanewclassobjectanditassignsaconstantnamedwiththeclassnametotheclass.Ontheotherhandthegenerationofaninstancereferencesthisconstantandcallsamethodonthisobject(usuallynew).Ifwelookattheexamplebelow,it’sprettyobviousthatthecreationofaninstancedoesn’tdifferfromanormalmethodcall.

S="content"classCend

S.upcase()#GettheobjecttheconstantSpointstoandcallupcaseC.new()#GettheobjecttheconstantCpointstoandcallnew

SonewisnotareservedwordinRuby.

Andwecanalsousepforaninstanceofaclassevenimmediatelyafteritscreation.

classCend

c=C.new()

p(c)##<C:0x2acbd7e4>

Itwon’tdisplayasnicelyasastringoranintegerbutitshowsitsrespectiveclassandit’sinternalID.ThisIDisthepointervaluewhichpointstotheobject.

Oh,Icompletelyforgottomentionaboutthenotationofmethodnames:Object.newmeanstheclassobjectObjectandthenewmethodcalledontheclassitself.SoObject#newandObject.newarecompletelydifferentthings,wehavetoseparatethemstrictly.

obj=Object.new()#Object.newobj.new()#Object#new

InpracticeamethodObject#newisalmostneverdefinedsothesecondlinewillreturnanerror.Pleaseregardthisasanexampleofthenotation.

MethodDefinitionEvenifwecandefineclasses,itisuselessifwecannotdefinemethods.Let’sdefineamethodforourclassC.

classCdefmyupcase(str)returnstr.upcase()endend

Todefineamethodweusethedefstatement.Inthisexamplewe

definedthemethodmyupcase.Thenameoftheonlyparameterisstr.Aswithvariables,it’snotnecessarytowriteparametertypesorthereturntype.Andwecanuseanynumberofparameters.

Let’susethedefinedmethod.Methodsareusuallycalledfromtheoutsidebydefault.

c=C.new()result=c.myupcase("content")p(result)#Shows"CONTENT"

Ofcourseifyougetusedtoityoudon’tneedtoassigneverytime.Thelinebelowgivesthesameresult.

p(C.new().myupcase("content"))#Alsoshows"CONTENT"

self

Duringtheexecutionofamethodtheinformationaboutwhoisitself(theinstanceonwhichthemethodwascalled)isalwayssavedandcanbepickedupinself.LikethethisinC++orJava.Let’scheckthisout.

classCdefget_self()returnselfendend

c=C.new()p(c)##<C:0x40274e44>p(c.get_self())##<C:0x40274e44>

Aswesee,theabovetwoexpressionsreturntheexactsameobject.Wecouldconfirmthatselfiscduringthemethodcallonc.

Thenwhatisthewaytocallamethodonitself?Whatfirstcomestomindiscallingviaself.

classCdefmy_p(obj)self.real_my_p(obj)#calledamethodagainstoneselfend

defreal_my_p(obj)p(obj)endend

C.new().my_p(1)#Output1

Butalwaysaddingtheselfwhencallinganownmethodistedious.Hence,itisdesignedsothatonecanomitthecalledmethod(thereceiver)wheneveronecallsamethodonself.

classCdefmy_p(obj)real_my_p(obj)#Youcancallwithoutspecifyingthereceiverend

defreal_my_p(obj)p(obj)endend

C.new().my_p(1)#Output1

InstanceVariables

Asthereareasaying“Objectsaredataandcode”,justbeingabletodefinemethodsalonewouldbenotsouseful.Eachobjectmustalsobeabletotostoredata.Inotherwordsinstancevariables.OrinC++jargonmembervariables.

InthefashionofRuby’svariablenamingconvention,thevariabletypecanbedeterminedbythefirstafewcharacters.Forinstancevariablesit’san@.

classCdefset_i(value)@i=valueend

defget_i()return@iendend

c=C.new()c.set_i("ok")p(c.get_i())#Shows"ok"

Instancevariablesdifferabitfromthevariablesseenbefore:Wecanreferencethemwithoutassigning(defining)them.Toseewhathappensweaddthefollowinglinestothecodeabove.

c=C.new()p(c.get_i())#Showsnil

Callinggetwithoutsetgivesnil.nilistheobjectwhichindicates“nothing”.It’smysteriousthatthere’sreallyanobjectbutitmeansnothing,butthat’sjustthewayitis.

Wecanusenillikealiteralaswell.

p(nil)#Showsnil

initialize

Aswesawbefore,whenwecall‘new’onafreshlydefinedclass,wecancreateaninstance.That’ssure,butsometimeswemightwanttohaveapeculiarinstantiation.Inthiscasewedon’tchangethenewmethod,wedefinetheinitializemethod.Whenwedothis,itgetscalledwithinnew.

classCdefinitialize()@i="ok"enddefget_i()return@iendendc=C.new()p(c.get_i())#Shows"ok"

Strictlyspeakingthisisthespecificationofthenewmethodbutnotthespecificationofthelanguageitself.

InheritanceClassescaninheritfromotherclasses.ForinstanceStringinheritsfromObject.Inthisbook,we’llindicatethisrelationbyaverticalarrowasinFig.3.

Figure3:Inheritance

Inthecaseofthisillustration,theinheritedclass(Object)iscalledsuperclassorsuperiorclass.Theinheritingclass(String)iscalledsubclassorinferiorclass.ThispointdiffersfromC++jargon,becareful.Butit’sthesameasinJava.

Anywaylet’stryitout.Letourcreatedclassinheritfromanotherclass.Toinheritfromanotherclass(ordesignateasuperclass)writethefollowing.

classC<SuperClassNameend

WhenweleaveoutthesuperclasslikeinthecasesbeforetheclassObjectbecomestacitlythesuperclass.

Now,whyshouldwewanttoinherit?Ofcoursetohandovermethods.Handingovermeansthatthemethodswhichweredefinedinthesuperclassalsoworkinthesubclassasiftheyweredefinedinthereoncemore.Let’scheckitout.

classCdefhello()return"hello"endend

classSub<Cend

sub=Sub.new()p(sub.hello())#Shows"hello"

hellowasdefinedintheclassCbutwecouldcallitonaninstanceoftheclassSubaswell.Ofcoursewedon’tneedtoassignvariables.Theaboveisthesameasthelinebelow.

p(Sub.new().hello())

Bydefiningamethodwiththesamename,wecanoverwritethemethod.InC++andObjectPascal(Delphi)it’sonlypossibletooverwritefunctionsexplicitlydefinedwiththekeywordvirtualbutinRubyeverymethodcanbeoverwrittenunconditionally.

classCdefhello()return"Hello"endend

classSub<Cdefhello()return"HellofromSub"endend

p(Sub.new().hello())#Shows"HellofromSub"p(C.new().hello())#Shows"Hello"

Wecaninheritoverseveralsteps.ForinstanceasinFig.4FixnuminheritseverymethodfromObject,NumericandInteger.Whenthere

aremethodswiththesamenamethenearerclassestakepreference.Astypeoverloadingisn’tthereatalltherequisitesareextremelystraightforward.

Figure4:Inheritanceovermultiplesteps

InC++it’spossibletocreateaclasswhichinheritsnothing.WhileinRubyonehastoinheritfromtheObjectclasseitherdirectlyorindirectly.InotherwordswhenwedrawtheinheritancerelationsitbecomesasingletreewithObjectatthetop.Forexample,whenwedrawatreeoftheinheritancerelationsamongtheimportantclassesofthebasiclibrary,itwouldlooklikeFig.5.

Figure5:Ruby’sclasstree

Oncethesuperclassisappointed(inthedefinitionstatement)it’simpossibletochangeit.Inotherwords,onecanaddanewclasstotheclasstreebutcannotchangeapositionordeleteaclass.

InheritanceofVariables……?InRuby(instance)variablesaren’tinherited.Eventhoughtryingtoinherit,aclassdoesnotknowaboutwhatvariablesaregoingtobeused.

Butwhenaninheritedmethodiscalled(inaninstanceofasubclass),assignmentofinstancevariableshappens.Whichmeanstheybecomedefined.Then,sincethenamespaceofinstancevariablesiscompletelyflatbasedoneachinstance,itcanbe

accessedbyamethodofwhicheverclass.

classAdefinitialize()#calledfromwhenprocessingnew()@i="ok"endend

classB<Adefprint_i()p(@i)endend

B.new().print_i()#Shows"ok"

Ifyoucan’tagreewiththisbehavior,let’sforgetaboutclassesandinheritance.Whenthere’saninstanceobjoftheclassC,thenthinkasifallthemethodsofthesuperclassofCaredefinedinC.Ofcoursewekeeptheoverwriteruleinmind.ThenthemethodsofCgetattachedtotheinstanceobj(Fig.6).ThisstrongpalpabilityisaspecialtyofRuby’sobjectorientation.

Figure6:AconceptionofaRubyobject

ModulesOnlyasinglesuperclasscanbedesignated.SoRubylookslikesingleinheritance.Butbecauseofmodulesithasinpracticetheabilitywhichisidenticaltomultipleinheritance.Let’sexplainthesemodulesnext.

Inshort,modulesareclassesforwhichasuperclasscannotbedesignatedandinstancescannotbecreated.Forthedefinitionwewriteasfollows.

moduleMend

HerethemoduleMwasdefined.Methodsaredefinedexactlythesamewayasforclasses.

moduleMdefmyupcase(str)returnstr.upcase()endend

Butbecausewecannotcreateinstances,wecannotcallthemdirectly.Todothat,weusethemoduleby“including”itintootherclasses.Thenwebecometobeabletodealwithitasifaclassinheritedthemodule.

moduleMdefmyupcase(str)returnstr.upcase()end

end

classCincludeMend

p(C.new().myupcase("content"))#"CONTENT"isshown

EventhoughnomethodwasdefinedintheclassCwecancallthemethodmyupcase.Itmeansit“inherited”themethodofthemoduleM.Inclusionisfunctionallycompletelythesameasinheritance.There’snolimitondefiningmethodsoraccessinginstancevariables.

Isaidwecannotspecifyanysuperclassofamodule,butothermodulescanbeincluded.

moduleMend

moduleM2includeMend

Inotherwordsit’sfunctionallythesameasappointingasuperclass.Butaclasscannotcomeaboveamodule.Onlymodulesareallowedabovemodules.

Theexamplebelowalsocontainstheinheritanceofmethods.

moduleOneMoredefmethod_OneMore()p("OneMore")end

end

moduleMincludeOneMore

defmethod_M()p("M")endend

classCincludeMend

C.new().method_M()#Output"M"C.new().method_OneMore()#Output"OneMore"

AswithclasseswhenwesketchinheritanceitlookslikeFig.7

Figure7:multilevelinclusion

Besides,theclassCalsohasasuperclass.Howisitsrelationshiptomodules?Forinstance,let’sthinkofthefollowingcase.

#modcls.rb

classClsdeftest()return"class"end

end

moduleModdeftest()return"module"endend

classC<ClsincludeModend

p(B.new().test())#"class"?"module"?

CinheritsfromClsandincludesMod.Whichwillbeshowninthiscase,"class"or"module"?Inotherwords,whichoneis“closer”,classormodule?We’dbetteraskRubyaboutRuby,thuslet’sexecuteit:

%rubymodcls.rb"module"

Apparentlyamoduletakespreferencebeforethesuperclass.

Ingeneral,inRubywhenamoduleisincluded,itwouldbeinheritedbygoinginbetweentheclassandthesuperclass.AsapictureitmightlooklikeFig.8.

Figure8:Therelationbetweenmodulesandclasses

Andifwealsotakingthemodulesincludedinthemoduleintoaccounts,itwouldlooklikeFig.9.

Figure9:Therelationbetweenmodulesandclasses(2)

TheProgramrevisited

Caution.Thissectionisextremelyimportantandexplainingtheelementswhicharenoteasytomixwithforprogrammerswhohaveonlyusedstaticlanguagesbefore.Forotherpartsjustskimmingissufficient,butforonlythispartI’dlikeyoutoreaditcarefully.Theexplanationwillalsoberelativelyattentive.

NestingofConstantsFirstarepetitionofconstants.Asaconstantbeginswithacapitalletterthedefinitiongoesasfollows.

Const=3

Nowwereferencetheconstantinthisway.

p(Const)#Shows3

Actuallywecanalsowritethis.

p(::Const)#Shows3inthesameway.

The::infrontshowsthatit’saconstantdefinedatthetoplevel.Youcanthinkofthepathinafilesystem.Assumethereisafilevmunixintherootdirectory.Beingat/onecanwritevmunixtoaccessthefile.Onecanalsowrite/vmunixasitsfullpath.It’sthesamewithConstand::Const.Attoplevelit’sokaytowriteonlyConstortowritethefullpath::Const

Andwhatcorrespondstoafilesystem’sdirectoriesinRuby?Thatshouldbeclassandmoduledefinitionstatements.Howevermentioningbothiscumbersome,soI’lljustsubsumethemunderclassdefinition.Whenoneentersaclassdefinitionthelevelforconstantsrises(asifenteringadirectory).

classSomeClassConst=3end

p(::SomeClass::Const)#Shows3p(SomeClass::Const)#Thesame.Shows3

SomeClassisdefinedattoplevel.HenceonecanreferenceitbywritingeitherSomeClassor::SomeClass.AndastheconstantConst

nestedintheclassdefinitionisaConst“insideSomeClass”,Itbecomes::SomeClass::Const.

Aswecancreateadirectoryinadirectory,wecancreateaclassinsideaclass.Forinstancelikethis:

classC#::CclassC2#::C::C2classC3#::C::C2::C3endendend

Bytheway,foraconstantdefinedinaclassdefinitionstatement,shouldwealwayswriteitsfullname?Ofcoursenot.Aswiththefilesystem,ifoneisinsidethesameclassdefinitiononecanskipthe::.Itbecomeslikethat:

classSomeClassConst=3p(Const)#Shows3.end

“What?”youmightthink.Surprisingly,evenifitisinaclassdefinitionstatement,wecanwriteaprogramwhichisgoingtobeexecuted.Peoplewhoareusedtoonlystaticlanguageswillfindthisquiteexceptional.IwasalsoflabbergastedthefirsttimeIsawit.

Let’saddthatwecanofcoursealsoviewaconstantinsideamethod.Thereferencerulesarethesameaswithintheclassdefinition(outsidethemethod).

classCConst="ok"deftest()p(Const)endend

C.new().test()#Shows"ok"

EverythingisexecutedLookingatthebigpictureIwanttowriteonemorething.InRubyalmostthewholepartsofprogramis“executed”.Constantdefinitions,classdefinitionsandmethoddefinitionsandalmostalltherestisexecutedintheapparentorder.

Lookforinstanceatthefollowingcode.Iusedvariousconstructionswhichhavebeenusedbefore.

1:p("first")2:3:classC<Object4:Const="inC"5:6:p(Const)7:8:defmyupcase(str)9:returnstr.upcase()10:end11:end12:13:p(C.new().myupcase("content"))

Thisprogramisexecutedinthefollowingorder:

1:p("first") Shows"first"

3:<Object TheconstantObjectisreferencedandtheclassobjectObjectisgained

3:classC AnewclassobjectwithsuperclassObjectisgenerated,andassignedtotheconstantC

4:Const="inC" Assigningthevalue"inC"totheconstant::C::Const

6:p(Const) Showingtheconstant::C::Consthence"inC"

8:defmyupcase(...)...end DefineC#myupcase13:C.new().myupcase(...)

RefertheconstantC,callthemethodnewonit,andthenmyupcaseonthereturnvalue

9:returnstr.upcase() Returns"CONTENT"13:p(...) Shows"CONTENT"

TheScopeofLocalVariablesAtlastwecantalkaboutthescopeoflocalvariables.

Thetoplevel,theinteriorofaclassdefinition,theinteriorofamoduledefinitionandamethodbodyareallhaveeachcompletelyindependentlocalvariablescope.Inotherwords,thelvarvariablesinthefollowingprogramarealldifferentvariables,andtheydonotinfluenceeachother.

lvar='toplevel'

classClvar='inC'defmethod()lvar='inC#method'

endend

p(lvar)#Shows"toplevel"

moduleMlvar='inM'end

p(lvar)#Shows"toplevel"

selfascontextPreviously,Isaidthatduringmethodexecutiononeself(anobjectonwhichthemethodwascalled)becomesself.That’struebutonlyhalftrue.ActuallyduringtheexecutionofaRubyprogram,selfisalwayssetwhereveritis.Itmeansthere’sselfalsoatthetoplevelorinaclassdefinitionstatement.

Forinstancetheselfatthetoplevelismain.It’saninstanceoftheObjectclasswhichisnothingspecial.mainisprovidedtosetupselfforthetimebeing.There’snodeepermeaningattachedtoit.

Hencethetoplevel’sselfi.e.mainisaninstanceofObject,suchthatonecancallthemethodsofObjectthere.AndinObjectthemoduleKernelisincluded.Intherethefunction-flavormethodslikepandputsaredefined(Fig.10).That’swhyonecancallputsandpalsoatthetoplevel.

Figure10:main,ObjectandKernel

Thuspisn’tafunction,it’samethod.JustbecauseitisdefinedinKernelandthuscanbecalledlikeafunctionas“itsown”methodwhereveritisornomatterwhattheclassofselfis.Therefore,therearen’tfunctionsinthetruesense,thereareonlymethods.

Bytheway,besidespandputstherearethefunction-flavormethodsprint,puts,printf,sprintf,gets,fork,andexecandmanymorewithsomewhatfamiliarnames.WhenyoulookatthechoiceofnamesyoumightbeabletoimagineRuby’scharacter.

Well,sinceselfissetupeverywhere,selfshouldalsobeinaclassdefinitioninthesameway.Theselfintheclassdefinitionistheclassitself(theclassobject).Henceitwouldlooklikethis.

classCp(self)#Cend

Whatshouldthisbegoodfor?Infact,we’vealreadyseenanexampleinwhichitisveryuseful.Thisone.

moduleMend

classCincludeMend

ThisincludeisactuallyamethodcalltotheclassobjectC.Ihaven’tmentionedityetbuttheparenthesesaroundargumentscanbeomittedformethodcalls.AndIomittedtheparenthesesaroundincludesuchthatitdoesn’tlooklikeamethodcallbecausewehavenotfinishedthetalkaboutclassdefinitionstatement.

LoadingInRubytheloadingoflibrariesalsohappensatruntime.Normallyonewritesthis.

require("library_name")

Theimpressionisn’tfalse,requireisamethod.It’snotevenareservedword.Whenitiswrittenthisway,loadingisexecutedonthelineitiswritten,andtheexecutionishandedoverto(thecodeof)thelibrary.AsthereisnoconceptlikeJavapackagesinRuby,whenwe’dliketoseparatenamespaces,itisdonebyputtingfilesintoadirectory.

require("somelib/file1")require("somelib/file2")

Andinthelibraryusuallyclassesandsucharedefinedwithclassstatementsormodulestatements.Theconstantscopeofthetop

levelisflatwithoutthedistinctionoffiles,soonecanseeclassesdefinedinanotherfilewithoutanyspecialpreparation.Topartitionthenamespaceofclassnamesonehastoexplicitlynestmodulesasshownbelow.

#exampleofthenamespacepartitionofnetlibrarymoduleNetclassSMTP#...endclassPOP#...endclassHTTP#...endend

MoreaboutClasses

ThetalkaboutConstantsstillgoesonUptonowweusedthefilesystemmetaphorforthescopeofconstants,butIwantyoutocompletelyforgetthat.

Thereismoreaboutconstants.Firstlyonecanalsoseeconstantsinthe“outer”class.

Const="ok"classCp(Const)#Shows"ok"

end

Thereasonwhythisisdesignedinthiswayisbecausethisbecomesusefulwhenmodulesareusedasnamespaces.Let’sexplainthisbyaddingafewthingstothepreviousexampleofnetlibrary.

moduleNetclassSMTP#UsesNet::SMTPHelperinthemethodsendclassSMTPHelper#SupportstheclassNet::SMTPendend

Insuchcase,it’sconvenientifwecanrefertoitalsofromtheSMTPclassjustbywritingSMTPHelper,isn’tit?Therefore,itisconcludedthat“it’sconvenientifwecanseetheouterclasses”.

Theouterclasscanbereferencednomatterhowmanytimesitisnesting.Whenthesamenameisdefinedondifferentlevels,theonewhichwillfirstbefoundfromwithinwillbereferredto.

Const="far"classCConst="near"#ThisoneiscloserthantheoneaboveclassC2classC3p(Const)#"near"isshownendendend

There’sanotherwayofsearchingconstants.Ifthetoplevelis

reachedwhengoingfurtherandfurtheroutsidethentheownsuperclassissearchedfortheconstant.

classAConst="ok"endclassB<Ap(Const)#"ok"isshownend

Really,that’sprettycomplicated.

Let’ssummarize.Whenlookingupaconstant,firsttheouterclassesissearchedthenthesuperclasses.Thisisquitecontrived,butlet’sassumeaclasshierarchyasfollows.

classA1endclassA2<A1endclassA3<A2classB1endclassB2<B1endclassB3<B2classC1endclassC2<C1endclassC3<C2p(Const)endendend

WhentheconstantConstinC3isreferenced,it’slookedupinthe

orderdepictedinFig.11.

Figure11:Searchorderforconstants

Becarefulaboutonepoint.Thesuperclassesoftheclassesoutside,forinstanceA1andB2,aren’tsearchedatall.Ifit’soutsideonceit’salwaysoutsideandifit’ssuperclassonceit’salwayssuperclass.Otherwise,thenumberofclassessearchedwouldbecometoobigandthebehaviorofsuchcomplicatedthingwouldbecomeunpredictable.

MetaclassesIsaidthatamethodcanbecalledonifitisanobject.Ialsosaidthatthemethodsthatcanbecalledaredeterminedbytheclassofanobject.Thenshouldn’ttherebeaclassforclassobjects?(Fig.12)

Figure12:Aclassofclasses?

Inthiskindofsituation,inRuby,wecancheckinpractice.It’sbecausethere’s“amethodwhichreturnstheclass(classobject)to

whichanobjectitselfbelongs”,Object#class.

p("string".class())#Stringisshownp(String.class())#Classisshownp(Object.class())#Classisshown

ApparentlyStringbelongstotheclassnamedClass.Thenwhat’stheclassofClass?

p(Class.class())#Classisshown

AgainClass.Inotherwords,whateverobjectitis,byfollowinglike.class().class().class()…,itwouldreachClassintheend,thenitwillstallintheloop(Fig.13).

Figure13:Theclassoftheclassoftheclass…

Classistheclassofclasses.Andwhathasarecursivestructureas“XofX”iscalledameta-X.HenceClassisametaclass.

MetaobjectsLet’schangethetargetandthinkaboutmodules.Asmodulesarealsoobjects,therealsoshouldbeaclassforthem.Let’ssee.

moduleMend

p(M.class())#Moduleisshown

TheclassofamoduleseemstobeModule.AndwhatshouldbetheclassoftheclassModule?

p(Module.class())#Class

It’sagainClass

Nowwechangethedirectionandexaminetheinheritancerelationships.What’sthesuperclassofClassandModule?InRuby,wecanfinditoutwithClass#superclass.

p(Class.superclass())#Modulep(Module.superclass())#Objectp(Object.superclass())#nil

SoClassisasubclassofModule.Basedonthesefacts,Figure14showstherelationshipsbetweentheimportantclassesofRuby.

Figure14:TheclassrelationshipbetweentheimportantRubyclasses

Uptonowweusednewandincludewithoutanyexplanation,butfinallyIcanexplaintheirtrueform.newisreallyamethoddefinedfortheclassClass.Thereforeonwhateverclass,(becauseitisaninstanceofClass),newcanbeusedimmediately.Butnewisn’tdefinedinModule.Henceit’snotpossibletocreateinstancesinamodule.AndsinceincludeisdefinedintheModuleclass,itcanbecalledonbothmodulesandclasses.

ThesethreeclassesObject,ModuleandclassareobjectsthatsupportthefoundationofRuby.WecansaythatthesethreeobjectsdescribetheRuby’sobjectworlditself.Namelytheyareobjectswhichdescribeobjects.Hence,ObjectModuleClassareRuby’s“meta-objects”.

SingletonMethodsIsaidthatmethodscanbecalledifitisanobject.Ialsosaidthatthemethodsthatcanbecalledaredeterminedbytheobject’sclass.HoweverIthinkIalsosaidthatideallymethodsbelongtoobjects.Classesarejustameanstoeliminatetheeffortofdefiningthesamemethodmorethanonce.

ActuallyInRubythere’salsoameanstodefinemethodsforindividualobjects(instances)notdependingontheclass.Todothis,youcanwritethisway.

obj=Object.new()defobj.my_first()puts("Myfirstsingletonmethod")

endobj.my_first()#ShowsMyfirstsingletonmethod

AsyoualreadyknowObjectistherootforeveryclass.It’sveryunlikelythatamethodwhosenameissoweirdlikemy_firstisdefinedinsuchimportantclass.AndobjisaninstanceofObject.Howeverthemethodmy_firstcanbecalledonobj.Hencewehavecreatedwithoutdoubtamethodwhichhasnothingtodowiththeclasstheobjectbelongsto.Thesemethodswhicharedefinedforeachobjectindividuallyarecalledsingletonmethods.

Whenaresingletonmethodsused?First,itisusedwhendefiningsomethinglikestaticmethodsofJavaorC++.Inotherwordsmethodswhichcanbeusedwithoutcreatinganinstance.ThesemethodsareexpressedinRubyassingletonmethodsofaclassobject.

ForexampleinUNIXthere’sasystemcallunlink.Thiscommanddeletesafileentryfromthefilesystem.InRubyitcanbeuseddirectlyasthesingletonmethodunlinkoftheFileclass.Let’stryitout.

File.unlink("core")#deletesthecoredump

It’scumbersometosay“thesingletonmethodunlinkoftheobjectFile”.WesimplywriteFile.unlink.Don’tmixitupandwriteFile#unlink,orviceversadon’twriteFile.writeforthemethodwritedefinedinFile.

▼Asummaryofthemethodnotation

notation thetargetobject exampleFile.unlink theFileclassitself File.unlink("core")File#write aninstanceofFile f.write("str")

ClassVariablesClassvariableswereaddedtoRubyfrom1.6on,theyarearelativelynewmechanism.Aswithconstants,theybelongtoaclass,andtheycanbereferencedandassignedfromboththeclassanditsinstances.Let’slookatanexample.Thebeginningofthenameis@@.

classC@@cvar="ok"p(@@cvar)#"ok"isshown

defprint_cvar()p(@@cvar)endend

C.new().print_cvar()#"ok"isshown

Asthefirstassignmentservesasthedefinition,areferencebeforeanassignmentliketheoneshownbelowleadstoaruntimeerror.Thereisan´@´infrontbutthebehaviordifferscompletelyfrominstancevariables.

%ruby-e'classC@@cvar

end'-e:3:uninitializedclassvariable@@cvarinC(NameError)

HereIwasabitlazyandusedthe-eoption.Theprogramisthethreelinesbetweenthesinglequotes.

Classvariablesareinherited.Orsayingitdifferently,avariableinasuperiorclasscanbeassignedandreferencedintheinferiorclass.

classA@@cvar="ok"end

classB<Ap(@@cvar)#Shows"ok"defprint_cvar()p(@@cvar)endend

B.new().print_cvar()#Shows"ok"

GlobalVariables

Atlasttherearealsoglobalvariables.Theycanbereferencedfromeverywhereandassignedeverywhere.Thefirstletterofthenameisa$.

$gvar="globalvariable"p($gvar)#Shows"globalvariable"

Aswithinstancevariables,allkindsofnamescanbeconsidereddefinedforglobalvariablesbeforeassignments.Inotherwordsareferencebeforeanassignmentgivesanilanddoesn’traiseanerror.

Copyright©2002-2004MineroAoki,Allrightsreserved.

EnglishTranslation:SebastianKrause<skra@pantolog.de>

TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera

CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License

RubyHackingGuide

TranslatedbyVincentISAMBART

Chapter2:Objects

StructureofRubyobjects

GuidelineFromthischapter,wewillbeginactuallyexploringtherubysourcecode.First,asdeclaredatthebeginningofthisbook,we’llstartwiththeobjectstructure.

Whatarethenecessaryconditionsforobjectstobeobjects?Therecouldbemanywaystoexplainaboutobjectitself,butthereareonlythreeconditionsthataretrulyindispensable.

1. Theabilitytodifferentiateitselffromotherobjects(anidentity)

2. Theabilitytorespondtomessages(methods)3. Theabilitytostoreinternalstate(instancevariables)

Inthischapter,wearegoingtoconfirmthesethreefeaturesonebyone.

Thetargetfileismainlyruby.h,butwewillalsobrieflylookatotherfilessuchasobject.c,class.corvariable.c.

VALUEandobjectstructInruby,thebodyofanobjectisexpressedbyastructandalwayshandledviaapointer.Adifferentstructtypeisusedforeachclass,butthepointertypewillalwaysbeVALUE(figure1).

Figure1:VALUEandstruct

HereisthedefinitionofVALUE:

▼VALUE

71typedefunsignedlongVALUE;

(ruby.h)

Inpractice,whenusingaVALUE,wecastittothepointertoeachobjectstruct.Thereforeifanunsignedlongandapointerhaveadifferentsize,rubywillnotworkwell.Strictlyspeaking,itwillnotworkifthere’sapointertypethatisbiggerthansizeof(unsignedlong).Fortunately,systemswhichcouldnotmeetthisrequirementisunlikelyrecently,butsometimeagoitseemstherewerequiteafewofthem.

Thestructs,ontheotherhand,haveseveralvariations,adifferent

structisusedbasedontheclassoftheobject.

structRObject allthingsforwhichnoneofthefollowingappliesstructRClass classobjectstructRFloat smallnumbersstructRString stringstructRArray arraystructRRegexp regularexpressionstructRHash hashtablestructRFile IO,File,Socket,etc…structRData

alltheclassesdefinedatClevel,excepttheonesmentionedabove

structRStruct Ruby’sStructclassstructRBignum bigintegers

Forexample,foranstringobject,structRStringisused,sowewillhavesomethinglikethefollowing.

Figure2:Stringobject

Let’slookatthedefinitionofafewobjectstructs.

▼Examplesofobjectstruct

/*structforordinaryobjects*/295structRObject{296structRBasicbasic;297structst_table*iv_tbl;298};

/*structforstrings(instanceofString)*/314structRString{315structRBasicbasic;316longlen;317char*ptr;318union{319longcapa;320VALUEshared;321}aux;322};

/*structforarrays(instanceofArray)*/324structRArray{325structRBasicbasic;326longlen;327union{328longcapa;329VALUEshared;330}aux;331VALUE*ptr;332};

(ruby.h)

Beforelookingateveryoneofthemindetail,let’sbeginwithsomethingmoregeneral.

First,asVALUEisdefinedasunsignedlong,itmustbecastbeforebeingusedwhenitisusedasapointer.That’swhyRxxxx()macroshavebeenmadeforeachobjectstruct.Forexample,forstruct

RStringthereisRSTRING(),forstructRArraythereisRARRAY(),etc…Thesemacrosareusedlikethis:

VALUEstr=....;VALUEarr=....;RSTRING(str)->len;/*((structRString*)str)->len*/RARRAY(arr)->len;/*((structRArray*)arr)->len*/

AnotherimportantpointtomentionisthatallobjectstructsstartwithamemberbasicoftypestructRBasic.Asaresult,ifyoucastthisVALUEtostructRBasic*,youwillbeabletoaccessthecontentofbasic,regardlessofthetypeofstructpointedtobyVALUE.

Figure3:structRBasic

Becauseitispurposefullydesignedthisway,structRBasicmustcontainveryimportantinformationforRubyobjects.HereisthedefinitionforstructRBasic:

▼structRBasic

290structRBasic{291unsignedlongflags;292VALUEklass;

293};

(ruby.h)

flagsaremultipurposeflags,mostlyusedtoregisterthestructtype(forinstancestructRObject).ThetypeflagsarenamedT_xxxx,andcanbeobtainedfromaVALUEusingthemacroTYPE().Hereisanexample:

VALUEstr;str=rb_str_new();/*createsaRubystring(itsstructisRString)*/TYPE(str);/*thereturnvalueisT_STRING*/

TheallflagsarenamedasT_xxxx,likeT_STRINGforstructRStringandT_ARRAYforstructRArray.Theyareverystraightforwardlycorrespondedtothetypenames.

TheothermemberofstructRBasic,klass,containstheclassthisobjectbelongsto.AstheklassmemberisoftypeVALUE,whatisstoredis(apointerto)aRubyobject.Inshort,itisaclassobject.

Figure4:objectandclass

Therelationbetweenanobjectanditsclasswillbedetailedinthe

“Methods”sectionofthischapter.

Bytheway,thismemberisnamedklasssoasnottoconflictwiththereservedwordclasswhenthefileisprocessedbyaC++compiler.

AboutstructtypesIsaidthatthetypeofstructisstoredintheflagsmemberofstructBasic.Butwhydowehavetostorethetypeofstruct?It’stobeabletohandlealldifferenttypesofstructviaVALUE.IfyoucastapointertoastructtoVALUE,asthetypeinformationdoesnotremain,thecompilerwon’tbeabletohelp.Thereforewehavetomanagethetypeourselves.That’stheconsequenceofbeingabletohandleallthestructtypesinaunifiedway.

OK,buttheusedstructisdefinedbytheclasssowhyarethestructtypeandclassarestoredseparately?Beingabletofindthestructtypefromtheclassshouldbeenough.Therearetworeasonsfornotdoingthis.

Thefirstoneis(I’msorryforcontradictingwhatIsaidbefore),infacttherearestructsthatdonothaveastructRBasic(i.e.theyhavenoklassmember).ForexamplestructRNodethatwillappearinthesecondpartofthebook.However,flagsisguaranteedtobeinthebeginningmemberseveninspecialstructslikethis.Soifyouputthetypeofstructinflags,alltheobjectstructscanbedifferentiatedinoneunifiedway.

Thesecondreasonisthatthereisnoone-to-onecorrespondencebetweenclassandstruct.Forexample,alltheinstancesofclassesdefinedattheRubylevelusestructRObject,sofindingastructfromaclasswouldrequiretokeepthecorrespondencebetweeneachclassandstruct.That’swhyit’seasierandfastertoputtheinformationaboutthetypeinthestruct.

Theuseofbasic.flagsRegardingtheuseofbasic.flags,becauseIfeelbadtosayitisthestructtype“andsuch”,I’llillustrateitentirelyhere.(Figure5)Thereisnoneedtounderstandeverythingrightaway,becausethisispreparedforthetimewhenyouwillbewonderingaboutitlater.

Figure5:Useofflags

Whenlookingatthediagram,itlookslikethat21bitsarenotusedon32bitmachines.Ontheseadditionalbits,theflagsFL_USER0toFL_USER8aredefined,andareusedforadifferentpurposeforeach

struct.InthediagramIalsoputFL_USER0(FL_SINGLETON)asanexample.

ObjectsembeddedinVALUEAsIsaid,VALUEisanunsignedlong.AsVALUEisapointer,itmaylooklikevoid*wouldalsobeallright,butthereisareasonfornotdoingthis.Infact,VALUEcanalsonotbeapointer.The6casesforwhichVALUEisnotapointerarethefollowing:

1. smallintegers2. symbols3. true4. false5. nil6. Qundef

I’llexplainthemonebyone.

SmallintegersAlldataareobjectsinRuby,thusintegersarealsoobjects.Butsincetherearesomanykindofintegerobjects,ifeachofthemisexpressedasastruct,itwouldriskslowingdownexecutionsignificantly.Forexample,whenincrementingfrom0to50000,wewouldhesitatetocreate50000objectsforonlythatpurpose.

That’swhyinruby,integersthataresmalltosomeextentare

treatedspeciallyandembeddeddirectlyintoVALUE.“Small”meanssignedintegersthatcanbeheldinsizeof(VALUE)*8-1bits.Inotherwords,on32bitsmachines,theintegershave1bitforthesign,and30bitsfortheintegerpart.IntegersinthisrangewillbelongtotheFixnumclassandtheotherintegerswillbelongtotheBignumclass.

Let’sseeinpracticetheINT2FIX()macrothatconvertsfromaCinttoaFixnum,andconfirmthatFixnumaredirectlyembeddedinVALUE.

▼INT2FIX

123#defineINT2FIX(i)((VALUE)(((long)(i))<<1|FIXNUM_FLAG))122#defineFIXNUM_FLAG0x01

(ruby.h)

Inbrief,shift1bittotheleft,andbitwiseoritwith1.

110100001000 beforeconversion1101000010001 afterconversion

ThatmeansthatFixnumasVALUEwillalwaysbeanoddnumber.Ontheotherhand,asRubyobjectstructsareallocatedwithmalloc(),theyaregenerallyarrangedonaddressesmultipleof4.SotheydonotoverlapwiththevaluesofFixnumasVALUE.

Also,toconvertintorlongtoVALUE,wecanusemacroslikeINT2NUM()orLONG2NUM().AnyconversionmacroXXXX2XXXXwithanamecontainingNUMcanmanagebothFixnumandBignum.ForexampleifINT2NUM()can’tconvertanintegerintoaFixnum,itwill

automaticallyconvertittoBignum.NUM2INT()willconvertbothFixnumandBignumtoint.Ifthenumbercan’tfitinanint,anexceptionwillberaised,sothereisnoneedtocheckthevaluerange.

SymbolsWhataresymbols?

Asthisquestionisquitetroublesometoanswer,let’sstartwiththereasonswhysymbolswerenecessary.Inthefirstplace,there’satypenamedIDusedinsideruby.Hereitis.

▼ID

72typedefunsignedlongID;

(ruby.h)

ThisIDisanumberhavingaone-to-oneassociationwithastring.However,it’snotpossibletohaveanassociationbetweenallstringsinthisworldandnumericalvalues.Itislimitedtotheonetoonerelationshipsinsideonerubyprocess.I’llspeakofthemethodtofindanIDinthenextchapter“Namesandnametables”.

Inlanguageprocessor,therearealotofnamestohandle.Methodnamesorvariablenames,constantnames,filenames,classnames…It’stroublesometohandleallofthemasstrings(char*),becauseofmemorymanagementandmemorymanagementandmemorymanagement…Also,lotsofcomparisonswouldcertainly

benecessary,butcomparingstringscharacterbycharacterwillslowdowntheexecution.That’swhystringsarenothandleddirectly,somethingwillbeassociatedandusedinstead.Andgenerallythat“something”willbeintegers,astheyarethesimplesttohandle.

TheseIDarefoundassymbolsintheRubyworld.Uptoruby1.4,thevaluesofIDconvertedtoFixnumwereusedassymbols.EventodaythesevaluescanbeobtainedusingSymbol#to_i.However,asrealuseresultscamepilingup,itwasunderstoodthatmakingFixnumandSymbolthesamewasnotagoodidea,sosince1.6anindependentclassSymbolhasbeencreated.

Symbolobjectsareusedalot,especiallyaskeysforhashtables.That’swhySymbol,likeFixnum,wasmadeembeddedinVALUE.Let’slookattheID2SYM()macroconvertingIDtoSymbolobject.

▼ID2SYM

158#defineSYMBOL_FLAG0x0e160#defineID2SYM(x)((VALUE)(((long)(x))<<8|SYMBOL_FLAG))

(ruby.h)

Whenshifting8bitsleft,xbecomesamultipleof256,thatmeansamultipleof4.Thenafterwithabitwiseor(inthiscaseit’sthesameasadding)with0x0e(14indecimal),theVALUEexpressingthesymbolisnotamultipleof4.Orevenanoddnumber.SoitdoesnotoverlaptherangeofanyotherVALUE.Quiteaclevertrick.

Finally,let’sseethereverseconversionofID2SYM(),SYM2ID().

▼SYM2ID()

161#defineSYM2ID(x)RSHIFT((long)x,8)

(ruby.h)

RSHIFTisabitshifttotheright.Asrightshiftmaykeepornotthesigndependingoftheplatform,itbecameamacro.

truefalsenil

ThesethreeareRubyspecialobjects.trueandfalserepresentthebooleanvalues.nilisanobjectusedtodenotethatthereisnoobject.TheirvaluesattheClevelaredefinedlikethis:

▼truefalsenil

164#defineQfalse0/*Ruby'sfalse*/165#defineQtrue2/*Ruby'strue*/166#defineQnil4/*Ruby'snil*/

(ruby.h)

Thistimeit’sevennumbers,butas0or2can’tbeusedbypointers,theycan’toverlapwithotherVALUE.It’sbecauseusuallythefirstblockofvirtualmemoryisnotallocated,tomaketheprogramsdereferencingaNULLpointercrash.

AndasQfalseis0,itcanalsobeusedasfalseatClevel.Inpractice,

inruby,whenafunctionreturnsabooleanvalue,it’softenmadetoreturnanintorVALUE,andreturnsQtrue/Qfalse.

ForQnil,thereisamacrodedicatedtocheckifaVALUEisQnilornot,NIL_P().

▼NIL_P()

170#defineNIL_P(v)((VALUE)(v)==Qnil)

(ruby.h)

ThenameendingwithpisanotationcomingfromLispdenotingthatitisafunctionreturningabooleanvalue.Inotherwords,NIL_Pmeans“istheargumentnil?”.Itseemsthe“p”charactercomesfrom“predicate.”Thisnamingruleisusedatmanydifferentplacesinruby.

Also,inRuby,falseandnilarefalse(inconditionalstatements)andalltheotherobjectsaretrue.However,inC,nil(Qnil)istrue.That’swhythere’stheRTEST()macrotodoRuby-styletestinC.

▼RTEST()

169#defineRTEST(v)(((VALUE)(v)&~Qnil)!=0)

(ruby.h)

AsinQnilonlythethirdlowerbitis1,in~Qnilonlythethirdlowerbitis0.ThenonlyQfalseandQnilbecome0withabitwiseand.

!=0hasbeenaddedtobecertaintoonlyhave0or1,tosatisfytherequirementsofthegliblibrarythatonlywants0or1([ruby-dev:11049]).

Bytheway,whatisthe‘Q’ofQnil?‘R’Iwouldhaveunderstoodbutwhy‘Q’?WhenIasked,theanswerwas“Becauseit’slikethatinEmacs.”IdidnothavethefunanswerIwasexpecting…

Qundef

▼Qundef

167#defineQundef6/*undefinedvalueforplaceholder*/

(ruby.h)

Thisvalueisusedtoexpressanundefinedvalueintheinterpreter.Itcan’t(mustnot)befoundatallattheRubylevel.

Methods

IalreadybroughtupthethreeimportantpointsofaRubyobject:havinganidentity,beingabletocallamethod,andkeepingdataforeachinstance.Inthissection,I’llexplaininasimplewaythestructurelinkingobjectsandmethods.

structRClass

InRuby,classesexistasobjectsduringtheexecution.Ofcourse.Sotheremustbeastructforclassobjects.ThatstructisstructRClass.ItsstructtypeflagisT_CLASS.

Asclassesandmodulesareverysimilar,thereisnoneedtodifferentiatetheircontent.That’swhymodulesalsousethestructRClassstruct,andaredifferentiatedbytheT_MODULEstructflag.

▼structRClass

300structRClass{301structRBasicbasic;302structst_table*iv_tbl;303structst_table*m_tbl;304VALUEsuper;305};

(ruby.h)

First,let’sfocusonthem_tbl(MethodTaBLe)member.structst_tableisanhashtableusedeverywhereinruby.Itsdetailswillbeexplainedinthenextchapter“Namesandnametables”,butbasically,itisatablemappingnamestoobjects.Inthecaseofm_tbl,itkeepsthecorrespondencebetweenthename(ID)ofthemethodspossessedbythisclassandthemethodsentityitself.Asforthestructureofthemethodentity,itwillbeexplainedinPart2andPart3.

Thefourthmembersuperkeeps,likeitsnamesuggests,thesuperclass.Asit’saVALUE,it’s(apointerto)theclassobjectofthesuperclass.InRubythereisonlyoneclassthathasnosuperclass

(therootclass):Object.

HoweverIalreadysaidthatallObjectmethodsaredefinedintheKernelmodule,Objectjustincludesit.Asmodulesarefunctionallysimilartomultipleinheritance,itmayseemhavingjustsuperisproblematic,butinrubysomecleverconversionsaremadetomakeitlooklikesingleinheritance.Thedetailsofthisprocesswillbeexplainedinthefourthchapter“Classesandmodules”.

Becauseofthisconversion,superofthestructofObjectpointstostructRClasswhichistheentityofKernelobjectandthesuperofKernelisNULL.Sotoputitconversely,ifsuperisNULL,itsRClassistheentityofKernel(figure6).

Figure6:ClasstreeattheClevel

MethodssearchWithclassesstructuredlikethis,youcaneasilyimaginethemethodcallprocess.Them_tbloftheobject’sclassissearched,andifthemethodwasnotfound,them_tblofsuperissearched,andsoon.Ifthereisnomoresuper,thatistosaythemethodwasnotfoundeveninObject,thenitmustnotbedefined.

Thesequentialsearchprocessinm_tblisdonebysearch_method().

▼search_method()

256staticNODE*257search_method(klass,id,origin)258VALUEklass,*origin;259IDid;260{261NODE*body;262263if(!klass)return0;264while(!st_lookup(RCLASS(klass)->m_tbl,id,&body)){265klass=RCLASS(klass)->super;266if(!klass)return0;267}268269if(origin)*origin=klass;270returnbody;271}

(eval.c)

Thisfunctionsearchesthemethodnamedidintheclassobjectklass.

RCLASS(value)isthemacrodoing:

((structRClass*)(value))

st_lookup()isafunctionthatsearchesinst_tablethevaluecorrespondingtoakey.Ifthevalueisfound,thefunctionreturnstrueandputsthefoundvalueattheaddressgiveninthirdparameter(&body).

Nevertheless,doingthissearcheachtimewhateverthecircumstanceswouldbetooslow.That’swhyinreality,oncecalled,amethodiscached.Sostartingfromthesecondtimeitwillbefoundwithoutfollowingsuperonebyone.Thiscacheanditssearchwillbeseeninthe15thchapter“Methods”.

Instancevariables

Inthissection,Iwillexplaintheimplementationofthethirdessentialcondition,instancevariables.

rb_ivar_set()

Instancevariableisthemechanismthatallowseachobjecttoholditsspecificdata.Sinceitisspecifictoeachobject,itseemsgoodtostoreitineachobjectitself(i.e.initsobjectstruct),butisitreallyso?Let’slookatthefunctionrb_ivar_set(),whichassignsanobjecttoaninstancevariable.

▼rb_ivar_set()

/*assignvaltotheidinstancevariableofobj*/984VALUE985rb_ivar_set(obj,id,val)986VALUEobj;987IDid;988VALUEval;989{

990if(!OBJ_TAINTED(obj)&&rb_safe_level()>=4)991rb_raise(rb_eSecurityError,"Insecure:can'tmodifyinstancevariable");992if(OBJ_FROZEN(obj))rb_error_frozen("object");993switch(TYPE(obj)){994caseT_OBJECT:995caseT_CLASS:996caseT_MODULE:997if(!ROBJECT(obj)->iv_tbl)ROBJECT(obj)->iv_tbl=st_init_numtable();998st_insert(ROBJECT(obj)->iv_tbl,id,val);999break;1000default:1001generic_ivar_set(obj,id,val);1002break;1003}1004returnval;1005}

(variable.c)

rb_raise()andrb_error_frozen()arebotherrorchecks.Thiscanalwaysbesaidhereafter:Errorchecksarenecessaryinreality,butit’snotthemainpartoftheprocess.Therefore,weshouldwhollyignorethematfirstread.

Afterremovingtheerrorhandling,onlytheswitchremains,but

switch(TYPE(obj)){caseT_aaaa:caseT_bbbb:...}

thisformisanidiomofruby.TYPE()isthemacroreturningthetypeflagoftheobjectstruct(T_OBJECT,T_STRING,etc.).Inotherwordsas

thetypeflagisanintegerconstant,wecanbranchdependingonitwithaswitch.FixnumorSymboldonothavestructs,butinsideTYPE()aspecialtreatmentisdonetoproperlyreturnT_FIXNUMandT_SYMBOL,sothere’snoneedtoworry.

Well,let’sgobacktorb_ivar_set().ItseemsonlythetreatmentsofT_OBJECT,T_CLASSandT_MODULEaredifferent.These3havebeenchosenonthebasisthattheirsecondmemberisiv_tbl.Let’sconfirmitinpractice.

▼Structswhosesecondmemberisiv_tbl

/*TYPE(val)==T_OBJECT*/295structRObject{296structRBasicbasic;297structst_table*iv_tbl;298};

/*TYPE(val)==T_CLASSorT_MODULE*/300structRClass{301structRBasicbasic;302structst_table*iv_tbl;303structst_table*m_tbl;304VALUEsuper;305};

(ruby.h)

iv_tblistheInstanceVariableTaBLe.Itrecordsthecorrespondencesbetweentheinstancevariablenamesandtheirvalues.

Inrb_ivar_set(),let’slookagainthecodeforthestructshaving

iv_tbl.

if(!ROBJECT(obj)->iv_tbl)ROBJECT(obj)->iv_tbl=st_init_numtable();st_insert(ROBJECT(obj)->iv_tbl,id,val);break;

ROBJECT()isamacrothatcastsaVALUEintoa`structRObject*.It'spossiblethatwhatobj`pointstoisactuallyastructRClass,butwhenaccessingonlythesecondmember,noproblemwilloccur.

st_init_numtable()isafunctioncreatinganewst_table.st_insert()isafunctiondoingassociationsinast_table.

Inconclusion,thiscodedoesthefollowing:ifiv_tbldoesnotexist,itcreatesit,thenstoresthe[variablename→object]association.

There’sonethingtobecarefulabout.AsstructRClassisthestructofaclassobject,itsinstancevariabletableisfortheclassobjectitself.InRubyprograms,itcorrespondstosomethinglikethefollowing:

classC@ivar="content"end

generic_ivar_set()

WhathappenswhenassigningtoaninstancevariableofanobjectwhosestructisnotoneofT_OBJECTT_MODULET_CLASS?

▼rb_ivar_set()inthecasethereisnoiv_tbl

1000default:1001generic_ivar_set(obj,id,val);1002break;

(variable.c)

Thisisdelegatedtogeneric_ivar_set().Beforelookingatthisfunction,let’sfirstexplainitsgeneralidea.

StructsthatarenotT_OBJECT,T_MODULEorT_CLASSdonothaveaniv_tblmember(thereasonwhytheydonothaveitwillbeexplainedlater).However,evenifitdoesnothavethemember,ifthere’sanothermethodlinkinganinstancetoastructst_table,itwouldbeabletohaveinstancevariables.Inruby,theseassociationsaresolvedbyusingaglobalst_table,generic_iv_table(figure7).

Figure7:generic_iv_table

Let’sseethisinpractice.

▼generic_ivar_set()

801staticst_table*generic_iv_tbl;

830staticvoid831generic_ivar_set(obj,id,val)832VALUEobj;833IDid;834VALUEval;835{836st_table*tbl;837/*forthetimebeingyoucanignorethis*/838if(rb_special_const_p(obj)){839special_generic_ivar=1;840}/*initializegeneric_iv_tblifitdoesnotexist*/841if(!generic_iv_tbl){842generic_iv_tbl=st_init_numtable();843}844/*theprocessitself*/845if(!st_lookup(generic_iv_tbl,obj,&tbl)){846FL_SET(obj,FL_EXIVAR);847tbl=st_init_numtable();848st_add_direct(generic_iv_tbl,obj,tbl);849st_add_direct(tbl,id,val);850return;851}852st_insert(tbl,id,val);853}

(variable.c)

rb_special_const_p()istruewhenitsparameterisnotapointer.However,asthisifpartrequiresknowledgeofthegarbagecollector,we’llskipitfornow.I’dlikeyoutocheckitagainafterreadingthechapter5“Garbagecollection”.

st_init_numtable()alreadyappearedsometimeago.Itcreatesa

newhashtable.

st_lookup()searchesavaluecorrespondingtoakey.Inthiscaseitsearchesforwhat’sattachedtoobj.Ifanattachedvaluecanbefound,thewholefunctionreturnstrueandstoresthevalueattheaddress(&tbl)givenasthirdparameter.Inshort,!st_lookup(...)canberead“ifavaluecan’tbefound”.

st_insert()wasalsoalreadyexplained.Itstoresanewassociationinatable.

st_add_direct()issimilartost_insert(),butitdoesnotcheckifthekeywasalreadystoredbeforeaddinganassociation.Itmeans,inthecaseofst_add_direct(),ifakeyalreadyregisteredisbeingused,twoassociationslinkedtothissamekeywillbestored.Wecanusest_add_direct()onlywhenthecheckforexistencehasalreadybeendone,orwhenanewtablehasjustbeencreated.Andthiscodewouldmeettheserequirements.

FL_SET(obj,FL_EXIVAR)isthemacrothatsetstheFL_EXIVARflaginthebasic.flagsofobj.Thebasic.flagsflagsareallnamedFL_xxxxandcanbesetusingFL_SET().TheseflagscanbeunsetwithFL_UNSET().TheEXIVARfromFL_EXIVARseemstobetheabbreviationofEXternalInstanceVARiable.

Thisflagissettospeedupthereadingofinstancevariables.IfFL_EXIVARisnotset,evenwithoutsearchingingeneric_iv_tbl,wecanseetheobjectdoesnothaveanyinstancevariables.Andof

courseabitcheckiswayfasterthansearchingastructst_table.

GapsinstructsNowyouunderstoodthewaytostoretheinstancevariables,butwhyaretherestructswithoutiv_tbl?Whyistherenoiv_tblinstructRStringorstructRArray?Couldn’tiv_tblbepartofRBasic?

Totelltheconclusionfirst,wecandosuchthing,butshouldnot.Asamatteroffact,thisproblemisdeeplylinkedtothewayrubymanagesobjects.

Inruby,thememoryusedforstringdata(char[])andsuchisdirectlyallocatedusingmalloc().However,theobjectstructsarehandledinaparticularway.rubyallocatesthembyclusters,andthendistributethemfromtheseclusters.Andinthisway,ifthetypes(orrathertheirsizes)werediverse,it’shardtomanage,thusRVALUE,whichistheunionoftheallstructs,isdefinedandthearrayoftheunionsismanaged.

Thesizeofaunionisthesameasthesizeofthebiggestmember,soforinstance,ifoneofthestructsisbig,alotofspacewouldbewasted.Therefore,it’spreferablethateachstructsizeisassimilaraspossible.

ThemostusedstructmightbeusuallystructRString.Afterthat,dependingoneachprogram,therecomesstructRArray(array),RHash(hash),RObject(userdefinedobject),etc.However,thisstruct

RObjectonlyusesthespaceofstructRBasic+1pointer.Ontheotherhand,structRString,RArrayandRHashtakethespaceofstructRBasic+3pointers.Inotherwords,whenthenumberofstructRObjectisbeingincreased,thememoryspaceofthetwopointersforeachobjectarewasted.Furthermore,ifthesizeofRStringwasasmuchas4pointers,Robjectwoulduselessthanthehalfsizeoftheunion,andthisistoowasteful.

Sothereceivedmeritforiv_tblismoreorlesssavingmemoryandspeedingup.Furthermorewedonotknowifitisusedoftenornot.Infact,generic_iv_tblwasnotintroducedbeforeruby1.2,soitwasnotpossibletouseinstancevariablesinStringorArrayatthattime.Nevertheless,itwasnotmuchofaproblem.Makinglargeamountsofmemoryuselessjustforsuchfunctionalitylooksstupid.

Ifyoutakeallthisintoconsideration,youcanconcludethatincreasingthesizeofobjectstructsforiv_tbldoesnotdoanygood.

rb_ivar_get()

Wesawtherb_ivar_set()functionthatsetsvariables,solet’sseequicklyhowtogetthem.

▼rb_ivar_get()

960VALUE961rb_ivar_get(obj,id)962VALUEobj;963IDid;964{

965VALUEval;966967switch(TYPE(obj)){/*(A)*/968caseT_OBJECT:969caseT_CLASS:970caseT_MODULE:971if(ROBJECT(obj)->iv_tbl&&st_lookup(ROBJECT(obj)->iv_tbl,id,&val))972returnval;973break;/*(B)*/974default:975if(FL_TEST(obj,FL_EXIVAR)||rb_special_const_p(obj))976returngeneric_ivar_get(obj,id);977break;978}/*(C)*/979rb_warning("instancevariable%snotinitialized",rb_id2name(id));980981returnQnil;982}

(variable.c)

Thestructureiscompletelythesame.

(A)ForstructRObjectorRClass,wesearchthevariableiniv_tbl.Asiv_tblcanalsobeNULL,wemustcheckitbeforeusingit.Thenifst_lookup()findstherelation,itreturnstrue,sothewholeifcanbereadas“Iftheinstancevariablehasbeenset,returnitsvalue”.

(C)Ifnocorrespondencecouldbefound,inotherwordsifwereadaninstancevariablethathasnotbeenset,wefirstleavetheifthentheswitch.rb_warning()willthenissueawarningandnilwillbereturned.That’sbecauseyoucanreadinstancevariablesthathave

notbeensetinRuby.

(B)Ontheotherhand,ifthestructisneitherstructRObjectnorRClass,theinstancevariabletableissearchedingeneric_iv_tbl.Whatgeneric_ivar_get()doescanbeeasilyguessed,soIwon’texplainit.I’dratherwantyoutofocusontheconditionoftheifstatement.

IalreadytoldyouthattheFL_EXIRVARflagissettotheobjectonwhichgeneric_ivar_set()isused.Here,thatflagisutilizedtomakethecheckfaster.

Andwhatisrb_special_const_p()?Thisfunctionreturnstruewhenitsparameterobjdoesnotpointtoastruct.Asnostructmeansnobasic.flags,noflagcanbesetinthefirstplace.ThusFL_xxxx()isdesignedtoalwaysreturnfalseforsuchobject.Hence,objectsthatarerb_special_const_p()shouldbetreatedspeciallyhere.

ObjectStructs

Inthissection,abouttheimportantonesamongobjectstructs,we’llbrieflyseetheirconcreteappearancesandhowtodealwiththem.

structRString

structRStringisthestructfortheinstancesoftheStringclassanditssubclasses.

▼structRString

314structRString{315structRBasicbasic;316longlen;317char*ptr;318union{319longcapa;320VALUEshared;321}aux;322};

(ruby.h)

ptrisapointertothestring,andlenthelengthofthatstring.Verystraightforward.

Ratherthanastring,Ruby’sstringismoreabytearray,andcancontainanybyteincludingNUL.SowhenthinkingattheRubylevel,endingthestringwithNULdoesnotmeananything.ButasCfunctionsrequireNUL,forconveniencetheendingNUListhere.However,itssizeisnotincludedinlen.

Whendealingwithastringfromtheinterpreteroranextensionlibrary,youcanaccessptrandlenbywritingRSTRING(str)->ptrorRSTRING(str)->len,anditisallowed.Buttherearesomepointstopayattentionto.

1. youhavetocheckifstrreallypointstoastructRStringby

yourselfbeforehand2. youcanreadthemembers,butyoumustnotmodifythem3. youcan’tstoreRSTRING(str)->ptrinsomethinglikealocal

variableanduseitlater

Whyisthat?First,thereisanimportantsoftwareengineeringprinciple:Don’tarbitrarilytamperwithsomeone’sdata.Whenthereareinterfacefunctions,weshouldusethem.However,therearealsoconcretereasonsinruby‘sdesignwhyyoushouldnotrefertoorstoreapointer,andthat’srelatedtothefourthmemberaux.However,toexplainproperlyhowtouseaux,wehavetoexplainfirstalittlemoreofRuby’sstrings’characteristics.

Ruby’sstringscanbemodified(aremutable).BymutableImeanafterthefollowingcode:

s="str"#createastringandassignittoss.concat("ing")#append"ing"tothisstringobjectp(s)#show"string"

thecontentoftheobjectpointedbyswillbecome“string”.It’sdifferentfromJavaorPythonstringobjects.Java’sStringBufferiscloser.

Andwhat’stherelation?First,mutablemeansthelength(len)ofthestringcanchange.Wehavetoincreaseordecreasetheallocatedmemorysizeeachtimethelengthchanges.Wecanofcourseuserealloc()forthat,butgenerallymalloc()andrealloc()areheavyoperations.Havingtorealloc()eachtimethestring

changesisahugeburden.

That’swhythememorypointedbyptrhasbeenallocatedwithasizealittlebiggerthanlen.Becauseofthat,iftheaddedpartcanfitintotheremainingmemory,it’stakencareofwithoutcallingrealloc(),soit’sfaster.Thestructmemberaux.capacontainsthelengthincludingthisadditionalmemory.

Sowhatisthisotheraux.shared?It’stospeedupthecreationofliteralstrings.HavealookatthefollowingRubyprogram.

whiletruedo#repeatindefinitelya="str"#createastringwith"str"ascontentandassignittoaa.concat("ing")#append"ing"totheobjectpointedbyap(a)#show"string"end

Whateverthenumberoftimesyourepeattheloop,thefourthline’sphastoshow"string".Andtodoso,theexpression"str"musteverytimecreateanobjectthatholdsadistinctchar[].Buttheremustbealsothehighpossibilitythatstringsarenotmodifiedatall,andalotofuselesscopiesofchar[]wouldbecreatedinsuchsituation.Ifpossible,we’dliketoshareonecommonchar[].

Thetricktoshareisaux.shared.Everystringobjectcreatedwithaliteralusesonesharedchar[].Andafterachangeoccurs,theobject-specificmemoryisallocated.Whenusingasharedchar[],theflagELTS_SHAREDissetintheobjectstruct’sbasic.flags,andaux.sharedcontainstheoriginalobject.ELTSseemstobethe

abbreviationofELemenTS.

Then,let’sreturntoourtalkaboutRSTRING(str)->ptr.ThoughreferringtoapointerisOK,youmustnotassigntoit.Thisisfirstbecausethevalueoflenorcapawillnolongeragreewiththeactualbody,andalsobecausewhenmodifyingstringscreatedaslitterals,aux.sharedhastobeseparated.

Beforeendingthissection,I’llwritesomeexamplesofdealingwithRString.I’dlikeyoutoregardstrasaVALUEthatpointstoRStringwhenreadingthis.

RSTRING(str)->len;/*length*/RSTRING(str)->ptr[0];/*firstcharacter*/str=rb_str_new("content",7);/*createastringwith"content"asitscontentthesecondparameteristhelength*/str=rb_str_new2("content");/*createastringwith"content"asitscontentitslengthiscalculatedwithstrlen()*/rb_str_cat2(str,"end");/*ConcatenateaCstringtoaRubystring*/

structRArray

structRArrayisthestructfortheinstancesofRuby’sarrayclassArray.

▼structRArray

324structRArray{325structRBasicbasic;326longlen;327union{328longcapa;329VALUEshared;

330}aux;331VALUE*ptr;332};

(ruby.h)

Exceptforthetypeofptr,thisstructureisalmostthesameasstructRString.ptrpointstothecontentofthearray,andlenisitslength.auxisexactlythesameasinstructRString.aux.capaisthe“real”lengthofthememorypointedbyptr,andifptrisshared,aux.sharedstoresthesharedoriginalarrayobject.

Fromthisstructure,it’sclearthatRuby’sArrayisanarrayandnotalist.Sowhenthenumberofelementschangesinabigway,arealloc()mustbedone,andifanelementmustbeinsertedatanotherplacethantheend,amemmove()willoccur.Butevenifitdoesit,it’smovingsofastthatwedon’tnoticeaboutthat.Recentmachinesarereallyimpressive.

AndthewaytoaccesstoitsmembersissimilartothewayofRString.WithRARRAY(arr)->ptrandRARRAY(arr)->len,youcanrefertothemembers,anditisallowed,butyoumustnotassigntothem,etc.We’llonlylookatsimpleexamples:

/*manageanarrayfromC*/VALUEary;ary=rb_ary_new();/*createanemptyarray*/rb_ary_push(ary,INT2FIX(9));/*pushaRuby9*/RARRAY(ary)->ptr[0];/*lookwhat'satindex0*/rb_p(RARRAY(ary)->ptr[0]);/*doponary[0](theresultis9)*/

#manageanarrayfromRuby

ary=[]#createanemptyarrayary.push(9)#push9ary[0]#lookwhat'satindex0p(ary[0])#doponary[0](theresultis9)

structRRegexp

It’sthestructfortheinstancesoftheregularexpressionclassRegexp.

▼structRRegexp

334structRRegexp{335structRBasicbasic;336structre_pattern_buffer*ptr;337longlen;338char*str;339};

(ruby.h)

ptristhecompiledregularexpression.stristhestringbeforecompilation(thesourcecodeoftheregularexpression),andlenisthisstring’slength.

AsanycodetohandleRegexpobjectsdoesn’tappearinthisbook,wewon’tseehowtouseit.Evenifyouuseitinextensionlibraries,aslongasyoudonotwanttouseitaveryparticularway,theinterfacefunctionsareenough.

structRHash

structRHashisthestructforHashobject,whichisRuby’shashtable.

▼structRHash

341structRHash{342structRBasicbasic;343structst_table*tbl;344intiter_lev;345VALUEifnone;346};

(ruby.h)

It’sawrapperforstructst_table.st_tablewillbedetailedinthenextchapter“Namesandnametables”.

ifnoneisthevaluewhenakeydoesnothaveanassociatedvalue,itsdefaultisnil.iter_levistomakethehashtablereentrant(multithreadsafe).

structRFile

structRFileisastructforinstancesofthebuilt-inIOclassanditssubclasses.

▼structRFile

348structRFile{349structRBasicbasic;350structOpenFile*fptr;351};

(ruby.h)

▼OpenFile

19typedefstructOpenFile{20FILE*f;/*stdioptrforread/write*/21FILE*f2;/*additionalptrforrwpipes*/22intmode;/*modeflags*/23intpid;/*child'spid(forpipes)*/24intlineno;/*numberoflinesread*/25char*path;/*pathnameforfile*/26void(*finalize)_((structOpenFile*));/*finalizeproc*/27}OpenFile;

(rubyio.h)

AllmembershavebeentransferredinstructOpenFile.Astherearen’tmanyinstancesofIOobjects,it’sOKtodoitlikethis.Thepurposeofeachmemberiswritteninthecomments.Basically,it’sawrapperaroundC’sstdio.

structRData

structRDatahasadifferenttenorfromwhatwesawbefore.Itisthestructforimplementationofextensionlibraries.

Ofcoursestructsforclassescreatedinextensionlibrariesarenecessary,butasthetypesofthesestructsdependonthecreatedclass,it’simpossibletoknowtheirsizeorstructinadvance.That’swhya“structformanagingapointertoauserdefinedstruct”hasbeencreatedonruby’ssidetomanagethis.ThisstructisstructRData.

▼structRData

353structRData{354structRBasicbasic;355void(*dmark)_((void*));356void(*dfree)_((void*));357void*data;358};

(ruby.h)

dataisapointertotheuserdefinedstruct,dfreeisthefunctionusedtofreethatuserdefinedstruct,anddmarkisthefunctiontodo“mark”ofthemarkandsweep.

BecauseexplainingstructRDataisstilltoocomplicated,forthetimebeinglet’sjustlookatitsrepresentation(figure8).Thedetailedexplanationofitsmemberswillbeintroducedafterwe’llfinishchapter5“Garbagecollection”.

Figure8:RepresentationofstructRData

TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera

CreativeCommonsAttribution-NonCommercial-ShareAlike2.5

License

RubyHackingGuide

TranslatedbyCliffordEscobarCAOILE

Chapter3:Namesand

NameTable

st_table

st_tablehasalreadyappearedseveraltimesasamethodtableandaninstancetable.Inthischapterlet’slookatthestructureofthest_tableindetail.

SummaryIpreviouslymentionedthatthest_tableisahashtable.Whatisahashtable?Itisadatastructurethatrecordsone-to-onerelations,forexample,avariablenameanditsvalue,orafunctionnameanditsbody,etc.

However,datastructuresotherthanhashtablescan,ofcourse,recordone-to-onerelations.Forexample,alistofthefollowingstructswillsufficeforthispurpose.

structentry{IDkey;VALUEval;

structentry*next;/*pointtothenextentry*/};

However,thismethodisslow.Ifthelistcontainsathousanditems,intheworstcase,itisnecessarytotraverseathousandlinks.Inotherwords,thesearchtimeincreasesinproportiontothenumberofelements.Thisisbad.Sinceancienttimes,variousspeedimprovementmethodshavebeenconceived.Thehashtableisoneofthoseimprovedmethods.Inotherwords,thepointisnotthatthehashtableisnecessarybutthatitcanbemadefaster.

Nowthen,letusexaminethest_table.Asitturnsout,thislibraryisnotcreatedbyMatsumoto,rather:

▼st.ccredits

1/*ThisisapublicdomaingeneralpurposehashtablepackagewrittenbyPeterMoore@UCB.*/

(st.c)

asshownabove.

Bytheway,whenIsearchedGoogleandfoundanotherversion,itmentionedthatst_tableisacontractionof“STringTABLE”.However,Ifinditcontradictorythatithasboth“generalpurpose”and“string”aspects.

Whatisahashtable?

Ahashtablecanbethoughtasthefollowing:Letusthinkofanarraywithnitems.Forexample,letusmaken=64(figure1).

Figure1:Array

Thenletusspecifyafunctionfthattakesakeyandproducesanintegerifrom0ton-1(0-63).Wecallthisfahashfunction.fwhengiventhesamekeyalwaysproducesthesamei.Forexample,ifwecanassumethatthekeyislimitedtopositiveintegers,whenthekeyisdividedby64,theremaindershouldalwaysfallbetween0and63.Therefore,thiscalculatingexpressionhasapossibilityofbeingthefunctionf.

Whenrecordingrelationships,givenakey,functionfgeneratesi,andplacesthevalueintoindexiofthearraywehaveprepared.Indexaccessintoanarrayisveryfast.Thekeyconcernischangingakeyintoaninteger.

Figure2:Arrayassignment

However,intherealworlditisn’tthateasy.Thereisacriticalproblemwiththisidea.Becausenisonly64,iftherearemorethan64relationshipstoberecorded,itiscertainthattherewillbethesameindexfortwodifferentkeys.Itisalsopossiblethatwithfewerthan64,thesamethingcanoccur.Forexample,giventheprevioushashfunction“key%64”,keys65and129willbothhaveahashvalueof1.Thisiscalledahashvaluecollision.Therearemanywaystoresolvesuchacollision.

Onesolutionistoinsertintothenextelementwhenacollisionoccurs.Thisiscalledopenaddressing.(Figure3).

Figure3:Openaddressing

Otherthanusingthearraylikethis,thereareotherpossibleapproaches,likeusingapointertoarespectivelinkedlistineachelementofthearray.Thenwhenacollisionoccurs,growthelinkedlist.Thisiscalledchaining.(Figure4)st_tableusesthischainingmethod.

Figure4:Chaining

However,ifitcanbedeterminedaprioriwhatsetofkeyswillbeused,itispossibletoimagineahashfunctionthatwillnevercreatecollisions.Thistypeoffunctioniscalleda“perfecthashfunction”.Actually,therearetoolswhichcreateaperfecthashfunctiongivenasetofarbitrarystrings.GNUgperfisoneofthose.ruby‘sparserimplementationusesGNUgperfbut…thisisnotthetimetodiscussit.We’lldiscussthisinthesecondpartofthebook.

DataStructureLetusstartlookingatthesourcecode.Aswrittenintheintroductorychapter,ifthereisdataandcode,itisbettertoreadthedatafirst.Thefollowingisthedatatypeofst_table.

▼st_table

9typedefstructst_tablest_table;

16structst_table{17structst_hash_type*type;18intnum_bins;/*slotcount*/

19intnum_entries;/*totalnumberofentries*/20structst_table_entry**bins;/*slot*/21};

(st.h)

▼structst_table_entry

16structst_table_entry{17unsignedinthash;18char*key;19char*record;20st_table_entry*next;21};

(st.c)

st_tableisthemaintablestructure.st_table_entryisaholderthatstoresonevalue.st_table_entrycontainsamembercallednextwhichofcourseisusedtomakest_table_entryintoalinkedlist.Thisisthechainpartofthechainingmethod.Thest_hash_typedatatypeisused,butIwillexplainthislater.Firstletmeexplaintheotherpartssoyoucancompareandunderstandtheroles.

Figure5:st_tabledatastructure

So,letuscommentonst_hash_type.

▼structst_hash_type

11structst_hash_type{12int(*compare)();/*comparisonfunction*/13int(*hash)();/*hashfunction*/14};

(st.h)

ThisisstillChapter3soletusexamineitattentively.

int(*compare)()

Thispartshows,ofcourse,themembercomparewhichhasadatatypeof“apointertoafunctionthatreturnsanint”.hashisalsoofthesametype.Thisvariableissubstitutedinthefollowingway:

intgreat_function(intn){/*ToDo:Dosomethinggreat!*/returnn;}

{int(*f)();f=great_function;

Anditiscalledlikethis:

(*f)(7);}

Hereletusreturntothest_hash_typecommentary.Ofthetwomembershashandcompare,hashisthehashfunctionfexplainedpreviously.

Ontheotherhand,compareisafunctionthatevaluatesifthekeyisactuallythesameornot.Withthechainingmethod,inthespotwiththesamehashvaluen,multipleelementscanbeinserted.Toknowexactlywhichelementisbeingsearchedfor,thistimeitisnecessarytouseacomparisonfunctionthatwecanabsolutelytrust.comparewillbethatfunction.

Thisst_hash_typeisagoodgeneralizedtechnique.Thehashtableitselfcannotdeterminewhatthestoredkeys’datatypewillbe.Forexample,inruby,st_table’skeysareIDorchar*orVALUE,buttowritethesamekindofhashforeach(datatype)isfoolish.Usually,thethingsthatchangewiththedifferentkeydatatypesarethingslikethehashfunction.Forthingslikememoryallocationandcollisiondetection,typicallymostofthecodeisthesame.Onlythepartswheretheimplementationchangeswithadifferingdatatypewillbebundledupintoafunction,andapointertothatfunctionwillbeused.Inthisfashion,themajorityofthecodethatmakesupthehashtableimplementationcanuseit.

Inobject-orientedlanguages,inthefirstplace,youcanattachaproceduretoanobjectandpassit(around),sothismechanismis

notnecessary.Perhapsitmorecorrecttosaythatthismechanismisbuilt-inasalanguage’sfeature.

st_hash_typeexampleTheusageofadatastructurelikest_hash_typeisgoodasanabstraction.Ontheotherhand,whatkindofcodeitactuallypassesthroughmaybedifficulttounderstand.Ifwedonotexaminewhatsortoffunctionisusedforhashorcompare,wewillnotgraspthereality.Tounderstandthis,itisprobablysufficienttolookatst_init_numtable()introducedinthepreviouschapter.Thisfunctioncreatesatableforintegerdatatypekeys.

▼st_init_numtable()

182st_table*183st_init_numtable()184{185returnst_init_table(&type_numhash);186}

(st.c)

st_init_table()isthefunctionthatallocatesthetablememoryandsoon.type_numhashisanst_hash_type(itisthemembernamed“type”ofst_table).Regardingthistype_numhash:

▼type_numhash

37staticstructst_hash_typetype_numhash={38numcmp,

39numhash,40};

552staticint553numcmp(x,y)554longx,y;555{556returnx!=y;557}

559staticint560numhash(n)561longn;562{563returnn;564}

(st.c)

Verysimple.Thetablethattherubyinterpreterusesisbyandlargethistype_numhash.

st_lookup()

Nowthen,letuslookatthefunctionthatusesthisdatastructure.First,it’sagoodideatolookatthefunctionthatdoesthesearching.Shownbelowisthefunctionthatsearchesthehashtable,st_lookup().

▼st_lookup()

247int248st_lookup(table,key,value)249st_table*table;250registerchar*key;251char**value;

252{253unsignedinthash_val,bin_pos;254registerst_table_entry*ptr;255256hash_val=do_hash(key,table);257FIND_ENTRY(table,ptr,hash_val,bin_pos);258259if(ptr==0){260return0;261}262else{263if(value!=0)*value=ptr->record;264return1;265}266}

(st.c)

Theimportantpartsareprettymuchindo_hash()andFIND_ENTRY().Letuslookattheminorder.

▼do_hash()

68#definedo_hash(key,table)(unsignedint)(*(table)->type->hash)((key))

(st.c)

Justincase,letuswritedownthemacrobodythatisdifficulttounderstand:

(table)->type->hash

isafunctionpointerwherethekeyispassedasaparameter.Thisisthesyntaxforcallingthefunction.*isnotappliedtotable.Inotherwords,thismacroisahashvaluegeneratorforakey,usingthe

preparedhashfunctiontype->hashforeachdatatype.

Next,letusexamineFIND_ENTRY().

▼FIND_ENTRY()

235#defineFIND_ENTRY(table,ptr,hash_val,bin_pos)do{\236bin_pos=hash_val%(table)->num_bins;\237ptr=(table)->bins[bin_pos];\238if(PTR_NOT_EQUAL(table,ptr,hash_val,key)){\239COLLISION;\240while(PTR_NOT_EQUAL(table,ptr->next,hash_val,key)){\241ptr=ptr->next;\242}\243ptr=ptr->next;\244}\245}while(0)

227#definePTR_NOT_EQUAL(table,ptr,hash_val,key)((ptr)!=0&&\(ptr->hash!=(hash_val)||!EQUAL((table),(key),(ptr)->key)))

66#defineEQUAL(table,x,y)\((x)==(y)||(*table->type->compare)((x),(y))==0)

(st.c)

COLLISIONisadebugmacrosowewill(should)ignoreit.

TheparametersofFIND_ENTRY(),startingfromtheleftare:

1. st_table2. thefoundentrywillbepointedtobythisparameter3. hashvalue4. temporaryvariable

And,thesecondparameterwillpointtothefoundst_table_entry*.

Attheoutermostlevel,ado..while(0)isusedtosafelywrapupamultipleexpressionmacro.Thisisruby‘s,orrather,Clanguage’spreprocessoridiom.Inthecaseofif(1),theremaybeadangerofaddinganelsepart.Inthecaseofwhile(1),itbecomesnecessarytoaddabreakattheveryend.

Also,thereisnosemicolonaddedafterthewhile(0).

FIND_ENTRY();

Thisissothatthesemicolonthatisnormallywrittenattheendofanexpressionwillnotgotowaste.

st_add_direct()

Continuingon,letusexaminest_add_direct()whichisafunctionthataddsanewrelationshiptothehashtable.Thisfunctiondoesnotcheckifthekeyisalreadyregistered.Italwaysaddsanewentry.Thisisthemeaningofdirectinthefunctionname.

▼st_add_direct()

308void309st_add_direct(table,key,value)310st_table*table;311char*key;312char*value;313{314unsignedinthash_val,bin_pos;

315316hash_val=do_hash(key,table);317bin_pos=hash_val%table->num_bins;318ADD_DIRECT(table,key,value,hash_val,bin_pos);319}

(st.c)

Justasbefore,thedo_hash()macrothatobtainsavalueiscalledhere.Afterthat,thenextcalculationisthesameasatthestartofFIND_ENTRY(),whichistoexchangethehashvalueforarealindex.

ThentheinsertionoperationseemstobeimplementedbyADD_DIRECT().Sincethenameisalluppercase,wecananticipatethatisamacro.

▼ADD_DIRECT()

268#defineADD_DIRECT(table,key,value,hash_val,bin_pos)\269do{\270st_table_entry*entry;\271if(table->num_entries/(table->num_bins)\>ST_DEFAULT_MAX_DENSITY){\272rehash(table);\273bin_pos=hash_val%table->num_bins;\274}\275\/*(A)*/\276entry=alloc(st_table_entry);\277\278entry->hash=hash_val;\279entry->key=key;\280entry->record=value;\/*(B)*/\281entry->next=table->bins[bin_pos];\282table->bins[bin_pos]=entry;\283table->num_entries++;\

284}while(0)

(st.c)

ThefirstifisanexceptioncasesoIwillexplainitafterwards.

(A)Allocateandinitializeast_table_entry.

(B)Inserttheentryintothestartofthelist.Thisistheidiomforhandlingthelist.Inotherwords,

entry->next=list_beg;list_beg=entry;

makesitpossibletoinsertanentrytothefrontofthelist.Thisissimilarto“cons-ing”intheLisplanguage.Checkforyourselfthateveniflist_begisNULL,thiscodeholdstrue.

Now,letmeexplainthecodeIleftaside.

▼ADD_DIRECT()-rehash

271if(table->num_entries/(table->num_bins)\>ST_DEFAULT_MAX_DENSITY){\272rehash(table);\273bin_pos=hash_val%table->num_bins;\274}\

(st.c)

DENSITYis“concentration”.Inotherwords,thisconditionalchecksifthehashtableis“crowded”ornot.Inthest_table,asthenumber

ofvaluesthatusethesamebin_posincreases,thelongerthelinklistbecomes.Inotherwords,searchbecomesslower.Thatiswhyforagivenbincount,whentheaverageelementsperbinbecometoomany,binisincreasedandthecrowdingisreduced.

ThecurrentST_DEFAULT_MAX_DENSITYis

▼ST_DEFAULT_MAX_DENSITY

23#defineST_DEFAULT_MAX_DENSITY5

(st.c)

Becauseofthissetting,ifinallbin_posthereare5st_table_entries,thenthesizewillbeincreased.

st_insert()

st_insert()isnothingmorethanacombinationofst_add_direct()andst_lookup(),soifyouunderstandthosetwo,thiswillbeeasy.

▼st_insert()

286int287st_insert(table,key,value)288registerst_table*table;289registerchar*key;290char*value;291{292unsignedinthash_val,bin_pos;293registerst_table_entry*ptr;294295hash_val=do_hash(key,table);

296FIND_ENTRY(table,ptr,hash_val,bin_pos);297298if(ptr==0){299ADD_DIRECT(table,key,value,hash_val,bin_pos);300return0;301}302else{303ptr->record=value;304return1;305}306}

(st.c)

Itchecksiftheelementisalreadyregisteredinthetable.Onlywhenitisnotregisteredwillitbeadded.Ifthereisainsertion,return0.Ifthereisnoinsertion,returna1.

IDandSymbols

I’vealreadydiscussedwhatanIDis.Itisacorrespondencebetweenanarbitrarystringofcharactersandavalue.Itisusedtodeclarevariousnames.Theactualdatatypeisunsignedint.

Fromchar*toIDTheconversionfromstringtoIDisexecutedbyrb_intern().Thisfunctionisratherlong,solet’somitthemiddle.

▼rb_intern()(simplified)

5451staticst_table*sym_tbl;/*char*toID*/5452staticst_table*sym_rev_tbl;/*IDtochar**/

5469ID5470rb_intern(name)5471constchar*name;5472{5473constchar*m=name;5474IDid;5475intlast;5476/*Ifforaname,thereisacorrespondingIDthatisalreadyregistered,thenreturnthatID*/5477if(st_lookup(sym_tbl,name,&id))5478returnid;

/*omitted...createanewID*/

/*registerthenameandIDrelation*/5538id_regist:5539name=strdup(name);5540st_add_direct(sym_tbl,name,id);5541st_add_direct(sym_rev_tbl,id,name);5542returnid;5543}

(parse.y)

ThestringandIDcorrespondencerelationshipcanbeaccomplishedbyusingthest_table.Thereprobablyisn’tanyespeciallydifficultparthere.

Whatistheomittedsectiondoing?Itistreatingglobalvariablenamesandinstancevariablesnamesasspecialandflaggingthem.Thisisbecauseintheparser,itisnecessarytoknowthevariable’sclassificationfromtheID.However,thefundamentalpartofIDisunrelatedtothis,soIwon’texplainithere.

FromIDtochar*Thereverseofrb_intern()isrb_id2name(),whichtakesanIDandgeneratesachar*.Youprobablyknowthis,butthe2inid2nameis“to”.“To”and“two”havethesamepronounciation,so“2”isusedfor“to”.Thissyntaxisoftenseen.

ThisfunctionalsosetstheIDclassificationflagssoitislong.Letmesimplifyit.

▼rb_id2name()(simplified)

char*rb_id2name(id)IDid;{char*name;

if(st_lookup(sym_rev_tbl,id,&name))returnname;return0;}

Maybeitseemsthatitisalittleover-simplified,butinrealityifweremovethedetailsitreallybecomesthissimple.

ThepointIwanttoemphasizeisthatthefoundnameisnotcopied.TherubyAPIdoesnotrequire(orrather,itforbids)thefree()-ingofthereturnvalue.Also,whenparametersarepassed,italwayscopiesthem.Inotherwords,thecreationandreleaseiscompletedbyoneside,eitherbytheuserorbyruby.

Sothen,whencreationandreleasecannotbeaccomplished(whenpasseditisnotreturned)onavalue,thenaRubyobjectisused.Ihavenotyetdiscussedit,butaRubyobjectisautomaticallyreleasedwhenitisnolongerneeded,evenifwearenottakingcareoftheobject.

ConvertingVALUEandIDIDisshownasaninstanceoftheSymbolclassattheRubylevel.Anditcanbeobtainedlikeso:"string".intern.TheimplementationofString#internisrb_str_intern().

▼rb_str_intern()

2996staticVALUE2997rb_str_intern(str)2998VALUEstr;2999{3000IDid;30013002if(!RSTRING(str)->ptr||RSTRING(str)->len==0){3003rb_raise(rb_eArgError,"interningemptystring");3004}3005if(strlen(RSTRING(str)->ptr)!=RSTRING(str)->len)3006rb_raise(rb_eArgError,"stringcontains`\\0'");3007id=rb_intern(RSTRING(str)->ptr);3008returnID2SYM(id);3009}

(string.c)

Thisfunctionisquitereasonableasarubyclasslibrarycodeexample.PleasepayattentiontothepartwhereRSTRING()isused

andcasted,andwherethedatastructure’smemberisaccessed.

Let’sreadthecode.First,rb_raise()ismerelyerrorhandlingsoweignoreitfornow.Therb_intern()wepreviouslyexaminedishere,andalsoID2SYMishere.ID2SYM()isamacrothatconvertsIDtoSymbol.

AndthereverseoperationisaccomplishedusingSymbol#to_sandsuch.Theimplementationisinsym_to_s.

▼sym_to_s()

522staticVALUE523sym_to_s(sym)524VALUEsym;525{526returnrb_str_new2(rb_id2name(SYM2ID(sym)));527}

(object.c)

SYM2ID()isthemacrothatconvertsSymbol(VALUE)toanID.

Itlookslikethefunctionisnotdoinganythingunreasonable.However,itisprobablynecessarytopayattentiontotheareaaroundthememoryhandling.rb_id2name()returnsachar*thatmustnotbefree().rb_str_new2()copiestheparameter’schar*andusesthecopy(anddoesnotchangetheparameter).Inthiswaythepolicyisconsistent,whichallowsthelinetobewrittenjustbychainingthefunctions.

TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera

CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License

RubyHackingGuide

TranslatedbyVincentISAMBART

Chapter4:Classesand

modules

Inthischapter,we’llseethedetailsofthedatastructurescreatedbyclassesandmodules.

Classesandmethodsdefinition

First,I’dliketohavealookathowRubyclassesaredefinedattheClevel.Thischapterinvestigatesalmostonlyparticularcases,soI’dlikeyoutoknowfirstthewayusedmostoften.

ThemainAPItodefineclassesandmodulesconsistsofthefollowing6functions:

rb_define_class()

rb_define_class_under()

rb_define_module()

rb_define_module_under()

rb_define_method()

rb_define_singleton_method()

Thereareafewotherversionsofthesefunctions,buttheextensionlibrariesandevenmostofthecorelibraryisdefinedusingjustthisAPI.I’llintroducetoyouthesefunctionsonebyone.

Classdefinitionrb_define_class()definesaclassatthetop-level.Let’staketheRubyarrayclass,Array,asanexample.

▼Arrayclassdefinition

19VALUErb_cArray;

1809void1810Init_Array()1811{1812rb_cArray=rb_define_class("Array",rb_cObject);

(array.c)

rb_cObjectandrb_cArraycorrespondrespectivelytoObjectandArrayattheRubylevel.Theaddedprefixrbshowsthatitbelongstorubyandthecthatitisaclassobject.Thesenamingrulesareusedeverywhereinruby.

Thiscalltorb_define_class()definesaclasscalledArray,whichinheritsfromObject.Atthesametimeasrb_define_class()createstheclassobject,italsodefinestheconstant.ThatmeansthatafterthisyoucanalreadyaccessArrayfromaRubyprogram.ItcorrespondstothefollowingRubyprogram:

classArray<Object

I’dlikeyoutonotethefactthatthereisnoend.Itwaswrittenlikethisonpurpose.Itisbecausewithrb_define_class()thebodyoftheclasshasnotbeenexecuted.

NestedclassdefinitionAfterthat,there’srb_define_class_under().Thisfunctiondefinesaclassnestedinanotherclassormodule.Thistimetheexampleiswhatisreturnedbystat(2),File::Stat.

▼DefinitionofFile::Stat

78VALUErb_cFile;80staticVALUErb_cStat;

2581rb_cFile=rb_define_class("File",rb_cIO);2674rb_cStat=rb_define_class_under(rb_cFile,"Stat",rb_cObject);

(file.c)

ThiscodecorrespondstothefollowingRubyprogram;

classFile<IOclassStat<Object

ThistimeagainIomittedtheendonpurpose.

Moduledefinition

rb_define_module()issimplesolet’sendthisquickly.

▼DefinitionofEnumerable

17VALUErb_mEnumerable;

492rb_mEnumerable=rb_define_module("Enumerable");

(enum.c)

Theminthebeginningofrb_mEnumerableissimilartothecforclasses:itshowsthatitisamodule.ThecorrespondingRubyprogramis:

moduleEnumerable

rb_define_module_under()isnotusedmuchsowe’llskipit.

MethoddefinitionThistimethefunctionistheonefordefiningmethods,rb_define_method().It’susedveryoften.We’lltakeonceagainanexamplefromArray.

▼DefinitionofArray#to_s

1818rb_define_method(rb_cArray,"to_s",rb_ary_to_s,0);

(array.c)

Withthistheto_smethodisdefinedinArray.Themethodbodyis

givenbyafunctionpointer(rb_ary_to_s).Thefourthparameteristhenumberofparameterstakenbythemethod.Asto_sdoesnottakeanyparameters,it’s0.IfwewritethecorrespondingRubyprogram,we’llhavethis:

classArray<Objectdefto_s#contentofrb_ary_to_s()endend

Ofcoursetheclasspartisnotincludedinrb_define_method()andonlythedefpartisaccurate.Butifthereisnoclasspart,itwilllooklikethemethodisdefinedlikeafunction,soIalsowrotetheenclosingclasspart.

Onemoreexample,thistimetakingaparameter:

▼DefinitionofArray#concat

1835rb_define_method(rb_cArray,"concat",rb_ary_concat,1);

(array.c)

Theclassforthedefinitionisrb_cArray(Array),themethodnameisconcat,itsbodyisrb_ary_concat()andthenumberofparametersis1.ItcorrespondstowritingthecorrespondingRubyprogram:

classArray<Objectdefconcat(str)#contentofrb_ary_concat()end

end

SingletonmethodsdefinitionWecandefinemethodsthatarespecifictoasingleobjectinstance.Theyarecalledsingletonmethods.AsIusedFile.unlinkasanexampleinchapter1“Rubylanguageminimum”,Ifirstwantedtoshowithere,butforaparticularreasonwe’lllookatFile.linkinstead.

▼DefinitionofFile.link

2624rb_define_singleton_method(rb_cFile,"link",rb_file_s_link,2);

(file.c)

It’susedlikerb_define_method().Theonlydifferenceisthatherethefirstparameterisjustthe“object”wherethemethodisdefined.Inthiscase,it’sdefinedinrb_cFile.

EntrypointBeingabletomakedefinitionslikebeforeisgreat,butwherearethesefunctionscalledfrom,andbywhatmeansaretheyexecuted?ThesedefinitionsaregroupedinfunctionsnamedInit_xxxx().Forinstance,forArrayafunctionInit_Array()likethishasbeenmade:

▼Init_Array

1809void1810Init_Array()1811{1812rb_cArray=rb_define_class("Array",rb_cObject);1813rb_include_module(rb_cArray,rb_mEnumerable);18141815rb_define_singleton_method(rb_cArray,"allocate",rb_ary_s_alloc,0);1816rb_define_singleton_method(rb_cArray,"[]",rb_ary_s_create,-1);1817rb_define_method(rb_cArray,"initialize",rb_ary_initialize,-1);1818rb_define_method(rb_cArray,"to_s",rb_ary_to_s,0);1819rb_define_method(rb_cArray,"inspect",rb_ary_inspect,0);1820rb_define_method(rb_cArray,"to_a",rb_ary_to_a,0);1821rb_define_method(rb_cArray,"to_ary",rb_ary_to_a,0);1822rb_define_method(rb_cArray,"frozen?",rb_ary_frozen_p,0);

(array.c)

TheInitforthebuilt-infunctionsareexplicitlycalledduringthestartupofruby.Thisisdoneininits.c.

▼rb_call_inits()

47void48rb_call_inits()49{50Init_sym();51Init_var_tables();52Init_Object();53Init_Comparable();54Init_Enumerable();55Init_Precision();56Init_eval();57Init_String();58Init_Exception();59Init_Thread();60Init_Numeric();61Init_Bignum();62Init_Array();

(inits.c)

Thisway,Init_Array()iscalledproperly.

Thatexplainsitforthebuilt-inlibraries,butwhataboutextensionlibraries?Infact,forextensionlibrariestheconventionisthesame.Takethefollowingcode:

require"myextension"

Withthis,iftheloadedextensionlibraryismyextension.so,atloadtime,the(extern)functionnamedInit_myextension()iscalled.Howtheyarecalledisbeyondthescopeofthischapter.Forthat,youshouldreadchapter18,“Load”.Herewe’lljustendthiswithanexampleofInit.

Thefollowingexampleisfromstringio,anextensionlibraryprovidedwithruby,thatistosaynotfromabuilt-inlibrary.

▼Init_stringio()(beginning)

895void896Init_stringio()897{898VALUEStringIO=rb_define_class("StringIO",rb_cData);899rb_define_singleton_method(StringIO,"allocate",strio_s_allocate,0);900rb_define_singleton_method(StringIO,"open",strio_s_open,-1);901rb_define_method(StringIO,"initialize",strio_initialize,-1);902rb_enable_super(StringIO,"initialize");903rb_define_method(StringIO,"become",strio_become,1);904rb_define_method(StringIO,"reopen",strio_reopen,-1);

(ext/stringio/stringio.c)

Singletonclasses

rb_define_singleton_method()

Youshouldnowbeabletomoreorlessunderstandhownormalmethodsaredefined.Somehowmakingthebodyofthemethod,thenregisteringitinm_tblwilldo.Butwhataboutsingletonmethods?We’llnowlookintothewaysingletonmethodsaredefined.

▼rb_define_singleton_method()

721void722rb_define_singleton_method(obj,name,func,argc)723VALUEobj;724constchar*name;725VALUE(*func)();726intargc;727{728rb_define_method(rb_singleton_class(obj),name,func,argc);729}

(class.c)

AsIexplained,rb_define_method()isafunctionusedtodefinenormalmethods,sothedifferencefromnormalmethodsisonlyrb_singleton_class().Butwhatoneartharesingletonclasses?

Inbrief,singletonclassesarevirtualclassesthatareonlyusedtoexecutesingletonmethods.Singletonmethodsarefunctionsdefinedinsingletonclasses.Classesthemselvesareinthefirstplace(inaway)the“implementation”tolinkobjectsandmethods,butsingletonclassesareevenmoreontheimplementationside.IntheRubylanguageway,theyarenotformallyincluded,anddon’tappearmuchattheRubylevel.

rb_singleton_class()

Well,let’sconfirmwhatthesingletonclassesaremadeof.It’stoosimpletojustshowyouthecodeofafunctioneachtimesothistimeI’lluseanewweapon,acallgraph.

rb_define_singleton_methodrb_define_methodrb_singleton_classSPECIAL_SINGLETONrb_make_metaclassrb_class_bootrb_singleton_class_attached

Callgraphsaregraphsshowingcallingrelationshipsamongfunctions(ormoregenerallyprocedures).Thecallgraphsshowingallthecallswritteninthesourcecodearecalledstaticcallgraphs.Theonesexpressingonlythecallsdoneduringanexecutionarecalleddynamiccallgraphs.

Thisdiagramisastaticcallgraphandtheindentationexpresseswhichfunctioncallswhichone.Forinstance,rb_define_singleton_method()callsrb_define_method()and

rb_singleton_class().Andthisrb_singleton_class()itselfcallsSPECIAL_SINGLETON()andrb_make_metaclass().Inordertoobtaincallgraphs,youcanusecflowandsuch.{cflow:seealsodoc/callgraph.htmlintheattachedCD-ROM}

Inthisbook,becauseIwantedtoobtaincallgraphsthatcontainonlyfunctions,Icreatedaruby-specifictoolbymyself.Perhapsitcanbegeneralizedbymodifyingitscodeanalyzingpart,thusI’dliketosomehowmakeituntilaroundthepublicationofthisbook.Thesesituationsarealsoexplainedindoc/callgraph.htmloftheattachedCD-ROM.

Let’sgobacktothecode.Whenlookingatthecallgraph,youcanseethatthecallsmadebyrb_singleton_class()goverydeep.Untilnowallcalllevelswereshallow,sowecouldsimplylookatthefunctionswithoutgettingtoolost.Butatthisdepth,IeasilyforgetwhatIwasdoing.Insuchsituationyoumustbringacallgraphtokeepawareofwhereitiswhenreading.Thistime,asanexample,we’lldecodetheproceduresbelowrb_singleton_class()inparallel.Weshouldlookoutforthefollowingtwopoints:

Whatexactlyaresingletonclasses?Whatisthepurposeofsingletonclasses?

NormalclassesandsingletonclassesSingletonclassesarespecialclasses:they’rebasicallythesameasnormalclasses,butthereareafewdifferences.Wecansaythat

findingthesedifferencesisexplainingconcretelysingletonclasses.

Whatshouldwedotofindthem?Weshouldfindthedifferencesbetweenthefunctioncreatingnormalclassesandtheonecreatingsingletonclasses.Forthis,wehavetofindthefunctionforcreatingnormalclasses.Thatisasnormalclassescanbedefinedbyrb_define_class(),itmustcallinawayoranotherafunctiontocreatenormalclasses.Forthemoment,we’llnotlookatthecontentofrb_define_class()itself.Ihavesomereasonstobeinterestedinsomethingthat’sdeeper.That’swhywewillfirstlookatthecallgraphofrb_define_class().

rb_define_classrb_class_inheritedrb_define_class_idrb_class_newrb_class_bootrb_make_metaclassrb_class_bootrb_singleton_class_attached

I’minterestedbyrb_class_new().Doesn’tthisnamemeansitcreatesanewclass?Let’sconfirmthat.

▼rb_class_new()

37VALUE38rb_class_new(super)39VALUEsuper;40{41Check_Type(super,T_CLASS);42if(super==rb_cClass){43rb_raise(rb_eTypeError,"can'tmakesubclassofClass");

44}45if(FL_TEST(super,FL_SINGLETON)){46rb_raise(rb_eTypeError,"can'tmakesubclassofvirtualclass");47}48returnrb_class_boot(super);49}

(class.c)

Check_Type()ischecksthetypeofobjectstructure,sowecanignoreit.rb_raise()iserrorhandlingsowecanignoreit.Onlyrb_class_boot()remains.Solet’slookatit.

▼rb_class_boot()

21VALUE22rb_class_boot(super)23VALUEsuper;24{25NEWOBJ(klass,structRClass);/*allocatesstructRClass*/26OBJSETUP(klass,rb_cClass,T_CLASS);/*initializationoftheRBasicpart*/2728klass->super=super;/*(A)*/29klass->iv_tbl=0;30klass->m_tbl=0;31klass->m_tbl=st_init_numtable();3233OBJ_INFECT(klass,super);34return(VALUE)klass;35}

(class.c)

NEWOBJ()andOBJSETUP()arefixedexpressionsusedwhencreatingRubyobjectsthatpossessoneofthebuilt-instructuretypes(structRxxxx).Theyarebothmacros.InNEWOBJ(),structRClassiscreated

andthepointerisputinitsfirstparameterklass.InOBJSETUP(),thestructRBasicmemberoftheRClass(andthusbasic.klassandbasic.flags)isinitialized.

OBJ_INFECT()isamacrorelatedtosecurity.Fromnowon,we’llignoreit.

At(A),thesupermemberofklassissettothesuperparameter.Itlookslikerb_class_boot()isafunctionthatcreatesaclassinheritingfromsuper.

So,asrb_class_boot()isafunctionthatcreatesaclass,andrb_class_new()isalmostidentical.

Then,let’soncemorelookatrb_singleton_class()’scallgraph:

rb_singleton_classSPECIAL_SINGLETONrb_make_metaclassrb_class_bootrb_singleton_class_attached

Herealsorb_class_boot()iscalled.Souptothatpoint,it’sthesameasinnormalclasses.What’sgoingonafteriswhat’sdifferentbetweennormalclassesandsingletonclasses,inotherwordsthecharacteristicsofsingletonclasses.Ifeverything’sclearsofar,wejustneedtoreadrb_singleton_class()andrb_make_metaclass().

Compressedrb_singleton_class()

rb_singleton_class()isalittlelongsowe’llfirstremoveitsnon-essentialparts.

▼rb_singleton_class()

678#defineSPECIAL_SINGLETON(x,c)do{\679if(obj==(x)){\680returnc;\681}\682}while(0)

684VALUE685rb_singleton_class(obj)686VALUEobj;687{688VALUEklass;689690if(FIXNUM_P(obj)||SYMBOL_P(obj)){691rb_raise(rb_eTypeError,"can'tdefinesingleton");692}693if(rb_special_const_p(obj)){694SPECIAL_SINGLETON(Qnil,rb_cNilClass);695SPECIAL_SINGLETON(Qfalse,rb_cFalseClass);696SPECIAL_SINGLETON(Qtrue,rb_cTrueClass);697rb_bug("unknownimmediate%ld",obj);698}699700DEFER_INTS;701if(FL_TEST(RBASIC(obj)->klass,FL_SINGLETON)&&702(BUILTIN_TYPE(obj)==T_CLASS||703rb_iv_get(RBASIC(obj)->klass,"__attached__")==obj)){704klass=RBASIC(obj)->klass;705}706else{707klass=rb_make_metaclass(obj,RBASIC(obj)->klass);708}709if(OBJ_TAINTED(obj)){710OBJ_TAINT(klass);711}712else{

713FL_UNSET(klass,FL_TAINT);714}715if(OBJ_FROZEN(obj))OBJ_FREEZE(klass);716ALLOW_INTS;717718returnklass;719}

(class.c)

Thefirstandthesecondhalfareseparatedbyablankline.Thefirsthalfhandlesspecialcasesandthesecondhalfhandlesthegeneralcase.Inotherwords,thesecondhalfisthetrunkofthefunction.That’swhywe’llkeepitforlaterandtalkaboutthefirsthalf.

Everythingthatishandledinthefirsthalfarenon-pointerVALUEs,itmeanstheirobjectstructsdonotexist.First,FixnumandSymbolareexplicitlypicked.Then,rb_special_const_p()isafunctionthatreturnstruefornon-pointerVALUEs,sothereonlyQtrue,QfalseandQnilshouldgetcaught.Otherthanthat,therearenovalidnon-pointerVALUEsoitwouldbereportedasabugwithrb_bug().

DEFER_INTS()andALLOW_INTS()bothendwiththesameINTSsoyoushouldseeapairinthem.That’sthecase,andtheyaremacrosrelatedtosignals.Becausetheyaredefinedinrubysig.h,youcanguessthatINTSistheabbreviationofinterrupts.Youcanignorethem.

Compressedrb_make_metaclass()▼rb_make_metaclass()

142VALUE143rb_make_metaclass(obj,super)144VALUEobj,super;145{146VALUEklass=rb_class_boot(super);147FL_SET(klass,FL_SINGLETON);148RBASIC(obj)->klass=klass;149rb_singleton_class_attached(klass,obj);150if(BUILTIN_TYPE(obj)==T_CLASS){151RBASIC(klass)->klass=klass;152if(FL_TEST(obj,FL_SINGLETON)){153RCLASS(klass)->super=RBASIC(rb_class_real(RCLASS(obj)->super))->klass;154}155}156157returnklass;158}

(class.c)

Wealreadysawrb_class_boot().Itcreatesa(normal)classusingthesuperparameterasitssuperclass.Afterthat,theFL_SINGLETONofthisclassisset.Thisisclearlysuspicious.Thenameofthefunctionmakesusthinkthatitistheindicationofasingletonclass.

Whataresingletonclasses?Finishingtheaboveprocess,furthermore,we’llthroughawaythedeclarationsbecauseparameters,returnvaluesandlocalvariablesareallVALUE.Thatmakesusabletocompresstothefollowing:

▼rb_singleton_class()rb_make_metaclass()(aftercompression)

rb_singleton_class(obj)

{if(FL_TEST(RBASIC(obj)->klass,FL_SINGLETON)&&(BUILTIN_TYPE(obj)==T_CLASS||BUILTIN_TYPE(obj)==T_MODULE)&&rb_iv_get(RBASIC(obj)->klass,"__attached__")==obj){klass=RBASIC(obj)->klass;}else{klass=rb_make_metaclass(obj,RBASIC(obj)->klass);}returnklass;}

rb_make_metaclass(obj,super){klass=createaclasswithsuperassuperclass;FL_SET(klass,FL_SINGLETON);RBASIC(obj)->klass=klass;rb_singleton_class_attached(klass,obj);if(BUILTIN_TYPE(obj)==T_CLASS){RBASIC(klass)->klass=klass;if(FL_TEST(obj,FL_SINGLETON)){RCLASS(klass)->super=RBASIC(rb_class_real(RCLASS(obj)->super))->klass;}}

returnklass;}

Theconditionoftheifstatementofrb_singleton_class()seemsquitecomplicated.However,thisconditionisnotconnectedtorb_make_metaclass(),whichisthemainstream,sowe’llseeitlater.Let’sfirstthinkaboutwhathappensonthefalsebranchoftheif.

TheBUILTIN_TYPE()ofrb_make_metaclass()issimilartoTYPE()asitisamacrotogetthestructuretypeflag(T_xxxx).Thatmeansthischeckinrb_make_metaclassmeans“ifobjisaclass”.Forthemoment

weassumethatobjisaclass,sowe’llremoveit.

Withthesesimplifications,wegetthefollowing:

▼rb_singleton_class()rb_make_metaclass()(afterrecompression)

rb_singleton_class(obj){klass=createaclasswithRBASIC(obj)->klassassuperclass;FL_SET(klass,FL_SINGLETON);RBASIC(obj)->klass=klass;returnklass;}

Butthereisstillaquitehardtounderstandsidetoit.That’sbecauseklassisusedtoooften.Solet’srenametheklassvariabletosclass.

▼rb_singleton_class()rb_make_metaclass()(variablesubstitution)

rb_singleton_class(obj){sclass=createaclasswithRBASIC(obj)->klassassuperclass;FL_SET(sclass,FL_SINGLETON);RBASIC(obj)->klass=sclass;returnsclass;}

Nowitshouldbeveryeasytounderstand.Tomakeitevensimpler,I’verepresentedwhatisdonewithadiagram(figure1).Inthehorizontaldirectionisthe“instance–class”relation,andintheverticaldirectionisinheritance(thesuperclassesareabove).

Figure1:rb_singleton_class

Whencomparingthefirstandlastpartofthisdiagram,youcanunderstandthatsclassisinsertedwithoutchangingthestructure.That’sallthereistosingletonclasses.Inotherwordstheinheritanceisincreasedonestep.Bydefiningmethodsthere,wecandefinemethodswhichhavecompletelynothingtodowithotherinstancesofklass.

SingletonclassesandinstancesBytheway,didyounoticeabout,duringthecompressionprocess,thecalltorb_singleton_class_attached()wasstealthilyremoved?Here:

rb_make_metaclass(obj,super){klass=createaclasswithsuperassuperclass;FL_SET(klass,FL_SINGLETON);RBASIC(obj)->klass=klass;rb_singleton_class_attached(klass,obj);/*THIS*/

Let’shavealookatwhatitdoes.

▼rb_singleton_class_attached()

130void131rb_singleton_class_attached(klass,obj)132VALUEklass,obj;133{134if(FL_TEST(klass,FL_SINGLETON)){135if(!RCLASS(klass)->iv_tbl){136RCLASS(klass)->iv_tbl=st_init_numtable();137}138st_insert(RCLASS(klass)->iv_tbl,rb_intern("__attached__"),obj);139}140}

(class.c)

IftheFL_SINGLETONflagofklassisset…inotherwordsifit’sasingletonclass,putthe__attached__→objrelationintheinstancevariabletableofklass(iv_tbl).That’showitlookslike(inourcaseklassisalwaysasingletonclass…inotherwordsitsFL_SINGLETONflagisalwaysset).

__attached__doesnothavethe@prefix,butit’sstoredintheinstancevariablestablesoit’sstillaninstancevariable.SuchaninstancevariablecanneverbereadattheRubylevelsoitcanbeusedtokeepvaluesforthesystem’sexclusiveuse.

Let’snowthinkabouttherelationshipbetweenklassandobj.klassisthesingletonclassofobj.Inotherwords,this“invisible”instancevariableallowsthesingletonclasstorememberthe

instanceitwascreatedfrom.Itsvalueisusedwhenthesingletonclassischanged,notablytocallhookmethodsontheinstance(i.e.obj).Forexample,whenamethodisaddedtoasingletonclass,theobj‘ssingleton_method_addedmethodiscalled.Thereisnologicalnecessitytodoingit,itwasdonebecausethat’showitwasdefinedinthelanguage.

Butisitreallyallright?Storingtheinstancein__attached__willforceonesingletonclasstohaveonlyoneattachedinstance.Forexample,bygetting(insomewayoranother)thesingletonclassandcallingnewonit,won’tasingletonclassenduphavingmultipleinstances?

Thiscannotbedonebecausetheproperchecksaredonetopreventthecreationofaninstanceofasingletonclass.

Singletonclassesareinthefirstplaceforsingletonmethods.Singletonmethodsaremethodsexistingonlyonaparticularobject.Ifsingletonclassescouldhavemultipleinstances,theywouldbethesameasnormalclasses.Hence,eachsingletonclasshasonlyoneinstance…orrather,itmustbelimitedtoone.

SummaryWe’vedonealot,maybemadearealmayhem,solet’sfinishandputeverythinginorderwithasummary.

Whataresingletonclasses?TheyareclassesthathavetheFL_SINGLETONflagsetandthatcanonlyhaveoneinstance.

Whataresingletonmethods?Theyaremethodsdefinedinthesingletonclassofanobject.

Metaclasses

Inheritanceofsingletonmethods

InfinitechainofclassesEvenaclasshasaclass,andit’sClass.AndtheclassofClassisagainClass.Wefindourselvesinaninfiniteloop(figure2).

Figure2:Infiniteloopofclasses

Uptohereit’ssomethingwe’vealreadygonethrough.What’sgoingafterthatisthethemeofthischapter.Whydoclasseshavetomakealoop?

First,inRubyalldataareobjects.AndclassesaredatainRubysotheyhavetobeobjects.

Astheyareobjects,theymustanswertomethods.Andsettingtherule“toanswertomethodsyoumustbelongtoaclass”made

processingeasier.That’swherecomestheneedforaclasstoalsohaveaclass.

Let’sbaseourselvesonthisandthinkaboutthewaytoimplementit.First,wecantryfirstwiththemostnaïveway,Class‘sclassisClassClass,ClassClass’sclassisClassClassClass…,chainingclassesofclassesonebyone.Butwhicheverthewayyoulookatit,thiscan’tbeimplementedeffectively.That’swhyit’scommoninobjectorientedlanguageswhereclassesareobjectsthatClass’sclassistoClassitself,creatinganendlessvirtualinstance-classrelationship.

((errata:ThisstructureisimplementedefficientlyinrecentRuby1.8,thusitcanbeimplementedefficiently.))

I’mrepeatingmyself,butthefactthatClass‘sclassisClassisonlytomaketheimplementationeasier,there’snothingimportantinthislogic.

“Classisalsoanobject”“Everythingisanobject”isoftenusedasadvertisingstatementwhenspeakingaboutRuby.Andasapartofthat,“Classesarealsoobjects!”alsoappears.Buttheseexpressionsoftengotoofar.Whenthinkingaboutthesesayings,wehavetosplitthemintwo:

alldataareobjectsclassesaredata

Talkingaboutdataorcodemakesadiscussionmuchhardertounderstand.That’swhyherewe’llrestrictthemeaningof“data”to“whatcanbeputinvariablesinprograms”.

Beingabletomanipulateclassesfromprogramsgivesprogramstheabilitytomanipulatethemselves.Thisiscalledreflection.InRuby,whichisaobjectorientedlanguageandfurthermorehasclasses,itisequivalenttobeabletodirectlymanipulateclasses.

Nevertheless,there’salsoawayinwhichclassesarenotobjects.Forexample,there’snoprobleminprovidingafeaturetomanipulateclassesasfunction-stylemethods(functionsdefinedatthetop-level).However,asinsidetheinterpretertherearedatastructurestorepresenttheclasses,it’smorenaturalinobjectorientedlanguagestomakethemavailabledirectly.AndRubydidthischoice.

Furthermore,anobjectiveinRubyisforalldatatobeobjects.That’swhyit’sappropriatetomakethemobjects.

Bytheway,thereisalsoareasonnotlinkedtoreflectionwhyinRubyclasseshadtobemadeobjects.Thatistopreparetheplacetodefinemethodswhichareindependentfrominstances(whatarecalledstaticmethodsinJavaandC++).

Andtoimplementstaticmethods,anotherthingwasnecessary:singletonmethods.Bychainreaction,thatalsomakessingletonclassesnecessary.Figure3showsthesedependencyrelationships.

Figure3:Requirementsdependencies

ClassmethodsinheritanceInRuby,singletonmethodsdefinedinaclassarecalledclassmethods.However,theirspecificationisalittlestrange.Forsomereasons,classmethodsareinheritable.

classAdefA.test#definesasingletonmethodinAputs("ok")endend

classB<Aend

B.test()#callsit

Thiscan’toccurwithsingletonmethodsfromobjectsthatarenotclasses.Inotherwords,classesaretheonlyoneshandledspecially.Inthefollowingsectionwe’llseehowclassmethodsareinherited.

SingletonclassofaclassAssumingthatclassmethodsareinherited,whereisthisoperationdone?Itmustbedoneeitheratclassdefinition(creation)oratsingletonmethoddefinition.Thenlet’sfirstlookatthecodedefiningclasses.

Classdefinitionmeansofcourserb_define_class().Nowlet’stakethecallgraphofthisfunction.

rb_define_classrb_class_inheritedrb_define_class_idrb_class_newrb_class_bootrb_make_metaclassrb_class_bootrb_singleton_class_attached

Ifyou’rewonderingwhereyou’veseenitbefore,welookedatitintheprevioussection.Atthattimeyoudidnotseeitbutifyoulookclosely,somehowrb_make_metaclass()appeared.Aswesawbefore,thisfunctionintroducesasingletonclass.Thisisverysuspicious.Whyisthiscalledevenifwearenotdefiningasingletonfunction?Furthermore,whyisthelowerlevelrb_make_metaclass()usedinsteadofrb_singleton_class()?Itlookslikewehavetocheckthesesurroundingsagain.

rb_define_class_id()

Let’sfirststartourreadingwithitscaller,rb_define_class_id().

▼rb_define_class_id()

160VALUE161rb_define_class_id(id,super)162IDid;163VALUEsuper;164{165VALUEklass;166167if(!super)super=rb_cObject;168klass=rb_class_new(super);169rb_name_class(klass,id);170rb_make_metaclass(klass,RBASIC(super)->klass);171172returnklass;173}

(class.c)

rb_class_new()wasafunctionthatcreatesaclasswithsuperasitssuperclass.rb_name_class()‘snamemeansitnamesaclass,butforthemomentwedonotcareaboutnamessowe’llskipit.Afterthatthere’stherb_make_metaclass()inquestion.I’mconcernedbythefactthatwhencalledfromrb_singleton_class(),theparametersweredifferent.Lasttimewaslikethis:

rb_make_metaclass(obj,RBASIC(obj)->klass);

Butthistimeislikethis:

rb_make_metaclass(klass,RBASIC(super)->klass);

Soasyoucanseeit’sslightlydifferent.Howdotheresultschangedependingonthat?Let’shaveonceagainalookatasimplified

rb_make_metaclass().

rb_make_metaclass(oncemore)▼rb_make_metaclass(afterfirstcompression)

rb_make_metaclass(obj,super){klass=createaclasswithsuperassuperclass;FL_SET(klass,FL_SINGLETON);RBASIC(obj)->klass=klass;rb_singleton_class_attached(klass,obj);if(BUILTIN_TYPE(obj)==T_CLASS){RBASIC(klass)->klass=klass;if(FL_TEST(obj,FL_SINGLETON)){RCLASS(klass)->super=RBASIC(rb_class_real(RCLASS(obj)->super))->klass;}}

returnklass;}

Lasttime,theifstatementwaswhollyskipped,butlookingonceagain,somethingisdoneonlyforT_CLASS,inotherwordsclasses.Thisclearlylooksimportant.Inrb_define_class_id(),asit’scalledlikethis:

rb_make_metaclass(klass,RBASIC(super)->klass);

Let’sexpandrb_make_metaclass()’sparametervariableswiththeactualvalues.

▼rb_make_metaclass(recompression)

rb_make_metaclass(klass,super_klass/*==RBASIC(super)->klass*/){sclass=createaclasswithsuper_classassuperclass;RBASIC(klass)->klass=sclass;RBASIC(sclass)->klass=sclass;returnsclass;}

Doingthisasadiagramgivessomethinglikefigure4.Init,thenamesbetweenparenthesesaresingletonclasses.ThisnotationisoftenusedinthisbooksoI’dlikeyoutorememberit.Thismeansthatobj‘ssingletonclassiswrittenas(obj).And(klass)isthesingletonclassforklass.Itlookslikethesingletonclassiscaughtbetweenaclassandthisclass’ssuperclass’sclass.

Figure4:Introductionofaclass’ssingletonclass

Byexpandingourimaginationfurtherfromthisresult,wecanthinkthatthesuperclass’sclass(thecinfigure4)mustagainbeasingletonclass.You’llunderstandwithonemoreinheritancelevel(figure5).

Figure5:Hierarchyofmulti-levelinheritance

Astherelationshipbetweensuperandklassisthesameastheonebetweenklassandklass2,cmustbethesingletonclass(super).Ifyoucontinuelikethis,finallyyou’llarriveattheconclusionthatObject‘sclassmustbe(Object).Andthat’sthecaseinpractice.Forexample,byinheritinglikeinthefollowingprogram:

classA<ObjectendclassB<Aend

internally,astructurelikefigure6iscreated.

Figure6:Classhierarchyandmetaclasses

Asclassesandtheirmetaclassesarelinkedandinheritlikethis,classmethodsareinherited.

ClassofaclassofaclassYou’veunderstoodtheworkingofclassmethodsinheritance,butbydoingthat,intheoppositesomequestionshaveappeared.Whatistheclassofaclass’ssingletonclass?Forthis,wecancheckitbyusingdebuggers.I’vemadefigure7fromtheresultsofthisinvestigation.

Figure7:Classofaclass’ssingletonclass

Aclass’ssingletonclassputsitselfasitsownclass.Quitecomplicated.

Thesecondquestion:theclassofObjectmustbeClass.Didn’tIproperlyconfirmthisinchapter1:Rubylanguageminimumbyusingclass()method?

p(Object.class())#Class

Certainly,that’sthecase“attheRubylevel”.But“attheClevel”,it’sthesingletonclass(Object).If(Object)doesnotappearattheRubylevel,it’sbecauseObject#classskipsthesingletonclasses.Let’slookatthebodyofthemethod,rb_obj_class()toconfirmthat.

▼rb_obj_class()

86VALUE87rb_obj_class(obj)88VALUEobj;89{90returnrb_class_real(CLASS_OF(obj));91}

76VALUE77rb_class_real(cl)78VALUEcl;79{80while(FL_TEST(cl,FL_SINGLETON)||TYPE(cl)==T_ICLASS){81cl=RCLASS(cl)->super;82}83returncl;84}

(object.c)

CLASS_OF(obj)returnsthebasic.klassofobj.Whileinrb_class_real(),allsingletonclassesareskipped(advancingtowardsthesuperclass).Inthefirstplace,singletonclassarecaughtbetweenaclassanditssuperclass,likeaproxy.That’swhywhena“real”classisnecessary,wehavetofollowthesuperclass

chain(figure8).

I_CLASSwillappearlaterwhenwewilltalkaboutinclude.

Figure8:Singletonclassandrealclass

SingletonclassandmetaclassWell,thesingletonclassesthatwereintroducedinclassesisalsoonetypeofclass,it’saclass’sclass.Soitcanbecalledmetaclass.

However,youshouldbewaryofthefactthatbeingasingletonclassdoesnotmeanbeingametaclass.Thesingletonclassesintroducedinclassesaremetaclasses.Theimportantfactisnotthattheyaresingletonclasses,butthattheyaretheclassesofclasses.IwasstuckonthispointwhenIstartedlearningRuby.AsImaynotbetheonlyone,Iwouldliketomakethisclear.

Thinkingaboutthis,therb_make_metaclass()functionnameisnotverygood.Whenusedforaclass,itdoesindeedcreateametaclass,butwhenusedforotherobjects,thecreatedclassisnotametaclass.

Thenfinally,evenifyouunderstoodthatsomeclassesare

metaclasses,it’snotasiftherewasanyconcretegain.I’dlikeyounottocaretoomuchaboutit.

BootstrapWehavenearlyfinishedourtalkaboutclassesandmetaclasses.Butthereisstilloneproblemleft.It’saboutthe3metaobjectsObject,ModuleandClass.These3cannotbecreatedwiththecommonuseAPI.Tomakeaclass,itsmetaclassmustbebuilt,butlikewesawsometimeago,themetaclass’ssuperclassisClass.However,asClasshasnotbeencreatedyet,themetaclasscannotbebuild.Soinruby,onlythese3classes’screationishandledspecially.

Thenlet’slookatthecode:

▼Object,ModuleandClasscreation

1243rb_cObject=boot_defclass("Object",0);1244rb_cModule=boot_defclass("Module",rb_cObject);1245rb_cClass=boot_defclass("Class",rb_cModule);12461247metaclass=rb_make_metaclass(rb_cObject,rb_cClass);1248metaclass=rb_make_metaclass(rb_cModule,metaclass);1249metaclass=rb_make_metaclass(rb_cClass,metaclass);

(object.c)

First,inthefirsthalf,boot_defclass()issimilartorb_class_boot(),itjustcreatesaclasswithitsgivensuperclassset.Theselinksgiveussomethingliketheleftpartoffigure9.

Andinthethreelinesofthesecondhalf,(Object),(Module)and(Class)arecreatedandset(rightfigure9).(Object)and(Module)‘sclasses…thatisthemselves…isalreadysetinrb_make_metaclass()sothereisnoproblem.Withthis,themetaobjects’bootstrapisfinished.

Figure9:Metaobjectscreation

Aftertakingeverythingintoaccount,itgivesusthefinalshapelikefigure10.

Figure10:Rubymetaobjects

Classnames

Inthissection,wewillanalysehow’sformedthereciprocalconversionbetweenclassandclassnames,inotherwordsconstants.Concretely,wewilltargetrb_define_class()andrb_define_class_under().

Name→classFirstwe’llreadrb_defined_class().Aftertheendofthisfunction,theclasscanbefoundfromtheconstant.

▼rb_define_class()

183VALUE184rb_define_class(name,super)185constchar*name;186VALUEsuper;187{188VALUEklass;189IDid;190191id=rb_intern(name);192if(rb_autoload_defined(id)){/*(A)autoload*/193rb_autoload_load(id);194}195if(rb_const_defined(rb_cObject,id)){/*(B)rb_const_defined*/196klass=rb_const_get(rb_cObject,id);/*(C)rb_const_get*/197if(TYPE(klass)!=T_CLASS){198rb_raise(rb_eTypeError,"%sisnotaclass",name);199}/*(D)rb_class_real*/200if(rb_class_real(RCLASS(klass)->super)!=super){201rb_name_error(id,"%sisalreadydefined",name);202}203returnklass;204}205if(!super){206rb_warn("nosuperclassfor'%s',Objectassumed",name);207}208klass=rb_define_class_id(id,super);209rb_class_inherited(super,klass);210st_add_direct(rb_class_tbl,id,klass);211212returnklass;213}

(class.c)

Thiscanbeclearlydividedintothetwoparts:beforeandafterrb_define_class_id().Theformeristoacquireorcreatetheclass.Thelatteristoassignittotheconstant.Wewilllookatitinmoredetailbelow.

(A)InRuby,thereisafeaturenamedautoloadthatautomaticallyloadslibrarieswhencertainconstantsareaccessed.Thesefunctionsnamedrb_autoload_xxxx()areforitschecks.Youcanignoreitwithoutanyproblem.

(B)WedeterminewhetherthenameconstanthasbeendefinedornotinObject.

(C)Getthevalueofthenameconstant.Thiswillbeexplainedindetailinchapter6.

(D)We’veseenrb_class_real()sometimeago.IftheclasscisasingletonclassoranICLASS,itclimbsthesuperhierarchyuptoaclassthatisnotandreturnsit.Inshort,thisfunctionskipsthevirtualclassesthatshouldnotappearattheRubylevel.

That’swhatwecanreadnearby.

Asconstantsareinvolvedaroundthis,itisverytroublesome.ButIfeellikethechapteraboutconstantsisprobablynotsorightplacetotalkaboutclassdefinition,that’sthereasonofsuchhalfwaydescriptionaroundhere.

Moreover,aboutthiscomingafterrb_define_class_id(),

st_add_direct(rb_class_tbl,id,klass);

Thispartassignstheclasstotheconstant.However,whicheverwayyoulookatityoudonotseethat.Infact,top-levelclassesand

modulesthataredefinedinCareseparatedfromtheotherconstantsandregroupedinrb_class_tbl().ThesplitisslightlyrelatedtotheGC.It’snotessential.

Class→nameWeunderstoodhowtheclasscanbeobtainedfromtheclassname,buthowtodotheopposite?BydoingthingslikecallingporClass#name,wecangetthenameoftheclass,buthowisitimplemented?

Infactthisisdonebyrb_name_class()whichalreadyappearedalongtimeago.Thecallisaroundthefollowing:

rb_define_classrb_define_class_idrb_name_class

Let’slookatitscontent:

▼rb_name_class()

269void270rb_name_class(klass,id)271VALUEklass;272IDid;273{274rb_iv_set(klass,"__classid__",ID2SYM(id));275}

(variable.c)

__classid__isanotherinstancevariablethatcan’tbeseenfromRuby.AsonlyVALUEscanbeputintheinstancevariabletable,theIDisconvertedtoSymbolusingID2SYM().

That’showweareabletofindtheconstantnamefromtheclass.

NestedclassesSo,inthecaseofclassesdefinedatthetop-level,weknowhowworksthereciprocallinkbetweennameandclass.What’sleftisthecaseofclassesdefinedinmodulesorotherclasses,andforthatit’salittlemorecomplicated.Thefunctiontodefinethesenestedclassesisrb_define_class_under().

▼rb_define_class_under()

215VALUE216rb_define_class_under(outer,name,super)217VALUEouter;218constchar*name;219VALUEsuper;220{221VALUEklass;222IDid;223224id=rb_intern(name);225if(rb_const_defined_at(outer,id)){226klass=rb_const_get(outer,id);227if(TYPE(klass)!=T_CLASS){228rb_raise(rb_eTypeError,"%sisnotaclass",name);229}230if(rb_class_real(RCLASS(klass)->super)!=super){231rb_name_error(id,"%sisalreadydefined",name);232}233returnklass;

234}235if(!super){236rb_warn("nosuperclassfor'%s::%s',Objectassumed",237rb_class2name(outer),name);238}239klass=rb_define_class_id(id,super);240rb_set_class_path(klass,outer,name);241rb_class_inherited(super,klass);242rb_const_set(outer,id,klass);243244returnklass;245}

(class.c)

Thestructureisliketheoneofrb_define_class():beforethecalltorb_define_class_id()istheredefinitioncheck,afteristhecreationofthereciprocallinkbetweenconstantandclass.Thefirsthalfisprettyboringlysimilartorb_define_class()sowe’llskipit.Inthesecondhalf,rb_set_class_path()isnew.We’regoingtolookatit.

rb_set_class_path()

Thisfunctiongivesthenamenametotheclassklassnestedintheclassunder.“classpath”meansaconstantnameincludingallthenestinginformationstartingfromtop-level,forexample“Net::NetPrivate::Socket”.

▼rb_set_class_path()

210void211rb_set_class_path(klass,under,name)212VALUEklass,under;213constchar*name;

214{215VALUEstr;216217if(under==rb_cObject){/*definedattop-level*/218str=rb_str_new2(name);/*createaRubystringfromname*/219}220else{/*nestedconstant*/221str=rb_str_dup(rb_class_path(under));/*copythereturnvalue*/222rb_str_cat2(str,"::");/*concatenate"::"*/223rb_str_cat2(str,name);/*concatenatename*/224}225rb_iv_set(klass,"__classpath__",str);226}

(variable.c)

Everythingexceptthelastlineistheconstructionoftheclasspath,andthelastlinemakestheclassrememberitsownname.__classpath__isofcourseanotherinstancevariablethatcan’tbeseenfromaRubyprogram.Inrb_name_class()therewas__classid__,butidisdifferentbecauseitdoesnotincludenestinginformation(lookatthetablebelow).

__classpath__Net::NetPrivate::Socket__classid__Socket

Itmeansclassesdefinedforexampleinrb_defined_class()allhave__classid__or__classpath__defined.Sotofindunder‘sclasspathwecanlookupintheseinstancevariables.Thisisdonebyrb_class_path().We’llomititscontent.

Namelessclasses

ContrarytowhatIhavejustsaid,thereareinfactcasesinwhichneither__classpath__nor__classid__areset.ThatisbecauseinRubyyoucanuseamethodlikethefollowingtocreateaclass.

c=Class.new()

Ifaclassiscreatedlikethis,itwon’tgothroughrb_define_class_id()andtheclasspathwon’tbeset.Inthiscase,cdoesnothaveanyname,whichistosaywegetanunnamedclass.

However,iflaterit’sassignedtoaconstant,anamewillbeattachedtotheclassatthatmoment.

SomeClass=c#theclassnameisSomeClass

Strictlyspeaking,atthefirsttimerequestingthenameafterassigningittoaconstant,thenamewillbeattachedtotheclass.Forinstance,whencallingponthisSomeClassclassorwhencallingtheClass#namemethod.Whendoingthis,avalueequaltotheclassissearchedinrb_class_tbl,andanamehastobechosen.Thefollowingcasecanalsohappen:

classAclassBC=tmp=Class.new()p(tmp)#herewesearchforthenameendend

sointheworstcasewehavetosearchforthewholeconstant

space.However,generally,therearen’tmanyconstantssoevensearchingallconstantsdoesnottaketoomuchtime.

Include

Weonlytalkedaboutclassessolet’sfinishthischapterwithsomethingelseandtalkaboutmoduleinclusion.

rb_include_module(1)IncludesaredonebytheordinarymethodModule#include.ItscorrespondingfunctioninCisrb_include_module().Infact,tobeprecise,itsbodyisrb_mod_include(),andthereModule#append_featureiscalled,andthisfunction’sdefaultimplementationfinallycallsrb_include_module().Mixingwhat’shappeninginRubyandCgivesusthefollowingcallgraph.

Module#include(rb_mod_include)Module#append_features(rb_mod_append_features)rb_include_module

Anyway,themanipulationsthatareusuallyregardedasinclusionsaredonebyrb_include_module().Thisfunctionisalittlelongsowe’lllookatitahalfatatime.

▼rb_include_module(firsthalf)

/*includemoduleinclass*/347void348rb_include_module(klass,module)349VALUEklass,module;350{351VALUEp,c;352intchanged=0;353354rb_frozen_class_p(klass);355if(!OBJ_TAINTED(klass)){356rb_secure(4);357}358359if(NIL_P(module))return;360if(klass==module)return;361362switch(TYPE(module)){363caseT_MODULE:364caseT_CLASS:365caseT_ICLASS:366break;367default:368Check_Type(module,T_MODULE);369}

(class.c)

Forthemomentit’sonlysecurityandtypechecking,thereforewecanignoreit.Theprocessitselfisbelow:

▼rb_include_module(secondhalf)

371OBJ_INFECT(klass,module);372c=klass;373while(module){374intsuperclass_seen=Qfalse;375376if(RCLASS(klass)->m_tbl==RCLASS(module)->m_tbl)377rb_raise(rb_eArgError,"cyclicincludedetected");378/*(A)skipifthesuperclassalreadyincludesmodule*/

379for(p=RCLASS(klass)->super;p;p=RCLASS(p)->super){380switch(BUILTIN_TYPE(p)){381caseT_ICLASS:382if(RCLASS(p)->m_tbl==RCLASS(module)->m_tbl){383if(!superclass_seen){384c=p;/*movetheinsertionpoint*/385}386gotoskip;387}388break;389caseT_CLASS:390superclass_seen=Qtrue;391break;392}393}394c=RCLASS(c)->super=include_class_new(module,RCLASS(c)->super);395changed=1;396skip:397module=RCLASS(module)->super;398}399if(changed)rb_clear_cache();400}

(class.c)

First,whatthe(A)blockdoesiswritteninthecomment.Itseemstobeaspecialconditionsolet’sfirstskipreadingitfornow.Byextractingtheimportantpartsfromtherestwegetthefollowing:

c=klass;while(module){c=RCLASS(c)->super=include_class_new(module,RCLASS(c)->super);module=RCLASS(module)->super;}

Inotherwords,it’sarepetitionofmodule‘ssuper.Whatisinmodule’ssupermustbeamoduleincludedbymodule(becauseourintuition

tellsusso).Thenthesuperclassoftheclasswheretheinclusionoccursisreplacedwithsomething.Wedonotunderstandmuchwhat,butatthemomentIsawthatIfelt“Ah,doesn’tthislooktheadditionofelementstoalist(likeLISP’scons)?”anditsuddenlymakethestoryfaster.Inotherwordsit’sthefollowingform:

list=new(item,list)

Thinkingaboutthis,itseemswecanexpectthatmoduleisinsertedbetweencandc->super.Ifit’slikethis,itfitsmodule’sspecification.

Buttobesureofthiswehavetolookatinclude_class_new().

include_class_new()

▼include_class_new()

319staticVALUE320include_class_new(module,super)321VALUEmodule,super;322{323NEWOBJ(klass,structRClass);/*(A)*/324OBJSETUP(klass,rb_cClass,T_ICLASS);325326if(BUILTIN_TYPE(module)==T_ICLASS){327module=RBASIC(module)->klass;328}329if(!RCLASS(module)->iv_tbl){330RCLASS(module)->iv_tbl=st_init_numtable();331}332klass->iv_tbl=RCLASS(module)->iv_tbl;/*(B)*/333klass->m_tbl=RCLASS(module)->m_tbl;334klass->super=super;/*(C)*/

335if(TYPE(module)==T_ICLASS){/*(D)*/336RBASIC(klass)->klass=RBASIC(module)->klass;/*(D-1)*/337}338else{339RBASIC(klass)->klass=module;/*(D-2)*/340}341OBJ_INFECT(klass,module);342OBJ_INFECT(klass,super);343344return(VALUE)klass;345}

(class.c)

We’reluckythere’snothingwedonotknow.

(A)Firstcreateanewclass.

(B)Transplantmodule’sinstancevariableandmethodtablesintothisclass.

(C)Maketheincludingclass’ssuperclass(super)thesuperclassofthisnewclass.

Inotherwords,itlookslikethisfunctioncreatesanincludeclasswhichwecanregarditassomethinglikean“avatar”ofthemodule.Theimportantpointisthatat(B)onlythepointerismovedon,withoutduplicatingthetable.Later,ifamethodisadded,themodule’sbodyandtheincludeclasswillstillhaveexactlythesamemethods(figure11).

Figure11:Includeclass

Ifyoulookcloselyat(A),thestructuretypeflagissettoT_ICLASS.Thisseemstobethemarkofanincludeclass.Thisfunction’snameisinclude_class_new()soICLASS’sImustbeinclude.

Andifyouthinkaboutjoiningwhatthisfunctionandrb_include_module()do,weknowthatourpreviousexpectationswerenotwrong.Inbrief,includingisinsertingtheincludeclassofamodulebetweenaclassanditssuperclass(figure12).

Figure12:Include

At(D-2)themoduleisstoredintheincludeclass’sklass.At(D-1),themodule’sbodyistakenout…I’dliketosaysoifpossible,butinfactthischeckdoesnothaveanyuse.TheT_ICLASScheckisalreadydoneatthebeginningofthisfunction,sowhenarrivingheretherecan’tstillbeaT_ICLASS.Modificationtorubypiledupatpiecebypieceduringquitealongperiodoftimesotherearequiteafewsmalloverlooks.

Thereisonemorethingtoconsider.Somehowtheincludeclass’sbasic.klassisonlyusedtopointtothemodule’sbody,soforexamplecallingamethodontheincludeclasswouldbeverybad.SoincludeclassesmustnotbeseenfromRubyprograms.Andinpracticeallmethodsskipincludeclasses,withnoexception.

SimulationItwascomplicatedsolet’slookataconcreteexample.I’dlikeyoutolookatfigure13(1).Wehavethec1classandthem1modulethatincludesm2.Fromthere,thechangesmadetoincludem1inc1are(2)and(3).imsareofcourseincludeclasses.

Figure13:Include

rb_include_module(2)Well,nowwecanexplainthepartofrb_include_module()weskipped.

▼rb_include_module(avoidingdoubleinclusion)

378/*(A)skipifthesuperclassalreadyincludesmodule*/379for(p=RCLASS(klass)->super;p;p=RCLASS(p)->super){380switch(BUILTIN_TYPE(p)){381caseT_ICLASS:382if(RCLASS(p)->m_tbl==RCLASS(module)->m_tbl){383if(!superclass_seen){384c=p;/*theinsertingpointismoved*/385}386gotoskip;387}388break;389caseT_CLASS:390superclass_seen=Qtrue;391break;392}393}

(class.c)

Amongthesuperclassesoftheklass(p),ifapisT_ICLASS(anincludeclass)andhasthesamemethodtableastheoneofthemodulewewanttoinclude(module),itmeansthatthepisanincludeclassofthemodule.Therefore,itwouldbeskippedtonotincludethemoduletwice.However,ifthismoduleincludesanothermodule(module->super),Itwouldbecheckedoncemore.

But,becausepisamodulethathasbeenincludedonce,themodulesincludedbyitmustalsoalreadybeincluded…that’swhatIthoughtforamoment,butwecanhavethefollowingcontext:

moduleMendmoduleM2

endclassCincludeM#M2isnotyetincludedinMend#thereforeM2isnotinC'ssuperclasses

moduleMincludeM2#asthereM2isincludedinM,endclassCincludeM#IwouldlikeheretoonlyaddM2end

Tosaythisconversely,therearecasesthataresultofincludeisnotpropagatedsoon.

Forclassinheritance,theclass’ssingletonmethodswereinheritedbutinthecaseofmodulethereisnosuchthing.Thereforethesingletonmethodsofthemodulearenotinheritedbytheincludingclass(ormodule).Whenyouwanttoalsoinheritsingletonmethods,theusualwayistooverrideModule#append_features.

TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera

CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License

RubyHackingGuide

TranslatedbySebastianKrause&ocha-

Chapter5:Garbage

Collection

Aconceptionofanexecutingprogram

It’sallofasuddenbutatthebeginningofthischapter,we’lllearnaboutthememoryspaceofanexecutingprogram.Inthischapterwe’llstepinsidethelowerlevelpartsofacomputerquiteabit,sowithoutpreliminaryknowledgeit’llbehardtofollow.Andit’llbealsonecessaryforthefollowingchapters.Oncewefinishthishere,therestwillbeeasier.

MemorySegmentsAgeneralCprogramhasthefollowingpartsinthememoryspace:

1. thetextarea2. aplaceforstaticandglobalvariables3. themachinestack4. theheap

Thetextareaiswherethecodelies.Obviouslythesecondarea

holdsstaticandglobalvariables.Argumentsandlocalvariablesoffunctionsarepilingupinthemachinestack.Theheapistheplacewhereallocatedbymalloc().

Let’stalkabitmoreaboutnumberthree,themachinestack.Sinceitiscalledthemachine“stack”,obviouslyithasastackstructure.Inotherwords,newstuffispiledontopofitoneafteranother.Whenweactuallypushesvaluesonthestack,eachvaluewouldbeatinypiecesuchasint.Butlogically,therearealittlelargerpieces.Theyarecalledstackframes.

Onestackframecorrespondstoonefunctioncall.Orinotherwordswhenthereisafunctioncall,onestackframeispushed.Whendoingreturn,onestackframewillbepopped.Figure1showsthereallysimplifiedappearanceofthemachinestack.

Figure1:MachineStack

Inthispicture,“above”iswrittenabovethetopofthestack,butthisitisnotnecessarilyalwaysthecasethatthemachinestack

goesfromlowaddressestohighaddresses.Forinstance,onthex86machinethestackgoesfromhightolowaddresses.

alloca()

Byusingmalloc(),wecangetanarbitrarilylargememoryareaoftheheap.alloca()isthemachinestackversionofit.Butunlikemalloc()it’snotnecessarytofreethememoryallocatedwithalloca().Oroneshouldsay:itisfreedautomaticallyatthesamemomentofreturnofeachfunction.That’swhyit’snotpossibletouseanallocatedvalueasthereturnvalue.It’sthesameas“Youmustnotreturnthepointertoalocalvariable.”

There’sbeennotanydifficulty.Wecanconsideritsomethingtolocallyallocateanarraywhosesizecanbechangedatruntime.

Howeverthereexistenvironmentswherethereisnonativealloca().Therearestillmanywhowouldliketousealloca()evenifinsuchenvironment,sometimesafunctiontodothesamethingiswritteninC.Butinthatcase,onlythefeaturethatwedon’thavetofreeitbyourselvesisimplementedanditdoesnotnecessarilyallocatethememoryonthemachinestack.Infact,itoftendoesnot.Ifitwerepossible,anativealloca()couldhavebeenimplementedinthefirstplace.

Howcanoneimplementalloca()inC?Thesimplestimplementationis:firstallocatememorynormallywithmalloc().Thenrememberthepairofthefunctionwhichcalledalloca()and

theassignedaddressesinagloballist.Afterthat,checkthislistwheneveralloca()iscalled,iftherearethememoriesallocatedforthefunctionsalreadyfinished,freethembyusingfree().

Figure2:Thebehaviorofanalloca()implementedinC

Themissing/alloca.cofrubyisanexampleofanemulatedalloca().

Overview

Fromhereonwecanatlasttalkaboutthemainsubjectofthischapter:garbagecollection.

WhatisGC?

Objectsarenormallyontopofthememory.Naturally,ifalotofobjectsarecreated,alotofmemoryisused.Ifmemorywereinfinitetherewouldbenoproblem,butinrealitythereisalwaysamemorylimit.That’swhythememorywhichisnotusedanymoremustbecollectedandrecycled.Moreconcretelythememoryreceivedthroughmalloc()mustbereturnedwithfree().

However,itwouldrequirealotofeffortsifthemanagementofmalloc()andfree()wereentirelylefttoprogrammers.Especiallyinobjectorientedprograms,becauseobjectsarereferringeachother,itisdifficulttotellwhentoreleasememory.

Theregarbagecollectioncomesin.GarbageCollection(GC)isafeaturetoautomaticallydetectandfreethememorywhichhasbecomeunnecessary.Withgarbagecollection,theworry“WhenshouldIhavetofree()??”hasbecomeunnecessary.Betweenwhenitexistsandwhenitdoesnotexist,theeaseofwritingprogramsdiffersconsiderably.

Bytheway,inabookaboutsomethingthatI’veread,there’sadescription“thethingtotidyupthefragmentedusablememoryisGC”.Thistaskiscalled“compaction”.Itiscompactionbecauseitmakesathingcompact.Becausecompactionmakesmemorycachemoreoftenhit,ithaseffectsforspeed-uptosomeextent,butitisnotthemainpurposeofGC.ThepurposeofGCistocollectmemory.TherearemanyGCswhichcollectmemoriesbutdon’tdocompaction.TheGCofrubyalsodoesnotdocompaction.

Then,inwhatkindofsystemisGCavailable?InCandC++,there’sBoehmGC\footnote{BoehmGChttp://www.hpl.hp.com/personal/Hans_Boehm/gc}whichcanbeusedasanadd-on.And,fortherecentlanguagessuchasJavaandPerl,Python,C#,Eiffel,GCisastandardequipment.Andofcourse,RubyhasitsGC.Let’sfollowthedetailsofruby’sGCinthischapter.Thetargetfileisgc.c.

WhatdoesGCdo?BeforeexplainingtheGCalgorithm,Ishouldexplain“whatgarbagecollectionis”.Inotherwords,whatkindofstateofthememoryis“theunnecessarymemory”?

Tomakedescriptionsmoreconcrete,let’ssimplifythestructurebyassumingthatthereareonlyobjectsandlinks.ThiswouldlookasshowninFigure3.

Figure3:Objects

Theobjectspointedtobyglobalvariablesandtheobjectsonthe

stackofalanguagearesurelynecessary.Andobjectspointedtobyinstancevariablesoftheseobjectsarealsonecessary.Furthermore,theobjectsthatarereachablebyfollowinglinksfromtheseobjectsarealsonecessary.

Toputitmorelogically,thenecessaryobjectsareallobjectswhichcanbereachedrecursivelyvialinksfromthe“surelynecessaryobjects”asthestartpoints.Thisisdepictedinfigure4.Whatareontheleftofthelineareall“surelynecessaryobjects”,andtheobjectswhichcanbereachedfromthemarecoloredblack.Theseobjectscoloredblackarethenecessaryobjects.Therestoftheobjectscanbereleased.

Figure4:necessaryobjectsandunnecessaryobjects

Intechnicalterms,“thesurelynecessaryobjects”arecalled“therootsofGC”.That’sbecausetheyaretherootsoftreestructuresthatemergesasaconsequenceoftracingnecessaryobjects.

MarkandSweepGCwasfirstimplementedinLisp.TheGCimplementedinLispatfirst,itmeanstheworld’sfirstGC,iscalledmark&sweepGC.TheGCofrubyisonetypeofit.

TheimageofMark-and-SweepGCisprettyclosetoourdefinitionof“necessaryobject”.First,put“marks”ontherootobjects.Settingthemasthestartpoints,put“marks”onallreachableobjects.Thisisthemarkphase.

Atthemomentwhenthere’snotanyreachableobjectleft,checkallobjectsintheobjectpool,release(sweep)allobjectsthathavenotmarked.“Sweep”isthe“sweep”ofMinesweeper.

Therearetwoadvantages.

Theredoesnotneedtobeany(oralmostany)concernforgarbagecollectionoutsidetheimplementationofGC.Cyclescanalsobereleased.(Asforcycles,seealsothesectionof“ReferenceCount”)

Therearealsotwodisadvantages.

Inordertosweepeveryobjectmustbetouchedatleastonce.

TheloadoftheGCisconcentratedatonepoint.

Whenusingtheemacseditor,theresometimesappears"Garbagecollecting..."anditcompletelystopsreacting.Thatisanexampleoftheseconddisadvantage.Butthispointcanbealleviatedbymodifyingthealgorithm(itiscalledincrementalGC).

StopandCopyStopandCopyisavariationofMarkandSweep.First,prepareseveralobjectareas.Tosimplifythisdescription,assumetherearetwoareasAandBhere.Andputan“active”markontheoneoftheareas.Whencreatinganobject,createitonlyinthe“active”one.(Figure5)

Figure5:StopandCopy(1)

WhentheGCstarts,followlinksfromtherootsinthesamemannerasmark-and-sweep.However,moveobjectstoanotherareainsteadofmarkingthem(Figure6).Whenallthelinkshavebeenfollowed,discardtheallelementswhichremaininA,andmakeBactivenext.

Figure6:StopandCopy(2)

StopandCopyalsohastwoadvantages:

CompactionhappensatthesametimeascollectingthememorySinceobjectsthatreferenceeachothermoveclosertogether,there’smorepossibilityofhittingthecache.

Andalsotwodisadvantages:

TheobjectareaneedstobemorethantwiceasbigThepositionsofobjectswillbechanged

Itseemswhatexistinthisworldarenotonlypositivethings.

ReferencecountingReferencecountingdiffersabitfromtheaforementionedGCs,thereach-checkcodeisdistributedinseveralplaces.

First,attachanintegercounttoeachelement.Whenreferringviavariablesorarrays,thecounterofthereferencedobjectisincreased.Whenquittingtorefer,decreasethecounter.Whenthecounterofanobjectbecomeszero,releasetheobject.Thisisthe

methodcalledreferencecounting(Figure7).

Figure7:Referencecounting

Thismethodalsohastwoadvantages:

TheloadofGCisdistributedovertheentireprogram.Theobjectthatbecomesunnecessaryisimmediatelyfreed.

Andalsotwodisadvantages.

Thecounterhandlingtendstobeforgotten.Whendoingitnaivelycyclesarenotreleased.

I’llexplainaboutthesecondpointjustincase.AcycleisacycleofreferencesasshowninFigure8.Ifthisisthecasethecounterswillneverdecreaseandtheobjectswillneverbereleased.

Figure8:Cycle

Bytheway,latestPython(2.2)usesreferencecountingGCbutitcanfreecycles.However,itisnotbecauseofthereferencecountingitself,butbecauseitsometimesinvokesmarkandsweepGCtocheck.

ObjectManagement

Ruby’sgarbagecollectionisonlyconcernedwithrubyobjects.Moreover,itonlyconcernedwiththeobjectscreatedandmanagedbyruby.Converselyspeaking,ifthememoryisallocatedwithoutfollowingacertainprocedure,itwon’tbetakencareof.Forinstance,thefollowingfunctionwillcauseamemoryleakevenifrubyisrunning.

voidnot_ok(){malloc(1024);/*receivememoryanddiscardit*/}

However,thefollowingfunctiondoesnotcauseamemoryleak.

voidthis_is_ok()

{rb_ary_new();/*createarubyarrayanddiscardit*/}

Sincerb_ary_new()usesRuby’sproperinterfacetoallocatememory,thecreatedobjectisunderthemanagementoftheGCofruby,thusrubywilltakecareofit.

structRVALUE

Sincethesubstanceofanobjectisastruct,managingobjectsmeansmanagingthatstructs.Ofcoursethenon-pointerobjectslikeFixnumSymbolniltruefalseareexceptions,butIwon’talwaysdescribeaboutittopreventdescriptionsfrombeingredundant.

Eachstructtypehasitsdifferentsize,butprobablyinordertokeepmanagementsimpler,aunionofallthestructsofbuilt-inclassesisdeclaredandtheunionisalwaysusedwhendealingwithmemory.Thedeclarationofthatunionisasfollows.

▼RVALUE

211typedefstructRVALUE{212union{213struct{214unsignedlongflags;/*0ifnotused*/215structRVALUE*next;216}free;217structRBasicbasic;218structRObjectobject;219structRClassklass;220structRFloatflonum;221structRStringstring;

222structRArrayarray;223structRRegexpregexp;224structRHashhash;225structRDatadata;226structRStructrstruct;227structRBignumbignum;228structRFilefile;229structRNodenode;230structRMatchmatch;231structRVarmapvarmap;232structSCOPEscope;233}as;234}RVALUE;

(gc.c)

structRVALUEisastructthathasonlyoneelement.I’veheardthatthereasonwhyunionisnotdirectlyusedistoenabletoeasilyincreaseitsmemberswhendebuggingorwhenextendinginthefuture.

First,let’sfocusonthefirstelementoftheunionfree.flags.Thecommentsays“0ifnotused”,butisittrue?Istherenotanypossibilityforfree.flagstobe0bychance?

Aswe’veseeninChapter2:Objects,allobjectstructshavestructRBasicasitsfirstelement.Therefore,bywhicheverelementoftheunionweaccess,obj->as.free.flagsmeansthesameasitiswrittenasobj->as.basic.flags.Andobjectsalwayshavethestruct-typeflag(suchasT_STRING),andtheflagisalwaysnot0.Therefore,theflagofan“alive”objectwillnevercoincidentallybe0.Hence,wecanconfirmthatsettingtheirflagsto0isnecessityandsufficiencytorepresent“dead”objects.

ObjectheapThememoryforalltheobjectstructshasbeenbroughttogetheringlobalvariableheaps.Hereafter,let’scallthisanobjectheap.

▼Objectheap

239#defineHEAPS_INCREMENT10240staticRVALUE**heaps;241staticintheaps_length=0;242staticintheaps_used=0;243244#defineHEAP_MIN_SLOTS10000245staticint*heaps_limits;246staticintheap_slots=HEAP_MIN_SLOTS;

(gc.c)

heapsisanarrayofarraysofstructRVALUE.SinceitisheapS,theeachcontainedarrayisprobablyeachheap.Eachelementofheapiseachslot(Figure9).

Figure9:heaps,heap,slot

Thelengthofheapsisheap_lengthanditcanbechanged.Thenumberoftheslotsactuallyinuseisheaps_used.Thelengthofeachheapisinthecorrespondingheaps_limits[index].Figure10showsthestructureoftheobjectheap.

Figure10:conceptualdiagramofheapsinmemory

Thisstructurehasanecessitytobethisway.Forinstance,ifallstructsarestoredinanarray,thememoryspacewouldbethemostcompact,butwecannotdorealloc()becauseitcouldchangetheaddresses.ThisisbecauseVALUEsaremerepointers.

InthecaseofanimplementationofJava,thecounterpartofVALUEsarenotaddressesbuttheindexesofobjects.Sincetheyarehandledthroughapointertable,objectsaremovable.Howeverinthiscase,indexingofthearraycomesineverytimeanobjectaccessoccurs

anditlowerstheperformanceinsomedegree.

Ontheotherhand,whathappensifitisanone-dimensionalarrayofpointerstoRVALUEs(itmeansVALUEs)?Thisseemstobeabletogowellatthefirstglance,butitdoesnotwhenGC.Thatis,asI’lldescribeindetail,theGCofrubyneedstoknowtheintegers"whichseemsVALUE(thepointerstoRVALUE).IfallRVALUEareallocatedinaddresseswhicharefarfromeachother,itneedstocomparealladdressofRVALUEwithallintegers“whichcouldbepointers”.ThismeansthetimeforGCbecomestheordermorethanO(n^2),andnotacceptable.

Accordingtotheserequirements,itisgoodthattheobjectheapformastructurethattheaddressesarecohesivetosomeextentandwhosepositionandtotalamountarenotrestrictedatthesametime.

freelist

UnusedRVALUEsaremanagedbybeinglinkedasasinglelinewhichisalinkedlistthatstartswithfreelist.Theas.free.nextofRVALUEisthelinkusedforthispurpose.

▼freelist

236staticRVALUE*freelist=0;

(gc.c)

add_heap()

Asweunderstoodthedatastructure,let’sreadthefunctionadd_heap()toaddaheap.Becausethisfunctioncontainsalotoflinesnotpartofthemainline,I’llshowtheonesimplifiedbyomittingerrorhandlingsandcastings.

▼add_heap()(simplified)

staticvoidadd_heap(){RVALUE*p,*pend;

/*extendheapsifnecessary*/if(heaps_used==heaps_length){heaps_length+=HEAPS_INCREMENT;heaps=realloc(heaps,heaps_length*sizeof(RVALUE*));heaps_limits=realloc(heaps_limits,heaps_length*sizeof(int));}

/*increaseheapsby1*/p=heaps[heaps_used]=malloc(sizeof(RVALUE)*heap_slots);heaps_limits[heaps_used]=heap_slots;pend=p+heap_slots;if(lomem==0||lomem>p)lomem=p;if(himem<pend)himem=pend;heaps_used++;heap_slots*=1.8;

/*linktheallocatedRVALUEtofreelist*/while(p<pend){p->as.free.flags=0;p->as.free.next=freelist;freelist=p;p++;}}

Pleasecheckthefollowingpoints.

thelengthofheapisheap_slotstheheap_slotsbecomes1.8timeslargereverytimewhenaheapisaddedthelengthofheaps[i](thevalueofheap_slotswhencreatingaheap)isstoredinheaps_limits[i].

Plus,sincelomemandhimemaremodifiedonlybythisfunction,onlybythisfunctionyoucanunderstandthemechanism.Thesevariablesholdthelowestandthehighestaddressesoftheobjectheap.Thesevaluesareusedlaterwhendeterminingtheintegers“whichseemsVALUE”.

rb_newobj()

Consideringalloftheabovepoints,wecantellthewaytocreateanobjectinasecond.IfthereisatleastaRVALUElinkedfromfreelist,wecanuseit.Otherwise,doGCorincreasetheheaps.Let’sconfirmthisbyreadingtherb_newobj()functiontocreateanobject.

▼rb_newobj()

297VALUE298rb_newobj()299{300VALUEobj;301302if(!freelist)rb_gc();303304obj=(VALUE)freelist;

305freelist=freelist->as.free.next;306MEMZERO((void*)obj,RVALUE,1);307returnobj;308}

(gc.c)

Iffreelestis0,inotherwords,ifthere’snotanyunusedstructs,invokeGCandcreatespaces.Evenifwecouldnotcollectnotanyobject,there’snoproblembecauseinthiscaseanewspaceisallocatedinrb_gc().Andtakeastructfromfreelist,zerofillitbyMEMZERO(),andreturnit.

Mark

Asdescribed,ruby’sGCisMark&Sweep.Its“mark”is,concretelyspeaking,tosetaFL_MARKflag:lookforunusedVALUE,setFL_MARKflagstofoundones,thenlookattheobjectheapafterinvestigatingallandfreeobjectsthatFL_MARKhasnotbeenset.

rb_gc_mark()

rb_gc_mark()isthefunctiontomarkobjectsrecursively.

▼rb_gc_mark()

573void574rb_gc_mark(ptr)

575VALUEptr;576{577intret;578registerRVALUE*obj=RANY(ptr);579580if(rb_special_const_p(ptr))return;/*specialconstnotmarked*/581if(obj->as.basic.flags==0)return;/*freecell*/582if(obj->as.basic.flags&FL_MARK)return;/*alreadymarked*/583584obj->as.basic.flags|=FL_MARK;585586CHECK_STACK(ret);587if(ret){588if(!mark_stack_overflow){589if(mark_stack_ptr-mark_stack<MARK_STACK_MAX){590*mark_stack_ptr=ptr;591mark_stack_ptr++;592}593else{594mark_stack_overflow=1;595}596}597}598else{599rb_gc_mark_children(ptr);600}601}

(gc.c)

ThedefinitionofRANY()isasfollows.Itisnotparticularlyimportant.

▼RANY()

295#defineRANY(o)((RVALUE*)(o))

(gc.c)

Therearethechecksfornon-pointersoralreadyfreedobjectsandtherecursivechecksformarkedobjectsatthebeginning,

obj->as.basic.flags|=FL_MARK;

andobj(thisistheptrparameterofthisfunction)ismarked.Thennext,it’stheturntofollowthereferencesfromobjandmark.rb_gc_mark_children()doesit.

Theothers,whatstartswithCHECK_STACK()andiswrittenalotisadevicetopreventthemachinestackoverflow.Sincerb_gc_mark()usesrecursivecallstomarkobjects,ifthereisabigobjectcluster,itispossibletorunshortofthelengthofthemachinestack.Tocounterthat,ifthemachinestackisnearlyoverflow,itstopstherecursivecalls,pilesuptheobjectsonagloballist,andlateritmarksthemonceagain.Thiscodeisomittedbecauseitisnotpartofthemainline.

rb_gc_mark_children()

Now,asforrb_gc_mark_children(),itjustlistsuptheinternaltypesandmarksonebyone,thusitisnotjustlongbutalsonotinteresting.Here,itisshownbutthesimpleenumerationsareomitted:

▼rb_gc_mark_children()

603void604rb_gc_mark_children(ptr)

605VALUEptr;606{607registerRVALUE*obj=RANY(ptr);608609if(FL_TEST(obj,FL_EXIVAR)){610rb_mark_generic_ivar((VALUE)obj);611}612613switch(obj->as.basic.flags&T_MASK){614caseT_NIL:615caseT_FIXNUM:616rb_bug("rb_gc_mark()calledforbrokenobject");617break;618619caseT_NODE:620mark_source_filename(obj->as.node.nd_file);621switch(nd_type(obj)){622caseNODE_IF:/*1,2,3*/623caseNODE_FOR:624caseNODE_ITER:/*…………omitted…………*/749}750return;/*notneedtomarkbasic.klass*/751}752753rb_gc_mark(obj->as.basic.klass);754switch(obj->as.basic.flags&T_MASK){755caseT_ICLASS:756caseT_CLASS:757caseT_MODULE:758rb_gc_mark(obj->as.klass.super);759rb_mark_tbl(obj->as.klass.m_tbl);760rb_mark_tbl(obj->as.klass.iv_tbl);761break;762763caseT_ARRAY:764if(FL_TEST(obj,ELTS_SHARED)){765rb_gc_mark(obj->as.array.aux.shared);766}767else{768longi,len=obj->as.array.len;769VALUE*ptr=obj->as.array.ptr;770

771for(i=0;i<len;i++){772rb_gc_mark(*ptr++);773}774}775break;

/*…………omitted…………*/

837default:838rb_bug("rb_gc_mark():unknowndatatype0x%x(0x%x)%s",839obj->as.basic.flags&T_MASK,obj,840is_pointer_to_heap(obj)?"corruptedobject":"nonobject");841}842}

(gc.c)

Itcallsrb_gc_mark()recursively,isonlywhatI’dlikeyoutoconfirm.Intheomittedpart,NODEandT_xxxxareenumeratedrespectively.NODEwillbeintroducedinPart2.

Additionally,let’sseetheparttomarkT_DATA(thestructusedforextensionlibraries)becausethere’ssomethingwe’dliketocheck.Thiscodeisextractedfromthesecondswitchstatement.

▼rb_gc_mark_children()–T_DATA

789caseT_DATA:790if(obj->as.data.dmark)(*obj->as.data.dmark)(DATA_PTR(obj));791break;

(gc.c)

Here,itdoesnotuserb_gc_mark()orsimilarfunctions,butthe

dmarkwhichisgivenfromusers.Insideit,ofcourse,itmightuserb_gc_mark()orsomething,butnotusingisalsopossible.Forexample,inanextremesituation,ifauserdefinedobjectdoesnotcontainVALUE,there’snoneedtomark.

rb_gc()

Bynow,we’vefinishedtotalkabouteachobject.Fromnowon,let’sseethefunctionrb_gc()thatpresidesthewhole.Theobjectsmarkedhereare“objectswhichareobviouslynecessary”.Inotherwords,“therootsofGC”.

▼rb_gc()

1110void1111rb_gc()1112{1113structgc_list*list;1114structFRAME*volatileframe;/*gcc2.7.2.3-O2bug??*/1115jmp_bufsave_regs_gc_mark;1116SET_STACK_END;11171118if(dont_gc||during_gc){1119if(!freelist){1120add_heap();1121}1122return;1123}

/*……markfromtheallroots……*/

1183gc_sweep();1184}

(gc.c)

Therootswhichshouldbemarkedwillbeshownonebyoneafterthis,butI’dliketomentionjustonepointhere.

InrubytheCPUregistersandthemachinestackarealsotheroots.ItmeansthatthelocalvariablesandargumentsofCareautomaticallymarked.Forexample,

staticintf(void){VALUEarr=rb_ary_new();

/*……dovariousthings……*/}

likethisway,wecanprotectanobjectjustbyputtingitintoavariable.ThisisaverysignificanttraitoftheGCofruby.Becauseofthisfeature,ruby’sextensionlibrariesareinsanelyeasytowrite.

However,whatisonthestackisnotonlyVALUE.Therearealotoftotallyunrelatedvalues.HowtoresolvethisisthekeywhenreadingtheimplementationofGC.

TheRubyStackFirst,itmarksthe(ruby‘s)stackframesusedbytheinterpretor.SinceyouwillbeabletofindoutwhoitisafterreachingPart3,youdon’thavetothinksomuchaboutitfornow.

▼MarkingtheRubyStack

1130/*markframestack*/1131for(frame=ruby_frame;frame;frame=frame->prev){1132rb_gc_mark_frame(frame);1133if(frame->tmp){1134structFRAME*tmp=frame->tmp;1135while(tmp){1136rb_gc_mark_frame(tmp);1137tmp=tmp->prev;1138}1139}1140}1141rb_gc_mark((VALUE)ruby_class);1142rb_gc_mark((VALUE)ruby_scope);1143rb_gc_mark((VALUE)ruby_dyna_vars);

(gc.c)

ruby_frameruby_classruby_scoperuby_dyna_varsarethevariablestopointtoeachtopofthestacksoftheevaluator.Theseholdtheframe,theclassscope,thelocalvariablescope,andtheblocklocalvariablesatthattimerespectively.

RegisterNext,itmarkstheCPUregisters.

▼markingtheregisters

1148FLUSH_REGISTER_WINDOWS;1149/*Here,allregistersmustbesavedintojmp_buf.*/1150setjmp(save_regs_gc_mark);1151mark_locations_array((VALUE*)save_regs_gc_mark,sizeof(save_regs_gc_mark)/sizeof(VALUE*));

(gc.c)

FLUSH_REGISTER_WINDOWSisspecial.Wewillseeitlater.

setjmp()isessentiallyafunctiontoremotelyjump,butthecontentoftheregistersaresavedintotheargument(whichisavariableoftypejmp_buf)asitssideeffect.Makinguseofthis,itattemptstomarkthecontentoftheregisters.Thingsaroundherereallylooklikesecrettechniques.

HoweveronlydjgppandHuman68karespeciallytreated.djgppisagccenvironmentforDOS.Human68kisanOSofSHARPX680x0Series.Inthesetwoenvironments,thewholeregistersseemtobenotsavedonlybytheordinarysetjmp(),setjmp()isredefinedasfollowsasaninline-assemblertoexplicitlywriteouttheregisters.

▼theoriginalversionofsetjmp

1072#ifdef__GNUC__1073#ifdefined(__human68k__)||defined(DJGPP)1074#ifdefined(__human68k__)1075typedefunsignedlongrb_jmp_buf[8];1076__asm__(".even\n\2-bytealignment1077_rb_setjmp:\n\thelabelofrb_setjmp()function1078move.l4(sp),a0\n\loadthefirstargumenttothea0register1079movem.ld3-d7/a3-a5,(a0)\n\copytheregisterstowherea0pointsto1080moveq.l#0,d0\n\set0tod0(asthereturnvalue)1081rts");return1082#ifdefsetjmp1083#undefsetjmp1084#endif1085#else1086#ifdefined(DJGPP)1087typedefunsignedlongrb_jmp_buf[6];1088__asm__(".align4\n\order4-bytealignment1089_rb_setjmp:\n\thelabelforrb_setjmp()function1090pushl%ebp\n\pushebptothestack

1091movl%esp,%ebp\n\setthestackpointertoebp1092movl8(%ebp),%ebp\n\pickupthefirstargumentandsettoebp1093movl%eax,(%ebp)\n\inthefollowings,storeeachregister1094movl%ebx,4(%ebp)\n\towhereebppointsto1095movl%ecx,8(%ebp)\n\1096movl%edx,12(%ebp)\n\1097movl%esi,16(%ebp)\n\1098movl%edi,20(%ebp)\n\1099popl%ebp\n\restoreebpfromthestack1100xorl%eax,%eax\n\set0toeax(asthereturnvalue)1101ret");return1102#endif1103#endif1104intrb_setjmp(rb_jmp_buf);1105#definejmp_bufrb_jmp_buf1106#definesetjmprb_setjmp1107#endif/*__human68k__orDJGPP*/1108#endif/*__GNUC__*/

(gc.c)

Alignmentistheconstraintwhenputtingvariablesonmemories.Forexample,in32-bitmachineintisusually32bits,butwecannotalwaystake32bitsfromanywhereofmemories.Particularly,RISCmachinehasstrictconstraints,itisdecidedlike“fromamultipleof4byte”or“fromevenbyte”.Whentherearesuchconstraints,memoryaccessunitcanbemoresimplified(thus,itcanbefaster).Whenthere’stheconstraintof“fromamultipleof4byte”,itiscalled“4-bytealignment”.

Plus,inccofdjgpporHuman68k,there’sarulethatthecompilerputtheunderlinetotheheadofeachfunctionname.Therefore,whenwritingaCfunctioninAssembler,weneedtoputtheunderline(_)toitsheadbyourselves.Thistypeofconstraintsaretechniquesinordertoavoidtheconflictsinnameswithlibrary

functions.AlsoinUNIX,itissaidthattheunderlinehadbeenattachedbysometimeago,butitalmostdisappearsnow.

Now,thecontentoftheregistershasbeenabletobewrittenoutintojmp_buf,itwillbemarkedinthenextcode:

▼marktheregisters(shownagain)

1151mark_locations_array((VALUE*)save_regs_gc_mark,sizeof(save_regs_gc_mark)/sizeof(VALUE*));

(gc.c)

Thisisthefirsttimethatmark_locations_array()appears.I’lldescribeitinthenextsection.

mark_locations_array()

▼mark_locations_array()

500staticvoid501mark_locations_array(x,n)502registerVALUE*x;503registerlongn;504{505while(n--){506if(is_pointer_to_heap((void*)*x)){507rb_gc_mark(*x);508}509x++;510}511}

(gc.c)

Thisfunctionistomarktheallelementsofanarray,butitslightlydiffersfromthepreviousmarkfunctions.Untilnow,eachplacetobemarkediswhereweknowitsurelyholdsaVALUE(apointertoanobject).Howeverthistime,whereitattemptstomarkistheregisterspace,itisenoughtoexpectthatthere’realsowhatarenotVALUE.Tocounterthat,ittriestodetectwhetherornotthevalueisaVALUE(apointer),thenifitseems,thevaluewillbehandledasapointer.Thiskindofmethodsarecalled“conservativeGC”.Itseemsthatitisconservativebecauseit“tentativelyinclinesthingstothesafeside”

Next,we’lllookatthefunctiontocheckif“itlookslikeaVALUE”,itisis_pointer_to_heap().

is_pointer_to_heap()

▼is_pointer_to_heap()

480staticinlineint481is_pointer_to_heap(ptr)482void*ptr;483{484registerRVALUE*p=RANY(ptr);485registerRVALUE*heap_org;486registerlongi;487488if(p<lomem||p>himem)returnQfalse;489490/*checkifthere'sthepossibilitythatpisapointer*/491for(i=0;i<heaps_used;i++){492heap_org=heaps[i];493if(heap_org<=p&&p<heap_org+heaps_limits[i]&&494((((char*)p)-((char*)heap_org))%sizeof(RVALUE))==0)

495returnQtrue;496}497returnQfalse;498}

(gc.c)

IfIbrieflyexplainit,itwouldlooklikethefollowings:

checkifitisinbetweenthetopandthebottomoftheaddresseswhereRVALUEsreside.checkifitisintherangeofaheapmakesurethevaluepointstotheheadofaRVALUE.

Sincethemechanismislikethis,it’sobviouslypossiblethatanon-VALUEvalueismistakenlyhandledasaVALUE.Butatleast,itwillneverfailtofindouttheusedVALUEs.And,withthisamountoftests,itmayrarelypickupanon-VALUEvalueunlessitintentionallydoes.Therefore,consideringaboutthebenefitswecanobtainbyGC,it’ssufficienttocompromise.

RegisterWindowThissectionisaboutFLUSH_REGISTER_WINDOWS()whichhasbeendeferred.

RegisterwindowsarethemechanismtoenabletoputapartofthemachinestackintoinsidetheCPU.Inshort,itisacachewhosepurposeofuseisnarroweddown.Recently,itexistsonlyinSparcarchitecture.It’spossiblethattherearealsoVALUEsinregister

windows,andit’salsonecessarytogetdownthemintomemory.

Thecontentofthemacroislikethis:

▼FLUSH_REGISTER_WINDOWS

125#ifdefined(sparc)||defined(__sparc__)126#ifdefined(linux)||defined(__linux__)127#defineFLUSH_REGISTER_WINDOWSasm("ta0x83")128#else/*Solaris,notsparclinux*/129#defineFLUSH_REGISTER_WINDOWSasm("ta0x03")130#endif131#else/*Notasparc*/132#defineFLUSH_REGISTER_WINDOWS133#endif

(defines.h)

asm(...)isabuilt-inassembler.However,eventhoughIcallitassembler,thisinstructionnamedtaisthecallofaprivilegedinstruction.Inotherwords,thecallisnotoftheCPUbutoftheOS.That’swhytheinstructionisdifferentforeachOS.ThecommentsdescribeonlyaboutLinuxandSolaris,butactuallyFreeBSDandNetBSDarealsoworksonSparc,sothiscommentiswrong.

Plus,ifitisnotSparc,itisunnecessarytoflush,thusFLUSH_REGISTER_WINDOWSisdefinedasnothing.Likethis,themethodtogetamacrobacktonothingisveryfamoustechniquethatisalsoconvenientwhendebugging.

MachineStack

Then,let’sgobacktotherestofrb_gc().Thistime,itmarksVALUESsinthemachinestack.

▼markthemachinestack

1152rb_gc_mark_locations(rb_gc_stack_start,(VALUE*)STACK_END);1153#ifdefined(__human68k__)1154rb_gc_mark_locations((VALUE*)((char*)rb_gc_stack_start+2),1155(VALUE*)((char*)STACK_END+2));1156#endif

(gc.c)

rb_gc_stack_startseemsthestartaddress(theendofthestack)andSTACK_ENDseemstheendaddress(thetop).And,rb_gc_mark_locations()practicallymarksthestackspace.

Therearerb_gc_mark_locations()twotimesinordertodealwiththearchitectureswhicharenot4-bytealignment.rb_gc_mark_locations()triestomarkforeachportionofsizeof(VALUE),soifitisin2-bytealignmentenvironment,sometimesnotbeabletoproperlymark.Inthiscase,itmovestherange2bytesthenmarksagain.

Now,rb_gc_stack_start,STACK_END,rb_gc_mark_locations(),let’sexaminethesethreeinthisorder.

Init_stack()

Thefirstthingisrb_gc_starck_start.ThisvariableissetonlyduringInit_stack().AsthenameInit_mightsuggest,thisfunctionis

calledatthetimewheninitializingtherubyinterpretor.

▼Init_stack()

1193void1194Init_stack(addr)1195VALUE*addr;1196{1197#ifdefined(__human68k__)1198externvoid*_SEND;1199rb_gc_stack_start=_SEND;1200#else1201VALUEstart;12021203if(!addr)addr=&start;1204rb_gc_stack_start=addr;1205#endif1206#ifdefHAVE_GETRLIMIT1207{1208structrlimitrlim;12091210if(getrlimit(RLIMIT_STACK,&rlim)==0){1211doublespace=(double)rlim.rlim_cur*0.2;12121213if(space>1024*1024)space=1024*1024;1214STACK_LEVEL_MAX=(rlim.rlim_cur-space)/sizeof(VALUE);1215}1216}1217#endif1218}

(gc.c)

Whatisimportantisonlythepartinthemiddle.Itdefinesanarbitrarylocalvariable(itisallocatedonthestack)anditsetsitsaddresstorb_gc_stack_start.The_SENDinsidethecodefor__human68k__isprobablythevariabledefinedbyalibraryof

compilerorsystem.Naturally,youcanpresumethatitisthecontractionofStackEND.

Meanwhile,thecodeafterthatbundledbyHAVE_GETRLIMITappearstocheckthelengthofthestackanddomysteriousthings.Thisisalsointhesamecontextofwhatisdoneatrb_gc_mark_children()topreventthestackoverflow.Wecanignorethis.

STACK_END

Next,we’lllookattheSTACK_ENDwhichisthemacrotodetecttheendofthestack.

▼STACK_END

345#ifdefC_ALLOCA346#defineSET_STACK_ENDVALUEstack_end;alloca(0);347#defineSTACK_END(&stack_end)348#else349#ifdefined(__GNUC__)&&defined(USE_BUILTIN_FRAME_ADDRESS)350#defineSET_STACK_ENDVALUE*stack_end=__builtin_frame_address(0)351#else352#defineSET_STACK_ENDVALUE*stack_end=alloca(1)353#endif354#defineSTACK_END(stack_end)355#endif

(gc.c)

AstherearethreevariationsofSET_STACK_END,let’sstartwiththebottomone.alloca()allocatesaspaceattheendofthestackandreturnsit,sothereturnvalueandtheendaddressofthestackshouldbeveryclose.Hence,itconsidersthereturnvalueof

alloca()asanapproximatevalueoftheendofthestack.

Let’sgobackandlookattheoneatthetop.WhenthemacroC_ALLOCAisdefined,alloca()isnotnativelydefined,…inotherwords,itindicatesacompatiblefunctionisdefinedinC.Imentionedthatinthiscasealloca()internallyallocatesmemorybyusingmalloc().However,itdoesnothelptogetthepositionofthestackatall.Todealwiththissituation,itdeterminesthatthelocalvariablestack_endofthecurrentlyexecutingfunctionisclosetotheendofthestackandusesitsaddress(&stack_end).

Plus,thiscodecontainsalloca(0)whosepurposeisnoteasytosee.Thishasbeenafeatureofthealloca()definedinCsinceearlytimes,anditmeans“pleasecheckandfreetheunusedspace”.SincethisisusedwhendoingGC,itattemptstofreethememoryallocatedwithalloca()atthesametime.ButIthinkit’sbettertoputitinanothermacroinsteadofmixingintosuchplace…

Andatlast,inthemiddlecase,itisabout__builtin_frame_address().__GNUC__isasymboldefinedingcc(thecompilerofGNUC).Sincethisisusedtolimit,itisabuilt-ininstructionofgcc.Youcangettheaddressofthen-timespreviousstackframewith__builtin_frame_address(n).Asfor__builtin_frame_adress(0),itprovidestheaddressofthecurrentframe.

rb_gc_mark_locations()

Thelastoneistherb_gc_mark_locations()functionthatactually

marksthestack.

▼rb_gc_mark_locations()

513void514rb_gc_mark_locations(start,end)515VALUE*start,*end;516{517VALUE*tmp;518longn;519520if(start>end){521tmp=start;522start=end;523end=tmp;524}525n=end-start+1;526mark_locations_array(start,n);527}

(gc.c)

Basically,delegatingtothefunctionmark_locations_array()whichmarksaspaceissufficient.Whatthisfunctiondoesisproperlyadjustingthearguments.Suchadjustmentisrequiredbecauseinwhichdirectionthemachinestackextendsisundecided.Ifthemachinestackextendstoloweraddresses,endissmaller,ifitextendstohigheraddresses,startissmaller.Therefore,sothatthesmalleronebecomesstart,theyareadjustedhere.

TheotherrootobjectsFinally,itmarksthebuilt-inVALUEcontainersoftheinterpretor.

▼Theotherroots

1159/*marktheregisteredglobalvariables*/1160for(list=global_List;list;list=list->next){1161rb_gc_mark(*list->varptr);1162}1163rb_mark_end_proc();1164rb_gc_mark_global_tbl();11651166rb_mark_tbl(rb_class_tbl);1167rb_gc_mark_trap_list();11681169/*marktheinstancevariablesoftrue,false,etcifexist*/1170rb_mark_generic_ivar_tbl();1171/*markthevariablesusedintherubyparser(onlywhileparsing)*/1172rb_gc_mark_parser();

(gc.c)

WhenputtingaVALUEintoaglobalvariableofC,itisrequiredtoregisteritsaddressbyuserviarb_gc_register_address().Astheseobjectsaresavedinglobal_List,allofthemaremarked.

rb_mark_end_proc()istomarktheproceduralobjectswhichareregisteredviakindofENDstatementofRubyandexecutedwhenaprogramfinishes.(ENDstatementswillnotbedescribedinthisbook).

rb_gc_mark_global_tbl()istomarktheglobalvariabletablerb_global_tbl.(Seealsothenextchapter“VariablesandConstants”)

rb_mark_tbl(rb_class_tbl)istomarkrb_class_tblwhichwasdiscussedinthepreviouschapter.

rb_gc_mark_trap_list()istomarktheproceduralobjectswhichareregisteredviatheRuby’sfunction-likemethodtrap.(Thisisrelatedtosignalsandwillalsonotbedescribedinthisbook.)

rb_mark_generic_ivar_tbl()istomarktheinstancevariabletablepreparedfornon-pointerVALUEsuchastrue.

rb_gc_mark_parser()istomarkthesemanticstackoftheparser.(ThesemanticstackwillbedescribedinPart2.)

Untilhere,themarkphasehasbeenfinished.

Sweep

ThespecialtreatmentforNODEThesweepphaseistheprocedurestofindoutandfreethenot-markedobjects.But,forsomereason,theobjectsoftypeT_NODEarespeciallytreated.Takealookatthenextpart:

▼atthebegginingofgc_sweep()

846staticvoid847gc_sweep()848{849RVALUE*p,*pend,*final_list;850intfreed=0;851inti,used=heaps_used;

852853if(ruby_in_compile&&ruby_parser_stack_on_heap()){854/*Iftheyaccstackisnotonthemachinestack,855donotcollectNODEwhileparsing*/856for(i=0;i<used;i++){857p=heaps[i];pend=p+heaps_limits[i];858while(p<pend){859if(!(p->as.basic.flags&FL_MARK)&&BUILTIN_TYPE(p)==T_NODE)860rb_gc_mark((VALUE)p);861p++;862}863}864}

(gc.c)

NODEisaobjecttoexpressaprogramintheparser.NODEisputonthestackpreparedbyatoolnamedyaccwhilecompiling,butthatstackisnotalwaysonthemachinestack.Concretelyspeaking,whenruby_parser_stack_on_heap()isfalse,itindicatesitisnotonthemachinestack.Inthiscase,aNODEcouldbeaccidentallycollectedinthemiddleofitscreation,thustheobjectsoftypeT_NODEareunconditionallymarkedandprotectedfrombeingcollectedwhilecompiling(ruby_in_compile).

FinalizerAfterithasreachedhere,allnot-markedobjectscanbefreed.However,there’sonethingtodobeforefreeing.InRubythefreeingofobjectscanbehooked,anditisnecessarytocallthem.Thishookiscalled“finalizer”.

▼gc_sweep()Middle

869freelist=0;870final_list=deferred_final_list;871deferred_final_list=0;872for(i=0;i<used;i++){873intn=0;874875p=heaps[i];pend=p+heaps_limits[i];876while(p<pend){877if(!(p->as.basic.flags&FL_MARK)){878(A)if(p->as.basic.flags){879obj_free((VALUE)p);880}881(B)if(need_call_final&&FL_TEST(p,FL_FINALIZE)){882p->as.free.flags=FL_MARK;/*remainsmarked*/883p->as.free.next=final_list;884final_list=p;885}886else{887p->as.free.flags=0;888p->as.free.next=freelist;889freelist=p;890}891n++;892}893(C)elseif(RBASIC(p)->flags==FL_MARK){894/*theobjectsthatneedtofinalize*/895/*areleftuntouched*/896}897else{898RBASIC(p)->flags&=~FL_MARK;899}900p++;901}902freed+=n;903}904if(freed<FREE_MIN){905add_heap();906}907during_gc=0;

(gc.c)

Thischecksallovertheobjectheapfromtheedge,andfreestheobjectonwhichFL_MARKflagisnotsetbyusingobj_free()(A).obj_free()frees,forinstance,onlychar[]usedbyStringobjectsorVALUE[]usedbyArrayobjects,butitdoesnotfreetheRVALUEstructanddoesnottouchbasic.flagsatall.Therefore,ifastructismanipulatedafterobj_free()iscalled,there’snoworryaboutgoingdown.

Afteritfreestheobjects,itbranchesbasedonFL_FINALIZEflag(B).IfFL_FINALIZEissetonanobject,sinceitmeansatleastafinalizerisdefinedontheobject,theobjectisaddedtofinal_list.Otherwise,theobjectisimmediatelyaddedtofreelist.Whenfinalizing,basic.flagsbecomesFL_MARK.Thestruct-typeflag(suchasT_STRING)isclearedbecauseofthis,andtheobjectcanbedistinguishedfromaliveobjects.

Then,thisphasecompletesbyexecutingtheallfinalizers.Noticethatthehookedobjectshavealreadydiedwhencallingthefinalizers.Itmeansthatwhileexecutingthefinalizers,onecannotusethehookedobjects.

▼gc_sweep()therest

910if(final_list){911RVALUE*tmp;912913if(rb_prohibit_interrupt||ruby_in_compile){914deferred_final_list=final_list;

915return;916}917918for(p=final_list;p;p=tmp){919tmp=p->as.free.next;920run_final((VALUE)p);921p->as.free.flags=0;922p->as.free.next=freelist;923freelist=p;924}925}926}

(gc.c)

Theforinthelasthalfisthemainfinalizingprocedure.TheifinthefirsthalfisthecasewhentheexecutioncouldnotbemovedtotheRubyprogramforvariousreasons.Theobjectswhosefinalizationisdeferredwillbeappearintheroute(C)ofthepreviouslist.

rb_gc_force_recycle()

I’lltalkaboutalittledifferentthingattheend.Untilnow,theruby‘sgarbagecollectordecideswhetherornotitcollectseachobject,butthere’salsoawaythatusersexplicitlyletitcollectaparticularobject.It’srb_gc_force_recycle().

▼rb_gc_force_recycle()

928void929rb_gc_force_recycle(p)930VALUEp;931{932RANY(p)->as.free.flags=0;

933RANY(p)->as.free.next=freelist;934freelist=RANY(p);935}

(gc.c)

Itsmechanismisnotsospecial,butIintroducedthisbecauseyou’llseeitseveraltimesinPart2andPart3.

Discussions

TofreespacesThespaceallocatedbyanindividualobject,say,char[]ofString,isfreedduringthesweepphase,butthecodetofreetheRVALUEstructitselfhasnotappearedyet.And,theobjectheapalsodoesnotmanagethenumberofstructsinuseandsuch.Thismeansthatiftheruby’sobjectspaceisonceallocateditwouldneverbefreed.

Forexample,themailerwhatI’mcreatingnowtemporarilyusesthespacealmost40Mbyteswhenconstructingthethreadsfor500mails,butifmostofthespacebecomesunusedastheconsequenceofGCitwillkeepoccupyingthe40Mbytes.Becausemymachineisalsokindofmodern,itdoesnotmatterifjustthe40Mbytesareused.But,ifthisoccursinaserverwhichkeepsrunning,there’sthepossibilityofbecomingaproblem.

However,onealsoneedtoconsiderthatfree()doesnotalways

meanthedecreaseoftheamountofmemoryinuse.IfitdoesnotreturnmemorytoOS,theamountofmemoryinuseoftheprocessneverdecrease.And,dependingontheimplementationofmalloc(),althoughdoingfree()itoftendoesnotcausereturningmemorytoOS.

…Ihadwrittenso,butjustbeforethedeadlineofthisbook,RVALUEbecametobefreed.TheattachedCD-ROMalsocontainstheedgeruby,sopleasecheckbydiff.…whatasadending.

GenerationalGCMark&Sweephasanweakpoint,itis“itneedstotouchtheentireobjectspaceatleastonce”.There’sthepossibilitythatusingtheideaofGenerationalGCcanmakeupfortheweakpoint.

ThefundamentalofGenerationalGCistheexperientialrulethat“Mostobjectsarelastingforeitherverylongorveryshorttime”.Youmaybeconvincedaboutthispointbythinkingforsecondsabouttheprogramsyouwrite.

Then,thinkingbasedonthisrule,onemaycomeupwiththeideathat“long-livedobjectsdonotneedtobemarkedorswepteachandeverytime”.Onceanobjectisthoughtthatitwillbelong-lived,itistreatedspeciallyandexcludedfromtheGCtarget.Then,forbothmarkingandsweeping,itcansignificantlydecreasethenumberoftargetobjects.Forexample,ifhalfoftheobjectsarelong-livedataparticularGCtime,thenumberofthetargetobjects

ishalf.

There’saproblem,though.GenerationalGCisverydifficulttodoifobjectscan’tbemoved.Itisbecausethelong-livedobjectsare,asIjustwrote,neededto“betreatedspecially”.SincegenerationalGCdecreasesthenumberoftheobjectsdealtwithandreducesthecost,ifwhichgenerationaobjectbelongstoisnotclearlycategorized,asaconsequenceitisequivalenttodealingwithbothgenerations.Furthermore,theruby’sGCisalsoaconservativeGC,soitalsohastobecreatedsothatis_pointer_to_heap()work.Thisisparticularlydifficult.

Howtosolvethisproblemis…BythehandofMr.KiyamaMasato,theimplementationofGenerationalGCforrubyhasbeenpublished.I’llbrieflydescribehowthispatchdealswitheachproblem.Andthistime,bycourtesyofMr.Kiyama,thisGenerationalGCpatchanditspaperarecontainedinattachedCD-ROM.(Seealsodoc/generational-gc.html)

Then,Ishallstarttheexplanation.Inordertoeaseexplaining,fromnowon,thelong-livedobjectsarecalledas“old-generationobjects”,theshort-livedobjectsarecalledas“new-generationobjects”,

First,aboutthebiggestproblemwhichisthespecialtreatmentfortheold-generationobjects.Thispointisresolvedbylinkingonlythenew-generationobjectsintoalistnamednewlist.ThislistissubstantializedbyincreasingRVALUE’selements.

Second,aboutthewaytodetecttheold-generationobjects.Itisverysimplydonebyjustremovingthenewlistobjectswhichwerenotgarbagecollectedfromthenewlist.Inotherwords,onceanobjectsurvivesthroughGC,itwillbetreatedasanold-generationobject.

Third,aboutthewaytodetectthereferencesfromold-generationobjectstonew-generationobjects.InGenerationalGC,it’ssortof,theold-generationobjectskeepbeinginthemarkedstate.However,whentherearelinksfromold-generationtonew-generation,thenew-generationobjectswillnotbemarked.(Figure11)

Figure11:referenceovergenerations

Thisisnotgood,soatthemomentwhenanold-generationalobjectreferstoanew-generationalobject,thenew-generationalobjectmustbeturnedintoold-generational.Thepatchmodifiesthe

librariesandaddscheckstowherethere’spossibilitythatthiskindofreferenceshappens.

Thisistheoutlineofitsmechanism.Itwasscheduledthatthispatchisincludedruby1.7,butithasnotbeenincludedyet.Itissaidthatthereasonisitsspeed,There’saninferencethatthecostofthethirdpoint“checkallreferences”matters,buttheprecisecausehasnotfiguredout.

CompactionCouldtheruby’sGCdocompaction?SinceVALUEofrubyisadirectpointertoastruct,iftheaddressofthestructarechangedbecauseofcompaction,itisnecessarytochangetheallVALUEsthatpointtothemovedstructs.

However,sincetheruby’sGCisaconservativeGC,“thecasewhenitisimpossibletodeterminewhetherornotitisreallyaVALUE”ispossible.Changingthevalueeventhoughinthissituation,ifitwasnotVALUEsomethingawfulwillhappen.CompactionandconservativeGCarereallyincompatible.

But,let’scontrivecountermeasuresinonewayoranother.ThefirstwayistoletVALUEbeanobjectIDinsteadofapointer.(Figure12)ItmeanssandwichingaindirectlayerbetweenVALUEandastruct.Inthisway,asit’snotnecessarytorewriteVALUE,structscanbesafelymoved.Butastrade-offs,accessingspeedslowsdownandthecompatibilityofextensionlibrariesislost.

Figure12:referencethroughtheobjectID

Then,thenextwayistoallowmovingthestructonlywhentheyarepointedfromonlythepointersthat“issurelyVALUE”(Figure13).ThismethodiscalledMostly-copyinggarbagecollection.Intheordinaryprograms,therearenotsomanyobjectsthatis_pointer_to_heap()istrue,sotheprobabilityofbeingabletomovetheobjectstructsisquitehigh.

Figure13:Mostly-copyinggarbagecollection

Moreoverandmoreover,byenablingtomovethestruct,theimplementationofGenerationalGCbecomessimpleatthesametime.Itseemstobeworthtochallenge.

volatiletoprotectfromGCIwrotethatGCtakescareofVALUEonthestack,thereforeifaVALUEislocatedasalocalvariabletheVALUEshouldcertainlybemarked.Butinrealityduetotheeffectsofoptimization,it’spossiblethatthevariablesdisappear.Forexample,there’sapossibilityofdisappearinginthefollowingcase:

VALUEstr;str=rb_str_new2("...");printf("%s\n",RSTRING(str)->ptr);

Becausethiscodedoesnotaccessthestritself,somecompilers

onlykeepsstr->ptrinmemoryanddeletesthestr.Ifthishappened,thestrwouldbecollectedandtheprocesswouldbedown.There’snochoiceinthiscase

volatileVALUEstr;

weneedtowritethisway.volatileisareservedwordofC,andithasaneffectofforbiddingoptimizationsthathavetodowiththisvariable.IfvolatilewasattachedinthecoderelatestoRuby,youcouldassumealmostcertainlythatitsexistsforGC.WhenIreadK&R,Ithought“whatistheuseofthis?”,andtotallydidn’texpecttoseetheplentyoftheminruby.

Consideringtheseaspects,thepromiseoftheconservativeGC“usersdon’thavetocareaboutGC”seemsnotalwaystrue.Therewasonceadiscussionthat“theScheme’sGCnamedKSMdoesnotneedvolatile”,butitseemsitcouldnotbeappliedtorubybecauseitsalgorithmhasahole.

Whentoinvoke

Insidegc.cWhentoinvokeGC?Insidegc.c,therearethreeplacescallingrb_gc()insideofgc.c,

ruby_xmalloc()

ruby_xrealloc()

rb_newobj()

Asforruby_xmalloc()andruby_xrealloc(),itiswhenfailingtoallocatememory.DoingGCmayfreememoriesandit’spossiblethataspacebecomesavailableagain.rb_newobj()hasasimilarsituation,itinvokeswhenfreelistbecomesempty.

InsidetheinterpritorThere’sseveralplacesexceptforgc.cwherecallingrb_gc()intheinterpretor.

First,inio.canddir.c,whenitrunsoutoffiledescriptorsandcouldnotopen,itinvokesGC.IfIOobjectsaregarbagecollected,it’spossiblethatthefilesareclosedandfiledescriptorsbecomeavailable.

Inruby.c,rb_gc()issometimesdoneafterloadingafile.AsImentionedinthepreviousSweepsection,itistocompensateforthefactthatNODEcannotbegarbagecollectedwhilecompiling.

ObjectCreation

We’vefinishedaboutGCandcometobeabletodealwiththeRubyobjectsfromitscreationtoitsfreeing.SoI’dliketodescribeaboutobjectcreationshere.ThisisnotsorelatedtoGC,rather,itisrelatedalittletothediscussionaboutclassesinthepreviouschapter.

AllocationFrameworkWe’vecreatedobjectsmanytimes.Forexample,inthisway:

classCendC.new()

Atthistime,howdoesC.newcreateaobject?

First,C.newisactuallyClass#new.Itsactualbodyisthis:

▼rb_class_new_instance()

725VALUE726rb_class_new_instance(argc,argv,klass)727intargc;728VALUE*argv;729VALUEklass;730{731VALUEobj;732733obj=rb_obj_alloc(klass);734rb_obj_call_init(obj,argc,argv);735736returnobj;737}

(object.c)

rb_obj_alloc()callstheallocatemethodagainsttheklass.Inotherwords,itcallsC.allocateinthisexamplecurrentlyexplained.ItisClass#allocatebydefaultanditsactualbodyisrb_class_allocate_instance().

▼rb_class_allocate_instance()

708staticVALUE709rb_class_allocate_instance(klass)710VALUEklass;711{712if(FL_TEST(klass,FL_SINGLETON)){713rb_raise(rb_eTypeError,"can'tcreateinstanceofvirtualclass");714}715if(rb_frame_last_func()!=alloc){716returnrb_obj_alloc(klass);717}718else{719NEWOBJ(obj,structRObject);720OBJSETUP(obj,klass,T_OBJECT);721return(VALUE)obj;722}723}

(object.c)

rb_newobj()isafunctionthatreturnsaRVALUEbytakingfromthefreelist.NEWOBJ()isjustarb_newobj()withtype-casting.TheOBJSETUP()isamacrotoinitializethestructRBasicpart,youcanthinkthatthisexistsonlyinordernottoforgettosettheFL_TAINTflag.

Therestisgoingbacktorb_class_new_instance(),thenitcallsrb_obj_call_init().Thisfunctioncallsinitializeonthejustcreatedobject,andtheinitializationcompletes.

Thisissummarizedasfollows:

SomeClass.new=Class#new(rb_class_new_instance)SomeClass.allocate=Class#allocate(rb_class_allocate_instance)SomeClass#initialize=Object#initialize(rb_obj_dummy)

Icouldsaythattheallocateclassmethodistophysicallyinitialize,theinitializeistologicallyinitialize.Themechanismlikethis,inotherwordsthemechanismthatanobjectcreationisdividedintoallocate/initializeandnewpresidesthem,iscalledthe“allocationframework”.

CreatingUserDefinedObjectsNext,we’llexamineabouttheinstancecreationsoftheclassesdefinedinextensionlibraries.Asitiscalleduser-defined,itsstructisnotdecided,withouttellinghowtoallocateit,rubydon’tunderstandhowtocreateitsobject.Let’slookathowtotellit.

Data_Wrap_Struct()

Whicheveritisuser-definedornot,itscreationmechanismitselfcanfollowtheallocationframework.ItmeansthatwhendefininganewSomeClassclassinC,weoverwritebothSomeClass.allocateandSomeClass#initialize.

Let’slookattheallocatesidefirst.Here,itdoesthephysicalinitialization.Whatisnecessarytoallocate?Imentionedthattheinstanceoftheuser-definedclassisapairofstructRDataandauser-preparedstruct.We’llassumethatthestructisoftypestructmy.InordertocreateaVALUEbasedonthestructmy,youcanuseData_Wrap_Struct().Thisishowtouse:

structmy*ptr=malloc(sizeof(structmy));/*arbitrarilyallocateintheheap*/VALUEval=Data_Wrap_Struct(data_class,mark_f,free_f,ptr);

data_classistheclassthatvalbelongsto,ptristhepointertobewrapped.mark_fis(thepointerto)thefunctiontomarkthisstruct.However,thisdoesnotmarktheptritselfandisusedwhenthestructpointedbyptrcontainsVALUE.Ontheotherhand,free_fisthefunctiontofreetheptritself.Theargumentofthebothfunctionsisptr.Goingbackalittleandreadingthecodetomarkmayhelpyoutounderstandthingsaroundhereinoneshot.

Let’salsolookatthecontentofData_Wrap_Struct().

▼Data_Wrap_Struct()

369#defineData_Wrap_Struct(klass,mark,free,sval)\370rb_data_object_alloc(klass,sval,\(RUBY_DATA_FUNC)mark,\(RUBY_DATA_FUNC)free)

365typedefvoid(*RUBY_DATA_FUNC)_((void*));

(ruby.h)

Mostofitisdelegatedtorb_object_alloc().

▼rb_data_object_alloc()

310VALUE311rb_data_object_alloc(klass,datap,dmark,dfree)312VALUEklass;313void*datap;314RUBY_DATA_FUNCdmark;315RUBY_DATA_FUNCdfree;316{317NEWOBJ(data,structRData);318OBJSETUP(data,klass,T_DATA);319data->data=datap;320data->dfree=dfree;321data->dmark=dmark;322323return(VALUE)data;324}

(gc.c)

Thisisnotcomplicated.Asthesameastheordinaryobjects,itpreparesaRVALUEbyusingNEWOBJ()OBJSETUP(),andsetsthemembers.

Here,let’sgobacktoallocate.We’vesucceededtocreateaVALUEbynow,sotherestisputtingitinanarbitraryfunctionanddefiningthefunctiononaclassbyrb_define_singleton_method().

Data_Get_Struct()

Thenextthingisinitialize.Notonlyforinitialize,themethodsneedawaytopulloutthestructmy*fromthepreviouslycreated

VALUE.Inordertodoit,youcanusetheData_Get_Struct()macro.

▼Data_Get_Struct()

378#defineData_Get_Struct(obj,type,sval)do{\379Check_Type(obj,T_DATA);\380sval=(type*)DATA_PTR(obj);\381}while(0)

360#defineDATA_PTR(dta)(RDATA(dta)->data)

(ruby.h)

Asyousee,itjusttakesthepointer(tostructmy)fromamemberofRData.Thisissimple.Check_Type()justchecksthestructtype.

TheIssuesoftheAllocationFrameworkSo,I’veexplainedinnocentlyuntilnow,butactuallythecurrentallocationframeworkhasafatalissue.Ijustdescribedthattheobjectcreatedwithallocateappearstotheinitializeortheothermethods,butifthepassedobjectthatwascreatedwithallocateisnotofthesameclass,itmustbeaveryseriousproblem.Forexample,iftheobjectcreatedwiththedefaultObjct.allocate(Class#allocate)ispassedtothemethodofString,thiscauseaseriousproblem.ThatisbecauseeventhoughthemethodsofStringarewrittenbasedontheassumptionthatastructoftypestructRStringisgiven,thegivenobjectisactuallyastructRObject.Inordertoavoidsuchsituation,theobjectcreatedwithC.allocatemustbepassedonlytothemethodsofCoritssubclasses.

Ofcourse,thisisalwaystruewhenthingsareordinarilydone.AsC.allocatecreatestheinstanceoftheclassC,itisnotpassedtothemethodsoftheotherclasses.Asanexception,itispossiblethatitispassedtothemethodofObject,butthemethodsofObjectdoesnotdependonthestructtype.

However,whatifitisnotordinarilydone?SinceC.allocateisexposedattheRubylevel,thoughI’venotdescribedaboutthemyet,bymakinguseofaliasorsuperorsomething,thedefinitionofallocatecanbemovedtoanotherclass.Inthisway,youcancreateanobjectwhoseclassisStringbutwhoseactualstructtypeisstructRObject.ItmeansthatyoucanfreelyletrubydownfromtheRubylevel.Thisisaproblem.

ThesourceoftheissueisthatallocateisexposedtotheRubylevelasamethod.Converselyspeaking,asolutionistodefinethecontentofallocateontheclassbyusingawaythatisanythingbutamethod.So,

rb_define_allocator(rb_cMy,my_allocate);

analternativelikethisiscurrentlyindiscussion.

TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera

CreativeCommonsAttribution-NonCommercial-ShareAlike2.5

License

RubyHackingGuide

TranslatedbyVincentISAMBART

Chapter6:Variables

andconstants

Outlineofthischapter

RubyvariablesInRubytherearequitealotofdifferenttypesofvariablesandconstants.Let’slinethemup,startingfromthelargestscope.

GlobalvariablesConstantsClassvariablesInstancevariablesLocalvariables

Instancevariableswerealreadyexplainedinchapter2“Objects”.Inthischapterwe’lltalkabout:

GlobalvariablesClassvariablesConstants

Wewilltalkaboutlocalvariablesinthethirdpartofthebook.

APIforvariablesTheobjectofthischapter’sanalysisisvariable.c.LetmefirstintroducetheAPIswhichwouldbetheentrypoints.

VALUErb_iv_get(VALUEobj,char*name)VALUErb_ivar_get(VALUEobj,IDname)VALUErb_iv_set(VALUEobj,char*name,VALUEval)VALUErb_ivar_set(VALUEobj,IDname,VALUEval)

ThesearetheAPIstoaccessinstancevariableswhichhavealreadybeendescribed.Theyareshownhereagainbecausetheirdefinitionsareinvariable.c.

VALUErb_cv_get(VALUEklass,char*name)VALUErb_cvar_get(VALUEklass,IDname)VALUErb_cv_set(VALUEklass,char*name,VALUEval)VALUErb_cvar_set(VALUEklass,IDname,VALUEval)

ThesefunctionsaretheAPIforaccessingclassvariables.Classvariablesbelongdirectlytoclassessothefunctionstakeaclassasparameter.Thereareintwogroups,dependingiftheirnamestartswithrb_Xvorrb_Xvar.Thedifferenceliesinthetypeofthevariable“name”.Theoneswithashorternamearegenerallyeasiertousebecausetheytakeachar*.TheoneswithalongernamearemoreforinternaluseastheytakeaID.

VALUErb_const_get(VALUEklass,IDname)VALUErb_const_get_at(VALUEklass,IDname)

VALUErb_const_set(VALUEklass,IDname,VALUEval)

Thesefunctionsareforaccessingconstants.Constantsalsobelongtoclassessotheytakeclassesasparameter.rb_const_get()followsthesuperclasschain,whereasrb_const_get_at()doesnot(itjustlooksinklass).

structglobal_entry*rb_global_entry(IDname)VALUErb_gv_get(char*name)VALUErb_gvar_get(structglobal_entry*ent)VALUErb_gv_set(char*name,VALUEval)VALUErb_gvar_set(structglobal_entry*ent,VALUEval)

Theselastfunctionsareforaccessingglobalvariables.Theyarealittledifferentfromtheothersduetotheuseofstructglobal_entry.We’llexplainthiswhiledescribingtheimplementation.

PointsofthischapterThemostimportantpointwhentalkingaboutvariablesis“Whereandhowarevariablesstored?”,inotherwords:datastructures.

Thesecondmostimportantmatterishowwesearchforthevalues.ThescopesofRubyvariablesandconstantsarequitecomplicatedbecausevariablesandconstantsaresometimesinherited,sometimeslookedforoutsideofthelocalscope…Tohaveabetterunderstanding,youshouldthinkbycomparingtheimplementationwiththespecification,like“Itbehaveslikethisinthissituationsoitsimplementationcouldn’tbeotherthenthis!”

Classvariables

Classvariablesarevariablesthatbelongtoclasses.InJavaorC++theyarecalledstaticvariables.Theycanbeaccessedfromboththeclassoritsinstances.But“fromaninstance”or“fromtheclass”isinformationonlyavailableintheevaluator,andwedonothaveoneforthemoment.SofromtheClevelit’slikehavingnoaccessrange.We’lljustfocusonthewaythesevariablesarestored.

ReadingThefunctionstogetaclassvariablearerb_cvar_get()andrb_cv_get().ThefunctionwiththelongernametakesIDasparameterandtheonewiththeshorteronetakeschar*.BecausetheonetakinganIDseemsclosertotheinternals,we’lllookatit.

▼rb_cvar_get()

1508VALUE1509rb_cvar_get(klass,id)1510VALUEklass;1511IDid;1512{1513VALUEvalue;1514VALUEtmp;15151516tmp=klass;1517while(tmp){1518if(RCLASS(tmp)->iv_tbl){1519if(st_lookup(RCLASS(tmp)->iv_tbl,id,&value)){1520if(RTEST(ruby_verbose)){1521cvar_override_check(id,tmp);1522}

1523returnvalue;1524}1525}1526tmp=RCLASS(tmp)->super;1527}15281529rb_name_error(id,"uninitializedclassvariable%sin%s",1530rb_id2name(id),rb_class2name(klass));1531returnQnil;/*notreached*/1532}

(variable.c)

Thisfunctionreadsaclassvariableinklass.

Errormanagementfunctionslikerb_raise()canbesimplyignoredlikeIsaidbefore.Therb_name_error()thatappearsthistimeisafunctionforraisinganexception,soitcanbeignoredforthesamereasons.Inruby,youcanassumethatallfunctionsendingwith_errorraiseanexception.

Afterremovingallthis,wecanseethatitisjustfollowingtheklass‘ssuperclasschainonebyoneandsearchingineachiv_tbl.…Atthispoint,I’dlikeyoutosay“What?iv_tblistheinstancevariablestable,isn’tit?”Asamatteroffact,classvariablesarestoredintheinstancevariabletable.

WecandothisbecausewhencreatingIDs,thewholenameofthevariablesistakenintoaccount,includingtheprefix:rb_intern()willreturndifferentIDsfor“@var”and“@@var”.AttheRubylevel,thevariabletypeisdeterminedonlybytheprefixsothere’snowaytoaccessaclassvariablecalled@varfromRuby.

Constants

It’salittleabruptbutI’dlikeyoutorememberthemembersofstructRClass.Ifweexcludethebasicmember,structRClasscontains:

VALUEsuper

structst_table*iv_tbl

structst_table*m_tbl

Then,consideringthat:

1. constantsbelongtoaclass2. wecan’tseeanytablededicatedtoconstantsinstructRClass3. classvariablesandinstancevariablesarebothiniv_tbl

Coulditmeanthattheconstantsarealso…

Assignmentrb_const_set()isafunctiontosetthevalueofconstants:itsetstheconstantidintheclassklasstothevalueval.

▼rb_const_set()

1377void1378rb_const_set(klass,id,val)1379VALUEklass;1380IDid;1381VALUEval;

1382{1383mod_av_set(klass,id,val,Qtrue);1384}

(variable.c)

mod_av_set()doesallthehardwork:

▼mod_av_set()

1352staticvoid1353mod_av_set(klass,id,val,isconst)1354VALUEklass;1355IDid;1356VALUEval;1357intisconst;1358{1359char*dest=isconst?"constant":"classvariable";13601361if(!OBJ_TAINTED(klass)&&rb_safe_level()>=4)1362rb_raise(rb_eSecurityError,"Insecure:can'tset%s",dest);1363if(OBJ_FROZEN(klass))rb_error_frozen("class/module");1364if(!RCLASS(klass)->iv_tbl){1365RCLASS(klass)->iv_tbl=st_init_numtable();1366}1367elseif(isconst){1368if(st_lookup(RCLASS(klass)->iv_tbl,id,0)||1369(klass==rb_cObject&&st_lookup(rb_class_tbl,id,0))){1370rb_warn("alreadyinitialized%s%s",dest,rb_id2name(id));1371}1372}13731374st_insert(RCLASS(klass)->iv_tbl,id,val);1375}

(variable.c)

Youcanthistimeagainignorethewarningchecks(rb_raise(),rb_error_frozen()andrb_warn()).Here’swhat’sleft:

▼mod_av_set()(onlytheimportantpart)

if(!RCLASS(klass)->iv_tbl){RCLASS(klass)->iv_tbl=st_init_numtable();}st_insert(RCLASS(klass)->iv_tbl,id,val);

We’renowsureconstantsalsoresideintheinstancetable.Itmeansintheiv_tblofstructRClass,thefollowingaremixedtogether:

1. theclass’sowninstancevariables2. classvariables3. constants

ReadingWenowknowhowtheconstantsarestored.We’llnowcheckhowtheyreallywork.

rb_const_get()

We’llnowlookatrb_const_get(),thefunctiontoreadaconstant.Thisfunctionreturnstheconstantreferredtobyidfromtheclassklass.

▼rb_const_get()

1156VALUE1157rb_const_get(klass,id)1158VALUEklass;

1159IDid;1160{1161VALUEvalue,tmp;1162intmod_retry=0;11631164tmp=klass;1165retry:1166while(tmp){1167if(RCLASS(tmp)->iv_tbl&&st_lookup(RCLASS(tmp)->iv_tbl,id,&value)){1168returnvalue;1169}1170if(tmp==rb_cObject&&top_const_get(id,&value))returnvalue;1171tmp=RCLASS(tmp)->super;1172}1173if(!mod_retry&&BUILTIN_TYPE(klass)==T_MODULE){1174mod_retry=1;1175tmp=rb_cObject;1176gotoretry;1177}11781179/*Uninitializedconstant*/1180if(klass&&klass!=rb_cObject){1181rb_name_error(id,"uninitializedconstant%sat%s",1182rb_id2name(id),1183RSTRING(rb_class_path(klass))->ptr);1184}1185else{/*global_uninitialized*/1186rb_name_error(id,"uninitializedconstant%s",rb_id2name(id));1187}1188returnQnil;/*notreached*/1189}

(variable.c)

There’salotofcodeintheway.First,weshouldatleastremovetherb_name_error()inthesecondhalf.Inthemiddle,what’saroundmod_entryseemstobeaspecialhandlingformodules.Let’salsoremovethatforthetimebeing.Thefunctiongetsreducedtothis:

▼rb_const_get(simplified)

VALUErb_const_get(klass,id)VALUEklass;IDid;{VALUEvalue,tmp;

tmp=klass;while(tmp){if(RCLASS(tmp)->iv_tbl&&st_lookup(RCLASS(tmp)->iv_tbl,id,&value)){returnvalue;}if(tmp==rb_cObject&&top_const_get(id,&value))returnvalue;tmp=RCLASS(tmp)->super;}}

Nowitshouldbeprettyeasytounderstand.Thefunctionsearchesfortheconstantiniv_tblwhileclimbingklass’ssuperclasschain.Thatmeans:

classAConst="ok"endclassB<Ap(Const)#canbeaccessedend

Theonlyproblemremainingistop_const_get().Thisfunctionisonlycalledforrb_cObjectsotopmustmean“top-level”.Ifyoudon’tremember,atthetop-level,theclassisObject.Thismeansthesameas“intheclassstatementdefiningC,theclassbecomesC”,meaningthat“thetop-level’sclassisObject”.

#theclassofthetop-levelisObjectclassA#theclassisAclassB#theclassisBendend

Sotop_const_get()probablydoessomethingspecifictothetoplevel.

top_const_get()

Let’slookatthistop_const_getfunction.Itlooksuptheidconstantwritesthevalueinklasspandreturns.

▼top_const_get()

1102staticint1103top_const_get(id,klassp)1104IDid;1105VALUE*klassp;1106{1107/*pre-definedclass*/1108if(st_lookup(rb_class_tbl,id,klassp))returnQtrue;11091110/*autoload*/1111if(autoload_tbl&&st_lookup(autoload_tbl,id,0)){1112rb_autoload_load(id);1113*klassp=rb_const_get(rb_cObject,id);1114returnQtrue;1115}1116returnQfalse;1117}

(variable.c)

rb_class_tblwasalreadymentionedinchapter4“Classesandmodules”.It’sthetableforstoringtheclassesdefinedatthetop-level.Built-inclasseslikeStringorArrayhaveforexampleanentryinit.That’swhyweshouldnotforgettosearchinthistablewhenlookingfortop-levelconstants.

Thenextblockisrelatedtoautoloading.Itisdesignedtobeabletoregisteralibrarythatisloadedautomaticallywhenaccessingaparticulartop-levelconstantforthefirsttime.Thiscanbeusedlikethis:

autoload(:VeryBigClass,"verybigclass")#VeryBigClassisdefinedinit

Afterthis,whenVeryBigClassisaccessedforthefirsttime,theverybigclasslibraryisloaded(withrequire).AslongasVeryBigClassisdefinedinthelibrary,executioncancontinuesmoothly.It’sanefficientapproach,whenalibraryistoobigandalotoftimeisspentonloading.

Thisautoloadisprocessedbyrb_autoload_xxxx().Wewon’tdiscussautoloadfurtherinthischapterbecausetherewillprobablybeabigchangeinhowitworkssoon.

(translator’snote:Thewayautoloadworksdidchangein1.8:autoloadedconstantsdonotneedtobedefinedattop-levelanymore).

Otherclasses?

Butwheredidthecodeforlookingupconstantsinotherclassesendup?Afterall,constantsarefirstlookedupintheoutsideclasses,theninthesuperclasses.

Infact,wedonotyethaveenoughknowledgetolookatthat.Theoutsideclasseschangedependingonthelocationintheprogram.Inotherwordsitdependsoftheprogramcontext.Soweneedfirsttounderstandhowtheinternalstateoftheevaluatorishandled.Specifically,thissearchinotherclassesisdoneintheev_const_get()functionofeval.c.We’lllookatitandfinishwiththeconstantsinthethirdpartofthebook.

Globalvariables

GeneralremarksGlobalvariablescanbeaccessedfromanywhere.Orputtheotherwayaround,thereisnoneedtorestrictaccesstothem.Becausetheyarenotattachedtoanycontext,thetableonlyhastobeatoneplace,andthere’snoneedtodoanycheck.Thereforeimplementationisverysimple.

Butthereisstillquitealotofcode.ThereasonforthisisthatglobalvariablesofRubyareequippedwithsomegimmickswhichmakeithardtoregardthemasmerevariables.Functionslikethefollowingareonlyavailableforglobalvariables:

youcan“hook”accessofglobalvariablesyoucanaliasthemwithalias

Let’sexplainthissimply.

Aliasesofvariablesalias$newname$oldname

Afterthis,youcanuse$newnameinsteadof$oldname.aliasforvariablesismainlyacounter-measurefor“symbolvariables”.“symbolvariables”arevariablesinheritedfromPerllike$=or$0.$=decidesifduringstringcomparisonupperandlowercaselettersshouldbedifferentiated.$0showsthenameofthemainRubyprogram.Therearesomeothersymbolvariablesbutanywayastheirnameisonlyonecharacterlong,theyaredifficulttorememberforpeoplewhodon’tknowPerl.So,aliaseswerecreatedtomakethemalittleeasiertounderstand.

Thatsaid,currentlysymbolvariablesarenotrecommended,andaremovedonebyoneinsingletonmethodsofsuitablemodules.Thecurrentschoolofthoughtisthat$=andotherswillbeabolishedin2.0.

HooksYoucan“hook”readandwriteofglobalvariables.

AlthoughhookscanbealsobesetattheRubylevel,Ithinkthe

purposeofitseemsrathertopreparethespecialvariablesforsystemuselike$KCODEatClevel.$KCODEisthevariablecontainingtheencodingtheinterpretercurrentlyusestohandlestrings.Essentiallyonlyspecialstringslike"EUC"or"UTF8"canbeassignedtoit,butthisistoobothersomesoitisdesignedsothat"e"or"u"canalsobeused.

p($KCODE)#"NONE"(default)$KCODE="e"p($KCODE)#"EUC"$KCODE="u"p($KCODE)#"UTF8"

Knowingthatyoucanhookassignmentofglobalvariables,youshouldunderstandeasilyhowthiscanbedone.Bytheway,$KCODE’sKcomesfrom“kanji”(thenameofChinesecharactersinJapanese).

Youmightsaythatevenwithaliasorhooks,globalvariablesjustaren’tusedmuch,soit’sfunctionalitythatdoesn’treallymater.It’sadequatenottotalkmuchaboutunusedfunctions,andI’dliketousemorepagesfortheanalysisoftheparserandevaluator.That’swhyI’llproceedwiththeexplanationbelowwhosedegreeofhalf-heartedis85%.

DatastructureIsaidthatthepointwhenlookingathowvariablesworkisthewaytheyarestored.First,I’dlikeyoutofirmlygraspthestructureused

byglobalvariables.

▼Datastructureforglobalvariables

21staticst_table*rb_global_tbl;

334structglobal_entry{335structglobal_variable*var;336IDid;337};

324structglobal_variable{325intcounter;/*referencecounter*/326void*data;/*valueofthevariable*/327VALUE(*getter)();/*functiontogetthevariable*/328void(*setter)();/*functiontosetthevariable*/329void(*marker)();/*functiontomarkthevariable*/330intblock_trace;331structtrace_var*trace;332};

(variable.c)

rb_global_tblisthemaintable.Allglobalvariablesarestoredinthistable.Thekeysofthistableareofcoursevariablenames(ID).Avalueisexpressedbyastructglobal_entryandastructglobal_variable(figure1).

Figure1:Globalvariablestableatexecutiontime

Thestructurerepresentingthevariablesissplitintwotobeabletocreatealiases.Whenanaliasisestablished,twoglobal_entryspointtothesamestructglobal_variable.

It’satthistimethatthereferencecounter(thecountermemberofstructglobal_variable)isnecessary.Iexplainedthegeneralideaofareferencecounterintheprevioussection“Garbagecollection”.Reviewingitbriefly,whenanewreferencetothestructureismade,thecounterinincrementedby1.Whenthereferenceisnotusedanymore,thecounterisdecreasedby1.Whenthecounterreaches0,thestructureisnolongerusefulsofree()canbecalled.

WhenhooksaresetattheRubylevel,alistofstructtrace_varsisstoredinthetracememberofstructglobal_variable,butIwon’ttalkaboutit,andomitstructtrace_var.

Reading

Youcanhaveageneralunderstandingofglobalvariablesjustbylookingathowtheyareread.Thefunctionsforreadingthemarerb_gv_get()andrb_gvar_get().

▼rb_gv_get()rb_gvar_get()

716VALUE717rb_gv_get(name)718constchar*name;719{720structglobal_entry*entry;721722entry=rb_global_entry(global_id(name));723returnrb_gvar_get(entry);724}

649VALUE650rb_gvar_get(entry)651structglobal_entry*entry;652{653structglobal_variable*var=entry->var;654return(*var->getter)(entry->id,var->data,var);655}

(variable.c)

Asubstantialpartofthecontentseemstoturnaroundtherb_global_entry()function,butthatdoesnotpreventusunderstandingwhat’sgoingon.global_idisafunctionthatconvertsachar*toIDandchecksifit’stheIDofaglobalvariable.(*var->getter)(...)isofcourseafunctioncallusingthefunctionpointervar->getter.Ifpisafunctionpointer,(*p)(arg)callsthefunction.

Butthemainpartisstillrb_global_entry().

▼rb_global_entry()

351structglobal_entry*352rb_global_entry(id)353IDid;354{355structglobal_entry*entry;356357if(!st_lookup(rb_global_tbl,id,&entry)){358structglobal_variable*var;359entry=ALLOC(structglobal_entry);360st_add_direct(rb_global_tbl,id,entry);361var=ALLOC(structglobal_variable);362entry->id=id;363entry->var=var;364var->counter=1;365var->data=0;366var->getter=undef_getter;367var->setter=undef_setter;368var->marker=undef_marker;369370var->block_trace=0;371var->trace=0;372}373returnentry;374}

(variable.c)

Themaintreatmentisonlydonebythest_lookup()atthebeginning.What’sdoneafterwardsisjustcreating(andstoring)anewentry.As,whenaccessinganonexistingglobalvariable,anentryisautomaticallycreated,rb_global_entry()willneverreturnNULL.

Thiswasmainlydoneforspeed.Whentheparserfindsaglobalvariable,itgetsthecorrespondingstructglobal_entry.Whenreadingthevalueofthevariable,thevalueisjustobtainedfromtheentry(usingrb_gv_get()).

Let’snowcontinuealittlewiththecodethatfollows.var->getterandothersaresettoundef_xxxx.undefprobablymeansthattheyarethesetter/getter/markerforaglobalvariablewhosestateisundefined.

undef_getter()justshowsawarningandreturnsnil,asevenundefinedglobalvariablescanberead.undef_setter()isalittlebitinterestingsolet’slookatit.

▼undef_setter()

385staticvoid386undef_setter(val,id,data,var)387VALUEval;388IDid;389void*data;390structglobal_variable*var;391{392var->getter=val_getter;393var->setter=val_setter;394var->marker=val_marker;395396var->data=(void*)val;397}

(variable.c)

val_getter()takesthevaluefromentry->dataandreturnsit.

val_getter()justputsavalueinentry->data.Settinghandlersthiswayallowsusnottoneedspecialhandlingforundefinedvariables(figure2).Skillfullydone,isn’tit?

Figure2:Settingandconsultationofglobalvariables

TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera

CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License

RubyHackingGuide

TranslatedbyCliffordEscobarCAOILE&ocha-

Chapter7:Security

FundamentalsIsaysecuritybutIdon’tmeanpasswordsorencryption.TheRubysecurityfeatureisusedforhandlinguntrustedobjectsinaenvironmentlikeCGIprogramming.

Forexample,whenyouwanttoconvertastringrepresentinganumberintoainteger,youcanusetheevalmethod.However.evalisamethodthat“runsastringasaRubyprogram.”Ifyouevalastringfromaunknownpersonfromthenetwork,itisverydangerous.Howeverfortheprogrammertofullydifferentiatebetweensafeandunsafethingsisverytiresomeandcumbersome.Therefore,itisforcertainthatamistakewillbemade.So,letusmakeitpartofthelanguage,wasreasoningforthisfeature.

Sothen,howRubyprotectusfromthatsortofdanger?Causesofdangerousoperations,forexample,openingunintendedfiles,areroughlydividedintotwogroups:

DangerousdataDangerouscode

Fortheformer,thecodethathandlesthesevaluesiscreatedbytheprogrammersthemselves,sothereforeitis(relatively)safe.For

thelatter,theprogramcodeabsolutelycannotbetrusted.

Becausethesolutionisvastlydifferentbetweenthetwocauses,itisimportanttodifferentiatethembylevel.Thisarecalledsecuritylevels.TheRubysecuritylevelisrepresentedbythe$SAFEglobalvariable.Thevaluerangesfromminimumvalue0tomaximumvalue4.Whenthevariableisassigned,thelevelincreases.Oncethelevelisraiseditcanneverbelowered.Andforeachlevel,theoperationsarelimited.

Iwillnotexplainlevel1or3.Level0isthenormalprogramenvironmentandthesecuritysystemisnotrunning.Level2handlesdangerousvalues.Level4handlesdangerouscode.Wecanskip0andmoveontoexplainindetaillevels2and4.

((errata:Level1handlesdangerousvalues.“Level2hasnousecurrently”isright.))

Level1Thislevelisfordangerousdata,forexample,innormalCGIapplications,etc.

Aper-object“taintedmark”servesasthebasisfortheLevel1implementation.Allobjectsreadinexternallyaremarkedtainted,andanyattempttoevalorFile.openwithataintedobjectwillcauseanexceptiontoberaisedandtheattemptwillbestopped.

Thistaintedmarkis“infectious”.Forexample,whentakingapart

ofataintedstring,thatpartisalsotainted.

Level4Thislevelisfordangerousprograms,forexample,runningexternal(unknown)programs,etc.

Atlevel1,operationsandthedataitusesarechecked,butatlevel4,operationsthemselvesarerestricted.Forexample,exit,fileI/O,threadmanipulation,redefiningmethods,etc.Ofcourse,thetaintedmarkinformationisused,butbasicallytheoperationsarethecriteria.

UnitofSecurity$SAFElookslikeaglobalvariablebutisinactualityathreadlocalvariable.Inotherwords,Ruby’ssecuritysystemworksonunitsofthread.InJavaand.NET,rightscanbesetpercomponent(object),butRubydoesnotimplementthat.TheassumedmaintargetwasprobablyCGI.

Therefore,ifonewantstoraisethesecuritylevelofonepartoftheprogram,thenitshouldbemadeintoadifferentthreadandhaveitssecuritylevelraised.Ihaven’tyetexplainedhowtocreateathread,butIwillshowanexamplehere:

#Raisethesecuritylevelinadifferentthreadp($SAFE)#0isthedefaultThread.fork{#Startadifferentthread$SAFE=4#Raisethelevel

eval(str)#Runthedangerousprogram}p($SAFE)#Outsideoftheblock,thelevelisstill0

Reliabilityof$SAFEEvenwithimplementingthespreadingoftaintedmarks,orrestrictingoperations,ultimatelyitisstillhandledmanually.Inotherwords,internallibrariesandexternallibrariesmustbecompletelycompatibleandiftheydon’t,thenthepartwaythe“tainted”operationswillnotspreadandthesecuritywillbelost.Andactuallythiskindofholeisoftenreported.Forthisreason,thiswriterdoesnotwhollytrustit.

Thatisnottosay,ofcourse,thatallRubyprogramsaredangerous.Evenat$SAFE=0itispossibletowriteasecureprogram,andevenat$SAFE=4itispossibletowriteaprogramthatfitsyourwhim.However,onecannotputtoomuchconfidenceon$SAFE(yet).

Inthefirstplace,functionalityandsecuritydonotgotogether.Itiscommonsensethataddingnewfeaturescanmakeholeseasiertoopen.Thereforeitisprudenttothinkthatrubycanprobablybedangerous.

ImplementationFromnowon,we’llstarttolookintoitsimplementation.Inordertowhollygraspthesecuritysystemofruby,wehavetolookat“whereisbeingchecked”ratherthanitsmechanism.However,this

timewedon’thaveenoughpagestodoit,andjustlistingthemupisnotinteresting.Therefore,inthischapter,I’llonlydescribeaboutthemechanismusedforsecuritychecks.TheAPIstocheckaremainlythesebelowtwo:

rb_secure(n):Ifmorethanorequaltoleveln,itwouldraiseSecurityError.SafeStringValue():Ifmorethanorequaltolevel1andastringistainted,thenitwouldraiseanexception.

Wewon’treadSafeStringValue()here.

TaintedMarkThetaintmarkis,tobeconcrete,theFL_TAINTflag,whichissettobasic->flags,andwhatisusedtoinfectitistheOBJ_INFECT()macro.Hereisitsusage.

OBJ_TAINT(obj)/*setFL_TAINTtoobj*/OBJ_TAINTED(obj)/*checkifFL_TAINTissettoobj*/OBJ_INFECT(dest,src)/*infectFL_TAINTfromsrctodest*/

SinceOBJ_TAINT()andOBJ_TAINTED()canbeassumednotimportant,let’sbrieflylookoveronlyOBJ_INFECT().

▼OBJ_INFECT

441#defineOBJ_INFECT(x,s)do{\if(FL_ABLE(x)&&FL_ABLE(s))\RBASIC(x)->flags|=RBASIC(s)->flags&FL_TAINT;\

}while(0)

(ruby.h)

FL_ABLE()checksiftheargumentVALUEisapointerornot.Ifthebothobjectsarepointers(itmeanseachofthemhasitsflagsmember),itwouldpropagatetheflag.

$SAFE▼ruby_safe_level

124intruby_safe_level=0;

7401staticvoid7402safe_setter(val)7403VALUEval;7404{7405intlevel=NUM2INT(val);74067407if(level<ruby_safe_level){7408rb_raise(rb_eSecurityError,"triedtodowngradesafelevelfrom%dto%d",7409ruby_safe_level,level);7410}7411ruby_safe_level=level;7412curr_thread->safe=level;7413}

(eval.c)

Thesubstanceof$SAFEisruby_safe_levelineval.c.AsIpreviouslywrote,$SAFEislocaltoeachthread,Itneedstobewrittenineval.cwheretheimplementationofthreadsislocated.Inotherwords,itisineval.conlybecauseoftherestrictionsofC,butitcan

essentiallybelocatedinanotherplace.

safe_setter()isthesetterofthe$SAFEglobalvariable.Itmeans,becausethisfunctionistheonlywaytoaccessitfromRubylevel,thesecuritylevelcannotbelowered.

However,asyoucansee,fromClevel,becausestaticisnotattachedtoruby_safe_level,youcanignoretheinterfaceandmodifythesecuritylevel.

rb_secure()

▼rb_secure()

136void137rb_secure(level)138intlevel;139{140if(level<=ruby_safe_level){141rb_raise(rb_eSecurityError,"Insecureoperation`%s'atlevel%d",142rb_id2name(ruby_frame->last_func),ruby_safe_level);143}144}

(eval.c)

Ifthecurrentsafelevelismorethanorequaltolevel,thiswouldraiseSecurityError.It’ssimple.

TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera

CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License

RubyHackingGuide

Chapter8:RubyLanguageDetails

I’lltalkaboutthedetailsofRuby’ssyntaxandevaluation,whichhaven’tbeencoveredyet.Ididn’tintendacompleteexposition,soIleftouteverythingwhichdoesn’tcomeupinthisbook.That’swhyyouwon’tbeabletowriteRubyprogramsjustbyreadingthis.Acompleteexpositioncanbefoundinthe\footnote{Rubyreferencemanual:archives/ruby-refm.tar.gzintheattachedCD-ROM}

ReaderswhoknowRubycanskipoverthischapter.

Literals

TheexpressivenessofRuby’sliteralsisextremelyhigh.Inmyopinion,whatmakesRubyascriptlanguageisfirstlytheexistenceofthetoplevel,secondlyit’stheexpressivenessofitsliterals.Thirdlyitmightbetherichnessofitsstandardlibrary.

Asingleliteralalreadyhasenormouspower,butevenmorewhenmultipleliteralsarecombined.EspeciallytheabilityofcreatingcomplexliteralsthathashandarrayliteralsarecombinedisthebiggestadvantageofRuby’sliteral.Onecanwrite,forinstance,ahashofarraysofregularexpressionsbyconstructingstraightforwardly.

Whatkindofexpressionsarevalid?Let’slookatthemonebyone.

StringsStringsandregularexpressionscan’tbemissinginascriptinglanguage.TheexpressivenessofRuby’sstringisveryvariousevenmorethantheotherRuby’sliterals.

SingleQuotedStrings'string'#「string」'\\begin{document}'#「\begin{document}」'\n'#「\n」backslashandann,notanewline'\1'#「\1」backslashand1'\''#「'」

Thisisthesimplestform.InC,whatenclosedinsinglequotesbecomesacharacter,butinRuby,itbecomesastring.Let’scallthisa'-string.Thebackslashescapeisineffectonlyfor\itselfand'.Ifoneputsabackslashinfrontofanothercharacterthebackslashremainsasinthefourthexample.

AndRuby’sstringsaren’tdividedbynewlinecharacters.Ifwewriteastringoverseverallinesthenewlinesarecontainedinthestring.

'multilinestring'

Andifthe-Koptionisgiventotherubycommand,multibytestringswillbeaccepted.AtpresentthethreeencodingsEUC-JP(-Ke),Shift

JIS(-Ks),andUTF8(-Ku)canbespecified.

'「漢字が通る」と「マルチバイト⽂字が通る」はちょっと違う'#'There'salittledifferencebetween"Kanjiareaccepted"and"Multibytecharactersareaccepted".'

DoubleQuotedStrings"string"#「string」"\n"#newline"\x0f"#abytegiveninhexadecimalform"page#{n}.html"#embeddingacommand

Withdoublequoteswecanusecommandexpansionandbackslashnotation.ThebackslashnotationissomethingclassicalthatisalsosupportedinC,forinstance,\nisanewline,\bisabackspace.InRuby,Ctrl-CandESCcanalsobeexpressed,that’sconvenient.However,merelylistingthewholenotationisnotfun,regardingitsimplementation,itjustmeansalargenumberofcasestobehandledandthere’snothingespeciallyinteresting.Therefore,theyareentirelyleftouthere.

Ontheotherhand,expressionexpansionisevenmorefantastic.WecanwriteanarbitraryRubyexpressioninside#{}anditwillbeevaluatedatruntimeandembeddedintothestring.Therearenolimitationslikeonlyonevariableoronlyonemethod.Gettingthisfar,itisnotamereliteralanymorebuttheentirethingcanbeconsideredasanexpressiontoexpressastring.

"embedded#{lvar}expression""embedded#{@ivar}expression"

"embedded#{1+1}expression""embedded#{method_call(arg)}expression""embedded#{"stringinstring"}expression"

Stringswith%%q(string)#sameas'string'%Q(string)#sameas"string"%(string)#sameas%Q(string)or"string"

Ifalotofseparatorcharactersappearinastring,escapingallofthembecomesaburden.Inthatcasetheseparatorcharacterscanbechangedbyusing%.Inthefollowingexample,thesamestringiswrittenasa"-stringand%-string.

"<ahref=\"http://i.loveruby.net#{path}\">"%Q(<ahref="http://i.loveruby.net#{path}">)

Thebothexpressionshasthesamelength,butthe%-oneisalotnicertolookat.Whenwehavemorecharacterstoescapeinit,%-stringwouldalsohaveadvantageinlength.

Herewehaveusedparenthesesasdelimiters,butsomethingelseisfine,too.Likebracketsorbracesor#.Almosteverysymbolisfine,even%.

%q#thisisstring#%q[thisisstring]%q%thisisstring%

HereDocuments

Heredocumentisasyntaxwhichcanexpressstringsspanningmultiplelines.Anormalstringstartsrightafterthedelimiter"andeverythinguntiltheending"wouldbethecontent.Whenusingheredocument,thelinesbetweenthelinewhichcontainsthestarting<<EOSandthelinewhichcontainstheendingEOSwouldbethecontent.

"thecharactersbetweenthestartingsymbolandtheendingsymbolwillbecomeastring."

<<EOSAlllinesbetweenthestartingandtheendinglineareinthisheredocumentEOS

HereweusedEOSasidentifierbutanywordisfine.Preciselyspeaking,allthecharactermatching[a-zA-Z_0-9]andmulti-bytecharacterscanbeused.

Thecharacteristicofheredocumentisthatthedelimitersare“thelinescontainingthestartingidentifierortheendingidentifier”.Thelinewhichcontainsthestartsymbolisthestartingdelimiter.Therefore,thepositionofthestartidentifierinthelineisnotimportant.Takingadvantageofthis,itdoesn’tmatterthat,forinstance,itiswritteninthemiddleofanexpression:

printf(<<EOS,count_n(str))count=%dEOS

Inthiscasethestring"count=%d\n"goesintheplaceof<<EOS.Soit’sthesameasthefollowing.

printf("count=%d\n",count_n(str))

Thepositionofthestartingidentifierisreallynotrestricted,butonthecontrary,therearestrictrulesfortheendingsymbol:Itmustbeatthebeginningofthelineandtheremustnotbeanotherletterinthatline.Howeverifwewritethestartsymbolwithaminuslikethis<<-EOSwecanindentthelinewiththeendsymbol.

<<-EOSItwouldbeconvenientifonecouldindentthecontentofaheredocument.Butthat'snotpossible.Ifyouwantthat,writingamethodtodeleteindentsisusuallyawaytogo.Butbewareoftabs.EOS

Furthermore,thestartsymbolcanbeenclosedinsingleordoublequotes.Thenthepropertiesofthewholeheredocumentchange.Whenwechange<<EOSto<<"EOS"wecanuseembeddedexpressionsandbackslashnotation.

<<"EOS"Onedayis#{24*60*60}seconds.Incredible.EOS

But<<'EOS'isnotthesameasasinglequotedstring.Itstartsthecompleteliteralmode.Everythingevenbackslashesgointothestringastheyaretyped.Thisisusefulforastringwhichcontains

manybackslashes.

InPart2,I’llexplainhowtoparseaheredocument.ButI’dlikeyoutotrytoguessitbefore.

CharactersRubystringsarebytesequences,therearenocharacterobjects.InsteadtherearethefollowingexpressionswhichreturntheintegerswhichcorrespondacertaincharacterinASCIIcode.

?a#theintegerwhichcorrespondsto"a"?.#theintegerwhichcorrespondsto"."?\n#LF?\C-a#Ctrl-a

RegularExpressions/regexp//^Content-Length:/i/正規表現//\/\*.*?\*\//m#AnexpressionwhichmatchesCcomments/reg#{1+1}exp/#thesameas/reg2exp/

Whatiscontainedbetweenslashesisaregularexpression.Regularexpressionsarealanguagetodesignatestringpatterns.Forexample

/abc/

Thisregularexpressionmatchesastringwherethere’sana

followedbyabfollowedbyac.Itmatches“abc”or“fffffffabc”or“abcxxxxx”.

Onecandesignatemorespecialpatterns.

/^From:/

Thismatchesastringwherethere’saFromfollowedbya:atthebeginningofaline.Thereareseveralmoreexpressionsofthiskind,suchthatonecancreatequitecomplexpatterns.

Theusesareinfinite:Changingthematchedparttoanotherstring,deletingthematchedpart,determiningifthere’sonematchandsoon…

Amoreconcreteusecasewouldbe,forinstance,extractingtheFrom:headerfromamail,orchangingthe\ntoan\r,orcheckingifastringlookslikeamailaddress.

Sincetheregularexpressionitselfisanindependentlanguage,ithasitsownparserandevaluatorwhicharedifferentfromruby.Theycanbefoundinregex.c.Hence,it’senoughforrubytobeabletocutouttheregularexpressionpartfromaRubyprogramandfeedit.Asaconsequence,theyaretreatedalmostthesameasstringsfromthegrammaticalpointofview.Almostallofthefeatureswhichstringshavelikeescapes,backslashnotationsandembeddedexpressionscanbeusedinthesamewayinregularexpressions.

However,wecansaytheyaretreatedasthesameasstringsonlywhenweareintheviewpointof“Ruby’ssyntax”.Asmentionedbefore,sinceregularexpressionitselfisalanguage,naturallywehavetofollowitslanguageconstraints.Todescriberegularexpressionindetail,it’ssolargethatonemorecanbewritten,soI’dlikeyoutoreadanotherbookforthissubject.Irecommend“MasteringRegularExpression”byJeffreyE.F.Friedl.

RegularExpressionswith%Alsoaswithstrings,regularexpressionsalsohaveasyntaxforchangingdelimiters.Inthiscaseitis%r.Tounderstandthis,lookingatsomeexamplesareenoughtounderstand.

%r(regexp)%r[/\*.*?\*/]#matchesaCcomment%r("(?:[^"\\]+|\\.)*")#matchesastringinC%r{reg#{1+1}exp}#embeddingaRubyexpression

ArraysAcomma-separatedlistenclosedinbrackets[]isanarrayliteral.

[1,2,3]['This','is','an','array','of','string']

[/regexp/,{'hash'=>3},4,'string',?\C-a]

lvar=$gvar=@ivar=@@cvar=nil[lvar,$gvar,@ivar,@@cvar][Object.new(),Object.new(),Object.new()]

Ruby’sarray(Array)isalistofarbitraryobjects.Fromasyntacticalstandpoint,it’scharacteristicisthatarbitraryexpressionscanbeelements.Asmentionedearlier,anarrayofhashesofregularexpressionscaneasilybemade.Notjustliteralsbutalsoexpressionswhichvariablesormethodcallscombinedtogethercanalsobewrittenstraightforwardly.

Notethatthisis“anexpressionwhichgeneratesanarrayobject”aswiththeotherliterals.

i=0whilei<5p([1,2,3].id)#Eachtimeanotherobjectidisshown.i+=1end

WordArraysWhenwritingscriptsoneusesarraysofstringsalot,hencethereisaspecialnotationonlyforarraysofstrings.Thatis%w.Withanexampleit’simmediatelyobvious.

%w(alphabetagammadelta)#['alpha','beta','gamma','delta']%w(⽉⽕⽔⽊⾦⼟⽇)%w(JanFebMarAprMayJunJulAugSepOctNovDec)

There’salso%Wwhereexpressionscanbeembedded.It’safeatureimplementedfairlyrecently.

n=5

%w(list0list#{n})#['list0','list#{n}']%W(list0list#{n})#['list0','list5']

Theauthorhasn’tcomeupwithagooduseof%Wyet.

HashesHashtablesaredatastructurewhichstoreaone-to-onerelationbetweenarbitraryobjects.Bywritingasfollows,theywillbeexpressionstogeneratetables.

{'key'=>'value','key2'=>'value2'}{3=>0,'string'=>5,['array']=>9}{Object.new()=>3,Object.new()=>'string'}

#Ofcoursewecanputitinseverallines.{0=>0,1=>3,2=>6}

Weexplainedhashesindetailinthethirdchapter“NamesandNametables”.Theyarefastlookuptableswhichallocatememoryslotsdependingonthehashvalues.InRubygrammar,bothkeysandvaluescanbearbitraryexpressions.

Furthermore,whenusedasanargumentofamethodcall,the{...}canbeomittedunderacertaincondition.

some_method(arg,key=>value,key2=>value2)#some_method(arg,{key=>value,key2=>value2})#sameasabove

Withthiswecanimitatenamed(keyword)arguments.

button.set_geometry('x'=>80,'y'=>'240')

Ofcourseinthiscaseset_geometrymustacceptahashasinput.Thoughrealkeywordargumentswillbetransformedintoparametervariables,it’snotthecaseforthisbecausethisisjusta“imitation”.

RangesRangeliteralsareoddballswhichdon’tappearinmostotherlanguages.HerearesomeexpressionswhichgenerateRangeobjects.

0..5#from0to5containing50...5#from0to5notcontaining51+2..9+0#from3to9containing9'a'..'z'#stringsfrom'a'to'z'containing'z'

Iftherearetwodotsthelastelementisincluded.Iftherearethreedotsitisnotincluded.Notonlyintegersbutalsofloatsandstringscanbemadeintoranges,evenarangebetweenarbitraryobjectscanbecreatedifyou’dattempt.However,thisisaspecificationofRangeclass,whichistheclassofrangeobjects,(itmeansalibrary),thisisnotamatterofgrammar.Fromtheparser’sstandpoint,itjustenablestoconcatenatearbitraryexpressionswith...Ifarangecannotbegeneratedwiththeobjectsastheevaluatedresults,itwouldbearuntimeerror.

Bytheway,becausetheprecedenceof..and...isquitelow,

sometimesitisinterpretedinasurprisingway.

1..5.to_a()#1..(5.to_a())

IthinkmypersonalityisrelativelybentforRubygrammar,butsomehowIdon’tlikeonlythisspecification.

SymbolsInPart1,wetalkedaboutsymbolsatlength.It’ssomethingcorrespondsone-to-onetoanarbitrarystring.InRubysymbolsareexpressedwitha:infront.

:identifier:abcde

Theseexamplesareprettynormal.Actually,besidesthem,allvariablenamesandmethodnamescanbecomesymbolswitha:infront.Likethis:

:$gvar:@ivar:@@cvar:CONST

Moreover,thoughwehaven’ttalkedthisyet,[]orattr=canbeusedasmethodnames,sonaturallytheycanalsobeusedassymbols.

:[]:attr=

Whenoneusesthesesymbolsasvaluesinanarray,it’lllookquitecomplicated.

NumericalValuesThisistheleastinteresting.OnepossiblethingIcanintroducehereisthat,whenwritingamillion,

1_000_000

aswrittenabove,wecanuseunderscoredelimitersinthemiddle.Buteventhisisn’tparticularlyinteresting.Fromhereoninthisbook,we’llcompletelyforgetaboutnumericalvalues.

Methods

Let’stalkaboutthedefinitionandcallingofmethods.

DefinitionandCallsdefsome_method(arg)....end

classCdefsome_method(arg)....end

end

Methodsaredefinedwithdef.Iftheyaredefinedattopleveltheybecomefunctionstylemethods,insideaclasstheybecomemethodsofthisclass.Tocallamethodwhichwasdefinedinaclass,oneusuallyhastocreateaninstancewithnewasshownbelow.

C.new().some_method(0)

TheReturnValueofMethodsThereturnvalueofamethodis,ifareturnisexecutedinthemiddle,itsvalue.Otherwise,it’sthevalueofthestatementwhichwasexecutedlast.

defone()#1isreturnedreturn1999end

deftwo()#2isreturned9992end

defthree()#3isreturnediftruethen3else999endend

Ifthemethodbodyisempty,itwouldautomaticallybenil,andanexpressionwithoutavaluecannotputattheend.Henceeverymethodhasareturnvalue.

OptionalArgumentsOptionalargumentscanalsobedefined.Ifthenumberofargumentsdoesn’tsuffice,theparametersareautomaticallyassignedtodefaultvalues.

defsome_method(arg=9)#defaultvalueis9pargend

some_method(0)#0isshown.some_method()#Thedefaultvalue9isshown.

Therecanalsobeseveraloptionalarguments.Butinthatcasetheymustallcomeattheendoftheargumentlist.Ifelementsinthemiddleofthelistwereoptional,howthecorrespondencesoftheargumentswouldbeveryunclear.

defright_decl(arg1,arg2,darg1=nil,darg2=nil)....end

#Thisisnotpossibledefwrong_decl(arg,default=nil,arg2)#Amiddleargumentcannotbeoptional....end

Omittingargumentparentheses

Infact,theparenthesesofamethodcallcanbeomitted.

puts'Hello,World!'#puts("Hello,World")obj=Object.new#obj=Object.new()

InPythonwecangetthemethodobjectbyleavingoutparentheses,butthereisnosuchthinginRuby.

Ifyou’dliketo,youcanomitmoreparentheses.

puts(File.basenamefname)#puts(File.basename(fname))sameastheabove

Ifwelikewecanevenleaveoutmore

putsFile.basenamefname#puts(File.basename(fname))sameastheabove

However,recentlythiskindof“nestedomissions”becameacauseofwarnings.It’slikelythatthiswillnotpassanymoreinRuby2.0.

Actuallyeventheparenthesesoftheparametersdefinitioncanalsobeomitted.

defsome_methodparam1,param2,param3end

defother_method#withoutarguments...weseethisalotend

Parenthesesareoftenleftoutinmethodcalls,butleavingoutparenthesesinthedefinitionisnotverypopular.Howeverifthere

arenoarguments,theparenthesesarefrequentlyomitted.

ArgumentsandListsBecauseArgumentsformalistofobjects,there’snothingoddifwecandosomethingconverse:extractingalist(anarray)asarguments,asthefollowingexample.

defdelegate(a,b,c)p(a,b,c)end

list=[1,2,3]delegate(*list)#identicaltodelegate(1,2,3)

Inthiswaywecandistributeanarrayintoarguments.Let’scallthisdevicea*argumentnow.Hereweusedalocalvariablefordemonstration,butofcoursethereisnolimitation.Wecanalsodirectlyputaliteraloramethodcallinstead.

m(*[1,2,3])#Wecouldhavewrittentheexpandedforminthefirstplace...m(*mcall())

The*argumentcanbeusedtogetherwithordinaryarguments,butthe*argumentmustcomelast.Otherwise,thecorrespondencestoparametervariablescannotbedeterminedinasingleway.

Inthedefinitionontheotherhandwecanhandletheargumentsinbulkwhenweputa*infrontoftheparametervariable.

defsome_method(*args)

pargsend

some_method()#prints[]some_method(0)#prints[0]some_method(0,1)#prints[0,1]

Thesurplusargumentsaregatheredinanarray.Onlyone*parametercanbedeclared.Itmustalsocomeafterthedefaultarguments.

defsome_method0(arg,*rest)enddefsome_method1(arg,darg=nil,*rest)end

Ifwecombinelistexpansionandbulkreceptiontogether,theargumentsofonemethodcanbepassedasawholetoanothermethod.Thismightbethemostpracticaluseofthe*parameter.

#amethodwhichpassesitsargumentstoother_methoddefdelegate(*args)other_method(*args)end

defother_method(a,b,c)returna+b+cend

delegate(0,1,2)#sameasother_method(0,1,2)delegate(10,20,30)#sameasother_method(10,20,30)

VariousMethodCallExpressionsBeingjustasinglefeatureas‘methodcall’doesnotmeanits

representationisalsosingle.Hereisaboutso-calledsyntacticsugar.InRubythereisatonofit,andtheyarereallyattractiveforapersonwhohasafetishforparsers.Forinstancetheexamplesbelowareallmethodcalls.

1+2#1.+(2)a==b#a.==(b)~/regexp/#/regexp/.~obj.attr=val#obj.attr=(val)obj[i]#obj.[](i)obj[k]=v#obj.[]=(k,v)<code>cvsdiffabstract.rd</code>#Kernel.`('cvsdiffabstract.rd')

It’shardtobelieveuntilyougetusedtoit,butattr=,[]=,\`are(indeed)allmethodnames.Theycanappearasnamesinamethoddefinitionandcanalsobeusedassymbols.

classCdef[](index)enddef+(another)endendp(:attr=)p(:[]=)p(:`)

Astherearepeoplewhodon’tlikesweets,therearealsomanypeoplewhodislikesyntacticsugar.Maybetheyfeelunfairwhenthethingswhichareessentiallythesameappearinfakedlooks.(Why’severyonesoserious?)

Let’sseesomemoredetails.

SymbolAppendicesobj.name?obj.name!

Firstasmallthing.It’sjustappendinga?ora!.CallandDefinitiondonotdiffer,soit’snottoopainful.Thereareconventionforwhattousethesemethodnames,butthereisnoenforcementonlanguagelevel.It’sjustaconventionathumanlevel.ThisisprobablyinfluencedfromLispinwhichagreatvarietyofcharacterscanbeusedinprocedurenames.

BinaryOperators1+2#1.+(2)

BinaryOperatorswillbeconvertedtoamethodcalltotheobjectonthelefthandside.Herethemethod+fromtheobject1iscalled.Aslistedbelowtherearemanyofthem.Therearethegeneraloperators+and-,alsotheequivalenceoperator==andthespaceshipoperator`<=>’asinPerl,allsorts.Theyarelistedinorderoftheirprecedence.

***/%+-<<>>&|^>>=<<=<=>======~

Thesymbols&and|aremethods,butthedoublesymbols&&and||arebuilt-inoperators.RememberhowitisinC.

UnaryOperators+2-1.0~/regexp/

Thesearetheunaryoperators.Thereareonlythreeofthem:+-~.+and-workastheylooklike(bydefault).Theoperator~matchesastringoraregularexpressionwiththevariable$_.Withanintegeritstandsforbitconversion.

Todistinguishtheunary+fromthebinary+themethodnamesfortheunaryoperatorsare+@and-@respectively.Ofcoursetheycanbecalledbyjustwriting+nor-n.

((errata:+or–astheprefixofanumericliteralisactuallyscannedasapartoftheliteral.Thisisakindofoptimizations.))

AttributeAssignmentobj.attr=val#obj.attr=(val)

Thisisanattributeassignmentfashion.Theabovewillbetranslatedintothemethodcallattr=.Whenusingthistogetherwithmethodcallswhoseparenthesesareomitted,wecanwritecodewhichlookslikeattributeaccess.

classCdefi()@iend#Wecanwritethedefinitioninonelinedefi=(n)@i=nendend

c=C.newc.i=99pc.i#prints99

Howeveritwillturnoutbotharemethodcalls.Theyaresimilartoget/setpropertyinDelphiorslotaccessorsinCLOS.

Besides,wecannotdefineamethodsuchasobj.attr(arg)=,whichcantakeanotherargumentintheattributeassignmentfashion.

IndexNotationobj[i]#obj.[](i)

Theabovewillbetranslatedintoamethodcallfor[].Arrayandhashaccessarealsoimplementedwiththisdevice.

obj[i]=val#obj.[]=(i,val)

Indexassignmentfashion.Thisistranslatedintoacallforamethodnamed[]=.

super

Werelativelyoftenhaveasituationwherewewantaddalittlebittothebehaviourofanalreadyexistingmethodratherthan

replacingit.Hereamechanismtocallamethodofthesuperclasswhenoverwritingamethodisrequired.InRuby,that’ssuper.

classAdeftestputs'inA'endendclassB<Adeftestsuper#invokesA#testendend

Ruby’ssuperdiffersfromtheoneinJava.Thissinglewordmeans“callthemethodwiththesamenameinthesuperclass”.superisareservedword.

Whenusingsuper,becarefulaboutthedifferencebetweensuperwithnoargumentsandsuperwhoseargumentsareomitted.Thesuperwhoseargumentsareomittedpassesallthegivenparametervariables.

classAdeftest(*args)pargsendend

classB<Adeftest(a,b,c)#superwithnoargumentssuper()#shows[]

#superwithomittedarguments.Sameresultassuper(a,b,c)super#shows[1,2,3]

endend

B.new.test(1,2,3)

VisibilityInRuby,evenwhencallingthesamemethod,itcanbeorcannotbecalleddependingonthelocation(meaningtheobject).Thisfunctionalityisusuallycalled“visibility”(whetheritisvisible).InRuby,thebelowthreetypesofmethodscanbedefined.

public

private

protected

publicmethodscanbecalledfromanywhereinanyform.privatemethodscanonlybecalledinaform“syntactically”withoutareceiver.Ineffecttheycanonlybecalledbyinstancesoftheclassinwhichtheyweredefinedandininstancesofitssubclass.protectedmethodscanonlybecalledbyinstancesofthedefiningclassanditssubclasses.Itdiffersfromprivatethatmethodscanstillbecalledfromotherinstancesofthesameclass.

ThetermsarethesameasinC++butthemeaningisslightlydifferent.Becareful.

Usuallywecontrolvisibilityasshownbelow.

classC

publicdefa1()end#becomespublicdefa2()end#becomespublic

privatedefb1()end#becomesprivatedefb2()end#becomesprivate

protecteddefc1()end#becomesprotecteddefc2()end#becomesprotectedend

Herepublic,privateand`protectedaremethodcallswithoutparentheses.Thesearen’tevenreservedwords.

publicandprivatecanalsobeusedwithanargumenttosetthevisibilityofaparticularmethod.Butitsmechanismisnotinteresting.We’llleavethisout.

ModulefunctionsGivenamodule‘M’.Iftherearetwomethodswiththeexactsamecontent

M.method_name

M#method_name(Visibilityisprivate)

thenwecallthisamodulefunction.

Itisnotapparentwhythisshouldbeuseful.Butlet’slookatthenextexamplewhichishappilyused.

Math.sin(5)#Ifusedforafewtimesthisismoreconvenient

includeMathsin(5)#Ifusedmoreoftenthisismorepractical

It’simportantthatbothfunctionshavethesamecontent.Withadifferentselfbutwiththesamecodethebehaviorshouldstillbethesame.Instancevariablesbecomeextremelydifficulttouse.Hencesuchmethodisverylikelyamethodinwhichonlyproceduresarewritten(likesin).That’swhytheyarecalledmodule“functions”.

Iterators

Ruby’siteratorsdifferabitfromJava’sorC++’siteratorclassesor‘Iterator’designpattern.Preciselyspeaking,thoseiteratorsarecalledexterioriterators,Ruby’siteratorsareinterioriterators.Regardingthis,it’sdifficulttounderstandfromthedefinitionsolet’sexplainitwithaconcreteexample.

arr=[0,2,4,6.8]

Thisarrayisgivenandwewanttoaccesstheelementsinorder.InCstylewewouldwritethefollowing.

i=0whilei<arr.lengthprintarr[i]

i+=1end

Usinganiteratorwecanwrite:

arr.eachdo|item|printitemend

Everythingfromeachdotoendisthecalltoaniteratormethod.Morepreciselyeachistheiteratormethodandbetweendoandendistheiteratorblock.Thepartbetweentheverticalbarsarecalledblockparameters,whichbecomevariablestoreceivetheparameterspassedfromtheiteratormethodtotheblock.

Sayingitalittleabstractly,aniteratorissomethinglikeapieceofcodewhichhasbeencutoutandpassed.Inourexamplethepieceprintitemhasbeencutoutandispassedtotheeachmethod.Theneachtakesalltheelementsofthearrayinorderandpassesthemtothecutoutpieceofcode.

Wecanalsothinktheotherwayround.Theotherpartsexceptprintitemarebeingcutoutandenclosedintotheeachmethod.

i=0whilei<arr.lengthprintarr[i]i+=1end

arr.eachdo|item|printitemend

Comparisonwithhigherorderfunctions

WhatcomesclosestinCtoiteratorsarefunctionswhichreceivefunctionpointers,itmeanshigherorderfunctions.ButtherearetwopointsinwhichiteratorsinRubyandhigherorderfunctionsinCdiffer.

Firstly,Rubyiteratorscanonlytakeoneblock.Forinstancewecan’tdothefollowing.

#Mistake.Severalblockscannotbepassed.array_of_array.eachdo|i|....enddo|j|....end

Secondly,Ruby’sblockscansharelocalvariableswiththecodeoutside.

lvar='ok'[0,1,2].eachdo|i|plvar#Canacceslocalvariableoutsidetheblock.end

That’swhereiteratorsareconvenient.

Butvariablescanonlybesharedwiththeoutside.Theycannotbesharedwiththeinsideoftheiteratormethod(e.g.each).Puttingit

intuitively,onlythevariablesintheplacewhichlooksofthesourcecodecontinuedarevisible.

BlockLocalVariablesLocalvariableswhichareassignedinsideablockstaylocaltothatblock,itmeanstheybecomeblocklocalvariables.Let’scheckitout.

[0].eachdoi=0pi#0end

Fornow,tocreateablock,weapplyeachonanarrayoflength1(Wecanfullyleaveouttheblockparameter).Inthatblock,theivariableisfirstassigned..meaningdeclared.Thismakesiblocklocal.

Itissaidblocklocal,soitshouldnotbeabletoaccessfromtheoutside.Let’stestit.

%ruby-e'[0].eachdoi=0endpi#Hereoccursanerror.'-e:5:undefinedlocalvariableormethod`i'for#<Object:0x40163a9c>(NameError)

Whenwereferencedablocklocalvariablefromoutsidetheblock,

surelyanerroroccured.Withoutadoubtitstayedlocaltotheblock.

Iteratorscanalsobenestedrepeatedly.Eachtimethenewblockcreatesanotherscope.

lvar=0[1].eachdovar1=1[2].eachdovar2=2[3].eachdovar3=3#Herelvar,var1,var2,var3canbeseenend#Herelvar,var1,var2canbeseenend#Herelvar,var1canbeseenend#Hereonlylvarcanbeseen

There’sonepointwhichyouhavetokeepinmind.Differingfromnowadays’majorlanguagesRuby’sblocklocalvariablesdon’tdoshadowing.ShadowingmeansforinstanceinCthatinthecodebelowthetwodeclaredvariablesiaredifferent.

{inti=3;printf("%d\n",i);/*3*/{inti=99;printf("%d\n",i);/*99*/}printf("%d\n",i);/*3(元に戻った)*/}

Insidetheblocktheiinsideovershadowstheioutside.That’swhyit’scalledshadowing.

ButwhathappenswithblocklocalvariablesofRubywherethere’snoshadowing.Let’slookatthisexample.

i=0pi#0[0].eachdoi=1pi#1endpi#1thechangeispreserved

Evenwhenweassigniinsidetheblock,ifthereisthesamenameoutside,itwouldbeused.Thereforewhenweassigntoinsidei,thevalueofoutsideiwouldbechanged.Onthispointtherecamemanycomplains:“Thisiserrorprone.Pleasedoshadowing.”Eachtimethere’snearlyflamingbuttillnownoconclusionwasreached.

ThesyntaxofiteratorsTherearesomesmallertopicsleft.

First,therearetwowaystowriteaniterator.Oneisthedo~endasusedabove,theotheroneistheenclosinginbraces.Thetwoexpressionsbelowhaveexactlythesamemeaning.

arr.eachdo|i|putsiend

arr.each{|i|#Theauthorlikesafourspaceindentationforputsi#aniteratorwithbraces.}

Butgrammaticallytheprecedenceisdifferent.Thebracesbindmuchstrongerthando~end.

mmdo....end#m(m)do....endmm{....}#m(m(){....})

Anditeratorsaredefinitelymethods,sotherearealsoiteratorsthattakearguments.

re=/^\d/#regularexpressiontomatchadigitatthebeginningoftheline$stdin.grep(re)do|line|#lookrepeatedlyforthisregularexpression....end

yield

Ofcourseuserscanwritetheirowniterators.Methodswhichhaveayieldintheirdefinitiontextareiterators.Let’strytowriteaniteratorwiththesameeffectasArray#each:

#addingthedefinitiontotheArrayclassclassArraydefmy_eachi=0whilei<self.lengthyieldself[i]i+=1endendend

#thisistheoriginaleach[0,1,2,3,4].eachdo|i|piend

#my_eachworksthesame[0,1,2,3,4].my_eachdo|i|piend

yieldcallstheblock.Atthispointcontrolispassedtotheblock,whentheexecutionoftheblockfinishesitreturnsbacktothesamelocation.Thinkaboutitlikeacharacteristicfunctioncall.Whenthepresentmethoddoesnothaveablockaruntimeerrorwilloccur.

%ruby-e'[0,1,2].each'-e:1:in`each':noblockgiven(LocalJumpError)from-e:1

Proc

Isaid,thatiteratorsarelikecutoutcodewhichispassedasanargument.Butwecanevenmoredirectlymakecodetoanobjectandcarryitaround.

twice=Proc.new{|n|n*2}ptwice.call(9)#18willbeprinted

Inshort,itislikeafunction.Asmightbeexpectedfromthefactitiscreatedwithnew,thereturnvalueofProc.newisaninstanceoftheProcclass.

Proc.newlookssurelylikeaniteratoranditisindeedso.Itisanordinaryiterator.There’sonlysomemysticmechanisminsideProc.newwhichturnsaniteratorblockintoanobject.

BesidesthereisafunctionstylemethodlambdaprovidedwhichhasthesameeffectasProc.new.Choosewhateversuitsyou.

twice=lambda{|n|n*2}

IteratorsandProcWhydidwestarttalkingallofasuddenaboutProc?BecausethereisadeeprelationshipbetweeniteratorsandProc.Infact,iteratorblocksandProcobjectsarequitethesamething.That’swhyonecanbetransformedintotheother.

First,toturnaniteratorblockintoaProcobjectonehastoputan&infrontoftheparametername.

defprint_block(&block)pblockend

print_block()doend#Showssomethinglike<Proc:0x40155884>print_block()#Withoutablocknilisprinted

Withan&infrontoftheargumentname,theblockistransformedtoaProcobjectandassignedtothevariable.Ifthemethodisnotaniterator(there’snoblockattached)nilisassigned.

Andintheotherdirection,ifwewanttopassaProctoaniteratorwealsouse&.

block=Proc.new{|i|pi}[0,1,2].each(&block)

Thiscodemeansexactlythesameasthecodebelow.

[0,1,2].each{|i|pi}

Ifwecombinethesetwo,wecandelegateaniteratorblocktoamethodsomewhereelse.

defeach_item(&block)[0,1,2].each(&block)end

each_itemdo|i|#sameas[0,1,2].eachdo|i|piend

Expressions

“Expressions”inRubyarethingswithwhichwecancreateotherexpressionsorstatementsbycombiningwiththeothers.Forinstanceamethodcallcanbeanothermethodcall’sargument,soitisanexpression.Thesamegoesforliterals.Butliteralsandmethodcallsarenotalwayscombinationsofelements.Onthe

contrary,“expressions”,whichI’mgoingtointroduce,alwaysconsistsofsomeelements.

if

Weprobablydonotneedtoexplaintheifexpression.Iftheconditionalexpressionistrue,thebodyisexecuted.AsexplainedinPart1,everyobjectexceptnilandfalseistrueinRuby.

ifcond0then....elsifcond1then....elsifcond2then....else....end

elsif/else-clausescanbeomitted.Eachthenaswell.Buttherearesomefinerrequirementsconcerningthen.Forthiskindofthing,lookingatsomeexamplesisthebestwaytounderstand.HereonlythingI’dsayisthatthebelowcodesarevalid.

#1#4ifcondthen.....endifcondthen....end#2ifcond;....end#5ifcond#3thenifcondthen;....end....end

AndinRuby,ifisanexpression,sothereisthevalueoftheentireifexpression.Itisthevalueofthebodywhereaconditionexpressionismet.Forexample,iftheconditionofthefirstifistrue,thevaluewouldbetheoneofitsbody.

p(iftruethen1else2end)#=>1p(iffalsethen1else2end)#=>2p(iffalsethen1elsiftruethen2else3end)#=>2

Ifthere’snomatch,orthematchedclauseisempty,thevaluewouldbenil.

p(iffalsethen1end)#=>nilp(iftruethenend)#=>nil

unless

Anifwithanegatedconditionisanunless.Thefollowingtwoexpressionshavethesamemeaning.

unlesscondthenifnot(cond)then........endend

unlesscanalsohaveattachedelseclausesbutanyelsifcannotbeattached.Needlesstosay,thencanbeomitted.

unlessalsohasavalueanditsconditiontodecideiscompletelythesameasif.Itmeanstheentirevaluewouldbethevalueofthebodyofthematchedclause.Ifthere’snomatchorthematched

clauseisempty,thevaluewouldbenil.

and&&or||

Themostlikelyutilizationoftheandisprobablyabooleanoperation.Forinstanceintheconditionalexpressionofanif.

ifcond1andcond2puts'ok'end

ButasinPerl,shorLisp,itcanalsobeusedasaconditionalbranchexpression.Thetwofollowingexpressionshavethesamemeaning.

ifinvalid?(key)invalid?(key)andreturnnilreturnnilend

&&andandhavethesamemeaning.Differentisthebindingorder.

methodarg0&&arg1#method(arg0&&arg1)methodarg0andarg1#method(arg0)andarg1

Basicallythesymbolicoperatorcreatesanexpressionwhichcanbeanargument(arg).Thealphabeticaloperatorcreatesanexpressionwhichcannotbecomeanargument(expr).

Asforand,iftheevaluationofthelefthandsideistrue,therighthandsidewillalsobeevaluated.

Ontheotherhandoristheoppositeofand.Iftheevaluationofthe

lefthandsideisfalse,therighthandsidewillalsobeevaluated.

valid?(key)orreturnnil

orand||havethesamerelationshipas&&andand.Onlytheprecedenceisdifferent.

TheConditionalOperatorThereisaconditionaloperatorsimilartoC:

cond?iftrue:iffalse

Thespacebetweenthesymbolsisimportant.Iftheybumptogetherthefollowingweirdnesshappens.

cond?iftrue:iffalse#cond?(iftrue(:iffalse))

Thevalueoftheconditionaloperatoristhevalueofthelastexecutedexpression.Eitherthevalueofthetruesideorthevalueofthefalseside.

whileuntil

Here’sawhileexpression.

whileconddo....end

Thisisthesimplestloopsyntax.Aslongascondistruethebodyisexecuted.Thedocanbeomitted.

untilio_ready?(id)dosleep0.5end

untilcreatesaloopwhoseconditiondefinitionisopposite.Aslongastheconditionisfalseitisexecuted.Thedocanbeomitted.

Naturallythereisalsojumpsyntaxestoexitaloop.breakasinC/C++/Javaisalsobreak,butcontinueisnext.PerhapsnexthascomefromPerl.

i=0whiletrueifi>10break#exittheloopelsifi%2==0i*=2next#nextloopiterationendi+=1end

AndthereisanotherPerlism:theredo.

whilecond#(A)....redo....end

Itwillreturnto(A)andrepeatfromthere.Whatdiffersfromnextisitdoesnotcheckthecondition.

Imightcomeintotheworldtop100,iftheamountofRubyprogramswouldbecounted,butIhaven’tusedredoyet.ItdoesnotseemtobenecessaryafterallbecauseI’velivedhappilydespiteofit.

case

Aspecialformoftheifexpression.Itperformsbranchingonaseriesofconditions.Thefollowingleftandrightexpressionsareidenticalinmeaning.

casevaluewhencond1thenifcond1===value........whencond2thenelsifcond2===value........whencond3,cond4thenelsifcond3===valueorcond4===value........elseelse........endend

Thethreefoldequals===is,asthesameasthe==,actuallyamethodcall.Noticethatthereceiveristheobjectonthelefthandside.Concretely,ifitisthe===ofanArray,itwouldcheckifitcontainsthevalueasitselement.IfitisaHash,ittestswhetherithasthevalueasitskey.Ifitsisanregularexpression,ittestsifthevaluematches.Andsoon.Sincecasehasmanygrammaticalelements,to

listthemallwouldbetedious,thuswewillnotcovertheminthisbook.

ExceptionsThisisacontrolstructurewhichcanpassovermethodboundariesandtransmiterrors.ReaderswhoareacquaintedtoC++orJavawillknowaboutexceptions.Rubyexceptionsarebasicallythesame.

InRubyexceptionscomeintheformofthefunctionstylemethodraise.raiseisnotareservedword.

raiseArgumentError,"wrongnumberofargument"

InRubyexceptionareinstancesoftheExceptionclassandit’ssubclasses.Thisformtakesanexceptionclassasitsfirstargumentandanerrormessageasitssecondargument.IntheabovecaseaninstanceofArgumentErroriscreatedand“thrown”.Exceptionobjectwouldditchthepartaftertheraiseandstarttoreturnupwardsthemethodcallstack.

defraise_exceptionraiseArgumentError,"wrongnumberofargument"#thecodeaftertheexceptionwillnotbeexecutedputs'afterraise'endraise_exception()

Ifnothingblockstheexceptionitwillmoveonandonandfinallyit

willreachthetoplevel.Whenthere’snoplacetoreturnanymore,rubygivesoutamessageandendswithanon-zeroexitcode.

%rubyraise.rbraise.rb:2:in`raise_exception':wrongnumberofargument(ArgumentError)fromraise.rb:7

Howeveranexitwouldbesufficientforthis,andforanexceptionthereshouldbeawaytosethandlers.InRuby,begin~rescue~endisusedforthis.Itresemblesthetry~catchinC++andJava.

defraise_exceptionraiseArgumentError,"wrongnumberofargument"end

beginraise_exception()rescueArgumentError=>errthenputs'exceptioncatched'perrend

rescueisacontrolstructurewhichcapturesexceptions,itcatchesexceptionobjectsofthespecifiedclassanditssubclasses.Intheaboveexample,aninstanceofArgumentErrorcomesflyingintotheplacewhereArgumentErroristargeted,soitmatchesthisrescue.By=>errtheexceptionobjectwillbeassignedtothelocalvariableerr,afterthattherescuepartisexecuted.

%rubyrescue.rbexceptioncatched#<ArgumentError:wrongnumberofargument>

Whenanexceptionisrescued,itwillgothroughtherescueanditwillstarttoexecutethesubsequentasifnothinghappened,butwecanalsomakeitretryfromthebegin.Todoso,retryisused.

begin#theplacetoreturn....rescueArgumentError=>errthenretry#retryyourlifeend

Wecanomitthe=>errandthethenafterrescue.Wecanalsoleaveouttheexceptionclass.Inthiscase,itmeansasthesameaswhentheStandardErrorclassisspecified.

Ifwewanttocatchmoreexceptionclasses,wecanjustwritetheminline.Whenwewanttohandledifferenterrorsdifferently,wecanspecifyseveralrescueclauses.

beginraiseIOError,'portnotready'rescueArgumentError,TypeErrorrescueIOErrorrescueNameErrorend

Whenwritteninthisway,arescueclausethatmatchestheexceptionclassissearchedinorderfromthetop.Onlythematchedclausewillbeexecuted.Forinstance,onlytheclauseofIOErrorwillbeexecutedintheabovecase.

Ontheotherhand,whenthereisanelseclause,itisexecutedonlywhenthereisnoexception.

beginnil#OfcourseherewillnoerroroccurrescueArgumentError#Thispartwillnotbeexecutedelse#Thispartwillbeexecutedend

Moreoveranensureclausewillbeexecutedineverycase:whenthereisnoexception,whenthereisanexception,rescuedornot.

beginf=File.open('/etc/passwd')#dostuffensure#thispartwillbeexecutedanywayf.closeend

Bytheway,thisbeginexpressionalsohasavalue.Thevalueofthewholebegin~endexpressionisthevalueofthepartwhichwasexecutedlastamongbegin/rescue/elseclauses.Itmeansthelaststatementoftheclausesasidefromensure.Thereasonwhytheensureisnotcountedisprobablybecauseensureisusuallyusedforcleanup(thusitisnotamainline).

VariablesandConstantsReferringavariableoraconstant.Thevalueistheobjectthevariablepointsto.Wealreadytalkedintoomuchdetailaboutthevariousbehaviors.

lvar@ivar@@cvarCONST$gvar

Iwanttoaddonemorething.Amongthevariablesstartingwith$,therearespecialkinds.Theyarenotnecessarilyglobalvariablesandsomehavestrangenames.

FirstthePerlishvariables$_and$~.$_savesthereturnvalueofgetsandothermethods,$~containsthelastmatchofaregularexpression.Theyareincrediblevariableswhicharelocalvariablesandsimultaneouslythreadlocalvariables.

Andthe$!toholdtheexceptionobjectwhenanerrorisoccured,the$?toholdthestatusofachildprocess,the$SAFEtorepresentthesecuritylevel,theyareallthreadlocal.

AssignmentVariableassignmentsareallperformedby=.Allvariablesaretypeless.Whatissavedisareferencetoanobject.Asitsimplementation,itwasaVALUE(pointer).

var=1obj=Object.new@ivar='string'@@cvar=['array']PI=3.1415926535$gvar={'key'=>'value'}

However,asmentionedearlierobj.attr=valisnotanassignmentbutamethodcall.

SelfAssignmentvar+=1

ThissyntaxisalsoinC/C++/Java.InRuby,

var=var+1

itisashortcutofthiscode.DifferingfromC,theRuby+isamethodandthuspartofthelibrary.InC,thewholemeaningof+=isbuiltinthelanguageprocessoritself.AndinC++,+=and*=canbewhollyoverwritten,butwecannotdothisinRuby.InRuby+=isalwaysdefinedasanoperationofthecombinationof+andassignment.

Wecanalsocombineselfassignmentandanattribute-access-flavormethod.Theresultmorelookslikeanattribute.

classCdefi()@iend#Amethoddefinitioncanbewritteninoneline.defi=(n)@i=nendend

obj=C.newobj.i=1obj.i+=2#obj.i=obj.i+2pobj.i#3

Ifthereis+=theremightalsobe++butthisisnotthecase.Whyisthatso?InRubyassignmentisdealtwithonthelanguagelevel.Butontheotherhandmethodsareinthelibrary.Keepingthesetwo,theworldofvariablesandtheworldofobjects,strictlyapartisanimportantpeculiarityofRuby.If++wereintroducedtheseparationmighteasilybebroken.That’swhythere’sno++

Somepeopledon’twanttogowithoutthebrevityof++.Ithasbeenproposedagainandagaininthemailinglistbutwasalwaysturneddown.Iamalsoinfavorof++butnotasmuchasIcan’tdowithout,andIhavenotfeltsomuchneedsof++inRubyinthefirstplace,soI’vekeptsilentanddecidedtoforgetaboutit.

defined?

defined?isasyntaxofaquitedifferentcolorinRuby.Ittellswhetheranexpressionvalueis“defined”ornotatruntime.

var=1defined?(var)#=>true

Inotherwordsittellswhetheravaluecanbeobtainedfromtheexpressionreceivedasitsargument(isitokaytocallitso?)whentheexpressionisevaluated.Thatsaidbutofcourseyoucan’twriteanexpressioncausingaparseerror,anditcouldnotdetectiftheexpressionissomethingcontainingamethodcallwhichraisesanerrorinit.

Iwouldhavelovedtotellyoumoreaboutdefined?butitwillnot

appearagaininthisbook.Whatapity.

Statements

Astatementiswhatbasicallycannotbecombinedwiththeothersyntaxes,inotherwords,theyarelinedvertically.

Butitdoesnotmeanthere’snoevaluatedvalue.Forinstancetherearereturnvaluesforclassdefinitionstatementsandmethoddefinitionstatements.Howeverthisisrarelyrecommendedandisn’tuseful,you’dbetterregardthemlightlyinthisway.Herewealsoskipaboutthevalueofeachstatement.

TheEndingofastatementUptonowwejustsaid“Fornowoneline’sonestatement”.ButRuby’sstatementending’saren’tthatstraightforward.

FirstastatementcanbeendedexplicitlywithasemicolonasinC.Ofcoursethenwecanwritetwoandmorestatementsinoneline.

puts'Hello,World!';puts'Hello,Worldoncemore!'

Ontheotherhand,whentheexpressionapparentlycontinues,suchasjustafteropenedparentheses,dyadicoperators,oracomma,thestatementcontinuesautomatically.

#1+3*method(6,7+8)1+3*method(6,7+8)

Butit’salsototallynoproblemtouseabackslashtoexplicitlyindicatethecontinuation.

p1+\2

TheModifiersifandunlessTheifmodifierisanirregularversionofthenormalifTheprogramsontheleftandrightmeanexactlythesame.

on_true()ifcondifcondon_true()end

Theunlessisthenegativeversion.Guardstatements(statementswhichexcludeexceptionalconditions)canbeconvenientlywrittenwithit.

TheModifierswhileanduntilwhileanduntilalsohaveabacknotation.

process()whilehave_content?sleep(1)untilready?

Combiningthiswithbeginandendgivesado-while-looplikeinC.

beginres=get_response(id)endwhileneed_continue?(res)

ClassDefinitionclassC<SuperClass....end

DefinestheclassCwhichinheritsfromSuperClass

WetalkedquiteextensivelyaboutclassesinPart1.Thisstatementwillbeexecuted,theclasstobedefinedwillbecomeselfwithinthestatement,arbitraryexpressionscanbewrittenwithin.Classdefinitionscanbenested.TheyformthefoundationofRubyexecutionimage.

MethodDefinitiondefm(arg)end

I’vealreadywrittenaboutmethoddefinitionandwon’taddmore.Thissectionisputtomakeitclearthattheyalsobelongtostatements.

SingletonmethoddefinitionWealreadytalkedalotaboutsingletonmethodsinPart1.Theydonotbelongtoclassesbuttoobjects,infact,theybelongtosingletonclasses.Wedefinesingletonmethodsbyputtingthereceiverinfrontofthemethodname.Parameterdeclarationisdonethesamewaylikewithordinarymethods.

defobj.some_methodend

defobj.some_method2(arg1,arg2,darg=nil,*rest,&block)end

DefinitionofSingletonmethodsclass<<obj....end

Fromtheviewpointofpurposes,itisthestatementtodefinesomesingletonmethodsinabundle.Fromtheviewpointofmeasures,itisthestatementinwhichthesingletonclassofobjbecomesselfwhenexecuted.InallovertheRubyprogram,thisistheonlyplacewhereasingletonclassisexposed.

class<<objpself#=>#<Class:#<Object:0x40156fcc>>#SingletonClass「(obj)」defa()end#defobj.adefb()end#defobj.bend

MultipleAssignmentWithamultipleassignment,severalassignmentscanbedoneallatonce.Thefollowingisthesimplestcase:

a,b,c=1,2,3

It’sexactlythesameasthefollowing.

a=1b=2c=3

Justbeingconciseisnotinteresting.infact,whenanarraycomesintobemixed,itbecomessomethingfunforthefirsttime.

a,b,c=[1,2,3]

Thisalsohasthesameresultastheabove.Furthermore,therighthandsidedoesnotneedtobeagrammaticallistoraliteral.Itcanalsobeavariableoramethodcall.

tmp=[1,2,3]a,b,c=tmpret1,ret2=some_method()#some_methodmightprobablyreturnseveralvalues

Preciselyspeakingitisasfollows.Herewe’llassumeobjis(theobjectof)thevalueofthelefthandside,

1. objifitisanarray2. ifitsto_arymethodisdefined,itisusedtoconvertobjtoan

array.3. [obj]

Decidetheright-handsidebyfollowingthisprocedureandperformassignments.Itmeanstheevaluationoftheright-handsideandtheoperationofassignmentsaretotallyindependentfromeachother.

Anditgoeson,boththeleftandrighthandsidecanbeinfinitelynested.

a,(b,c,d)=[1,[2,3,4]]a,(b,(c,d))=[1,[2,[3,4]]](a,b),(c,d)=[[1,2],[3,4]]

Astheresultoftheexecutionofthisprogram,eachlinewillbea=1b=2c=3d=4.

Anditgoeson.Thelefthandsidecanbeindexorparameterassignments.

i=0arr=[]arr[i],arr[i+1],arr[i+2]=0,2,4parr#[0,2,4]

obj.attr0,obj.attr1,obj.attr2="a","b","c"

Andlikewithmethodparameters,*canbeusedtoreceiveinabundle.

first,*rest=0,1,2,3,4

pfirst#0prest#[1,2,3,4]

Whenallofthemareusedallatonce,it’sextremelyconfusing.

BlockparameterandmultipleassignmentWebrushedoverblockparameterswhenweweretalkingaboutiterators.Butthereisadeeprelationshipbetweenthemandmultipleassignment.Forinstanceinthefollowingcase.

array.eachdo|i|....end

Everytimewhentheblockiscalled,theyieldedargumentsaremulti-assignedtoi.Herethere’sonlyonevariableonthelefthandside,soitdoesnotlooklikemultiassignment.Butiftherearetwoormorevariables,itwouldalittlemorelooklikeit.Forinstance,Hash#eachisanrepeatedoperationonthepairsofkeysandvalues,sousuallywecallitlikethis:

hash.eachdo|key,value|....end

Inthiscase,eacharrayconsistofakeyandavalueisyieldedfromthehash.

Hencewecanalsodoesthefollowingthingbyusingnestedmultipleassignment.

#[[key,value],index]areyieldedhash.each_with_indexdo|(key,value),index|....end

alias

classCaliasneworigend

Defininganothermethodnewwiththesamebodyasthealreadydefinedmethodorig.aliasaresimilartohardlinksinaunixfilesystem.Theyareameansofassigningmultiplenamestoonemethodbody.Tosaythisinversely,becausethenamesthemselvesareindependentofeachother,evenifonemethodnameisoverwrittenbyasubclassmethod,theotheronestillremainswiththesamebehavior.

undef

classCundefmethod_nameend

ProhibitsthecallingofC#method_name.It’snotjustasimplerevokingofthedefinition.Ifthereevenwereamethodinthesuperclassitwouldalsobeforbidden.Inotherwordsthemethodisexchanged

forasignwhichsays“Thismethodmustnotbecalled”.

undefisextremelypowerful,onceitissetitcannotbedeletedfromtheRubylevelbecauseitisusedtocoverupcontradictionsintheinternalstructure.Onlyoneleftmeasureisinheritinganddefiningamethodinthelowerclass.Eveninthatcase,callingsuperwouldcauseanerroroccurring.

ThemethodwhichcorrespondstounlinkinafilesystemisModule#remove_method.Whiledefiningaclass,selfreferstothatclass,wecancallitasfollows(RememberthatClassisasubclassofModule.)

classCremove_method(:method_name)end

Butevenwitharemove_methodonecannotcanceltheundef.It’sbecausethesignputupbyundefprohibitsanykindofsearches.

((errata:Itcanberedefinedbyusingdef))

Somemoresmalltopics

Comments#examplesofbadcomments.

1+1#compute1+1.aliasmy_idid#my_idisanaliasofid.

Froma#totheendoflineisacomment.Itdoesn’thaveameaningfortheprogram.

Embeddeddocuments=beginThisisanembeddeddocument.It'ssocalledbecauseitisembeddedintheprogram.Plainandsimple.=end

Anembeddeddocumentstretchesfroman=beginoutsideastringatthebeginningofalinetoa=end.Theinteriorcanbearbitrary.Theprogramignoresitasamerecomment.

Multi-bytestringsWhentheglobalvariable$KCODEissettoeitherEUC,SJISorUTF8,stringsencodedineuc-jp,shift_jis,orutf8respectivelycanbeusedinastringofadata.

Andiftheoption-Ke,-Ksor-KuisgiventotherubycommandmultibytestringscanbeusedwithintheRubycode.Stringliterals,regularexpressionsandevenoperatornamescancontainmultibytecharacters.Henceitispossibletodosomethinglikethis:

def表⽰(arg)putsarg

end

表⽰'にほんご'

ButIreallycannotrecommenddoingthingslikethat.

TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera

CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License

RubyHackingGuide

TranslatedbyVincentISAMBART&ocha-

Chapter9:yacccrash

course

Outline

ParserandscannerHowtowriteparsersforprogramminglanguageshasbeenanactiveareaofresearchforalongtime,andthereisaquitefirmestablishedtacticfordoingit.Ifwelimitourselvestoagrammarnottoostrange(orambiguous),wecansolvethisproblembyfollowingthismethod.

Thefirstpartconsistsinsplittingastringinalistofwords(ortokens).Thisiscalledascannerorlexer.Theterm“lexicalanalyzer”isalsoused,butistoocomplicatedtosaysowe’llusethenamescanner.

Whenspeakingaboutscanners,thecommonsensefirstsays“therearegenerallyspacesattheendofaword”.Andinpractice,itwasmadelikethisinmostprogramminglanguages,becauseit’stheeasiestway.

Therecanalsobeexceptions.Forexample,intheoldFortran,whitespacesdidnothaveanymeaning.Thismeansawhitespacedidnotendaword,andyoucouldputspacesinthenameofavariable.Howeverthatmadetheparsingverycomplicatedsothecompilervendors,onebyone,startedignoringthatstandard.FinallyFortran90followedthistrendandmadethefactthatwhitespaceshaveanimpactthestandard.

Bytheway,itseemsthereasonwhitespaceshadnotmeaninginFortran77wasthatwhenwritingprogramsonpunchcardsitwaseasytomakeerrorsinthenumberofspaces.

ListofsymbolsIsaidthatthescannerspitsoutalistofwords(tokens),but,tobeexact,whatthescannercreatesisalistof“symbols”,notwords.

Whataresymbols?Let’stakenumbersasanexample.Inaprogramminglanguage,1,2,3,99areall“numbers”.Theycanallbehandledthesamewaybythegrammar.Wherewecanwrite1,wecanalsowrite2or3.That’swhytheparserdoesnotneedtohandlethemindifferentways.Fornumbers,“number”isenough.

“number”,“identifier”andotherscanbegroupedtogetheras“symbol”.ButbecarefulnottomixthiswiththeSymbolclass.

Thescannerfirstsplitsthestringintowordsanddetermineswhatthesesymbolsare.Forexample,NUMBERorDIGITfornumbers,IDENTIFIERfornameslike“name”,IFforthereservedwordif.These

symbolsarethengiventothenextphase.

ParsergeneratorThelistofwordsandsymbolsspittedoutbythescanneraregoingtobeusedtoformatree.Thistreeiscalledasyntaxtree.

Thename“parser”isalsosometimesusedtoincludeboththescannerandthecreationofthesyntaxtree.However,wewillusethenarrowsenseof“parser”,thecreationofthesyntaxtree.Howdoesthisparsermakeatreefromthelistofsymbols?Inotherwords,onwhatshouldwefocustofindthetreecorrespondingtoapieceofcode?

Thefirstwayistofocusonthemeaningofthewords.Forexample,let’ssupposewefindthewordvar.Ifthedefinitionofthelocalvariablevarhasbeenfoundbeforethis,we’llunderstandit’sthereadingofalocalvariable.

Anotherwaysistoonlyfocusonwhatwesee.Forexample,ifafteranidentifiedcomesa‘=’,we’llunderstandit’sanassignment.Ifthereservedwordifappears,we’llunderstandit’sthestartofanifstatement.

Thelatermethod,focusingonlyonwhatwesee,isthecurrenttrend.Inotherwordsthelanguagemustbedesignedtobeanalyzedjustbylookingatthelistofsymbols.Thechoicewasbecausethiswayissimpler,canbemoreeasilygeneralizedandcanthereforebe

automatizedusingtools.Thesetoolsarecalledparsergenerators.

ThemostusedparsergeneratorunderUNIXisyacc.Likemanyothers,ruby‘sparseriswrittenusingyacc.Theinputfileforthistoolisparser.y.That’swhytobeabletoreadruby’sparser,weneedtounderstandyacctosomeextent.(Note:Startingfrom1.9,rubyrequiresbisoninsteadofyacc.However,bisonismainlyyaccwithadditionalfunctionality,sothisdoesnotdiminishtheinterestofthischapter.)

Thischapterwillbeasimplepresentationofyacctobeabletounderstandparse.y,andthereforewewilllimitourselvestowhat’sneededtoreadparse.y.Ifyouwanttoknowmoreaboutparsersandparsergenerators,IrecommendyouabookIwrotecalled“Rubyを256倍使うための本無道編”(Thebooktouse256timesmoreofRuby-Unreasonablebook).IdonotrecommenditbecauseIwroteit,butbecauseinthisfieldit’stheeasiestbooktounderstand.Andbesidesit’scheapsostakeswillbelow.

Nevertheless,ifyouwouldlikeabookfromsomeoneelse(orcan’treadJapanese),IrecommendO’Reilly’s“lex&yaccprogramming”byJohnR.Levine,TonyMasonandDougBrown.Andifyourarestillnotsatisfied,youcanalsoread“Compilers”(alsoknownasthe“dragonbook”becauseofthedragononitscover)byAlfredV.Aho,RaviSethiandJeffreyD.Ullman.

Grammar

GrammarfileTheinputfileforyacciscalled“grammarfile”,asit’sthefilewherethegrammariswritten.Theconventionistonamethisgrammarfile*.y.ItwillbegiventoyaccwhowillgenerateCsourcecode.Thisfilecanthenbecompiledasusual(figure1showsthefullprocess).

Figure1:Filedependencies

Theoutputfilenameisalwaysy.tab.candcan’tbechanged.Therecentversionsofyaccusuallyallowtochangeitonthecommandline,butforcompatibilityitwassafertokeepy.tab.c.Bytheway,itseemsthetabofy.tab.ccomesfromtable,aslotsofhugetablesaredefinedinit.It’sgoodtohavealookatthefileonce.

Thegrammarfile’scontenthasthefollowingform:

▼Generalformofthegrammarfile

%{Header%}%union....%token....%type....

%%Rulespart%%Userdefinedpart

yacc‘sinputfileisfirstdividedin3partsby%%.Thefirstpartifcalledthedefinitionpart,hasalotofdefinitionsandsetups.Between%{and%}wecanwriteanythingwewantinC,likeforexamplenecessarymacros.Afterthat,theinstructionsstartingwith%arespecialyaccinstructions.Everytimeweuseone,we’llexplainit.

Themiddlepartofthefileiscalledtherulespart,andisthemostessentialpartforyacc.It’swhereiswrittenthegrammarwewanttoparse.We’llexplainitindetailsinthenextsection.

Thelastpartofthefile,theuserdefinedpart,canbeusedfreelybytheuser.yaccjustcopiesthispartverbatimintheoutputfile.It’susedforexampletoputauxiliaryroutinesneededbytheparser.

Whatdoesyaccdo.Whatyacctakescareofismainlythisrulespartinthemiddle.yacc

takesthegrammarwrittenthereanduseittomakeafunctioncalledyyparse().It’stheparser,inthenarrowsenseoftheword.

Inthenarrowsense,soitmeansascannerisneeded.However,yaccwon’ttakecareofit,itmustbedonebytheuser.Thescanneristhefunctionnamedyylex().

Evenifyacccreatesyyparse(),itonlytakescareofitscorepart.The“actions”we’llmentionlaterisoutofitsscope.Youcanthinkthepartdonebyyaccistoosmall,butthat’snotthecase.That’sbecausethis“corepart”isoverlyimportantthatyaccsurvivedtothisdayeventhoughwekeepcomplainingaboutit.

Butwhatonearthisthiscorepart?That’swhatwe’regoingtosee.

BNFWhenwewanttowriteaparserinC,itscodewillbe“cutthestringthisway,makethisanifstatement…”Whenusingparsergenerators,wesaytheopposite,thatis“Iwouldliketoparsethisgrammar.”Doingthiscreatesforusaparsertohandlethegrammar.Thismeanstellingthespecificationgivesustheimplementation.That’stheconvenientpointofyacc.

Buthowcanwetellthespecification?Withyacc,themethodofdescriptionusedistheBNF(Backus-NaurForm).Let’slookataverysimpleexample.

if_stmt:IFexprTHENstmtEND

Let’sseeseparatelywhat’sattheleftandattherightofthe“:”.Thepartontheleftside,if_stmt,isequaltotherightpart…iswhatImeanhere.Inotherwords,I’msayingthat:

if_stmtandIFexprTHENstmtENDareequivalent.

Here,if_stmt,IF,expr…areall“symbols”.expristheabbreviationofexpression,stmtofstatement.Itmustbeforsurethedeclarationoftheifstatement.

Onedefinitioniscalledarule.Thepartattheleftof“:”iscalledtheleftsideandtherightpartcalledtherightside.Thisisquiteeasytoremember.

Butsomethingismissing.Wedonotwantanifstatementwithoutbeingabletouseelse.Andevenifwecouldwriteelse,havingtoalwayswritetheelseevenwhenit’suselesswouldbecumbersome.Inthiscasewecoulddothefollowing:

if_stmt:IFexprTHENstmtEND|IFexprTHENstmtELSEstmtEND

“|”means“or”.

if_stmtiseither“IFexprTHENstmtEND”or“`IFexprTHENstmtELSEstmtEND`”.

That’sit.

HereIwouldlikeyoutopayattentiontothesplitdonewith|.Withjustthis,onemoreruleisadded.Infact,punctuatingwith|isjustashorterwaytorepeattheleftside.Thepreviousexamplehasexactlythesamemeaningasthefollowing:

if_stmt:IFexprTHENstmtENDif_stmt:IFexprTHENstmtELSEstmtEND

Thismeanstworulesaredefinedintheexample.

Thisisnotenoughtocompletethedefinitionoftheifstatement.That’sbecausethesymbolsexprandstmtarenotsentbythescanner,theirrulesmustbedefined.TobeclosertoRuby,let’sboldlyaddsomerules.

stmt:if_stmt|IDENTIFIER'='expr/*assignment*/|expr

if_stmt:IFexprTHENstmtEND|IFexprTHENstmtELSEstmtEND

expr:IDENTIFIER/*readingavariable*/|NUMBER/*integerconstant*/|funcall/*FUNctionCALL*/

funcall:IDENTIFIER'('args')'

args:expr/*onlyoneparameter*/

Iusedtwonewelements.First,commentsofthesameformasinC,andcharacterexpressedusing'='.This'='isalsoofcourseasymbol.Symbolslike“=”aredifferentfromnumbersasthereis

onlyonevarietyforthem.That’swhyforsymbolswherecanalsouse'='.Itwouldbegreattobeabletouseforstringsfor,forexample,reservedwords,butduetolimitationsoftheClanguagethiscannotbedone.

Weaddruleslikethis,tothepointwecompletewritingallthegrammar.Withyacc,theleftsideofthefirstwrittenruleis“thewholegrammarwewanttoexpress”.Sointhisexample,stmtexpressesthewholeprogram.

Itwasalittletooabstract.Let’sexplainthisalittlemoreconcretely.By“stmtexpressesthewholeprogram”,Imeanstmtandtherowsofsymbolsexpressedasequivalentbytherules,areallrecognizedasgrammar.Forexample,stmtandstmtareequivalent.Ofcourse.Thenexprisequivalenttostmt.That’sexpressedlikethisintherule.Then,NUMBERandstmtareequivalent.That’sbecauseNUMBERisexprandexprisstmt.

Wecanalsosaythatmorecomplicatedthingsareequivalent.

stmt↓if_stmt↓IFexprTHENstmtEND↓↓IFIDENTIFIERTHENexprEND↓IFIDENTIFIERTHENNUMBEREND

Whenithasexpandeduntilhere,allelementsbecomethesymbols

sentbythescanner.Itmeanssuchsequenceofsymbolsiscorrectasaprogram.Orputtingittheotherwayaround,ifthissequenceofsymbolsissentbythescanner,theparsercanunderstanditintheoppositeorderofexpanding.

IFIDENTIFIERTHENNUMBEREND↓IFIDENTIFIERTHENexprEND↓↓IFexprTHENstmtEND↓if_stmt↓stmt

Andstmtisasymbolexpressingthewholeprogram.That’swhythissequenceofsymbolsisacorrectprogramfortheparser.Whenit’sthecase,theparsingroutineyyparse()endsreturning0.

Bytheway,thetechnicaltermexpressingthattheparsersucceededisthatit“accepted”theinput.Theparserislikeagovernmentoffice:ifyoudonotfillthedocumentsintheboxesexactlylikeheaskedyouto,he’llrefusethem.Theacceptedsequencesofsymbolsaretheonesforwhichtheboxeswherefilledcorrectly.Parserandgovernmentofficearestrangelysimilarforinstanceinthefactthattheycareaboutdetailsinspecificationandthattheyusecomplicatedterms.

Terminalsymbolsandnonterminalsymbols

Well,intheconfusionofthemomentIusedwithoutexplainingittheexpression“symbolscomingfromthescanner”.Solet’sexplainthis.Iuseoneword“symbol”buttherearetwotypes.

Thefirsttypeofthesymbolsaretheonessentbythescanner.Theyareforexample,IF,THEN,END,'=',…Theyarecalledterminalsymbols.That’sbecauselikebeforewhenwedidthequickexpansionwefindthemalignedattheend.Inthischapterterminalsymbolsarealwayswrittenincapitalletters.However,symbolslike'='betweenquotesarespecial.Symbolslikethisareallterminalsymbols,withoutexception.

Theothertypeofsymbolsaretheonesthatnevercomefromthescanner,forexampleif_stmt,exprorstmt.Theyarecallednonterminalsymbols.Astheydon’tcomefromthescanner,theyonlyexistintheparser.Nonterminalsymbolsalsoalwaysappearatonemomentortheotherastheleftsideofarule.Inthischapter,nonterminalsymbolsarealwayswritteninlowercaseletters.

HowtotestI’mnowgoingtotellyouthewaytoprocessthegrammarfilewithyacc.

%tokenABCDE%%list:ABC|de

de:DE

First,putallterminalsymbolsusedafter%token.However,youdonothavetotypethesymbolswithquotes(like'=').Then,put%%tomarkachangeofsectionandwritethegrammar.That’sall.

Let’snowprocessthis.

%yaccfirst.y%lsfirst.yy.tab.c%

LikemostUnixtools,“silencemeanssuccess”.

There’salsoimplementationsofyaccthatneedsemicolonsattheendof(groupsof)rules.Whenit’sthecaseweneedtodothefollowing:

%tokenABCDE%%list:ABC|de;

de:DE;

IhatethesesemicolonssointhisbookI’llneverusethem.

VoidrulesLet’snowlookalittlemoreatsomeoftheestablishedwaysofgrammardescription.I’llfirstintroducevoidrules.

void:

There’snothingontherightside,thisruleis“void”.Forexample,thetwofollowingtargetsmeansexactlythesamething.

target:ABC

target:AvoidBvoidCvoid:

Whatistheuseofsuchathing?It’sveryuseful.Forexampleinthefollowingcase.

if_stmt:IFexprTHENstmtsopt_elseEND

opt_else:|ELSEstmts

Usingvoidrules,wecanexpresscleverlythefactthat“theelsesectionmaybeomitted”.Comparedtotherulesmadepreviouslyusingtwodefinitions,thiswayisshorterandwedonothavetodispersetheburden.

RecursivedefinitionsThefollowingexampleisstillalittlehardtounderstand.

list:ITEM/*rule1*/|listITEM/*rule2*/

Thisexpressesalistofoneormoreitems,inotherwordsanyof

thefollowinglistsofsymbols:

ITEMITEMITEMITEMITEMITEMITEMITEMITEMITEM:

Doyouunderstandwhy?First,accordingtorule1listcanbereadITEM.Ifyoumergethiswithrule2,listcanbeITEMITEM.

list:listITEM=ITEMITEM

WenowunderstandthatthelistofsymbolsITEMITEMissimilartolist.Byapplyingagainrule2tolist,wecansaythat3ITEMarealsosimilartolist.Byquicklycontinuingthisprocess,thelistcangrowtoanysize.Thisissomethinglikemathematicalinduction.

I’llnowshowyouthenextexample.Thefollowingexampleexpressesthelistswith0ormoreITEM.

list:|listITEM

Firstthefirstlinemeans“listisequivalentto(void)”.ByvoidImeanthelistwith0ITEM.Then,bylookingatrule2wecansaythat“listITEM”isequivalentto1ITEM.That’sbecauselistisequivalenttovoid.

list:listITEM

=(void)ITEM=ITEM

Byapplyingthesameoperationsofreplacementmultipletimes,wecanunderstandthatlististheexpressionalistof0ormoreitems.

Withthisknowledge,“listsof2ormoreITEM”or“listsof3ormoreITEM”areeasy,andwecanevencreate“listsofanevennumberofelements”.

list:|listITEMITEM

Constructionofvalues

ThisabstracttalklastedlongenoughsointhissectionI’dreallyliketogoonwithamoreconcretetalk.

ShiftandreduceUpuntilnow,variouswaystowritegrammarshavebeenexplained,butwhatwewantisbeingabletobuildasyntaxtree.However,I’mafraidtosay,onlytellingittherulesisnotenoughtobeabletoletitbuildasyntaxtree,asmightbeexpected.Therefore,thistime,I’lltellyouthewaytobuildasyntaxtreebyaddingsomethingtotherules.

We’llfirstseewhattheparserdoesduringtheexecution.We’llusethefollowingsimplegrammarasanexample.

%tokenABC%%program:ABC

Intheparserthereisastackcalledthesemanticstack.Theparserpushesonitallthesymbolscomingfromthescanner.Thismoveiscalled“shiftingthesymbols”.

[AB]←Cshift

Andwhenanyoftherightsideofaruleisequaltotheendofthestack,itis“interpreted”.Whenthishappens,thesequenceoftheright-handsideisreplacedbythesymboloftheleft-handside.

[ABC]↓reduction[program]

Thismoveiscalled“reduceABC”toprogram".Thistermisalittlepresumptuous,butinshortitislike,whenyouhaveenoughnumberoftilesofhakuandhatsuandchurespectively,itbecomes“Bigthreedragons”inJapaneseMahjong,…thismightbeirrelevant.

Andsinceprogramexpressesthewholeprogram,ifthere’sonlyaprogramonthestack,itprobablymeansthewholeprogramisfoundout.Therefore,iftheinputisjustfinishedhere,itisaccepted.

Let’strywithalittlemorecomplicatedgrammar.

%tokenIFESTHENEND%%program:if

if:IFexprTHENstmtsEND

expr:E

stmts:S|stmtsS

Theinputfromthescanneristhis.

IFETHENSSSEND

Thetransitionsofthesemanticstackinthiscaseareshownbelow.

Stack MoveemptyatfirstIF shiftIFIFE shiftEIFexpr reduceEtoexprIFexprTHEN shiftTHENIFexprTHENS shiftSIFexprTHENstmts reduceStostmtsIFexprTHENstmtsS shiftSIFexprTHENstmts reducestmtsStostmtsIFexprTHENstmtsS shiftSIFexprTHENstmts reducestmtsStostmtsIFexprTHENstmtsEND shiftENDif reduceIFexprTHENstmtsENDtoif

program reduceiftoprogramaccept.

Astheendofthissection,there’sonethingtobecautiouswith.areductiondoesnotalwaysmeansdecreasingthesymbols.Ifthere’savoidrule,it’spossiblethatasymbolisgeneratedoutof“void”.

ActionNow,I’llstarttodescribetheimportantparts.Whichevershiftingorreducing,doingseveralthingsonlyinsideofthesemanticstackisnotmeaningful.Sinceourultimategoalwasbuildingasyntaxtree,itcannotbesufficientwithoutleadingtoit.Howdoesyaccdoitforus?Theansweryaccmadeisthat“weshallenabletohookthemomentwhentheparserperformingareduction.”Thehooksarecalledactionsoftheparser.Anactioncanbewrittenatthelastoftheruleasfollows.

program:ABC{/*Hereisanaction*/}

Thepartbetween{and}istheaction.Ifyouwritelikethis,atthemomentreducingABCtoprogramthisactionwillbeexecuted.Whateveryoudoasanactionisfree.IfitisaCcode,almostallthingscanbewritten.

ThevalueofasymbolThisisfurthermoreimportantbut,eachsymbolhas“itsvalue”.

Bothterminalandnonterminalsymbolsdo.Asforterminalsymbols,sincetheycomefromthescanner,theirvaluesarealsogivenbythescanner.Forexample,1or9ormaybe108foraNUMBERsymbol.ForanIDENTIFIERsymbol,itmightbe"attr"or"name"or"sym".Anythingisfine.Eachsymbolanditsvaluearepushedtogetheronthesemanticstack.ThenextfigureshowsthestatejustthemomentSisshiftedwithitsvalue.

IFexprTHENstmtsSvaluevaluevaluevaluevalue

Accordingtothepreviousrule,stmtsScanbereducedtostmts.Ifanactioniswrittenattherule,itwouldbeexecuted,butatthatmoment,thevaluesofthesymbolscorrespondingtotheright-handsidearepassedtotheaction.

IFexprTHENstmtsS/*Stack*/v1v2v3v4v5↓↓stmts:stmtsS/*Rule*/↓↓{$1+$2;}/*Action*/

Thiswayanactioncantakethevalueofeachsymbolcorrespondingtotheright-handsideofarulethrough$1,$2,$3,…yaccwillrewritethekindsof$1and$2tothenotationtopointtothestack.HoweverbecauseitiswritteninClanguageitneedstohandle,forinstance,types,butbecauseitistiresome,let’sassumetheirtypesareofintforthemoment.

Next,insteaditwillpushthesymboloftheleft-handside,butbecauseallsymbolshavetheirvaluestheleft-handsidesymbolmustalsohaveitsvalue.Itisexpressedas$$inactions,thevalueof$$whenleavinganactionwillbethevalueoftheleft-handsidesymbol.

IFexprTHENstmtsS/*thestackjustbeforereducing*/v1v2v3v4v5↓↓stmts:stmtsS/*therulethattheright-handsidematchestheend*/↑↓↓{$$=$1+$2;}/*itsaction*/

IFexprTHENstmts/*thestackafterreducing*/v1v2v3(v4+v5)

Astheendofthissection,thisisjustanextra.Thevalueofasymbolissometimescalled“semanticvalue”.Thereforethestacktoputthemisthe“semanticvaluestack”,anditiscalled“semanticstack”forshort.

yaccandtypesIt’sreallycumbersomebutwithouttalkingabouttypeswecannotfinishthistalk.Whatisthetypeofthevalueofasymbol?Tosaythebottomlinefirst,itwillbethetypenamedYYSTYPE.ThismustbetheabbreviationofeitherYYStackTYPEorSemanticvalueTYPE.AndYYSTYPEisobviouslythetypedefofsomewhatanothertype.Thetypeistheuniondefinedwiththeinstructionnamed%unioninthedefinitionpart.

Wehavenotwritten%unionbeforebutitdidnotcauseanerror.Why?Thisisbecauseyaccconsideratelyprocesswiththedefaultvaluewithoutasking.ThedefaultvalueinCshouldnaturallybeint.Therefore,YYSTYPEisintbydefault.

Asforanexampleofayaccbookoracalculator,intcanbeusedunchanged.Butinordertobuildasyntaxtree,wewanttousestructsandpointersandtheothervariousthings.Thereforeforinstance,weuse%unionasfollows.

%union{structnode{inttype;structnode*left;structnode*right;}*node;intnum;char*str;}

Becausethisisnotforpracticaluse,thearbitrarynamesareusedfortypesandmembers.NoticethatitisdifferentfromtheordinalCbutthere’snosemicolonattheendofthe%uniconblock.

And,ifthisiswritten,itwouldlooklikethefollowinginy.tab.c.

typedefunion{structnode{inttype;structnode*left;structnode*right;}*node;intnum;char*str;

}YYSTYPE;

And,asforthesemanticstack,

YYSTYPEyyvs[256];/*thesubstanceofthestack(yyvs=YYValueStack)*/YYSTYPE*yyvsp=yyvs;/*thepointertotheendofthestack*/

wecanexpectsomethinglikethis.Therefore,thevaluesofthesymbolsappearinactionswouldbe

/*theactionbeforeprocessedbyyacc*/target:ABC{func($1,$2,$3);}

/*afterconverted,itsappearanceiny.tab.c*/{func(yyvsp[-2],yyvsp[-1],yyvsp[0]);;

naturallylikethis.

Inthiscase,becausethedefaultvalueintisused,itcanbeaccessedjustbyreferringtothestack.IfYYSTYPEisaunion,itisnecessarytoalsospecifyoneofitsmembers.Therearetwowaystodothat,onewayisassociatingwitheachsymbol,anotherwayisspecifyingeverytime.

Generally,thewayofassociatingwitheachtypeisused.Byusing%tokenforterminalsymbolsandusing%typefornonterminalsymbols,itiswrittenasfollows.

%token<num>ABC/*AllofthevaluesofABCisoftypeint*/%type<str>target/*Allofthevaluesoftargetisoftypechar**/

Ontheotherhand,ifyou’dliketospecifyeverytime,youcanwriteamembernameintonextto$asfollows.

%union{char*str;}%%target:{$<str>$="Inshort,thisisliketypecasting";}

You’dbetteravoidusingthismethodifpossible.Definingamemberforeachsymbolisbasic.

Couplingtheparserandthescannertogether

Afterall,I’vefinishedtotalkallaboutthisandthatofthevaluesinsidetheparser.Fortherest,I’lltalkingabouttheconnectingprotocolwiththescanner,thentheheartofthisstorywillbeallfinished.

First,we’dliketomakesurethatImentionedthatthescannerwastheyylex()function.each(terminal)symbolitselfisreturned(asint)asareturnvalueofthefunction.Sincetheconstantswiththesamenamesofsymbolsaredefined(#define)byyacc,wecanwriteNUMBERforaNUMBER.Anditsvalueispassedbyputtingitintoaglobalvariablenamedyylval.ThisyylvalisalsooftypeYYSTYPE,andtheexactlysamethingsastheparsercanbesaid.Inotherwords,ifitisdefinedin%unionitwouldbecomeaunion.Butthistimethememberisnotautomaticallyselected,itsmembernamehastobemanuallywritten.Theverysimpleexampleswouldlooklikethe

following.

staticintyylex(){yylval.str=next_token();returnSTRING;}

Figure2summarizestherelationshipsdescribedbynow.I’dlikeyoutocheckonebyone.yylval,$$,$1,$2…allofthesevariablesthatbecometheinterfacesareoftypeYYSTYPE.

Figure2:Relationshipsamongyaccrelatedvariables&functions

EmbeddedAction

Anactioniswrittenatthelastofarule,ishowitwasexplained.However,actuallyitcanbewritteninthemiddleofarule.

target:AB{puts("embeddedaction");}CD

Thisiscalled“embeddedaction”.Anembeddedactionismerelyasyntacticsugarofthefollowingdefinition:

target:ABdummyCD

dummy:/*voidrule*/{puts("embeddedaction");}

Fromthisexample,youmightbeabletotelleverythingincludingwhenitisexecuted.Thevalueofasymbolcanalsobetaken.Inotherwords,inthisexample,thevalueoftheembeddedactionwillcomeoutas$3.

PracticalTopics

ConflictsI’mnotafraidofyaccanymore.

Ifyouthoughtso,itistoonaive.Whyeveryoneisafraidsomuch

aboutyacc,thereasonisgoingtoberevealed.

Upuntilnow,Iwrotenotsocarefully“whentheright-handsideoftherulematchestheendofthestack”,butwhathappensifthere’sarulelikethis:

target:ABC|ABC

WhenthesequenceofsymbolsABCactuallycomesout,itwouldbehardtodeterminewhichistheruletomatch.Suchthingcannotbeinterpretedevenbyhumans.Thereforeyaccalsocannotunderstandthis.Whenyaccfindoutanoddgrammarlikethis,itwouldcomplainthatareduce/reduceconflictoccurs.Itmeansmultiplerulesarepossibletoreduceatthesametime.

%yaccrrconf.yconflicts:1reduce/reduce

Butusually,Ithinkyouwon’tdosuchthingsexceptasanaccident.Buthowaboutthenextexample?Thedescribedsymbolsequenceiscompletelythesame.

target:abc|Abc

abc:ABC

bc:BC

Thisisrelativelypossible.Especiallywheneachpartis

complicatedlymovedwhiledevelopingrules,itisoftenthecasethatthiskindofrulesaremadewithoutnoticing.

There’salsoasimilarpattern,asfollows:

target:abc|abC

abc:ABC

ab:AB

WhenthesymbolsequenceABCcomesout,it’shardtodeterminewhetheritshouldchooseoneabcorthecombinationofabandC.Inthiscase,yaccwillcomplainthatashift/reduceconflictoccurs.Thismeansthere’rebothashift-ableruleandareduce-ableruleatthesametime.

%yaccsrconf.yconflicts:1shift/reduce

Thefamousexampleofshift/reduceconflictsis“thehangingelseproblem”.Forexample,theifstatementofClanguagecausesthisproblem.I’lldescribeitbysimplifyingthecase:

stmt:expr';'|if

expr:IDENTIFIER

if:IF'('expr')'stmt|IF'('expr')'stmtELSEstmt

Inthisrule,theexpressionisonlyIDENTIFIER(variable),thesubstanceofifisonlyonestatement.Now,whathappensifthenextprogramisparsedwiththisgrammar?

if(cond)if(cond)true_stmt;elsefalse_stmt;

Ifitiswrittenthisway,wemightfeellikeit’squiteobvious.Butactually,thiscanbeinterpretedasfollows.

if(cond){if(cond)true_stmt;}else{false_stmt;}

Thequestionis“betweenthetwoifs,insideoneoroutsideoue,whichistheonetowhichtheelseshouldbeattached?”.

Howevershift/reduceconflictsarerelativelylessharmfulthanreduce/reduceconflicts,becauseusuallytheycanbesolvedbychoosingshift.Choosingshiftisalmostequivalentto“connectingtheelementsclosertoeachother”anditiseasytomatchhumaninstincts.Infact,thehangingelsecanalsobesolvedbyshiftingit.Hence,theyaccfollowsthistrend,itchosesshiftbydefaultwhenashift/reduceconflictoccurs.

Look-aheadAsanexperiment,I’dlikeyoutoprocessthenextgrammarwithyacc.

%tokenABC%%target:ABC/*rule1*/|AB/*rule2*/

Wecan’thelpexpectingthereshouldbeaconflict.AtthetimewhenithasreaduntilAB,therule1wouldattempttoshift,therule2wouldattempttoreduce.Inotherwords,thisshouldcauseashift/reduceconflict.However,….

%yaccconf.y%

It’sodd,there’snoconflict.Why?

Infact,theparsercreatedwithyacccanlookaheadonlyonesymbol.Beforeactuallydoingshiftorreduce,itcandecidewhattodobypeekingthenextsymbol.

Therefore,itisalsoconsideredforuswhengeneratingtheparser,iftherulecanbedeterminedbyasinglelook-ahead,conflictswouldbeavoided.Inthepreviousrules,forinstance,ifCcomesrightafterAB,onlytherule1ispossibleanditwouldbechose(shift).Iftheinputhasfinished,therule2wouldbechose(reduce).

Noticethattheword“look-ahead”hastwomeanings:onethingisthelook-aheadwhileprocessing*.ywithyacc.Theotherthingisthelook-aheadwhileactuallyexecutingthegeneratedparser.Thelook-aheadduringtheexecutionisnotsodifficult,butthelook-aheadofyaccitselfisprettycomplicated.That’sbecauseitneedstopredictallpossibleinputpatternsanddecidesitsbehaviorsfromonlythegrammarrules.

However,because“allpossible”isactuallyimpossible,ithandles“mostof”patterns.Howbroadrangeoverallpatternsitcancoverupshowsthestrengthofalook-aheadalgorithm.Thelook-aheadalgorithmthatyaccuseswhenprocessinggrammarfilesisLALR,whichisrelativelypowerfulamongcurrentlyexistingalgorithmstoresolveconflicts.

Alotthingshavebeenintroduced,butyoudon’thavetosoworrybecausewhattodointhisbookisonlyreadingandnotwriting.WhatIwantedtoexplainhereisnotthelook-aheadofgrammarsbutthelook-aheadduringexecutions.

OperatorPrecedenceSinceabstracttalkshavelastedforlong,I’lltalkmoreconcretely.Let’strytodefinetherulesforinfixoperatorssuchas+or*.Therearealsoestablishedtacticsforthis,we’dbettertamelyfollowit.Somethinglikeacalculatorforarithmeticoperationsisdefinedbelow:

expr:expr'+'expr|expr'-'expr|expr'*'expr|expr'/'expr|primary

primary:NUMBER|'('expr')'

primaryisthesmallestgrammarunit.Thepointisthatexprbetweenparenthesesbecomesaprimary.

Then,ifthisgrammariswrittentoanarbitraryfileandcompiled,theresultwouldbethis.

%yaccinfix.y16shift/reduceconflicts

Theyconflictaggressively.Thinkingfor5minutesisenoughtoseethatthisrulecausesaprobleminthefollowingandsimialrcases:

1-1-1

Thiscanbeinterpretedinbothofthenexttwoways.

(1-1)-11-(1-1)

Theformerisnaturalasannumericalexpression.Butwhatyaccdoesistheprocessoftheirappearances,theredoesnotcontainanymeanings.Asforthethingssuchasthemeaningthe-symbolhas,itisabsolutelynotconsideredatall.Inordertocorrectlyreflecta

humanintention,wehavetospecifywhatwewantstepbystep.

Then,whatwecandoiswritingthisinthedefinitionpart.

%left'+''-'%left'*''/'

Theseinstructionsspecifiesboththeprecedenceandtheassociativityatthesametime.I’llexplaintheminorder.

Ithinkthattheterm“precedence”oftenappearswhentalkingaboutthegrammarofaprogramminglanguage.Describingitlogicallyiscomplicated,soifIputitinstinctively,itisabouttowhichoperatorparenthesesareattachedinthefollowingandsimilarcases.

1+2*3

If*hashigherprecedence,itwouldbethis.

1+(2*3)

If+hashigherprecedence,itwouldbethis.

(1+2)*3

Asshownabove,resolvingshift/reduceconflictsbydefiningthestrongeronesandweakeronesamongoperatorsisoperatorprecedence.

However,iftheoperatorshasthesameprecedence,howcanitberesolved?Likethis,forinstance,

1-2-3

becausebothoperatorsare-,theirprecedencesarethecompletelysame.Inthiscase,itisresolvedbyusingtheassociativity.Associativityhasthreetypes:leftrightnonassoc,theywillbeinterpretedasfollows:

Associativity Interpretationleft(left-associative) (1–2)–3right(right-associative) 1–(2–3)nonassoc(non-associative) parseerror

Mostoftheoperatorsfornumericalexpressionsareleft-associative.Theright-associativeisusedmainlyfor=ofassignmentandnotofdenial.

a=b=1#(a=(b=1))notnota#(not(nota))

Therepresentativesofnon-associativeareprobablythecomparisonoperators.

a==b==c#parseerrora<=b<=c#parseerror

However,thisisnottheonlypossibility.InPython,forinstance,comparisonsbetweenthreetermsarepossible.

Then,thepreviousinstructionsnamed%left%right%noassocareusedtospecifytheassociativitiesoftheirnames.And,precedenceisspecifiedastheorderoftheinstructions.Thelowertheoperatorswritten,thehighertheprecedencestheyhave.Iftheyarewritteninthesameline,theyhavethesamelevelofprecedence.

%left'+''-'/*left-associativeandthirdprecedence*/%left'*''/'/*left-associativeandsecondprecedence*/%right'!'/*right-associativeandfirstprecedence*/

TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera

CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License

RubyHackingGuide

TranslatedbyRobertGRAVINA&ocha-

Chapter10:Parser

Outlineofthischapter

ParserconstructionThemainsourceoftheparserisparser.y.Becauseitis*.y,itistheinputforyaccandparse.cisgeneratedfromit.

Althoughonewouldexpectlex.ctocontainthescanner,thisisnotthecase.Thisfileiscreatedbygperf,takingthefilekeywordsasinput,anddefinesthereservedwordhashtable.Thistool-generatedlex.cis#includedin(thealsotool-generated)parse.c.Thedetailsofthisprocessissomewhatdifficulttoexplainatthistime,soweshallreturntothislater.

Figure1showstheparserconstructionprocess.ForthebenefitofthosereadersusingWindowswhomaynotbeaware,themv(move)commandcreatesanewcopyofafileandremovestheoriginal.ccis,ofcourse,theCcompilerandcpptheCpre-processor.

Figure1:Parserconstructionprocess

Dissectingparse.yLet’snowlookatparse.yinabitmoredetail.Thefollowingfigurepresentsaroughoutlineofthecontentsofparse.y.

▼parse.y

%{header%}%union....%token....%type....

%%

rules

%%usercodesectionparserinterfacescanner(characterstreamprocessing)syntaxtreeconstructionsemanticanalysislocalvariablemanagementIDimplementation

Asfortherulesanddefinitionspart,itisaspreviouslydescribed.Sincethispartisindeedtheheartoftheparser,I’llstarttoexplainitaheadoftheotherpartsinthenextsection.

Thereareaconsiderablenumberofsupportfunctionsdefinedintheusercodesection,butroughlyspeaking,theycanbedividedintothesixpartswrittenabove.Thefollowingtableshowswhereeachofpartsareexplainedinthisbook.

Part Chapter SectionParserinterface Thischapter Section3“Scanning”Scanner Thischapter Section3“Scanning”Syntaxtreeconstruction

Chapter12“Syntaxtreeconstruction”

Section2“Syntaxtreeconstruction”

Semanticanalysis Chapter12“Syntaxtreeconstruction”Section3“Semanticanalysis”

Localvariablemanagement

Chapter12“Syntaxtreeconstruction”

Section4“Localvariables”

IDimplementation

Chapter3“Namesandnametables”

Section2“IDandsymbols”

Generalremarksaboutgrammarrules

CodingrulesThegrammarofrubyconformstoacodingstandardandisthuseasytoreadonceyouarefamiliarwithit.

Firstly,regardingsymbolnames,allnon-terminalsymbolsarewritteninlowercasecharacters.Terminalsymbolsareprefixedbysomelowercasecharacterandthenfollowedbyuppercase.Reservedwords(keywords)areprefixedwiththecharacterk.Otherterminalsymbolsareprefixedwiththecharactert.

▼Symbolnameexamples

Token Symbolname(non-terminalsymbol) bodystmtif kIFdef kDEFrescue kRESCUEvarname tIDENTIFIERConstName tCONST1 tINTEGER

TheonlyexceptionstotheserulesareklBEGINandklEND.Thesesymbolnamesrefertothereservedwordsfor“BEGIN”and“END”,respectively,andthelherestandsforlarge.Sincethereservedwordsbeginandendalreadyexist(naturally,withsymbolnameskBEGINandkEND),thesenon-standardsymbolnameswererequired.

Importantsymbolsparse.ycontainsbothgrammarrulesandactions,however,fornowIwouldliketoconcentrateonthegrammarrulesalone.Thescriptsample/exyacc.rbcanbeusedtoextractthegrammarrulesfromthisfile.Asidefromthis,runningyacc-vwillcreatealogfiley.outputwhichalsocontainsthegrammarrules,howeveritisratherdifficulttoread.InthischapterIhaveusedaslightymodifiedversionofexyacc.rb\footnote{modifiedexyacc.rb:tools/exyacc2.rblocatedontheattachedCD-ROM}toextractthegrammarrules.

▼parse.y(rules)

program:compstmt

bodystmt:compstmtopt_rescueopt_elseopt_ensure

compstmt:stmtsopt_terms::

Theoutputisquitelong–over450linesofgrammarrules–andassuchIhaveonlyincludedthemostimportantpartsinthischapter.

Whichsymbols,then,arethemostimportant?Thenamessuchasprogram,expr,stmt,primary,argetc.arealwaysveryimportant.It’s

becausetheyrepresentthegeneralpartsofthegrammaticalelementsofaprogramminglanguage.Thefollowingtableoutlinestheelementsweshouldgenerallyfocusoninthesyntaxofaprogram.

Syntaxelement PredictedsymbolnamesProgram programprogfileinputstmts

wholeSentence statementstmt

Expression expressionexprexp

Smallestelement primaryprim

Lefthandsideofanexpression lhs(lefthandside)Righthandsideofanexpression rhs(righthandside)

Functioncall funcallfunction_callcallfunction

Methodcall methodmethod_callcall

Argument argumentarg

Functiondefinition defundefinitionfunctionfndef

Declarations declarationdecl

Ingeneral,programminglanguagestendtohavethefollowinghierarchystructure.

Programelement Properties

Program Usuallyalistofstatements

Statement Whatcannotbecombinedwiththeothers.Asyntaxtreetrunk.

Expression Whatisacombinationbyitselfandcanalsobeapartofanotherexpression.Asyntaxtreeinternalnode.

Primary Anelementwhichcannotbefurtherdecomposed.Asyntaxtreeleafnode.

ThestatementsarethingslikefunctiondefinitionsinCorclassdefinitionsinJava.Anexpressioncanbeaprocedurecall,anarithmeticexpressionetc.,whileaprimaryusuallyreferstoastringliteralornumber.Somelanguagesdonotcontainallofthesesymboltypes,howevertheygenerallycontainsomekindofhierarchyofsymbolssuchasprogram→stmt→expr→primary.

However,astructureatalowlevelcanbecontainedbyasuperiorstructure.Forexample,inCafunctioncallisanexpressionbutitcansolelybeput.Itmeansitisanexpressionbutitcanalsobeastatement.

Conversely,whensurroundedinparentheses,expressionsbecomeprimaries.Itisbecausethelowerthelevelofaelementthehighertheprecedenceithas.

Therangeofstatementsdifferconsiderablybetweenprogramminglanguages.Let’sconsiderassignmentasanexample.InC,becauseitispartofexpressions,wecanusethevalueofthewholeassignmentexpression.ButinPascal,assignmentisastatement,wecannotdosuchthing.Also,functionandclassdefinitionsaretypicallystatementshoweverinlanguagessuchasLispandScheme,sinceeverythingisanexpression,theydonothavestatementsinthefirstplace.RubyisclosetoLisp’sdesigninthisregard.

ProgramstructureNowlet’sturnourattentiontothegrammarrulesofruby.Firstly,inyacc,thelefthandsideofthefirstrulerepresentstheentiregrammar.Currently,itisprogram.Followingfurtherandfurtherfromhere,asthesameastheestablishedtactic,thefourprogramstmtexprprimarywillbefound.Withaddingargtothem,let’slookattheirrules.

▼rubygrammar(outline)

program:compstmt

compstmt:stmtsopt_terms

stmts:none|stmt|stmtstermsstmt

stmt:kALIASfitemfitem|kALIAStGVARtGVAR::|expr

expr:kRETURNcall_args|kBREAKcall_args::|'!'command_call|arg

arg:lhs'='arg|var_lhstOP_ASGNarg|primary_value'['aref_args']'tOP_ASGNarg::

|arg'?'arg':'arg|primary

primary:literal|strings::|tLPAREN_ARGexpr')'|tLPARENcompstmt')'::|kREDO|kRETRY

Ifwefocusonthelastruleofeachelement,wecanclearlymakeoutahierarchyofprogram→stmt→expr→arg→primary.

Also,we’dliketofocusonthisruleofprimary.

primary:literal::|tLPAREN_ARGexpr')'/*here*/

ThenametLPAREN_ARGcomesfromtforterminalsymbol,LforleftandPARENforparentheses–itistheopenparenthesis.Whythisisn’t'('iscoveredinthenextsection“Context-dependentscanner”.Anyway,thepurposeofthisruleisdemoteanexprtoaprimary.ThiscreatesacyclewhichcantheseeninFigure2,andthearrowshowshowthisruleisreducedduringparsing.

Figure2:exprdemotion

Thenextruleisalsoparticularlyinteresting.

primary:literal::|tLPARENcompstmt')'/*here*/

Acompstmt,whichequalstotheentireprogram(program),canbedemotedtoaprimarywiththisrule.Thenextfigureillustratesthisruleinaction.

Figure3:programdemotion

ThismeansthatforanysyntaxelementinRuby,ifwesurrounditwithparenthesisitwillbecomeaprimaryandcanbepassedasanargumenttoafunction,beusedastherighthandsideofanexpressionetc.Thisisanincrediblefact.Let’sactuallyconfirmit.

p((classC;end))p((defa()end))p((aliasaligets))p((iftruethennilelsenilend))p((1+1*1**1-1/1^1))

Ifweinvokerubywiththe-coption(syntaxcheck),wegetthefollowingoutput.

%ruby-cprimprog.rbSyntaxOK

Indeed,it’shardtobelievebut,itcouldactuallypass.Apparently,wedidnotgetthewrongidea.

Ifwecareaboutthedetails,sincetherearewhatrejectedbythesemanticanalysis(seealsoChapter12“Syntaxtreeconstruction”),itisnotperfectlypossible.Forexamplepassingareturnstatementasanargumenttoafunctionwillresultinanerror.Butatleastattheleveloftheoutlooks,the“surroundinganythinginparenthesismeansitcanbepassedasanargumenttoafunction”ruledoeshold.

InthenextsectionIwillcoverthecontentsoftheimportantelementsonebyone.

program

▼program

program:compstmt

compstmt:stmtsopt_terms

stmts:none|stmt|stmtstermsstmt

Asmentionedearlier,programrepresentstheentiregrammarthatmeanstheentireprogram.Thatprogramequalstocompstmts,andcompstmtsisalmostequivalenttostmts.Thatstmtsisalistofstmtsdelimitedbyterms.Hence,theentireprogramisalistofstmtsdelimitedbyterms.

termsis(ofcourse)anabbreviationfor“terminators”,thesymbolsthatterminatethesentences,suchassemicolonsornewlines.opt_termsmeans“OPTionalterms”.Thedefinitionsareasfollows:

▼opt_terms

opt_terms:|terms

terms:term|terms';'

term:';'|'\n'

Theinitial;or\nofatermscanbefollowedbyanynumberof;only;basedonthat,youmightstartthinkingthatifthereare2ormoreconsecutivenewlines,itcouldcauseaproblem.Let’stryandseewhatactuallyhappens.

1+1#firstnewline#secondnewline#thirdnewline1+1

Runthatwithruby-c.

%ruby-coptterms.rbSyntaxOK

Strange,itworked!Whatactuallyhappensisthis:consecutivenewlinesaresimplydiscardedbythescanner,whichreturnsonly

thefirstnewlineinaseries.

Bytheway,althoughwesaidthatprogramisthesameascompstmt,ifthatwasreallytrue,youwouldquestionwhycompstmtexistsatall.Actually,thedistinctionisthereonlyforexecutionofsemanticactions.programexiststoexecuteanysemanticactionswhichshouldbedoneonceintheprocessingofanentireprogram.Ifitwasonlyaquestionofparsing,programcouldbeomittedwithnoproblemsatall.

Togeneralizethispoint,thegrammarrulescanbedividedinto2groups:thosewhichareneededforparsingtheprogramstructure,andthosewhichareneededforexecutionofsemanticactions.Thenonerulewhichwasmentionedearlierwhentalkingaboutstmtsisanotheronewhichexistsforexecutingactions—it’susedtoreturnaNULLpointerforanemptylistoftypeNODE*.

stmt

Nextisstmt.Thisoneisratherinvolved,sowe’lllookintoitabitatatime.

▼stmt(1)

stmt:kALIASfitemfitem|kALIAStGVARtGVAR|kALIAStGVARtBACK_REF|kALIAStGVARtNTH_REF|kUNDEFundef_list|stmtkIF_MODexpr_value|stmtkUNLESS_MODexpr_value

|stmtkWHILE_MODexpr_value|stmtkUNTIL_MODexpr_value|stmtkRESCUE_MODstmt|klBEGIN'{'compstmt'}'|klEND'{'compstmt'}'

Lookingatthat,somehowthingsstarttomakesense.Thefirstfewhavealias,thenundef,thenthenextfewareallsomethingfollowedby_MOD—thoseshouldbestatementswithpostpositionmodifiers,asyoucanimagine.

expr_valueandprimary_valuearegrammarruleswhichexisttoexecutesemanticactions.Forexample,expr_valuerepresentsanexprwhichhasavalue.Expressionswhichdon’thavevaluesarereturnandbreak,orreturn/breakfollowedbyapostpositionmodifier,suchasanifclause.Foradetaileddefinitionofwhatitmeansto“haveavalue”,seechapter12,“SyntaxTreeConstruction”.Inthesameway,primary_valueisaprimarywhichhasavalue.

Asexplainedearlier,klBEGINandklENDrepresentBEGINandEND.

▼stmt(2)

|lhs'='command_call|mlhs'='command_call|var_lhstOP_ASGNcommand_call|primary_value'['aref_args']'tOP_ASGNcommand_call|primary_value'.'tIDENTIFIERtOP_ASGNcommand_call|primary_value'.'tCONSTANTtOP_ASGNcommand_call|primary_valuetCOLON2tIDENTIFIERtOP_ASGNcommand_call|backreftOP_ASGNcommand_call

Lookingattheserulesallatonceistherightapproach.Thecommonpointisthattheyallhavecommand_callontheright-handside.command_callrepresentsamethodcallwiththeparenthesesomitted.Thenewsymbolswhichareintroducedhereareexplainedinthefollowingtable.Ihopeyou’llrefertothetableasyoucheckovereachgrammarrule.

lhs thelefthandsideofanassignment(LeftHandSide)

mlhs thelefthandsideofamultipleassignment(MultipleLeftHandSide)

var_lhs thelefthandsideofanassignmenttoakindofvariable(VARiableLeftHandSide)

tOP_ASGN compoundassignmentoperatorlike+=or*=(OPeratorASsiGN)

aref_args argumenttoa[]methodcall(ArrayREFerence)tIDENTIFIER identifierwhichcanbeusedasalocalvariabletCONSTANT constantidentifier(withleadinguppercaseletter)tCOLON2 ::backref $1$2$3...

arefisaLispjargon.There’salsoasetastheothersideofapair,whichisanabbreviationof“arrayset”.Thisabbreviationisusedatalotofplacesinthesourcecodeofruby.

▼stmt(3)

|lhs'='mrhs_basic|mlhs'='mrhs

Thesetwoaremultipleassignments.mrhshasthesamestructureas

mlhsanditmeansmultiplerhs(therighthandside).We’vecometorecognizethatknowingthemeaningsofnamesmakesthecomprehensionmucheasier.

▼stmt(4)

|expr

Lastly,itjoinstoexpr.

expr

▼expr

expr:kRETURNcall_args|kBREAKcall_args|kNEXTcall_args|command_call|exprkANDexpr|exprkORexpr|kNOTexpr|'!'command_call|arg

Expression.Theexpressionofrubyisverysmallingrammar.That’sbecausethoseordinarycontainedinexpraremostlywentintoarg.Converselyspeaking,thosewhocouldnotgotoargarelefthere.Andwhatareleftare,again,methodcallswithoutparentheses.call_argsisanbareargumentlist,command_callis,aspreviouslymentioned,amethodwithoutparentheses.Ifthiskindofthingswascontainedinthe“small”unit,itwouldcauseconflicts

tremendously.

However,thesetwobelowareofdifferentkind.

exprkANDexprexprkORexpr

kANDis“and”,andkORis“or”.Sincethesetwohavetheirrolesascontrolstructures,theymustbecontainedinthe“big”syntaxunitwhichislargerthancommand_call.Andsincecommand_calliscontainedinexpr,atleasttheyneedtobeexprtogowell.Forexample,thefollowingusageispossible…

valid_items.include?argorraiseArgumentError,'invalidarg'#valid_items.include?(arg)orraise(ArgumentError,'invalidarg')

However,iftheruleofkORexistedinarginsteadofexpr,itwouldbejoinedasfollows.

valid_items.include?((argorraise))ArgumentError,'invalidarg'

Obviously,thiswouldendupaparseerror.

arg

▼arg

arg:lhs'='arg|var_lhstOP_ASGNarg|primary_value'['aref_args']'tOP_ASGNarg|primary_value'.'tIDENTIFIERtOP_ASGNarg

|primary_value'.'tCONSTANTtOP_ASGNarg|primary_valuetCOLON2tIDENTIFIERtOP_ASGNarg|backreftOP_ASGNarg|argtDOT2arg|argtDOT3arg|arg'+'arg|arg'-'arg|arg'*'arg|arg'/'arg|arg'%'arg|argtPOWarg|tUPLUSarg|tUMINUSarg|arg'|'arg|arg'^'arg|arg'&'arg|argtCMParg|arg'>'arg|argtGEQarg|arg'<'arg|argtLEQarg|argtEQarg|argtEQQarg|argtNEQarg|argtMATCHarg|argtNMATCHarg|'!'arg|'~'arg|argtLSHFTarg|argtRSHFTarg|argtANDOParg|argtOROParg|kDEFINEDopt_nlarg|arg'?'arg':'arg|primary

Althoughtherearemanyruleshere,thecomplexityofthegrammarisnotproportionatetothenumberofrules.Agrammarthatmerelyhasalotofcasescanbehandledveryeasilybyyacc,rather,thedepthorrecursiveoftheruleshasmoreinfluencesthe

complexity.

Then,itmakesuscuriousabouttherulesaredefinedrecursivelyintheformofargOPargattheplaceforoperators,butbecauseforalloftheseoperatorstheiroperatorprecedencesaredefined,thisisvirtuallyonlyamereenumeration.Let’scutthe“mereenumeration”outfromtheargrulebymerging.

arg:lhs'='arg/*1*/|primaryT_opeqarg/*2*/|argT_infixarg/*3*/|T_prearg/*4*/|arg'?'arg':'arg/*5*/|primary/*6*/

There’snomeaningtodistinguishterminalsymbolsfromlistsofterminalsymbols,theyareallexpressedwithsymbolswithT_.opeqisoperator+equal,T_prerepresentstheprepositionaloperatorssuchas'!'and'~',T_infixrepresentstheinfixoperatorssuchas'*'and'%'.

Toavoidconflictsinthisstructure,thingslikewrittenbelowbecomeimportant(but,thesedoesnotcoverall).

T_infixshouldnotcontain'='.

Sinceargspartiallyoverlapslhs,if'='iscontained,therule1andtherule3cannotbedistinguished.

T_opeqandT_infixshouldnothaveanycommonrule.

Sinceargscontainsprimary,iftheyhaveanycommonrule,therule2andtherule3cannotbedistinguished.

T_infixshouldnotcontain'?'.

Ifitcontains,therule3and5wouldproduceashift/reduceconflict.

T_preshouldnotcontain'?'or':'.

Ifitcontains,therule4and5wouldconflictinaverycomplicatedway.

Theconclusionisallrequirementsaremetandthisgrammardoesnotconflict.Wecouldsayit’samatterofcourse.

primary

Becauseprimaryhasalotofgrammarrules,we’llsplitthemupandshowtheminparts.

▼primary(1)

primary:literal|strings|xstring|regexp|words|qwords

Literals.literalisforSymbolliterals(:sym)andnumbers.

▼primary(2)

|var_ref|backref|tFID

Variables.var_refisforlocalvariablesandinstancevariablesandetc.backrefisfor$1$2$3…tFIDisfortheidentifierswith!or?,say,include?reject!.There’snopossibilityoftFIDbeingalocalvariable,evenifitappearssolely,itbecomesamethodcallattheparserlevel.

▼primary(3)

|kBEGINbodystmtkEND

bodystmtcontainsrescueandensure.Itmeansthisisthebeginoftheexceptioncontrol.

▼primary(4)

|tLPAREN_ARGexpr')'|tLPARENcompstmt')'

Thishasalreadydescribed.Syntaxdemoting.

▼primary(5)

|primary_valuetCOLON2tCONSTANT

|tCOLON3cname

Constantreferences.tCONSTANTisforconstantnames(capitalizedidentifiers).

BothtCOLON2andtCOLON3are::,buttCOLON3representsonlythe::whichmeansthetoplevel.Inotherwords,itisthe::of::Const.The::ofNet::SMTPistCOLON2.

Thereasonwhydifferentsymbolsareusedforthesametokenistodealwiththemethodswithoutparentheses.Forexample,itistodistinguishthenexttwofromeachother:

pNet::HTTP#p(Net::HTTP)pNet::HTTP#p(Net(::HTTP))

Ifthere’saspaceoradelimitercharactersuchasanopenparenthesisjustbeforeit,itbecomestCOLON3.Intheothercases,itbecomestCOLON2.

▼primary(6)

|primary_value'['aref_args']'

Index-formcalls,forinstance,arr[i].

▼primary(7)

|tLBRACKaref_args']'|tLBRACEassoc_list'}'

ArrayliteralsandHashliterals.ThistLBRACKrepresentsalso'[','['meansa'['withoutaspaceinfrontofit.Thenecessityofthisdifferentiationisalsoasideeffectofmethodcallswithoutparentheses.

Theterminalsymbolsofthisruleisveryincomprehensiblebecausetheydiffersinjustacharacter.Thefollowingtableshowshowtoreadeachtypeofparentheses,soI’dlikeyoutomakeuseofitwhenreading.

▼Englishnamesforeachparentheses

Symbol EnglishName() parentheses{} braces[] brackets

▼primary(8)

|kRETURN|kYIELD'('call_args')'|kYIELD'('')'|kYIELD|kDEFINEDopt_nl'('expr')'

Syntaxeswhoseformsaresimilartomethodcalls.Respectively,return,yield,defined?.

Thereargumentsforyield,butreturndoesnothaveanyarguments.Why?Thefundamentalreasonisthatyielditselfhasitsreturnvaluebutreturndoesnot.However,evenifthere’snot

anyargumentshere,itdoesnotmeanyoucannotpassvalues,ofcourse.Therewasthefollowingruleinexpr.

kRETURNcall_args

call_argsisabareargumentlist,soitcandealwithreturn1orreturnnil.Thingslikereturn(1)arehandledasreturn(1).Forthisreason,surroundingthemultipleargumentsofareturnwithparenthesesasinthefollowingcodeshouldbeimpossible.

return(1,2,3)#interpretedasreturn(1,2,3)andresultsinparseerror

Youcouldunderstandmoreaboutaroundhereifyouwillcheckthisagainafterreadingthenextchapter“Finite-StateScanner”.

▼primary(9)

|operationbrace_block|method_call|method_callbrace_block

Methodcalls.method_calliswitharguments(alsowithparentheses),operationiswithoutbothargumentsandparentheses,brace_blockiseither{~}ordo~endandifitisattachedtoamethod,themethodisaniterator.Forthequestion“Eventhoughitisbrace,whyisdo~endcontainedinit?”,there’sareasonthatismoreabyssalthanMarianTrench,butagaintheonlywaytounderstandisreadingthenextchapter“Finite-StateScanner”.

▼primary(10)

|kIFexpr_valuethencompstmtif_tailkEND#if|kUNLESSexpr_valuethencompstmtopt_elsekEND#unless|kWHILEexpr_valuedocompstmtkEND#while|kUNTILexpr_valuedocompstmtkEND#until|kCASEexpr_valueopt_termscase_bodykEND#case|kCASEopt_termscase_bodykEND#case(Form2)|kFORblock_varkINexpr_valuedocompstmtkEND#for

Thebasiccontrolstructures.Alittleunexpectedly,thingsappeartobethisbigareputinsideprimary,whichis“small”.Becauseprimaryisalsoarg,wecanalsodosomethinglikethis.

p(iftruethen'ok'end)#shows"ok"

Imentioned“almostallsyntaxelementsareexpressions”wasoneofthetraitsofRuby.Itisconcretelyexpressedbythefactthatifandwhileareinprimary.

Whyistherenoproblemifthese“big”elementsarecontainedinprimary?That’sbecausetheRuby’ssyntaxhasatraitthat“itbeginswiththeterminalsymbolAandendswiththeterminalsymbolB”.Inthenextsection,we’llthinkaboutthispointagain.

▼primary(11)

|kCLASScnamesuperclassbodystmtkEND#classdefinition|kCLASStLSHFTexprtermbodystmtkEND#singletonclassdefinition|kMODULEcnamebodystmtkEND#moduledefinition|kDEFfnamef_arglistbodystmtkEND#methoddefinition|kDEFsingletondot_or_colonfnamef_arglistbodystmtkEND#singletonmethoddefinition

Definitionstatements.I’vecalledthemtheclassstatementsandtheclassstatements,butessentiallyIshouldhavebeencalledthemtheclassprimaries,probably.Theseareallfitthepattern“beginningwiththeterminalsymbolAandendingwithB”,evenifsuchrulesareincreasedalotmore,itwouldneverbeaproblem.

▼primary(12)

|kBREAK|kNEXT|kREDO|kRETRY

Variousjumps.Theseare,well,notimportantfromtheviewpointofgrammar.

ConflictingListsIntheprevioussection,thequestion“isitallrightthatifisinsuchprimary?”wassuggested.Toproofpreciselyisnoteasy,butexplaininginstinctivelyisrelativelyeasy.Here,let’ssimulatewithasmallruledefinedasfollows:

%tokenABo%%element:Aitem_listB

item_list:|item_listitem

item:element

|o

elementistheelementthatwearegoingtoexamine.Forexample,ifwethinkaboutif,itwouldbeif.elementisalistthatstartswiththeterminalsymbolAandendswithB.Asforif,itstartswithifandendswithend.Theocontentsaremethodsorvariablereferencesorliterals.Foranelementofthelist,theoorelementisnesting.

Withtheparserbasedonthisgrammar,let’strytoparsethefollowinginput.

AAoooBoAoAoooBoBB

Theyarenestingtoomanytimesforhumanstocomprehendwithoutsomehelpssuchasindents.Butitbecomesrelativelyeasyifyouthinkinthenextway.Becauseit’scertainthatAandBwhichcontainonlyseveralobetweenthemaregoingtoappear,replacethemtoasingleowhentheyappear.Allwehavetodoisrepeatingthisprocedure.Figure4showstheconsequence.

Figure4:parsealistwhichstartswithAandendswithB

However,iftheendingBismissing,…

%tokenAo

%%element:Aitem_list/*Bisdeletedforanexperiment*/

item_list:|item_listitem

item:element|o

Iprocessedthiswithyaccandgot2shift/reduceconflicts.Itmeansthisgrammarisambiguous.IfwesimplytakeBoutfromthepreviousone,Theinputwouldbeasfollows.

AAooooAoAoooo

Thisishardtointerpretinanyway.However,therewasarulethat“chooseshiftifitisashift/reduceconflict”,let’sfollowitasanexperimentandparsetheinputwithshift(meaninginterior)whichtakesprecedence.(Figure5)

Figure5:parsealistoflistswhichstartwithA

Itcouldbeparsed.However,thisiscompletelydifferentfromtheintentionoftheinput,therebecomesnowaytosplitthelistinthemiddle.

Actually,themethodswithoutparenthesesofRubyisinthesimilarsituationtothis.It’snotsoeasytounderstandbutapairof

amethodnameanditsfirstargumentisA.Thisisbecause,sincethere’snocommaonlybetweenthetwo,itcanberecognizedasthestartofanewlist.

Also,the“practical”HTMLcontainsthispattern.Itis,forinstance,when</p>or</i>isomitted.That’swhyyacccouldnotbeusedforordinaryHTMLatall.

Scanner

ParserOutlineI’llexplainabouttheoutlineoftheparserbeforemovingontothescanner.TakealookatFigure6.

Figure6:ParserInterface(CallGraph)

Therearethreeofficialinterfacesoftheparser:rb_compile_cstr(),rb_compile_string(),rb_compile_file().TheyreadaprogramfromCstring,aRubystringobjectandaRubyIOobject,respectively,andcompileit.

Thesefunctions,directlyorindirectly,callyycompile(),andintheend,thecontrolwillbecompletelymovedtoyyparse(),whichisgeneratedbyyacc.Sincetheheartoftheparserisnothingbutyyparse(),it’snicetounderstandbyplacingyyparse()atthecenter.Inotherwords,functionsbeforemovingontoyyparse()areallpreparations,andfunctionsafteryyparse()aremerelychorefunctionsbeingpushedaroundbyyyparse().

Therestfunctionsinparse.yareauxiliaryfunctionscalledbyyylex(),andthesecanalsobeclearlycategorized.

First,theinputbufferisatthelowestlevelofthescanner.rubyisdesignedsothatyoucaninputsourceprogramsviabothRubyIOobjectsandstrings.Theinputbufferhidesthatandmakesitlooklikeasinglebytestream.

Thenextlevelisthetokenbuffer.Itreads1byteatatimefromtheinputbuffer,andkeepsthemuntilitwillformatoken.

Therefore,thewholestructureofyylexcanbedepictedasFigure7.

Figure7:Thewholepictureofthescanner

TheinputbufferLet’sstartwiththeinputbuffer.Itsinterfacesareonlythethree:nextc(),pushback(),peek().

Althoughthisissortofinsistent,Isaidthefirstthingistoinvestigatedatastructures.Thevariablesusedbytheinputbufferarethefollowings:

▼theinputbuffer

2279staticchar*lex_pbeg;2280staticchar*lex_p;2281staticchar*lex_pend;

(parse.y)

Thebeginning,thecurrentpositionandtheendofthebuffer.Apparently,thisbufferseemsasimplesingle-linestringbuffer(Figure8).

Figure8:Theinputbuffer

nextc()

Then,let’slookattheplacesusingthem.First,I’llstartwithnextc()thatseemsthemostorthodox.

▼nextc()

2468staticinlineint2469nextc()2470{2471intc;24722473if(lex_p==lex_pend){2474if(lex_input){2475VALUEv=lex_getline();24762477if(NIL_P(v))return-1;2478if(heredoc_end>0){2479ruby_sourceline=heredoc_end;2480heredoc_end=0;2481}2482ruby_sourceline++;2483lex_pbeg=lex_p=RSTRING(v)->ptr;2484lex_pend=lex_p+RSTRING(v)->len;2485lex_lastline=v;2486}2487else{2488lex_lastline=0;2489return-1;2490}2491}2492c=(unsignedchar)*lex_p++;2493if(c=='\r'&&lex_p<=lex_pend&&*lex_p=='\n'){2494lex_p++;2495c='\n';2496}24972498returnc;2499}

(parse.y)

Itseemsthatthefirstifistotestifitreachestheendoftheinputbuffer.And,theifinsideofitseems,sincetheelsereturns-1(EOF),totesttheendofthewholeinput.Converselyspeaking,whentheinputends,lex_inputbecomes0.((errata:itdoesnot.lex_inputwillneverbecome0duringordinaryscan.))

Fromthis,wecanseethatstringsarecomingbitbybitintotheinputbuffer.Sincethenameofthefunctionwhichupdatesthebufferislex_getline,it’sdefinitethateachlinecomesinatatime.

Hereisthesummary:

if(reachedtheendofthebuffer)if(stillthere'smoreinput)readthenextlineelsereturnEOFmovethepointerforwardskipreadingCRofCRLFreturnc

Let’salsolookatthefunctionlex_getline(),whichprovideslines.Thevariablesusedbythisfunctionareshowntogetherinthefollowing.

▼lex_getline()

2276staticVALUE(*lex_gets)();/*getsfunction*/2277staticVALUElex_input;/*non-nilifFile*/

2420staticVALUE

2421lex_getline()2422{2423VALUEline=(*lex_gets)(lex_input);2424if(ruby_debug_lines&&!NIL_P(line)){2425rb_ary_push(ruby_debug_lines,line);2426}2427returnline;2428}

(parse.y)

Exceptforthefirstline,thisisnotimportant.Apparently,lex_getsshouldbethepointertothefunctiontoreadaline,lex_inputshouldbetheactualinput.Isearchedtheplacewheresettinglex_getsandthisiswhatIfound:

▼setlex_gets

2430NODE*2431rb_compile_string(f,s,line)2432constchar*f;2433VALUEs;2434intline;2435{2436lex_gets=lex_get_str;2437lex_gets_ptr=0;2438lex_input=s;

2454NODE*2455rb_compile_file(f,file,start)2456constchar*f;2457VALUEfile;2458intstart;2459{2460lex_gets=rb_io_gets;2461lex_input=file;

(parse.y)

rb_io_gets()isnotaexclusivefunctionfortheparserbutoneofthegeneral-purposelibraryofRuby.ItisthefunctiontoreadalinefromanIOobject.

Ontheotherhand,lex_get_str()isdefinedasfollows:

▼lex_get_str()

2398staticintlex_gets_ptr;

2400staticVALUE2401lex_get_str(s)2402VALUEs;2403{2404char*beg,*end,*pend;24052406beg=RSTRING(s)->ptr;2407if(lex_gets_ptr){2408if(RSTRING(s)->len==lex_gets_ptr)returnQnil;2409beg+=lex_gets_ptr;2410}2411pend=RSTRING(s)->ptr+RSTRING(s)->len;2412end=beg;2413while(end<pend){2414if(*end++=='\n')break;2415}2416lex_gets_ptr=end-RSTRING(s)->ptr;2417returnrb_str_new(beg,end-beg);2418}

(parse.y)

lex_gets_ptrrememberstheplaceithavealreadyread.Thismovesittothenext\n,andsimultaneouslycutoutattheplaceandreturnit.

Here,let’sgobacktonextc.Asdescribed,bypreparingthetwofunctionswiththesameinterface,itswitchthefunctionpointerwheninitializingtheparser,andtheotherpartisusedincommon.Itcanalsobesaidthatthedifferenceofthecodeisconvertedtothedataandabsorbed.Therewasalsoasimilarmethodofst_table.

pushback()

Withtheknowledgeofthephysicalstructureofthebufferandnextc,wecanunderstandtheresteasily.pushback()writesbackacharacter.IfputitinC,itisungetc().

▼pushback()

2501staticvoid2502pushback(c)2503intc;2504{2505if(c==-1)return;2506lex_p--;2507}

(parse.y)

peek()

peek()checksthenextcharacterwithoutmovingthepointerforward.

▼peek()

2509#definepeek(c)(lex_p!=lex_pend&&(c)==*lex_p)

(parse.y)

TheTokenBufferThetokenbufferisthebufferofthenextlevel.Itkeepsthestringsuntilatokenwillbeabletocutout.Therearethefiveinterfacesasfollows:

newtok beginanewtokentokadd addacharactertothebuffertokfix fixatokentok thepointertothebeginningofthebufferedstringtoklen thelengthofthebufferedstringtoklast thelastbyteofthebufferedstring

Now,we’llstartwiththedatastructures.

▼TheTokenBuffer

2271staticchar*tokenbuf=NULL;2272staticinttokidx,toksiz=0;

(parse.y)

tokenbufisthebuffer,tokidxistheendofthetoken(sinceitisofint,itseemsanindex),andtoksizisprobablythebufferlength.Thisisalsosimplystructured.Ifdepictingit,itwouldlooklikeFigure9.

Figure9:Thetokenbuffer

Let’scontinuouslygototheinterfaceandreadnewtok(),whichstartsanewtoken.

▼newtok()

2516staticchar*2517newtok()2518{2519tokidx=0;2520if(!tokenbuf){2521toksiz=60;2522tokenbuf=ALLOC_N(char,60);2523}2524if(toksiz>4096){2525toksiz=60;2526REALLOC_N(tokenbuf,char,60);2527}2528returntokenbuf;2529}

(parse.y)

Theinitializinginterfaceofthewholebufferdoesnotexist,it’spossiblethatthebufferisnotinitialized.Therefore,thefirstifchecksitandinitializesit.ALLOC_N()isthemacrorubydefinesandisalmostthesameascalloc.

Theinitialvalueoftheallocatinglengthis60,andifitbecomestoobig(>4096),itwouldbereturnedbacktosmall.Sinceatokenbecomingthislongisunlikely,thissizeisrealistic.

Next,let’slookatthetokadd()toaddacharactertotokenbuffer.

▼tokadd()

2531staticvoid2532tokadd(c)2533charc;2534{2535tokenbuf[tokidx++]=c;2536if(tokidx>=toksiz){2537toksiz*=2;2538REALLOC_N(tokenbuf,char,toksiz);2539}2540}

(parse.y)

Atthefirstline,acharacterisadded.Then,itchecksthetokenlengthandifitseemsabouttoexceedthebufferend,itperformsREALLOC_N().REALLOC_N()isarealloc()whichhasthesamewayofspecifyingargumentsascalloc().

Therestinterfacesaresummarizedbelow.

▼tokfix()tok()toklen()toklast()

2511#definetokfix()(tokenbuf[tokidx]='\0')2512#definetok()tokenbuf2513#definetoklen()tokidx2514#definetoklast()(tokidx>0?tokenbuf[tokidx-1]:0)

(parse.y)

There’sprobablynoquestion.

yylex()

yylex()isverylong.Currently,therearemorethan1000lines.Themostofthemisoccupiedbyahugeswitchstatement,itbranchesbasedoneachcharacter.First,I’llshowthewholestructurethatsomepartsofitareleftout.

▼yylexoutline

3106staticint3107yylex()3108{3109staticIDlast_id=0;3110registerintc;3111intspace_seen=0;3112intcmd_state;31133114if(lex_strterm){/*...stringscan...*/3131returntoken;3132}3133cmd_state=command_start;3134command_start=Qfalse;3135retry:3136switch(c=nextc()){3137case'\0':/*NUL*/3138case'\004':/*^D*/3139case'\032':/*^Z*/3140case-1:/*endofscript.*/3141return0;31423143/*whitespaces*/

3144case'':case'\t':case'\f':case'\r':3145case'\13':/*'\v'*/3146space_seen++;3147gotoretry;31483149case'#':/*it'sacomment*/3150while((c=nextc())!='\n'){3151if(c==-1)3152return0;3153}3154/*fallthrough*/3155case'\n':/*...omission...*/

casexxxx::break;:/*branchesalotforeachcharacter*/::4103default:4104if(!is_identchar(c)||ISDIGIT(c)){4105rb_compile_error("Invalidchar`\\%03o'inexpression",c);4106gotoretry;4107}41084109newtok();4110break;4111}

/*...dealwithordinaryidentifiers...*/}

(parse.y)

Asforthereturnvalueofyylex(),zeromeansthattheinputhasfinished,non-zeromeansasymbol.

Becarefulthataextremelyconcisevariablenamed“c”isusedall

overthisfunction.space_seen++whenreadingaspacewillbecomehelpfullater.

Allithastodoastherestistokeepbranchingforeachcharacterandprocessingit,butsincecontinuousmonotonicprocedureislasting,itisboringforreaders.Therefore,we’llnarrowthemdowntoafewpoints.Inthisbooknotallcharacterswillbeexplained,butitiseasyifyouwillamplifythesamepattern.

'!'

Let’sstartwithwhatissimplefirst.

▼yylex–'!'

3205case'!':3206lex_state=EXPR_BEG;3207if((c=nextc())=='='){3208returntNEQ;3209}3210if(c=='~'){3211returntNMATCH;3212}3213pushback(c);3214return'!';

(parse.y)

Iwrouteoutthemeaningofthecode,soI’dlikeyoutoreadthembycomparingeachother.

case'!':movetoEXPR_BEGif(thenextcharacteris'='then){

tokenis「!=(tNEQ)」}if(thenextcharacteris'~'then){tokenis「!~(tNMATCH)」}ifitisneither,pushthereadcharacterbacktokenis'!'

Thiscaseclauseisshort,butdescribestheimportantruleofthescanner.Itis“thelongestmatchrule”.Thetwocharacters"!="canbeinterpretedintwoways:“!and=”or“!=”,butinthiscase"!="mustbeselected.Thelongestmatchisessentialforscannersofprogramminglanguages.

And,lex_stateisthevariablerepresentsthestateofthescanner.Thiswillbediscussedtoomuchinthenextchapter“Finite-StateScanner”,youcanignoreitfornow.EXPR_BEGindicates“itisclearlyatthebeginning”.Thisisbecausewhicheveritis!ofnotoritis!=oritis!~,itsnextsymbolisthebeginningofanexpression.

'<'

Next,we’lltrytolookat'<'asanexampleofusingyylval(thevalueofasymbol).

▼yylex−'&gt;'

3296case'>':3297switch(lex_state){3298caseEXPR_FNAME:caseEXPR_DOT:3299lex_state=EXPR_ARG;break;3300default:

3301lex_state=EXPR_BEG;break;3302}3303if((c=nextc())=='='){3304returntGEQ;3305}3306if(c=='>'){3307if((c=nextc())=='='){3308yylval.id=tRSHFT;3309lex_state=EXPR_BEG;3310returntOP_ASGN;3311}3312pushback(c);3313returntRSHFT;3314}3315pushback(c);3316return'>';

(parse.y)

Theplacesexceptforyylvalcanbeignored.Concentratingonlyonepointwhenreadingaprogramisessential.

Atthispoint,forthesymboltOP_ASGNof>>=,itsetitsvaluetRSHIFT.Sincetheusedunionmemberisid,itstypeisID.tOP_ASGNisthesymbolofselfassignment,itrepresentsallofthethingslike+=and-=and*=.Inordertodistinguishthemlater,itpassesthetypeoftheselfassignmentasavalue.

Thereasonwhytheselfassignmentsarebundledis,itmakestheruleshorter.Bundlingthingsthatcanbebundledatthescannerasmuchaspossiblemakestherulemoreconcise.Then,whyarethebinaryarithmeticoperatorsnotbundled?Itisbecausetheydiffersintheirprecedences.

':'

Ifscanningiscompletelyindependentfromparsing,thistalkwouldbesimple.Butinreality,itisnotthatsimple.TheRubygrammarisparticularlycomplex,ithasasomewhatdifferentmeaningwhenthere’saspaceinfrontofit,thewaytosplittokensischangeddependingonthesituationaround.Thecodeof':'shownbelowisanexamplethataspacechangesthebehavior.

▼yylex−':'

3761case':':3762c=nextc();3763if(c==':'){3764if(lex_state==EXPR_BEG||lex_state==EXPR_MID||3765(IS_ARG()&&space_seen)){3766lex_state=EXPR_BEG;3767returntCOLON3;3768}3769lex_state=EXPR_DOT;3770returntCOLON2;3771}3772pushback(c);3773if(lex_state==EXPR_END||lex_state==EXPR_ENDARG||ISSPACE(c)){3774lex_state=EXPR_BEG;3775return':';3776}3777lex_state=EXPR_FNAME;3778returntSYMBEG;

(parse.y)

Again,ignoringthingsrelatingtolex_state,I’dlikeyoufocusonaroundspace_seen.

space_seenisthevariablethatbecomestruewhenthere’saspacebeforeatoken.Ifitismet,meaningthere’saspaceinfrontof'::',itbecomestCOLON3,ifthere’snot,itseemstobecometCOLON2.ThisisasIexplainedatprimaryintheprevioussection.

IdentifierUntilnow,sincetherewereonlysymbols,itwasjustacharacteror2characters.Thistime,we’lllookatalittlelongthings.Itisthescanningpatternofidentifiers.

First,theoutlineofyylexwasasfollows:

yylex(...){switch(c=nextc()){casexxxx:....casexxxx:....default:}

thescanningcodeofidentifiers}

Thenextcodeisanextractfromtheendofthehugeswitch.Thisisrelativelylong,soI’llshowitwithcomments.

▼yylex—identifiers

4081case'@':/*aninstancevariableoraclassvariable*/4082c=nextc();

4083newtok();4084tokadd('@');4085if(c=='@'){/*@@,meaningaclassvariable*/4086tokadd('@');4087c=nextc();4088}4089if(ISDIGIT(c)){/*@1andsuch*/4090if(tokidx==1){4091rb_compile_error("`@%c'isnotavalidinstancevariablename",c);4092}4093else{4094rb_compile_error("`@@%c'isnotavalidclassvariablename",c);4095}4096}4097if(!is_identchar(c)){/*astrangecharacterappearsnextto@*/4098pushback(c);4099return'@';4100}4101break;41024103default:4104if(!is_identchar(c)||ISDIGIT(c)){4105rb_compile_error("Invalidchar`\\%03o'inexpression",c);4106gotoretry;4107}41084109newtok();4110break;4111}41124113while(is_identchar(c)){/*betweencharactersthatcanbeusedasidentifieres*/4114tokadd(c);4115if(ismbchar(c)){/*ifitistheheadbyteofamulti-bytecharacter*/4116inti,len=mbclen(c)-1;41174118for(i=0;i<len;i++){4119c=nextc();4120tokadd(c);4121}4122}4123c=nextc();4124}4125if((c=='!'||c=='?')&&

is_identchar(tok()[0])&&!peek('=')){/*theendcharacterofname!orname?*/4126tokadd(c);4127}4128else{4129pushback(c);4130}4131tokfix();

(parse.y)

Finally,I’dlikeyoufocusontheconditionattheplacewhereadding!or?.Thispartistointerpretinthenextway.

obj.m=1#obj.m=1(notobj.m=)obj.m!=1#obj.m!=1(notobj.m!)

((errata:thiscodeisnotrelatingtothatcondition))

Thisis“not”longest-match.The“longest-match”isaprinciplebutnotaconstraint.Sometimes,youcanrefuseit.

ThereservedwordsAfterscanningtheidentifiers,thereareabout100linesofthecodefurthertodeterminetheactualsymbols.Inthepreviouscode,instancevariables,classvariablesandlocalvariables,theyarescannedallatonce,buttheyarecategorizedhere.

ThisisOKbut,insideitthere’salittlestrangepart.Itistheparttofilterthereservedwords.Sincethereservedwordsarenotdifferentfromlocalvariablesinitscharactertype,scanninginabundleand

categorizinglaterismoreefficient.

Then,assumethere’sstrthatisachar*string,howcanwedeterminewhetheritisareservedword?First,ofcourse,there’sawayofcomparingalotbyifstatementsandstrcmp().However,thisiscompletelynotsmart.Itisnotflexible.Itsspeedwillalsolinearlyincrease.Usually,onlythedatawouldbeseparatedtoalistorahashinordertokeepthecodeshort.

/*convertthecodetodata*/structentry{char*name;intsymbol;};structentry*table[]={{"if",kIF},{"unless",kUNLESS},{"while",kWHILE},/*……omission……*/};

{....returnlookup_symbol(table,tok());}

Then,howrubyisdoingisthat,itusesahashtable.Furthermore,itisaperfecthash.AsIsaidwhentalkingaboutst_table,ifyouknewthesetofthepossiblekeysbeforehand,sometimesyoucouldcreateahashfunctionthatneverconflicts.Asforthereservedwords,“thesetofthepossiblekeysisknownbeforehand”,soitislikelythatwecancreateaperfecthashfunction.

But,“beingabletocreate”andactuallycreatingaredifferent.Creatingmanuallyistoomuchcumbersome.Sincethereservedwordscanincreaseordecrease,thiskindofprocessmustbe

automated.

Therefore,gperfcomesin.gperfisoneofGNUproducts,itgeneratesaperfectfunctionfromasetofvalues.Inordertoknowtheusageofgperfitselfindetail,Irecommendtodomangperf.Here,I’llonlydescribehowtousethegeneratedresult.

Inrubytheinputfileforgperfiskeywordsandtheoutputislex.c.parse.ydirectly#includeit.Basically,doing#includeCfilesisnotgood,butperformingnon-essentialfileseparationforjustonefunctionisworse.Particularly,inruby,there'sthepossibilitythatextern+functionsareusedbyextensionlibrarieswithoutbeingnoticed,thusthefunctionthatdoesnotwanttokeepitscompatibilityshouldbestatic.

Then,inthelex.c,afunctionnamedrb_reserved_word()isdefined.Bycallingitwiththechar*ofareservedwordaskey,youcanlookup.ThereturnvalueisNULLifnotfound,structkwtable*iffound(inotherwords,iftheargumentisareservedword).Thedefinitionofstructkwtableisasfollows:

▼kwtable

1structkwtable{char*name;intid[2];enumlex_statestate;};

(keywords)

nameisthenameofthereservedword,id[0]isitssymbol,id[1]isitssymbolasamodification(kIF_MODandsuch).lex_stateis“the

lex_stateshouldbemovedtoafterreadingthisreservedword”.lex_statewillbeexplainedinthenextchapter.

Thisistheplacewhereactuallylookingup.

▼yylex()—identifier—callrb_reserved_word()

4173structkwtable*kw;41744175/*Seeifitisareservedword.*/4176kw=rb_reserved_word(tok(),toklen());4177if(kw){

(parse.y)

StringsThedoublequote(")partofyylex()isthis.

▼yylex−'"'

3318case'"':3319lex_strterm=NEW_STRTERM(str_dquote,'"',0);3320returntSTRING_BEG;

(parse.y)

Surprisinglyitfinishesafterscanningonlythefirstcharacter.Then,thistime,whentakingalookattherule,tSTRING_BEGisfoundinthefollowingpart:

▼rulesforstrings

string1:tSTRING_BEGstring_contentstSTRING_END

string_contents:|string_contentsstring_content

string_content:tSTRING_CONTENT|tSTRING_DVARstring_dvar|tSTRING_DBEGterm_pushcompstmt'}'

string_dvar:tGVAR|tIVAR|tCVAR|backref

term_push:

Theserulesarethepartintroducedtodealwithembeddedexpressionsinsideofstrings.tSTRING_CONTENTisliteralpart,tSTRING_DBEGis"#{".tSTRING_DVARrepresents“#thatinfrontofavariable”.Forexample,

".....#$gvar...."

thiskindofsyntax.Ihavenotexplainedbutwhentheembeddedexpressionisonlyavariable,{and}canbeleftout.Butthisisoftennotrecommended.DofDVAR,DBEGseemstheabbreviationofdynamic.

And,backrefrepresentsthespecialvariablesrelatingtoregularexpressions,suchas$1$2or$&$'.

term_pushis“aruledefinedforitsaction”.

Now,we’llgobacktoyylex()here.Ifitsimplyreturnstheparser,sinceitscontextisthe“interior”ofastring,itwouldbeaproblemifavariableandifandothersaresuddenlyscannedinthenextyylex().Whatplaysanimportantrolethereis…

case'"':lex_strterm=NEW_STRTERM(str_dquote,'"',0);returntSTRING_BEG;

…lex_strterm.Let’sgobacktothebeginningofyylex().

▼thebeginningofyylex()

3106staticint3107yylex()3108{3109staticIDlast_id=0;3110registerintc;3111intspace_seen=0;3112intcmd_state;31133114if(lex_strterm){/*scanningstring*/3131returntoken;3132}3133cmd_state=command_start;3134command_start=Qfalse;3135retry:3136switch(c=nextc()){

(parse.y)

Iflex_strtermexists,itentersthestringmodewithoutasking.Itmeans,converselyspeaking,ifthere’slex_strterm,itiswhilescanningstring,andwhenparsingtheembeddedexpressions

insidestrings,youhavetosetlex_strtermto0.And,whentheembeddedexpressionends,youhavetosetitback.Thisisdoneinthefollowingpart:

▼string_content

1916string_content:....1917|tSTRING_DBEGterm_push1918{1919$<num>1=lex_strnest;1920$<node>$=lex_strterm;1921lex_strterm=0;1922lex_state=EXPR_BEG;1923}1924compstmt'}'1925{1926lex_strnest=$<num>1;1927quoted_term=$2;1928lex_strterm=$<node>3;1929if(($$=$4)&&nd_type($$)==NODE_NEWLINE){1930$$=$$->nd_next;1931rb_gc_force_recycle((VALUE)$4);1932}1933$$=NEW_EVSTR($$);1934}

(parse.y)

Intheembeddedaction,lex_streamissavedasthevalueoftSTRING_DBEG(virtually,thisisastackpush),itrecoversintheordinaryaction(pop).Thisisafairlysmartway.

Butwhyisitdoingthistediousthing?Can’titbedoneby,afterscanningnormally,callingyyparse()recursivelyatthepointwhenitfinds#{?There’sactuallyaproblem.yyparse()can’tbecalled

recursively.Thisisthewellknownlimitofyacc.Sincetheyyvalthatisusedtoreceiveorpassavalueisaglobalvariable,carelessrecursivecallscandestroythevalue.Withbison(yaccofGNU),recursivecallsarepossiblebyusing%pure_parserdirective,butthecurrentrubydecidednottoassumebison.Inreality,byacc(Berkelyyacc)isoftenusedinBSD-derivedOSandWindowsandsuch,ifbisonisassumed,itcausesalittlecumbersome.

lex_strterm

Aswe’veseen,whenyouconsiderlex_streamasabooleanvalue,itrepresentswhetherornotthescannerisinthestringmode.Butitscontentsalsohasameaning.First,let’slookatitstype.

▼lex_strterm

72staticNODE*lex_strterm;

(parse.y)

ThisdefinitionshowsitstypeisNODE*.ThisisthetypeusedforsyntaxtreeandwillbediscussedindetailinChapter12:Syntaxtreeconstruction.Forthetimebeing,itisastructurewhichhasthreeelements,sinceitisVALUEyoudon’thavetofree()it,youshouldrememberonlythesetwopoints.

▼NEW_STRTERM()

2865#defineNEW_STRTERM(func,term,paren)\2866rb_node_newnode(NODE_STRTERM,(func),(term),(paren))

(parse.y)

Thisisamacrotocreateanodetobestoredinlex_stream.First,termistheterminalcharacterofthestring.Forexample,ifitisa"string,itis",andifitisa'string,itis'.

parenisusedtostorethecorrespondingparenthesiswhenitisa%string.Forexample,

%Q(..........)

inthiscase,parenstores'('.And,termstorestheclosingparenthesis')'.Ifitisnota%string,parenis0.

Atlast,func,thisindicatesthetypeofastring.Theavailabletypesaredecidedasfollows:

▼func

2775#defineSTR_FUNC_ESCAPE0x01/*backslashnotationssuchas\nareineffect*/2776#defineSTR_FUNC_EXPAND0x02/*embeddedexpressionsareineffect*/2777#defineSTR_FUNC_REGEXP0x04/*itisaregularexpression*/2778#defineSTR_FUNC_QWORDS0x08/*%w(....)or%W(....)*/2779#defineSTR_FUNC_INDENT0x20/*<<-EOS(thefinishingsymbolcanbeindented)*/27802781enumstring_type{2782str_squote=(0),2783str_dquote=(STR_FUNC_EXPAND),2784str_xquote=(STR_FUNC_ESCAPE|STR_FUNC_EXPAND),2785str_regexp=(STR_FUNC_REGEXP|STR_FUNC_ESCAPE|STR_FUNC_EXPAND),2786str_sword=(STR_FUNC_QWORDS),2787str_dword=(STR_FUNC_QWORDS|STR_FUNC_EXPAND),2788};

(parse.y)

Eachmeaningofenumstring_typeisasfollows:

str_squote 'string/%qstr_dquote "string/%Qstr_xquote commandstring(notbeexplainedinthisbook)str_regexp regularexpressionstr_sword %wstr_dword %W

StringscanfunctionTherestisreadingyylex()inthestringmode,inotherwords,theifatthebeginning.

▼yylex−string

3114if(lex_strterm){3115inttoken;3116if(nd_type(lex_strterm)==NODE_HEREDOC){3117token=here_document(lex_strterm);3118if(token==tSTRING_END){3119lex_strterm=0;3120lex_state=EXPR_END;3121}3122}3123else{3124token=parse_string(lex_strterm);3125if(token==tSTRING_END||token==tREGEXP_END){3126rb_gc_force_recycle((VALUE)lex_strterm);3127lex_strterm=0;3128lex_state=EXPR_END;3129}

3130}3131returntoken;3132}

(parse.y)

Itisdividedintothetwomajorgroups:heredocumentandothers.Butthistime,wewon’treadparse_string().AsIpreviouslydescribed,therearealotofconditions,itistremendouslybeingaspaghetticode.IfItriedtoexplainit,oddsarehighthatreaderswouldcomplainthat“itisasthecodeiswritten!”.Furthermore,althoughitrequiresalotofefforts,itisnotinteresting.

But,notexplainingatallisalsonotagoodthingtodo,ThemodifiedversionthatfunctionsareseparatelydefinedforeachtargettobescannediscontainedintheattachedCD-ROM(doc/parse_string.html).I’dlikereaderswhoareinterestedintotrytolookoverit.

HereDocumentIncomparisontotheordinarystrings,heredocumentsarefairlyinteresting.Thatmaybebecause,unliketheotherelements,itdealwithalineatatime.Moreover,itisterrificthatthestartingsymbolcanexistinthemiddleofaprogram.First,I’llshowthecodeofyylex()toscanthestartingsymbolofaheredocument.

▼yylex−'&lt;'

3260case'<':

3261c=nextc();3262if(c=='<'&&3263lex_state!=EXPR_END&&3264lex_state!=EXPR_DOT&&3265lex_state!=EXPR_ENDARG&&3266lex_state!=EXPR_CLASS&&3267(!IS_ARG()||space_seen)){3268inttoken=heredoc_identifier();3269if(token)returntoken;

(parse.y)

Asusual,we’llignoretheherdoflex_state.Then,wecanseethatitreadsonly“<<”hereandtherestisscannedatheredoc_identifier().Therefore,hereisheredoc_identifier().

▼heredoc_identifier()

2926staticint2927heredoc_identifier()2928{/*...omission...readingthestartingsymbol*/2979tokfix();2980len=lex_p-lex_pbeg;/*(A)*/2981lex_p=lex_pend;/*(B)*/2982lex_strterm=rb_node_newnode(NODE_HEREDOC,2983rb_str_new(tok(),toklen()),/*nd_lit*/2984len,/*nd_nth*/2985/*(C)*/lex_lastline);/*nd_orig*/29862987returnterm=='`'?tXSTRING_BEG:tSTRING_BEG;2988}

(parse.y)

Thepartwhichreadsthestartingsymbol(<<EOS)isnotimportant,soitistotallyleftout.Untilnow,theinputbufferprobablyhas

becomeasdepictedasFigure10.Let’srecallthattheinputbufferreadsalineatatime.

Figure10:scanning"printf\(<<EOS,n\)"

Whatheredoc_identifier()isdoingisasfollows:(A)lenisthenumberofreadbytesinthecurrentline.(B)and,suddenlymovelex_ptotheendoftheline.Itmeansthatinthereadline,thepartafterthestartingsymbolisreadbutnotparsed.Whenisthatrestpartparsed?Forthismystery,ahintisthatat(C)thelex_lastline(thecurrentlyreadline)andlen(thelengththathasalreadyread)aresaved.

Then,thedynamiccallgraphbeforeandafterheredoc_identifierissimplyshownbelow:

yyparseyylex(case'<')heredoc_identifier(lex_strterm=....)yylex(thebeginningif)here_document

And,thishere_document()isdoingthescanofthebodyoftheheredocument.Omittinginvalidcasesandaddingsomecomments,heredoc_identifier()isshownbelow.Noticethatlex_strterm

remainsunchangedafteritwassetatheredoc_identifier().

▼here_document()(simplified)

here_document(NODE*here){VALUEline;/*thelinecurrentlybeingscanned*/VALUEstr=rb_str_new("",0);/*astringtostoretheresults*/

/*...handlinginvalidconditions,omitted...*/

if(embededexpressionsnotineffect){do{line=lex_lastline;/*(A)*/rb_str_cat(str,RSTRING(line)->ptr,RSTRING(line)->len);lex_p=lex_pend;/*(B)*/if(nextc()==-1){/*(C)*/gotoerror;}}while(thecurrentlyreadlineisnotequaltothefinishingsymbol);}else{/*theembededexpressionsareavailable...omitted*/}heredoc_restore(lex_strterm);lex_strterm=NEW_STRTERM(-1,0,0);yylval.node=NEW_STR(str);returntSTRING_CONTENT;}

rb_str_cat()isthefunctiontoconnectachar*attheendofaRubystring.Itmeansthatthecurrentlybeingreadlinelex_lastlineisconnectedtostrat(A).Afteritisconnected,there’snouseofthecurrentline.At(B),suddenlymovinglex_ptotheendofline.And(C)isaproblem,inthisplace,itlookslikedoingthecheckwhetheritisfinished,butactuallythenext“line”isread.I’dlikeyouto

recallthatnextc()automaticallyreadsthenextlinewhenthecurrentlinehasfinishedtoberead.So,sincethecurrentlineisforciblyfinishedat(B),lex_pmovestothenextlineat(C).

Andfinally,leavingthedo~whileloop,itisheredoc_restore().

▼heredoc_restore()

2990staticvoid2991heredoc_restore(here)2992NODE*here;2993{2994VALUEline=here->nd_orig;2995lex_lastline=line;2996lex_pbeg=RSTRING(line)->ptr;2997lex_pend=lex_pbeg+RSTRING(line)->len;2998lex_p=lex_pbeg+here->nd_nth;2999heredoc_end=ruby_sourceline;3000ruby_sourceline=nd_line(here);3001rb_gc_force_recycle(here->nd_lit);3002rb_gc_force_recycle((VALUE)here);3003}

(parse.y)

here->nd_origholdsthelinewhichcontainsthestartingsymbol.here->nd_nthholdsthelengthalreadyreadinthelinecontainsthestartingsymbol.Itmeansitcancontinuetoscanfromthejustafterthestartingsymbolasiftherewasnothinghappened.(Figure11)

Figure11:ThepictureofassignationofscanningHereDocument

TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera

CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License

RubyHackingGuide

TranslatedbyPeterZotovI’mverygratefultomyemployerEvilMartians,whosponsoredthework,andNikolayKonovalenko,whoputmoreeffortinthistranslationthanIcouldeverwishfor.Withoutthem,IwouldbestillfiguringoutwhatCOND_LEXPOP()actuallydoes.

Chapter11Finite-state

scanner

Outline

Intheory,thescannerandtheparserarecompletelyindependentofeachother–thescannerissupposedtorecognizetokens,whiletheparserissupposedtoprocesstheresultingseriesoftokens.Itwouldbeniceifthingswerethatsimple,butinrealityitrarelyis.Dependingonthecontextoftheprogramitisoftennecessarytoalterthewaytokensarerecognizedortheirsymbols.Inthischapterwewilltakealookatthewaythescannerandtheparsercooperate.

PracticalexamplesInmostprogramminglanguages,spacesdon’thaveanyspecificmeaningunlesstheyareusedtoseparatewords.However,Rubyisnotanordinarylanguageandmeaningscanchangesignificantlydependingonthepresenceofspaces.Hereisanexample

a[i]=1#a[i]=(1)

a[i]#a([i])

Theformerisanexampleofassigninganindex.Thelatterisanexampleofomittingthemethodcallparenthesesandpassingamemberofanarraytoaparameter.

Hereisanotherexample.

a+1#(a)+(1)a+1#a(+1)

Thisseemstobereallydislikedbysome.

However,theaboveexamplesmightgiveonetheimpressionthatonlyomittingthemethodcallparenthesescanbeasourceoftrouble.Let’slookatadifferentexample.

`cvsdiffparse.y`#commandcallstringobj.`("cvsdiffparse.y")#normalmethodcall

Here,theformerisamethodcallusingaliteral.Incontrast,thelatterisanormalmethodcall(with‘’’beingthemethodname).Dependingonthecontext,theycouldbehandledquitedifferently.

Belowisanotherexamplewherethefunctioningchangesdramatically

print(<<EOS)#here-document......EOS

list=[]

list<<nil#list.push(nil)

Theformerisamethodcallusingahere-document.Thelatterisamethodcallusinganoperator.

Asdemonstrated,Ruby’sgrammarcontainsmanypartswhicharedifficulttoimplementinpractice.Icouldn’trealisticallygiveathoroughdescriptionofallinjustonechapter,sointhisoneIwilllookatthebasicprinciplesandthosepartswhichpresentthemostdifficulty.

lex_state

Thereisavariablecalled“lex_state”.“lex”,obviously,standsfor“lexer”.Thus,itisavariablewhichshowsthescanner’sstate.

Whatstatesarethere?Let’slookatthedefinitions.

▼enumlex_state

61staticenumlex_state{62EXPR_BEG,/*ignorenewline,+/-isasign.*/63EXPR_END,/*newlinesignificant,+/-isaoperator.*/64EXPR_ARG,/*newlinesignificant,+/-isaoperator.*/65EXPR_CMDARG,/*newlinesignificant,+/-isaoperator.*/66EXPR_ENDARG,/*newlinesignificant,+/-isaoperator.*/67EXPR_MID,/*newlinesignificant,+/-isaoperator.*/68EXPR_FNAME,/*ignorenewline,noreservedwords.*/69EXPR_DOT,/*rightafter`.'or`::',noreservedwords.*/70EXPR_CLASS,/*immediateafter`class',noheredocument.*/71}lex_state;

(parse.y)

TheEXPRprefixstandsfor“expression”.EXPR_BEGis“Beginningofexpression”andEXPR_DOTis“insidetheexpression,afterthedot”.

Toelaborate,EXPR_BEGdenotes“Locatedattheheadoftheexpression”.EXPR_ENDdenotes“Locatedattheendoftheexpression”.EXPR_ARGdenotes“Beforethemethodparameter”.EXPR_FNAMEdenotes“Beforethemethodname(suchasdef)”.Theonesnotcoveredherewillbeanalyzedindetailbelow.

Incidentally,Iamledtobelievethatlex_stateactuallydenotes“afterparentheses”,“headofstatement”,soitshowsthestateoftheparserratherthanthescanner.However,it’sstillconventionallyreferredtoasthescanner’sstateandhere’swhy.

Themeaningof“state”hereisactuallysubtlydifferentfromhowit’susuallyunderstood.The“state”oflex_stateis“astateunderwhichthescannerdoesx”.ForexampleanaccuratedescriptionofEXPR_BEGwouldbe“Astateunderwhichthescanner,ifrun,willreactasifthisisattheheadoftheexpression”

Technically,this“state”canbedescribedasthestateofthescannerifwelookatthescannerasastatemachine.However,delvingtherewouldbeveeringofftopicandtootedious.Iwouldreferanyinterestedreaderstoanytextbookondatastructures.

Understandingthefinite-statescannerThetricktoreadingafinite-statescanneristonottrytograsp

everythingatonce.Someonewritingaparserwouldprefernottouseafinite-statescanner.Thatistosay,theywouldprefernottomakeitthemainpartoftheprocess.Scannerstatemanagementoftenendsupbeinganextrapartattachedtothemainpart.Inotherwords,thereisnosuchthingasacleanandconcisediagramforstatetransitions.

Whatoneshoulddoisthinktowardspecificgoals:“Thispartisneededtosolvethistask”“Thiscodeisforovercomingthisproblem”.Basically,putoutcodeinaccordancewiththetaskathand.Ifyoustartthinkingaboutthemutualrelationshipbetweentasks,you’llinvariablyendupstuck.LikeIsaid,thereissimplynosuchthing.

However,therestillneedstobeanoverreachingobjective.Whenreadingafinite-statescanner,thatobjectivewouldundoubtedlybetounderstandeverystate.Forexample,whatkindofstateisEXPR_BEG?Itisastatewheretheparserisattheheadoftheexpression.

ThestaticapproachSo,howcanweunderstandwhatastatedoes?Therearethreebasicapproaches

Lookatthenameofthestate

Thesimplestandmostobviousapproach.Forexample,thenameEXPR_BEGobviouslyreferstothehead(beginning)ofsomething.

Observewhatchangesunderthisstate

Lookatthewaytokenrecognitionchangesunderthestate,thentestitincomparisontopreviousexamples.

Lookatthestatefromwhichittransitions

Lookatwhichstateittransitionsfromandwhichtokencausesit.Forexample,if'\n'isalwaysfollowedbyatransitiontoaHEADstate,itmustdenotetheheadoftheline.

LetustakeEXPR_BEGasanexample.InRuby,allstatetransitionsareexpressedasassignmentstolex_state,sofirstweneedtogrepEXPR_BEGassignmentstofindthem.Thenweneedtoexporttheirlocation,forexample,suchas'#'and'*'and'!'ofyylex()Thenweneedtorecallthestatepriortothetransitionandconsiderwhichcasesuitsbest(seeimage1)

Figure1:TransitiontoEXPR_BEG

((errata:1.ActuallywhenthestateisEXPR_DOT,thestateafterreadingatIDENTIFIERwouldbeeitherARGorCMDARG.However,becausetheauthorwantedtoroughlygroupthemasFNAME/DOTandtheothershere,thesetwoareshowntogether.Therefore,tobeprecise,

EXPR_FNAMEandEXPR_DOTshouldhavealsobeenseparated.2.‘)’doesnotcausethetransitionfrom“everythingelse”toEXPR_BEG.))

Thisdoesindeedlookliketheheadofstatement.Especiallythe'\n'andthe';'Theopenparenthesesandthecommaalsosuggestthatit’stheheadnotjustofthestatement,butoftheexpressionaswell.

ThedynamicapproachThereareothereasymethodstoobservethefunctioning.Forexample,youcanuseadebuggerto“hook”theyylex()andlookatthelex_state

Anotherwayistorewritethesourcecodetooutputstatetransitions.Inthecaseoflex_stateweonlyhaveafewpatternsforassignmentandcomparison,sothesolutionwouldbetograspthemastextpatternsandrewritethecodetooutputstatetransitions.TheCDthatcomeswiththisbookcontainstherubylex-analysertool.Whennecessary,Iwillrefertoitinthistext.

Theoverallprocesslookslikethis:useadebuggerortheaforementionedtooltoobservethefunctioningoftheprogram.Thenlookatthesourcecodetoconfirmtheacquireddataanduseit.

Descriptionofstates

HereIwillgivesimpledescriptionsoflex_statestates.

EXPR_BEG

Headofexpression.Comesimmediatelyafter\n({[!?:,ortheoperatorop=Themostgeneralstate.

EXPR_MID

Comesimmediatelyafterthereservedwordsreturnbreaknextrescue.Invalidatesbinaryoperatorssuchas*or&GenerallysimilarinfunctiontoEXPR_BEG

EXPR_ARG

Comesimmediatelyafterelementswhicharelikelytobethemethodnameinamethodcall.Alsocomesimmediatelyafter'['ExceptforcaseswhereEXPR_CMDARGisused.

EXPR_CMDARG

Comesbeforethefirstparameterofanormalmethodcall.Formoreinformation,seethesection“Thedoconflict”

EXPR_END

Usedwhenthereisapossibilitythatthestatementisterminal.Forexample,afteraliteraloraclosingparenthesis.ExceptforcaseswhenEXPR_ENDARGisused

EXPR_ENDARG

SpecialiterationofEXPR_ENDComesimmediatelyaftertheclosingparenthesiscorrespondingtotLPAREN_ARGRefertothesection“Firstparameterenclosedinparentheses”

EXPR_FNAME

Comesbeforethemethodname,usuallyafterdef,alias,undeforthesymbol':'Asingle“`”canbeaname.

EXPR_DOT

Comesafterthedotinamethodcall.HandledsimilarlytoEXPR_FNAMEVariousreservedwordsaretreatedassimpleidentifiers.Asingle'`'canbeaname.

EXPR_CLASS

ComesafterthereservedwordclassThisisaverylimitedstate.

Thefollowingstatescanbegroupedtogether

BEGMID

ENDENDARG

ARGCMDARG

FNAMEDOT

Theyallexpresssimilarconditions.EXPR_CLASSisalittledifferent,

butonlyappearsinalimitednumberofplaces,notwarrantinganyspecialattention.

Line-breakhandling

TheproblemInRuby,astatementdoesnotnecessarilyrequireaterminator.InCorJavaastatementmustalwaysendwithasemicolon,butRubyhasnosuchrequirement.Statementsusuallytakeuponlyoneline,andthusendattheendoftheline.

Ontheotherhand,whenastatementisclearlycontinued,thishappensautomatically.Someconditionsfor“Thisstatementisclearlycontinued”areasfollows:

AfteracommaAfteraninfixoperatorParenthesesorbracketsarenotbalancedImmediatelyafterthereservedwordif

Etc.

ImplementationSo,whatdoweneedtoimplementthisgrammar?Simplyhaving

thescannerignoreline-breaksisnotsufficient.InagrammarlikeRuby’s,wherestatementsaredelimitedbyreservedwordsonbothends,conflictsdon’thappenasfrequentlyasinClanguages,butwhenItriedasimpleexperiment,Icouldn’tgetittoworkuntilIgotridofreturnnextbreakandreturnedthemethodcallparentheseswherevertheywereomitted.Toretainthosefeaturesweneedsomekindofterminalsymbolforstatements’ends.Itdoesn’tmatterwhetherit’s\nor';'butitisnecessary.

Twosolutionsexist–parser-basedandscanner-based.Fortheformer,youcanjustoptionallyput\nineveryplacethatallowsit.Forthelatter,havethe\npassedtotheparseronlywhenithassomemeaning(ignoringitotherwise).

Whichsolutiontouseisuptoyourpreferences,butusuallythescanner-basedoneisused.Thatwayproducesamorecompactcode.Moreover,iftherulesareoverloadedwithmeaninglesssymbols,itdefeatsthepurposeoftheparser-generator.

Tosumup,inRuby,line-breaksarebesthandledusingthescanner.Whenalineneedstocontinued,the\nwillbeignored,andwhenitneedstobeterminated,the\nispassedasatoken.Intheyylex()thisisfoundhere:

▼yylex()-'\n'

3155case'\n':3156switch(lex_state){3157caseEXPR_BEG:3158caseEXPR_FNAME:

3159caseEXPR_DOT:3160caseEXPR_CLASS:3161gotoretry;3162default:3163break;3164}3165command_start=Qtrue;3166lex_state=EXPR_BEG;3167return'\n';

(parse.y)

WithEXPR_BEG,EXPR_FNAME,EXPR_DOT,EXPR_CLASSitwillbegotoretry.Thatistosay,it’smeaninglessandshallbeignored.Thelabelretryisfoundinfrontofthelargeswitchintheyylex()

Inallotherinstances,line-breaksaremeaningfulandshallbepassedtotheparser,afterwhichlex_stateisrestoredtoEXPR_BEGBasically,wheneveraline-breakismeaningful,itwillbetheendofexpr

Irecommendleavingcommand_startaloneforthetimebeing.Toreiterate,tryingtograsptoomanythingsatoncewillonlyendinneedlessconfusion.

Letusnowtakealookatsomeexamplesusingtherubylex-analysertool.

%rubylex-analyser-e'm(a,b,c)unlessi'+EXPR_BEGEXPR_BEGC"\nm"tIDENTIFIEREXPR_CMDARG

EXPR_CMDARG"("'('EXPR_BEG0:condpush0:cmdpushEXPR_BEGC"a"tIDENTIFIEREXPR_CMDARGEXPR_CMDARG","','EXPR_BEGEXPR_BEGS"\nb"tIDENTIFIEREXPR_ARGEXPR_ARG","','EXPR_BEGEXPR_BEGS"c"tIDENTIFIEREXPR_ARGEXPR_ARG")"')'EXPR_END0:condlexpop0:cmdlexpopEXPR_ENDS"unless"kUNLESS_MODEXPR_BEGEXPR_BEGS"i"tIDENTIFIEREXPR_ARGEXPR_ARG"\n"\nEXPR_BEGEXPR_BEGC"\n"'EXPR_BEG

Asyoucansee,thereisalotofoutputhere,butweonlyneedtheleftandmiddlecolumns.Theleftcolumndisplaysthelex_statebeforeitenterstheyylex()whilethemiddlecolumndisplaysthetokensandtheirsymbols.

Thefirsttokenmandthesecondparameterbareprecededbyaline-breakbuta\nisappendedinfrontofthemanditisnottreatedasaterminalsymbol.Thatisbecausethelex_stateisEXPR_BEG.

However,inthesecondtolastline\nisusedasaterminalsymbol.ThatisbecausethestateisEXPR_ARG

Andthatishowitshouldbeused.Letushaveanotherexample.

%rubylex-analyser-e'classC<Objectend'+EXPR_BEGEXPR_BEGC"class"kCLASSEXPR_CLASS

EXPR_CLASS"\nC"tCONSTANTEXPR_ENDEXPR_ENDS"<"'<'EXPR_BEG+EXPR_BEGEXPR_BEGS"Object"tCONSTANTEXPR_ARGEXPR_ARG"\n"\nEXPR_BEGEXPR_BEGC"end"kENDEXPR_ENDEXPR_END"\n"\nEXPR_BEG

ThereservedwordclassisfollowedbyEXPR_CLASSsotheline-breakisignored.However,thesuperclassObjectisfollowedbyEXPR_ARG,sothe\nappears.

%rubylex-analyser-e'obj.class'+EXPR_BEGEXPR_BEGC"obj"tIDENTIFIEREXPR_CMDARGEXPR_CMDARG"."'.'EXPR_DOTEXPR_DOT"\nclass"tIDENTIFIEREXPR_ARGEXPR_ARG"\n"\nEXPR_BEG

'.'isfollowedbyEXPR_DOTsothe\nisignored.

NotethatclassbecomestIDENTIFIERdespitebeingareservedword.Thisisdiscussedinthenextsection.

Reservedwordsandidenticalmethodnames

Theproblem

InRuby,reservedwordscanusedasmethodnames.However,inactualityit’snotassimpleas“itcanbeused”–thereexistthreepossiblecontexts:

Methoddefinition(defxxxx)Call(obj.xxxx)Symbolliteral(:xxxx)

AllthreearepossibleinRuby.Belowwewilltakeacloserlookateach.

First,themethoddefinition.Itisprecededbythereservedworddefsoitshouldwork.

Incaseofthemethodcall,omittingthereceivercanbeasourceofdifficulty.However,thescopeofusehereisevenmorelimited,andomittingthereceiverisactuallyforbidden.Thatis,whenthemethodnameisareservedword,thereceiverabsolutelycannotbeomitted.Perhapsitwouldbemoreaccuratetosaythatitisforbiddeninordertoguaranteethatparsingisalwayspossible.

Finally,incaseofthesymbol,itisprecededbytheterminalsymbol':'soitalsoshouldwork.However,regardlessofreservedwords,the':'hereconflictswiththecolonina?b:cIfthisisavoided,thereshouldbenofurthertrouble.

Foreachofthesecases,similarlytobefore,ascanner-basedsolutionandaparser-basedsolutionexist.FortheformerusetIDENTIFIER(forexample)asthereservedwordthatcomesafterdef

or.or:Forthelatter,makethatintoarule.Rubyallowsforbothsolutionstobeusedineachofthethreecases.

MethoddefinitionThenamepartofthemethoddefinition.Thisishandledbytheparser.

▼Methoddefinitionrule

|kDEFfnamef_arglistbodystmtkEND|kDEFsingletondot_or_colonfnamef_arglistbodystmtkEND

Thereexistonlytworulesformethoddefinition–onefornormalmethodsandoneforsingletonmethods.Forboth,thenamepartisfnameanditisdefinedasfollows.

▼fname

fname:tIDENTIFIER|tCONSTANT|tFID|op|reswords

reswordsisareservedwordandopisabinaryoperator.Bothrulesconsistofsimplyallterminalsymbolslinedup,soIwon’tgointo

detailhere.Finally,fortFIDtheendcontainssymbolssimilarlytogsub!andinclude?

MethodcallMethodcallswithnamesidenticaltoreservedwordsarehandledbythescanner.Thescancodeforreservedwordsisshownbelow.

Scanningtheidentifierresult=(tIDENTIFIERortCONSTANT)

if(lex_state!=EXPR_DOT){structkwtable*kw;

/*Seeifitisareservedword.*/kw=rb_reserved_word(tok(),toklen());Reservedwordisprocessed}

EXPR_DOTexpresseswhatcomesafterthemethodcalldot.UnderEXPR_DOTreservedwordsareuniversallynotprocessed.ThesymbolforreservedwordsafterthedotbecomeseithertIDENTIFIERortCONSTANT.

SymbolsReservedwordsymbolsarehandledbyboththescannerandtheparser.First,therule.

▼symbol

symbol:tSYMBEGsym

sym:fname|tIVAR|tGVAR|tCVAR

fname:tIDENTIFIER|tCONSTANT|tFID|op|reswords

Reservedwords(reswords)areexplicitlypassedthroughtheparser.ThisisonlypossiblebecausethespecialterminalsymboltSYMBEGispresentatthestart.Ifthesymbolwere,forexample,':'itwouldconflictwiththeconditionaloperator(a?b:c)andstall.Thus,thetrickistorecognizetSYMBEGonthescannerlevel.

Buthowtocausethatrecognition?Let’slookattheimplementationofthescanner.

▼yylex-':'

3761case':':3762c=nextc();3763if(c==':'){3764if(lex_state==EXPR_BEG||lex_state==EXPR_MID||3765(IS_ARG()&&space_seen)){3766lex_state=EXPR_BEG;3767returntCOLON3;3768}3769lex_state=EXPR_DOT;3770returntCOLON2;3771}3772pushback(c);

3773if(lex_state==EXPR_END||lex_state==EXPR_ENDARG||ISSPACE(c)){3774lex_state=EXPR_BEG;3775return':';3776}3777lex_state=EXPR_FNAME;3778returntSYMBEG;

(parse.y)

Thisisasituationwhentheifinthefirsthalfhastwoconsecutive':'Inthissituation,the'::'isscannedinaccordancewiththeleftmostlongestmatchbasicrule.

Forthenextif,the':'istheaforementionedconditionaloperator.BothEXPR_ENDandEXPR_ENDARGcomeattheendoftheexpression,soaparameterdoesnotappear.Thatistosay,sincetherecan’tbeasymbol,the':'isaconditionaloperator.Similarly,ifthenextletterisaspace(ISSPACE(c)),asymbolisunlikelysoitisagainaconditionaloperator.

Whennoneoftheaboveapplies,it’sallsymbols.Inthatcase,atransitiontoEXPR_FNAMEoccurstoprepareforallmethodnames.Thereisnoparticulardangertoparsinghere,butifthisisforgotten,thescannerwillnotpassvaluestoreservedwordsandvaluecalculationwillbedisrupted.

Modifiers

TheproblemForexample,forififthereexistsanormalnotationandoneforpostfixmodification.

#Normalnotationifcondthenexprend

#Postfixexprifcond

Thiscouldcauseaconflict.Thereasoncanbeguessed–again,it’sbecausemethodparentheseshavebeenomittedpreviously.Observethisexample

callifcondthenaelsebend

Readingthisexpressionuptotheifgivesustwopossibleinterpretations.

call((if....))call()if....

Whenunsure,Irecommendsimplyusingtrialanderrorandseeingifaconflictoccurs.LetustrytohandleitwithyaccafterchangingkIF_MODtokIFinthegrammar.

%yaccparse.yparse.ycontains4shift/reduceconflictsand13reduce/reduceconflicts.

Asexpected,conflictsareaplenty.Ifyouareinterested,youaddtheoption-vtoyaccandbuildalog.Thenatureoftheconflictsshouldbeshownthereingreatdetail.

ImplementationSo,whatistheretodo?InRuby,onthesymbollevel(thatis,onthescannerlevel)thenormalifisdistinguishedfromthepostfixifbythembeingkIFandkIF_MODrespectively.Thisalsoappliestoallotherpostfixoperators.Inall,therearefive–kUNLESS_MODkUNTIL_MODkWHILE_MODkRESCUE_MODandkIF_MODThedistinctionismadehere:

▼yylex-Reservedword

4173structkwtable*kw;41744175/*Seeifitisareservedword.*/4176kw=rb_reserved_word(tok(),toklen());4177if(kw){4178enumlex_statestate=lex_state;4179lex_state=kw->state;4180if(state==EXPR_FNAME){4181yylval.id=rb_intern(kw->name);4182}4183if(kw->id[0]==kDO){4184if(COND_P())returnkDO_COND;4185if(CMDARG_P()&&state!=EXPR_CMDARG)4186returnkDO_BLOCK;4187if(state==EXPR_ENDARG)4188returnkDO_BLOCK;4189returnkDO;4190}4191if(state==EXPR_BEG)/***Here***/4192returnkw->id[0];

4193else{4194if(kw->id[0]!=kw->id[1])4195lex_state=EXPR_BEG;4196returnkw->id[1];4197}4198}

(parse.y)

Thisislocatedattheendofyylexaftertheidentifiersarescanned.Thepartthathandlesmodifiersisthelast(innermost)if〜else

WhetherthereturnvalueisalteredcanbedeterminedbywhetherornotthestateisEXPR_BEG.Thisiswhereamodifierisidentified.Basically,thevariablekwisthekeyandifyoulookfaraboveyouwillfindthatitisstructkwtable

I’vealreadydescribedinthepreviouschapterhowstructkwtableisastructuredefinedinkeywordsandthehashfunctionrb_reserved_word()iscreatedbygperf.I’llshowthestructurehereagain.

▼keywords–structkwtable

1structkwtable{char*name;intid[2];enumlex_statestate;};

(keywords)

I’vealreadyexplainedaboutnameandid[0]–theyarethereservedwordnameanditssymbol.HereIwillspeakabouttheremainingmembers.

First,id[1]isasymboltodealwithmodifiers.Forexample,incase

ofifthatwouldbekIF_MOD.Whenareservedworddoesnothaveamodifierequivalent,id[0]andid[1]containthesamethings.

Becausestateisenumlex_stateitisthestatetowhichatransitionshouldoccurafterthereservedwordisread.Belowisalistcreatedinthekwstat.rbtoolwhichImade.ThetoolcanbefoundontheCD.

%kwstat.rbruby/keywords----EXPR_ARGdefined?superyield

----EXPR_BEGandcaseelseensureifmoduleorunlesswhenbegindoelsifforinnotthenuntilwhile

----EXPR_CLASSclass

----EXPR_ENDBEGIN__FILE__endnilretrytrueEND__LINE__falseredoself

----EXPR_FNAMEaliasdefundef

----EXPR_MIDbreaknextrescuereturn

----modifiersifrescueunlessuntilwhile

Thedoconflict

TheproblemTherearetwoiteratorforms–do〜endand{〜}Theirdifferenceisinpriority–{〜}hasamuchhigherpriority.Ahigherprioritymeansthataspartofthegrammaraunitis“small”whichmeansitcanbeputintoasmallerrule.Forexample,itcanbeputnotintostmtbutexprorprimary.Inthepast{〜}iteratorswereinprimarywhiledo〜enditeratorswereinstmt

Bytheway,therehasbeenarequestforanexpressionlikethis:

mdo....end+mdo....end

Toallowforthis,putthedo〜enditeratorinargorprimary.Incidentally,theconditionforwhileisexpr,meaningitcontainsargandprimary,sothedowillcauseaconflicthere.Basically,itlookslikethis:

whilemdo....end

Atfirstglance,thedolookslikethedoofwhile.However,acloserlookrevealsthatitcouldbeamdo〜endbundling.Somethingthat’snotobviouseventoapersonwilldefinitelycauseyacctoconflict.Let’stryitinpractice.

/*doconflictexperiment*/%tokenkWHILEkDOtIDENTIFIERkEND%%

expr:kWHILEexprkDOexprkEND|tIDENTIFIER|tIDENTIFIERkDOexprkEND

Isimplifiedtheexampletoonlyincludewhile,variablereferencinganditerators.Thisrulecausesashift/reduceconflictiftheheadoftheconditionalcontainstIDENTIFIER.IftIDENTIFIERisusedforvariablereferencinganddoisappendedtowhile,thenit’sreduction.Ifit’smadeaniteratordo,thenit’sashift.

Unfortunately,inashift/reduceconflicttheshiftisprioritized,soifleftunchecked,dowillbecomeaniteratordo.Thatsaid,evenifareductionisforcedthroughoperatorprioritiesorsomeothermethod,dowon’tshiftatall,becomingunusable.Thus,tosolvetheproblemwithoutanycontradictions,weneedtoeitherdealwithonthescannerlevelorwritearulethatallowstouseoperatorswithoutputtingthedo〜enditeratorintoexpr.

However,notputtingdo〜endintoexprisnotarealisticgoal.Thatwouldrequireallrulesforexpr(aswellasforargandprimary)toberepeated.Thisleavesusonlythescannersolution.

Rule-levelsolutionBelowisasimplifiedexampleofarelevantrule.

▼dosymbol

primary:kWHILEexpr_valuedocompstmtkEND

do:term|kDO_COND

primary:operationbrace_block|method_callbrace_block

brace_block:'{'opt_block_varcompstmt'}'|kDOopt_block_varcompstmtkEND

Asyoucansee,theterminalsymbolsforthedoofwhileandfortheiteratordoaredifferent.Fortheformerit’skDO_CONDwhileforthelatterit’skDOThenit’ssimplyamatterofpointingthatdistinctionouttothescanner.

Symbol-levelsolutionBelowisapartialviewoftheyylexsectionthatprocessesreservedwords.It’stheonlyparttaskedwithprocessingdosolookingatthiscodeshouldbeenoughtounderstandthecriteriaformakingthedistinction.

▼yylex-Identifier-Reservedword

4183if(kw->id[0]==kDO){4184if(COND_P())returnkDO_COND;4185if(CMDARG_P()&&state!=EXPR_CMDARG)4186returnkDO_BLOCK;4187if(state==EXPR_ENDARG)4188returnkDO_BLOCK;4189returnkDO;4190}

(parse.y)

It’salittlemessy,butyouonlyneedthepartassociatedwithkDO_COND.Thatisbecauseonlytwocomparisonsaremeaningful.ThefirstisthecomparisonbetweenkDO_CONDandkDO/kDO_BLOCKThesecondisthecomparisonbetweenkDOandkDO_BLOCK.Therestaremeaningless.Rightnowweonlyneedtodistinguishtheconditionaldo–leavealltheotherconditionsalone.

Basically,COND_P()isthekey.

COND_P()

cond_stack

COND_P()isdefinedclosetotheheadofparse.y

▼cond_stack

75#ifdefHAVE_LONG_LONG76typedefunsignedLONG_LONGstack_type;77#else78typedefunsignedlongstack_type;79#endif8081staticstack_typecond_stack=0;82#defineCOND_PUSH(n)(cond_stack=(cond_stack<<1)|((n)&1))83#defineCOND_POP()(cond_stack>>=1)84#defineCOND_LEXPOP()do{\85intlast=COND_P();\86cond_stack>>=1;\87if(last)cond_stack|=1;\88}while(0)89#defineCOND_P()(cond_stack&1)

(parse.y)

Thetypestack_typeiseitherlong(over32bit)orlonglong(over64bit).cond_stackisinitializedbyyycompile()atthestartofparsingandafterthatishandledonlythroughmacros.Allyouneed,then,istounderstandthosemacros.

IfyoulookatCOND_PUSH/POPyouwillseethatthesemacrosuseintegersasstacksconsistingofbits.

MSB←→LSB...0000000000Initialvalue0...0000000001COND_PUSH(1)...0000000010COND_PUSH(0)...0000000101COND_PUSH(1)...0000000010COND_POP()...0000000100COND_PUSH(0)...0000000010COND_POP()

AsforCOND_P(),sinceitdetermineswhetherornottheleastsignificantbit(LSB)isa1,iteffectivelydetermineswhethertheheadofthestackisa1.

TheremainingCOND_LEXPOP()isalittleweird.ItleavesCOND_P()attheheadofthestackandexecutesarightshift.Basically,it“crushes”thesecondbitfromthebottomwiththelowermostbit.

MSB←→LSB...0000000000Initialvalue0...0000000001COND_PUSH(1)...0000000010COND_PUSH(0)...0000000101COND_PUSH(1)...0000000011COND_LEXPOP()...0000000100COND_PUSH(0)...0000000010COND_LEXPOP()

((errata:ItleavesCOND_P()onlywhenitis1.WhenCOND_P()is0andthesecondbottombitis1,itwouldbecome1afterdoingLEXPOP,thusCOND_P()isnotleftinthiscase.))

NowIwillexplainwhatthatmeans.

InvestigatingthefunctionLetusinvestigatethefunctionofthisstack.TodothatIwilllistupallthepartswhereCOND_PUSH()COND_POP()areused.

|kWHILE{COND_PUSH(1);}expr_valuedo{COND_POP();}--|kUNTIL{COND_PUSH(1);}expr_valuedo{COND_POP();}--|kFORblock_varkIN{COND_PUSH(1);}expr_valuedo{COND_POP();}--case'(':::COND_PUSH(0);CMDARG_PUSH(0);--case'[':::COND_PUSH(0);CMDARG_PUSH(0);--case'{':::COND_PUSH(0);CMDARG_PUSH(0);--case']':

case'}':case')':COND_LEXPOP();CMDARG_LEXPOP();

Fromthiswecanderivethefollowinggeneralrules

AtthestartofaconditionalexpressionPUSH(1)AtopeningparenthesisPUSH(0)AttheendofaconditionalexpressionPOP()AtclosingparenthesisLEXPOP()

Withthis,youshouldseehowtouseit.Ifyouthinkaboutitforaminute,thenamecond_stackitselfisclearlythenameforamacrothatdetermineswhetherornotit’sonthesamelevelastheconditionalexpression(seeimage2)

Figure2:ChangesofCOND_P()

Usingthistrickshouldalsomakesituationsliketheoneshownbeloweasytodealwith.

while(mdo....end)#doisaniteratordo(kDO)

....end

Thismeansthatona32-bitmachineintheabsenceoflonglongifconditionalexpressionsorparenthesesarenestedat32levels,thingscouldgetstrange.Ofcourse,inrealityyouwon’tneedtonestsodeepsothere’snoactualrisk.

Finally,thedefinitionofCOND_LEXPOP()looksabitstrange–thatseemstobeawayofdealingwithlookahead.However,therulesnowdonotallowforlookaheadtooccur,sothere’snopurposetomakethedistinctionbetweenPOPandLEXPOP.Basically,atthistimeitwouldbecorrecttosaythatCOND_LEXPOP()hasnomeaning.

tLPAREN_ARG(1)

TheproblemThisoneisverycomplicated.ItonlybecameworkableininRuby1.7andonlyfairlyrecently.Thecoreoftheissueisinterpretingthis:

call(expr)+1

Asoneofthefollowing

(call(expr))+1

call((expr)+1)

Inthepast,itwasalwaysinterpretedastheformer.Thatis,theparentheseswerealwaystreatedas“Methodparameterparentheses”.ButsinceRuby1.7itbecamepossibletointerpretitasthelatter–basically,ifaspaceisadded,theparenthesesbecome“Parenthesesofexpr”

Iwillalsoprovideanexampletoexplainwhytheinterpretationchanged.First,Iwroteastatementasfollows

pm()+1

Sofarsogood.Butlet’sassumethevaluereturnedbymisafractionandtherearetoomanydigits.Thenwewillhaveitdisplayedasaninteger.

pm()+1.to_i#??

Uh-oh,weneedparentheses.

p(m()+1).to_i

Howtointerpretthis?Upto1.6itwillbethis

(p(m()+1)).to_i

Themuch-neededto_iisrenderedmeaningless,whichisunacceptable.Tocounterthat,addingaspacebetweenitandthe

parentheseswillcausetheparenthesestobetreatedspeciallyasexprparentheses.

Forthoseeagertotestthis,thisfeaturewasimplementedinparse.yrevision1.100(2001-05-31).Thus,itshouldberelativelyprominentwhenlookingatthedifferencesbetweenitand1.99.Thisisthecommandtofindthedifference.

~/src/ruby%cvsdiff-r1.99-r1.100parse.y

InvestigationFirstletuslookathowtheset-upworksinreality.Usingtheruby-lexertool{ruby-lexer:locatedintools/ruby-lexer.tar.gzontheCD}wecanlookatthelistofsymbolscorrespondingtotheprogram.

%ruby-lexer-e'm(a)'tIDENTIFIER'('tIDENTIFIER')''\n'

SimilarlytoRuby,-eistheoptiontopasstheprogramdirectlyfromthecommandline.Withthiswecantryallkindsofthings.Let’sstartwiththeproblemathand–thecasewherethefirstparameterisenclosedinparentheses.

%ruby-lexer-e'm(a)'tIDENTIFIERtLPAREN_ARGtIDENTIFIER')''\n'

Afteraddingaspace,thesymboloftheopeningparenthesisbecametLPAREN_ARG.Nowlet’slookatnormalexpression

parentheses.

%ruby-lexer-e'(a)'tLPARENtIDENTIFIER')''\n'

FornormalexpressionparenthesesitseemstobetLPAREN.Tosumup:

Input Symbolofopeningparenthesism(a) '('m(a) tLPAREN_ARG(a) tLPAREN

Thusthefocusisdistinguishingbetweenthethree.FornowtLPAREN_ARGisthemostimportant.

ThecaseofoneparameterWe’llstartbylookingattheyylex()sectionfor'('

▼yylex-'('

3841case'(':3842command_start=Qtrue;3843if(lex_state==EXPR_BEG||lex_state==EXPR_MID){3844c=tLPAREN;3845}3846elseif(space_seen){3847if(lex_state==EXPR_CMDARG){3848c=tLPAREN_ARG;3849}3850elseif(lex_state==EXPR_ARG){3851c=tLPAREN_ARG;3852yylval.id=last_id;

3853}3854}3855COND_PUSH(0);3856CMDARG_PUSH(0);3857lex_state=EXPR_BEG;3858returnc;

(parse.y)

SincethefirstifistLPARENwe’relookingatanormalexpressionparenthesis.Thedistinguishingfeatureisthatlex_stateiseitherBEGorMID–thatis,it’sclearlyatthebeginningoftheexpression.

Thefollowingspace_seenshowswhethertheparenthesisisprecededbyaspace.Ifthereisaspaceandlex_stateiseitherARGorCMDARG,basicallyifit’sbeforethefirstparameter,thesymbolisnot'('buttLPAREN_ARG.Thisway,forexample,thefollowingsituationcanbeavoided

m(#Parenthesisnotprecededbyaspace.Methodparenthesis('(')marg,(#Unlessfirstparameter,expressionparenthesis(tLPAREN)

WhenitisneithertLPARENnortLPAREN_ARG,theinputcharactercisusedasisandbecomes'('.Thiswilldefinitelybeamethodcallparenthesis.

Ifsuchacleardistinctionismadeonthesymbollevel,noconflictshouldoccurevenifrulesarewrittenasusual.Simplified,itbecomessomethinglikethis:

stmt:command_call

method_call:tIDENTIFIER'('args')'/*Normalmethod*/

command_call:tIDENTIFIERcommand_args/*Methodwithparenthesesomitted*/

command_args:args

args:arg:args','arg

arg:primary

primary:tLPARENcompstmt')'/*Normalexpressionparenthesis*/|tLPAREN_ARGexpr')'/*Firstparameterenclosedinparentheses*/|method_call

NowIneedyoutofocusonmethod_callandcommand_callIfyouleavethe'('withoutintroducingtLPAREN_ARG,thencommand_argswillproduceargs,argswillproducearg,argwillproduceprimary.Then,'('willappearfromtLPAREN_ARGandconflictwithmethod_call(seeimage3)

Figure3:method_callandcommand_call

ThecaseoftwoparametersandmoreOnemightthinkthatiftheparenthesisbecomestLPAREN_ARGallwillbewell.Thatisnotso.Forexample,considerthefollowing

m(a,a,a)

Beforenow,expressionslikethisoneweretreatedasmethodcallsanddidnotproduceerrors.However,iftLPAREN_ARGisintroduced,theopeningparenthesisbecomesanexprparenthesis,andiftwoormoreparametersarepresent,thatwillcauseaparseerror.Thisneedstoberesolvedforthesakeofcompatibility.

Unfortunately,rushingaheadandjustaddingarulelike

command_args:tLPAREN_ARGargs')'

willjustcauseaconflict.Let’slookatthebiggerpictureandthinkcarefully.

stmt:command_call|expr

expr:arg

command_call:tIDENTIFIERcommand_args

command_args:args|tLPAREN_ARGargs')'

args:arg:args','arg

arg:primary

primary:tLPARENcompstmt')'|tLPAREN_ARGexpr')'|method_call

method_call:tIDENTIFIER'('args')'

Lookatthefirstruleofcommand_argsHere,argsproducesargThenargproducesprimaryandoutoftherecomesthetLPAREN_ARGrule.Andsinceexprcontainsargandasitisexpanded,itbecomeslikethis:

command_args:tLPAREN_ARGarg')'|tLPAREN_ARGarg')'

Thisisareduce/reduceconflict,whichisverybad.

So,howcanwedealwithonly2+parameterswithoutcausingaconflict?We’llhavetowritetoaccommodateforthatsituationspecifically.Inpractice,it’ssolvedlikethis:

▼command_args

command_args:open_args

open_args:call_args|tLPAREN_ARG')'|tLPAREN_ARGcall_args2')'

call_args:command|argsopt_block_arg|args','tSTARarg_valueopt_block_arg|assocsopt_block_arg

|assocs','tSTARarg_valueopt_block_arg|args','assocsopt_block_arg|args','assocs','tSTARargopt_block_arg|tSTARarg_valueopt_block_arg|block_arg

call_args2:arg_value','argsopt_block_arg|arg_value','block_arg|arg_value','tSTARarg_valueopt_block_arg|arg_value','args','tSTARarg_valueopt_block_arg|assocsopt_block_arg|assocs','tSTARarg_valueopt_block_arg|arg_value','assocsopt_block_arg|arg_value','args','assocsopt_block_arg|arg_value','assocs','tSTARarg_valueopt_block_arg|arg_value','args','assocs','tSTARarg_valueopt_block_arg|tSTARarg_valueopt_block_arg|block_arg

primary:literal|strings|xstring:|tLPAREN_ARGexpr')'

Herecommand_argsisfollowedbyanotherlevel–open_argswhichmaynotbereflectedintheruleswithoutconsequence.Thekeyisthesecondandthirdrulesofthisopen_argsThisformissimilartotherecentexample,butisactuallysubtlydifferent.Thedifferenceisthatcall_args2hasbeenintroduced.Thedefiningcharacteristicofthiscall_args2isthatthenumberofparametersisalwaystwoormore.Thisisevidencedbythefactthatmostrulescontain','Theonlyexceptionisassocs,butsinceassocsdoesnotcomeoutofexpritcannotconflictanyway.

Thatwasn’taverygoodexplanation.Toputitsimply,inagrammarwherethis:

command_args:call_args

doesn’twork,andonlyinsuchagrammar,thenextruleisusedtomakeanaddition.Thus,thebestwaytothinkhereis“Inwhatkindofgrammarwouldthisrulenotwork?”Furthermore,sinceaconflictonlyoccurswhentheprimaryoftLPAREN_ARGappearsattheheadofcall_args,thescopecanbelimitedfurtherandthebestwaytothinkis“InwhatkindofgrammardoesthisrulenotworkwhenatIDENTIFIERtLPAREN_ARGlineappears?”Belowareafewexamples.

m(a,a)

ThisisasituationwhenthetLPAREN_ARGlistcontainstwoormoreitems.

m()

Conversely,thisisasituationwhenthetLPAREN_ARGlistisempty.

m(*args)m(&block)m(k=>v)

ThisisasituationwhenthetLPAREN_ARGlistcontainsaspecialexpression(onenotpresentinexpr).

Thisshouldbesufficientformostcases.Nowlet’scomparetheabovewithapracticalimplementation.

▼open_args(1)

open_args:call_args|tLPAREN_ARG')'

First,theruledealswithemptylists

▼open_args(2)

|tLPAREN_ARGcall_args2')'

call_args2:arg_value','argsopt_block_arg|arg_value','block_arg|arg_value','tSTARarg_valueopt_block_arg|arg_value','args','tSTARarg_valueopt_block_arg|assocsopt_block_arg|assocs','tSTARarg_valueopt_block_arg|arg_value','assocsopt_block_arg|arg_value','args','assocsopt_block_arg|arg_value','assocs','tSTARarg_valueopt_block_arg|arg_value','args','assocs','tSTARarg_valueopt_block_arg|tSTARarg_valueopt_block_arg|block_arg

Andcall_args2dealswithelementscontainingspecialtypessuchasassocs,passingofarraysorpassingofblocks.Withthis,thescopeisnowsufficientlybroad.

tLPAREN_ARG(2)

TheproblemIntheprevioussectionIsaidthattheexamplesprovidedshouldbesufficientfor“most”specialmethodcallexpressions.Isaid“most”becauseiteratorsarestillnotcovered.Forexample,thebelowstatementwillnotwork:

m(a){....}m(a)do....end

Inthissectionwewillonceagainlookatthepreviouslyintroducedpartswithsolvingthisprobleminmind.

Rule-levelsolutionLetusstartwiththerules.Thefirstparthereisallfamiliarrules,sofocusonthedo_blockpart

▼command_call

command_call:command|block_command

command:operationcommand_args

command_args:open_args

open_args:call_args|tLPAREN_ARG')'|tLPAREN_ARGcall_args2')'

block_command:block_call

block_call:commanddo_block

do_block:kDO_BLOCKopt_block_varcompstmt'}'|tLBRACE_ARGopt_block_varcompstmt'}'

Bothdoand{arecompletelynewsymbolskDO_BLOCKandtLBRACE_ARG.Whyisn’titkDOor'{'youask?Inthiskindofsituationthebestanswerisanexperiment,sowewilltryreplacingkDO_BLOCKwithkDOandtLBRACE_ARGwith'{'andprocessingthatwithyacc

%yaccparse.yconflicts:2shift/reduce,6reduce/reduce

Itconflictsbadly.Afurtherinvestigationrevealsthatthisstatementisthecause.

m(a),b{....}

Thatisbecausethiskindofstatementisalreadysupposedtowork.b{....}becomesprimary.AndnowarulehasbeenaddedthatconcatenatestheblockwithmThatresultsintwopossibleinterpretations:

m((a),b){....}m((a),(b{....}))

Thisisthecauseoftheconflict–namely,a2shift/reduceconflict.

Theotherconflicthastodowithdo〜end

m((a))do....end#Adddo〜endusingblock_callm((a))do....end#Adddo〜endusingprimary

Thesetwoconflict.Thisis6reduce/reduceconflict.

{〜}iteratorThisistheimportantpart.Asshownpreviously,youcanavoidaconflictbychangingthedoand'{'symbols.

▼yylex-'{'

3884case'{':3885if(IS_ARG()||lex_state==EXPR_END)3886c='{';/*block(primary)*/3887elseif(lex_state==EXPR_ENDARG)3888c=tLBRACE_ARG;/*block(expr)*/3889else3890c=tLBRACE;/*hash*/3891COND_PUSH(0);3892CMDARG_PUSH(0);3893lex_state=EXPR_BEG;3894returnc;

(parse.y)

IS_ARG()isdefinedas

▼IS_ARG

3104#defineIS_ARG()(lex_state==EXPR_ARG||lex_state==EXPR_CMDARG)

(parse.y)

Thus,whenthestateisEXPR_ENDARGitwillalwaysbefalse.Inotherwords,whenlex_stateisEXPR_ENDARG,itwillalwaysbecometLBRACE_ARG,sothekeytoeverythingisthetransitiontoEXPR_ENDARG.

EXPR_ENDARG

NowweneedtoknowhowtosetEXPR_ENDARGIusedgreptofindwhereitisassigned.

▼TransitiontoEXPR_ENDARG

open_args:call_args|tLPAREN_ARG{lex_state=EXPR_ENDARG;}')'|tLPAREN_ARGcall_args2{lex_state=EXPR_ENDARG;}')'

primary:tLPAREN_ARGexpr{lex_state=EXPR_ENDARG;}')'

That’sstrange.OnewouldexpectthetransitiontoEXPR_ENDARGtooccuraftertheclosingparenthesiscorrespondingtotLPAREN_ARG,butit’sactuallyassignedbefore')'IrangrepafewmoretimesthinkingtheremightbeotherpartssettingtheEXPR_ENDARGbutfoundnothing.

Maybethere’ssomemistake.Maybelex_stateisbeingchangedsomeotherway.Let’suserubylex-analysertovisualizethelex_statetransition.

%rubylex-analyser-e'm(a){nil}'+EXPR_BEGEXPR_BEGC"m"tIDENTIFIEREXPR_CMDARGEXPR_CMDARGS"("tLPAREN_ARGEXPR_BEG

0:condpush0:cmdpush1:cmdpush-EXPR_BEGC"a"tIDENTIFIEREXPR_CMDARGEXPR_CMDARG")"')'EXPR_END0:condlexpop1:cmdlexpop+EXPR_ENDARGEXPR_ENDARGS"{"tLBRACE_ARGEXPR_BEG0:condpush10:cmdpush0:cmdresumeEXPR_BEGS"nil"kNILEXPR_ENDEXPR_ENDS"}"'}'EXPR_END0:condlexpop0:cmdlexpopEXPR_END"\n"\nEXPR_BEG

Thethreebigbranchinglinesshowthestatetransitioncausedbyyylex().Ontheleftisthestatebeforeyylex()Themiddletwoarethewordtextanditssymbols.Finally,ontherightisthelex_stateafteryylex()

Theproblemherearepartsofsinglelinesthatcomeoutas+EXPR_ENDARG.Thisindicatesatransitionoccurringduringparseraction.Accordingtothis,forsomereasonanactionisexecutedafterreadingthe')'atransitiontoEXPR_ENDARGoccursand'{'isnicelychangedintotLBRACE_ARGThisisactuallyaprettyhigh-leveltechnique–generously(ab)usingtheLALRuptothe(1).

Abusingthelookaheadruby-ycanbringupadetaileddisplayoftheyaccparserengine.Thistimewewilluseittomorecloselytracetheparser.

%ruby-yce'm(a){nil}'2>&1|egrep'^Reading|Reducing'Reducingviarule1(line303),->@1Readingatoken:Nexttokenis304(tIDENTIFIER)Readingatoken:Nexttokenis340(tLPAREN_ARG)Reducingviarule446(line2234),tIDENTIFIER->operationReducingviarule233(line1222),->@6Readingatoken:Nexttokenis304(tIDENTIFIER)Readingatoken:Nexttokenis41(')')Reducingviarule392(line1993),tIDENTIFIER->variableReducingviarule403(line2006),variable->var_refReducingviarule256(line1305),var_ref->primaryReducingviarule198(line1062),primary->argReducingviarule42(line593),arg->exprReducingviarule260(line1317),->@9Reducingviarule261(line1317),tLPAREN_ARGexpr@9')'->primaryReadingatoken:Nexttokenis344(tLBRACE_ARG)::

Herewe’reusingtheoption-cwhichstopstheprocessatjustcompilingand-ewhichallowstogiveaprogramfromthecommandline.Andwe’reusinggreptosingleouttokenreadandreductionreports.

Startbylookingatthemiddleofthelist.')'isread.Nowlookattheend–thereduction(execution)ofembeddingaction(@9)finallyhappens.Indeed,thiswouldallowEXPR_ENDARGtobesetafterthe')'beforethe'{'Butisthisalwaysthecase?Let’stakeanotherlookatthepartwhereit’sset.

Rule1tLPAREN_ARG{lex_state=EXPR_ENDARG;}')'Rule2tLPAREN_ARGcall_args2{lex_state=EXPR_ENDARG;}')'Rule3tLPAREN_ARGexpr{lex_state=EXPR_ENDARG;}')'

Theembeddingactioncanbesubstitutedwithanemptyrule.Forexample,wecanrewritethisusingrule1withnochangeinmeaningwhatsoever.

target:tLPAREN_ARGtmp')'tmp:{lex_state=EXPR_ENDARG;}

Assumingthatthisisbeforetmp,it’spossiblethatoneterminalsymbolwillbereadbylookahead.Thuswecanskipthe(empty)tmpandreadthenext.Andifwearecertainthatlookaheadwilloccur,theassignmenttolex_stateisguaranteedtochangetoEXPR_ENDARGafter')'Butis')'certaintobereadbylookaheadinthisrule?

AscertaininglookaheadThisisactuallyprettyclear.Thinkaboutthefollowinginput.

m(){nil}#Am(a){nil}#Bm(a,b,c){nil}#C

Ialsotooktheopportunitytorewritetheruletomakeiteasiertounderstand(withnoactualchanges).

rule1:tLPAREN_ARGe1')'rule2:tLPAREN_ARGone_arge2')'rule3:tLPAREN_ARGmore_argse3')'

e1:/*empty*/

e2:/*empty*/e3:/*empty*/

First,thecaseofinputA.Readingupto

m(#...tLPAREN_ARG

wearrivebeforethee1.Ife1isreducedhere,anotherrulecannotbechosenanymore.Thus,alookaheadoccurstoconfirmwhethertoreducee1andcontinuewithrule1tothebitterendortochooseadifferentrule.Accordingly,iftheinputmatchesrule1itiscertainthat')'willbereadbylookahead.

OntoinputB.First,readinguptohere

m(#...tLPAREN_ARG

Herealookaheadoccursforthesamereasonasdescribedabove.Furtherreadinguptohere

m(a#...tLPAREN_ARG'('tIDENTIFIER

Anotherlookaheadoccurs.Itoccursbecausedependingonwhetherwhatfollowsisa','ora')'adecisionismadebetweenrule2andrule3Ifwhatfollowsisa','thenitcanonlybeacommatoseparateparameters,thusrule3therulefortwoormoreparameters,ischosen.Thisisalsotrueiftheinputisnotasimpleabutsomethinglikeaniforliteral.Whentheinputiscomplete,alookaheadoccurstochoosebetweenrule2andrule3-therulesfor

oneparameterandtwoormoreparametersrespectively.

Thepresenceofaseparateembeddingactionispresentbefore')'ineveryrule.There’snogoingbackafteranactionisexecuted,sotheparserwilltrytopostponeexecutinganactionuntilitisascertainaspossible.Forthatreason,situationswhenthiscertaintycannotbegainedwithasinglelookaheadshouldbeexcludedwhenbuildingaparserasitisaconflict.

ProceedingtoinputC.

m(a,b,c

Atthispointanythingotherthanrule3isunlikelysowe’renotexpectingalookahead.Andyet,thatiswrong.Ifthefollowingis'('thenit’samethodcall,butifthefollowingis','or')'itneedstobeavariablereference.Basically,thistimealookaheadisneededtoconfirmparameterelementsinsteadofembeddingactionreduction.

Butwhatabouttheotherinputs?Forexample,whatifthethirdparameterisamethodcall?

m(a,b,c(....)#...','method_call

Onceagainalookaheadisnecessarybecauseachoiceneedstobemadebetweenshiftandreductiondependingonwhetherwhatfollowsis','or')'.Thus,inthisruleinallinstancesthe')'isreadbeforetheembeddingactionisexecuted.Thisisquite

complicatedandmorethanalittleimpressive.

Butwoulditbepossibletosetlex_stateusinganormalactioninsteadofanembeddingaction?Forexample,likethis:

|tLPAREN_ARG')'{lex_state=EXPR_ENDARG;}

Thiswon’tdobecauseanotherlookaheadislikelytooccurbeforetheactionisreduced.Thistimethelookaheadworkstoourdisadvantage.WiththisitshouldbeclearthatabusingthelookaheadofaLALRparserisprettytrickyandnotsomethinganoviceshouldbedoing.

do〜enditeratorSofarwe’vedealtwiththe{〜}iterator,butwestillhavedo〜end

left.Sincethey’rebothiterators,onewouldexpectthesamesolutionstowork,butitisn’tso.Theprioritiesaredifferent.Forexample,

ma,b{....}#m(a,(b{....}))ma,bdo....end#m(a,b)do....end

Thusit’sonlyappropriatetodealwiththemdifferently.

Thatsaid,insomesituationsthesamesolutionsdoapply.Theexamplebelowisonesuchsituation

m(a){....}m(a)do....end

Intheend,ouronlyoptionistolookattherealthing.Sincewe’redealingwithdohere,weshouldlookinthepartofyylex()thathandlesreservedwords.

▼yylex-Identifiers-Reservedwords-do

4183if(kw->id[0]==kDO){4184if(COND_P())returnkDO_COND;4185if(CMDARG_P()&&state!=EXPR_CMDARG)4186returnkDO_BLOCK;4187if(state==EXPR_ENDARG)4188returnkDO_BLOCK;4189returnkDO;4190}

(parse.y)

ThistimeweonlyneedthepartthatdistinguishesbetweenkDO_BLOCKandkDO.IgnorekDO_CONDOnlylookatwhat’salwaysrelevantinafinite-statescanner.

Thedecision-makingpartusingEXPR_ENDARGisthesameastLBRACE_ARGsoprioritiesshouldn’tbeanissuehere.Similarlyto'{'therightcourseofactionisprobablytomakeitkDO_BLOCK

((errata:Inthefollowingcase,prioritiesshouldhaveaninfluence.(Butitdoesnotintheactualcode.Itmeansthisisabug.)

mm(a){...}#Thisshouldbeinterpretedasm(m(a){...}),#butisinterpretedasm(m(a)){...}mm(a)do...end#asthesameasthis:m(m(a))do...end

))

TheproblemlieswithCMDARG_P()andEXPR_CMDARG.Let’slookatboth.

CMDARG_P()

▼cmdarg_stack

91staticstack_typecmdarg_stack=0;92#defineCMDARG_PUSH(n)(cmdarg_stack=(cmdarg_stack<<1)|((n)&1))93#defineCMDARG_POP()(cmdarg_stack>>=1)94#defineCMDARG_LEXPOP()do{\95intlast=CMDARG_P();\96cmdarg_stack>>=1;\97if(last)cmdarg_stack|=1;\98}while(0)99#defineCMDARG_P()(cmdarg_stack&1)

(parse.y)

Thestructureandinterface(macro)ofcmdarg_stackiscompletelyidenticaltocond_stack.It’sastackofbits.Sinceit’sthesame,wecanusethesamemeanstoinvestigateit.Let’slistuptheplaceswhichuseit.First,duringtheactionwehavethis:

command_args:{$<num>$=cmdarg_stack;CMDARG_PUSH(1);}open_args{/*CMDARG_POP()*/cmdarg_stack=$<num>1;$$=$2;}

$<num>$representstheleftvaluewithaforcedcasting.Inthiscaseitcomesoutasthevalueoftheembeddingactionitself,soitcanbeproducedinthenextactionwith$<num>1.Basically,it’sastructurewherecmdarg_stackishiddenin$$beforeopen_argsandthenrestoredinthenextaction.

Butwhyuseahide-restoresysteminsteadofasimplepush-pop?Thatwillbeexplainedattheendofthissection.

Searchingyylex()formoreCMDARGrelations,Ifoundthis.

Token Relation'(''[''{' CMDARG_PUSH(0)')'']''}' CMDARG_LEXPOP()

Basically,aslongasitisenclosedinparentheses,CMDARG_P()isfalse.

Considerboth,anditcanbesaidthatwhencommand_args,aparameterforamethodcallwithparenthesesomitted,isnotenclosedinparenthesesCMDARG_P()istrue.

EXPR_CMDARG

Nowlet’stakealookatonemorecondition–EXPR_CMDARGLikebefore,letuslookforplacewhereatransitiontoEXPR_CMDARGoccurs.

▼yylex-Identifiers-StateTransitions

4201if(lex_state==EXPR_BEG||4202lex_state==EXPR_MID||4203lex_state==EXPR_DOT||4204lex_state==EXPR_ARG||4205lex_state==EXPR_CMDARG){4206if(cmd_state)4207lex_state=EXPR_CMDARG;4208else4209lex_state=EXPR_ARG;4210}4211else{4212lex_state=EXPR_END;4213}

(parse.y)

Thisiscodethathandlesidentifiersinsideyylex()Leavingasidethatthereareabunchoflex_statetestsinhere,let’slookfirstatcmd_stateAndwhatisthis?

▼cmd_state

3106staticint3107yylex()3108{3109staticIDlast_id=0;3110registerintc;3111intspace_seen=0;3112intcmd_state;31133114if(lex_strterm){/*……omitted……*/3132}3133cmd_state=command_start;3134command_start=Qfalse;

(parse.y)

Turnsoutit’sanyylexlocalvariable.Furthermore,aninvestigationusinggreprevealedthathereistheonlyplacewhereitsvalueisaltered.Thismeansit’sjustatemporaryvariableforstoringcommand_startduringasinglerunofyylex

Whendoescommand_startbecometrue,then?

▼command_start

2327staticintcommand_start=Qtrue;

2334staticNODE*2335yycompile(f,line)2336char*f;2337intline;2338{:2380command_start=1;

staticintyylex(){:case'\n':/*……omitted……*/3165command_start=Qtrue;3166lex_state=EXPR_BEG;3167return'\n';

3821case';':3822command_start=Qtrue;

3841case'(':3842command_start=Qtrue;

(parse.y)

Fromthisweunderstandthatcommand_startbecomestruewhenoneoftheparse.ystaticvariables\n;(isscanned.

Summingupwhatwe’vecovereduptonow,first,when\n;(isread,command_startbecomestrueandduringthenextyylex()runcmd_statebecomestrue.

Andhereisthecodeinyylex()thatusescmd_state

▼yylex-Identifiers-Statetransitions

4201if(lex_state==EXPR_BEG||4202lex_state==EXPR_MID||4203lex_state==EXPR_DOT||4204lex_state==EXPR_ARG||4205lex_state==EXPR_CMDARG){4206if(cmd_state)4207lex_state=EXPR_CMDARG;4208else4209lex_state=EXPR_ARG;4210}4211else{4212lex_state=EXPR_END;4213}

(parse.y)

Fromthisweunderstandthefollowing:whenafter\n;(thestateisEXPR_BEGMIDDOTARGCMDARGandanidentifierisread,atransitiontoEXPR_CMDARGoccurs.However,lex_statecanonlybecomeEXPR_BEGfollowinga\n;(sowhenatransitionoccurstoEXPR_CMDARGthelex_statelosesitsmeaning.Thelex_staterestrictionisonlyimportanttotransitionsdealingwithEXPR_ARG

BasedontheabovewecannowthinkofasituationwherethestateisEXPR_CMDARG.Forexample,seetheonebelow.Theunderscoreisthecurrentposition.

m_m(m_mm_

((errata:Thethirdone“mm_”isnotEXPR_CMDARG.(ItisEXPR_ARG.)))

ConclusionLetusnowreturntothedodecisioncode.

▼yylex-Identifiers-Reservedwords-kDO-kDO_BLOCK

4185if(CMDARG_P()&&state!=EXPR_CMDARG)4186returnkDO_BLOCK;

(parse.y)

Insidetheparameterofamethodcallwithparenthesesomittedbutnotbeforethefirstparameter.Thatmeansfromthesecondparameterofcommand_callonward.Basically,likethis:

marg,argdo....endm(arg),argdo....end

WhyisthecaseofEXPR_CMDARGexcluded?ThisexampleshouldclearItup

mdo....end

Thispatterncanalreadybehandledusingthedo〜enditeratorwhichuseskDOandisdefinedinprimaryThus,includingthatcasewouldcauseanotherconflict.

RealityandtruthDidyouthinkwe’redone?Notyet.Certainly,thetheoryisnowcomplete,butonlyifeverythingthathasbeenwritteniscorrect.Asamatteroffact,thereisonefalsehoodinthissection.Well,moreaccurately,itisn’tafalsehoodbutaninexactstatement.It’sinthepartaboutCMDARG_P()

Actually,CMDARG_P()becomestruewheninsidecommand_args,thatistosay,insidetheparameterofamethodcallwithparenthesesomitted.

Butwhereexactlyis“insidetheparameterofamethodcallwithparenthesesomitted”?Onceagain,letususerubylex-analysertoinspectindetail.

%rubylex-analyser-e'ma,a,a,a;'+EXPR_BEGEXPR_BEGC"m"tIDENTIFIEREXPR_CMDARGEXPR_CMDARGS"a"tIDENTIFIEREXPR_ARG1:cmdpush-EXPR_ARG","','EXPR_BEGEXPR_BEG"a"tIDENTIFIEREXPR_ARGEXPR_ARG","','EXPR_BEGEXPR_BEG"a"tIDENTIFIEREXPR_ARG

EXPR_ARG","','EXPR_BEGEXPR_BEG"a"tIDENTIFIEREXPR_ARGEXPR_ARG";"';'EXPR_BEG0:cmdresumeEXPR_BEGC"\n"'EXPR_BEG

The1:cmdpush-intherightcolumnisthepushtocmd_stack.Whentherightmostdigitinthatlineis1CMDARG_P()becometrue.Tosumup,theperiodofCMDARG_P()canbedescribedas:

FromimmediatelyafterthefirstparameterofamethodcallwithparenthesesomittedTotheterminalsymbolfollowingthefinalparameter

But,verystrictlyspeaking,eventhisisstillnotentirelyaccurate.

%rubylex-analyser-e'ma(),a,a;'+EXPR_BEGEXPR_BEGC"m"tIDENTIFIEREXPR_CMDARGEXPR_CMDARGS"a"tIDENTIFIEREXPR_ARG1:cmdpush-EXPR_ARG"("'('EXPR_BEG0:condpush10:cmdpushEXPR_BEGC")"')'EXPR_END0:condlexpop1:cmdlexpopEXPR_END","','EXPR_BEGEXPR_BEG"a"tIDENTIFIEREXPR_ARGEXPR_ARG","','EXPR_BEGEXPR_BEG"a"tIDENTIFIEREXPR_ARGEXPR_ARG";"';'EXPR_BEG0:cmdresumeEXPR_BEGC"\n"'EXPR_BEG

Whenthefirstterminalsymbolofthefirstparameterhasbeen

read,CMDARG_P()istrue.Therefore,thecompleteanswerwouldbe:

FromthefirstterminalsymbolofthefirstparameterofamethodcallwithparenthesesomittedTotheterminalsymbolfollowingthefinalparameter

Whatrepercussionsdoesthisfacthave?RecallthecodethatusesCMDARG_P()

▼yylex-Identifiers-Reservedwords-kDO-kDO_BLOCK

4185if(CMDARG_P()&&state!=EXPR_CMDARG)4186returnkDO_BLOCK;

(parse.y)

EXPR_CMDARGstandsfor“Beforethefirstparameterofcommand_call”andisexcluded.Butwait,thismeaningisalsoincludedinCMDARG_P().Thus,thefinalconclusionofthissection:

EXPR_CMDARGiscompletelyuseless

Truthbetold,whenIrealizedthis,Ialmostbrokedowncrying.IwassureithadtomeanSOMETHINGandspentenormouseffortanalyzingthesource,butcouldn’tunderstandanything.Finally,Iranallkindoftestsonthecodeusingrubylex-analyserandarrivedattheconclusionthatithasnomeaningwhatsoever.

Ididn’tspendsomuchtimedoingsomethingmeaninglessjusttofillupmorepages.Itwasanattempttosimulateasituationlikely

tohappeninreality.Noprogramisperfect,allprogramscontaintheirownmistakes.Complicatedsituationsliketheonediscussedherearewheremistakesoccurmosteasily,andwhentheydo,readingthesourcematerialwiththeassumptionthatit’sflawlesscanreallybackfire.Intheend,whenreadingthesourcecode,youcanonlytrustthewhatactuallyhappens.

Hopefully,thiswillteachyoutheimportanceofdynamicanalysis.Wheninvestigatingsomething,focusonwhatreallyhappens.Thesourcecodewillnottellyoueverything.Itcan’ttellanythingotherthanwhatthereaderinfers.

Andwiththisveryusefulsermon,Iclosethechapter.

((errata:Thisconfidentlywrittenconclusionwaswrong.WithoutEXPR_CMDARG,forinstance,thisprogram“m(mdoend)”cannotbeparsed.Thisisanexampleofthefactthatcorrectnessisnotprovedevenifdynamicanalysesaredonesomanytimes.))

StillnottheendAnotherthingIforgot.Ican’tendthechapterwithoutexplainingwhyCMDARG_P()takesthatvalue.Here’stheproblematicpart:

▼command_args

1209command_args:{1210$<num>$=cmdarg_stack;1211CMDARG_PUSH(1);

1212}1213open_args1214{1215/*CMDARG_POP()*/1216cmdarg_stack=$<num>1;1217$$=$2;1218}

1221open_args:call_args

(parse.y)

Allthingsconsidered,thislookslikeanotherinfluencefromlookahead.command_argsisalwaysinthefollowingcontext:

tIDENTIFIER_

Thus,thislookslikeavariablereferenceoramethodcall.Ifit’savariablereference,itneedstobereducedtovariableandifit’samethodcallitneedstobereducedtooperationWecannotdecidehowtoproceedwithoutemployinglookahead.Thusalookaheadalwaysoccursattheheadofcommand_argsandafterthefirstterminalsymbolofthefirstparameterisread,CMDARG_PUSH()isexecuted.

ThereasonwhyPOPandLEXPOPexistseparatelyincmdarg_stackisalsohere.Observethefollowingexample:

%rubylex-analyser-e'mm(a),a'-e:1:warning:parenthesizeargument(s)forfutureversion+EXPR_BEGEXPR_BEGC"m"tIDENTIFIEREXPR_CMDARGEXPR_CMDARGS"m"tIDENTIFIEREXPR_ARG1:cmdpush-

EXPR_ARGS"("tLPAREN_ARGEXPR_BEG0:condpush10:cmdpush101:cmdpush-EXPR_BEGC"a"tIDENTIFIEREXPR_CMDARGEXPR_CMDARG")"')'EXPR_END0:condlexpop11:cmdlexpop+EXPR_ENDARGEXPR_ENDARG","','EXPR_BEGEXPR_BEGS"a"tIDENTIFIEREXPR_ARGEXPR_ARG"\n"\nEXPR_BEG10:cmdresume0:cmdresume

Lookingonlyatthepartsrelatedtocmdandhowtheycorrespondtoeachother…

1:cmdpush-parserpush(1)10:cmdpushscannerpush101:cmdpush-parserpush(2)11:cmdlexpopscannerpop10:cmdresumeparserpop(2)0:cmdresumeparserpop(1)

Thecmdpush-withaminussignattheendisaparserpush.Basically,pushandpopdonotcorrespond.Originallythereweresupposedtobetwoconsecutivepush-andthestackwouldbecome110,butduetothelookaheadthestackbecame101instead.CMDARG_LEXPOP()isalast-resortmeasuretodealwiththis.Thescanneralwayspushes0sonormallywhatitpopsshouldalsoalwaysbe0.Whenitisn’t0,wecanonlyassumethatit’s1duetotheparserpushbeinglate.Thus,thevalueisleft.

Conversely,atthetimeoftheparserpopthestackissupposedtobe

backinnormalstateandusuallypopshouldn’tcauseanytrouble.Whenitdoesn’tdothat,thereasonisbasicallythatitshouldworkright.Whetherpoppingorhidingin$$andrestoring,theprocessisthesame.Whenyouconsiderallthefollowingalterations,it’sreallyimpossibletotellhowlookahead’sbehaviorwillchange.Moreover,thisproblemappearsinagrammarthat’sgoingtobeforbiddeninthefuture(that’swhythereisawarning).Tomakesomethinglikethiswork,thetrickistoconsidernumerouspossiblesituationsandrespondthem.AndthatiswhyIthinkthiskindofimplementationisrightforRuby.Thereinliestherealsolution.

TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera

CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License

RubyHackingGuide

Chapter12:Syntaxtree

construction

Node

NODE

AsI’vealreadydescribed,aRubyprogramisfirstconvertedtoasyntaxtree.Tobemoreprecise,asyntaxtreeisatreestructuremadeofstructscalled“nodes”.Inruby,allnodesareoftypeNODE.

▼NODE

128typedefstructRNode{129unsignedlongflags;130char*nd_file;131union{132structRNode*node;133IDid;134VALUEvalue;135VALUE(*cfunc)(ANYARGS);136ID*tbl;137}u1;138union{139structRNode*node;140IDid;141intargc;

142VALUEvalue;143}u2;144union{145structRNode*node;146IDid;147longstate;148structglobal_entry*entry;149longcnt;150VALUEvalue;151}u3;152}NODE;

(node.h)

AlthoughyoumightbeabletoinferfromthestructnameRNode,nodesareRubyobjects.Thismeansthecreationandreleaseofnodesaretakencareofbytheruby’sgarbagecollector.

Therefore,flagsnaturallyhasthesameroleasbasic.flagsoftheobjectstruct.ItmeansthatT_NODEwhichisthetypeofastructandflagssuchasFL_FREEZEarestoredinit.AsforNODE,inadditiontothese,itsnodetypeisstoredinflags.

Whatdoesitmean?Sinceaprogramcouldcontainvariouselementssuchasifandwhileanddefandsoon,therearealsovariouscorrespondingnodetypes.Thethreeavailableunionarecomplicated,buthowtheseunionsareusedisdecidedtoonlyonespecificwayforeachnode.Forexample,thebelowtableshowsthecasewhenitisNODE_IFthatisthenodeofif.

member unionmember roleu1 u1.node theconditionexpressionu2 u2.node thebodyoftrue

u3 u3.node thebodyoffalse

And,innode.h,themacrostoaccesseachunionmemberareavailable.

▼themacrostoaccessNODE

166#definend_headu1.node167#definend_alenu2.argc168#definend_nextu3.node169170#definend_condu1.node171#definend_bodyu2.node172#definend_elseu3.node173174#definend_origu3.value::

(node.h)

Forexample,theseareusedasfollows:

NODE*head,*tail;head->nd_next=tail;/*head->u3.node=tail*/

Inthesourcecode,it’salmostcertainthatthesemacrosareused.AveryfewexceptionsareonlythetwoplaceswherecreatingNODEinparse.yandwheremarkingNODEingc.c.

Bytheway,whatisthereasonwhysuchmacrosareused?Foronething,itmightbebecauseit’scumbersometoremembernumberslikeu1thatarenotmeaningfulbyjustthemselves.Butwhatis

moreimportantthanthatis,thereshouldbenoproblemifthecorrespondingnumberischangedandit’spossiblethatitwillactuallybechanged.Forexample,sinceaconditionclauseofifdoesnothavetobestoredinu1,someonemightwanttochangeittou2forsomereason.Butifu1isdirectlyused,heneedstomodifyalotofplacesalloverthesourcecodes,itisinconvenient.SincenodesarealldeclaredasNODE,it’shardtofindnodesthatrepresentif.Bypreparingthemacrostoaccess,thiskindoftroublecanbeavoidedandconverselywecandeterminethenodetypesfromthemacros.

NodeTypeIsaidthatintheflagsofaNODEstructitsnodetypeisstored.We’lllookatinwhatformthisinformationisstored.Anodetypecanbesetbynd_set_type()andobtainedbynd_type().

▼nd_typend_set_type

156#definend_type(n)(((RNODE(n))->flags>>FL_USHIFT)&0xff)157#definend_set_type(n,t)\158RNODE(n)->flags=((RNODE(n)->flags&~FL_UMASK)\|(((t)<<FL_USHIFT)&FL_UMASK))

(node.h)

▼FL_USHIFTFL_UMASK

418#defineFL_USHIFT11429#defineFL_UMASK(0xff<<FL_USHIFT)

(ruby.h)

Itwon’tbesomuchtroubleifwe’llkeepfocusonaroundnd_type.Fig.1showshowitseemslike.

Fig.1:TheusageofRNode.flags

And,sincemacroscannotbeusedfromdebuggers,thenodetype()functionisalsoavailable.

▼nodetype

4247staticenumnode_type4248nodetype(node)/*fordebug*/4249NODE*node;4250{4251return(enumnode_type)nd_type(node);4252}

(parse.y)

FileNameandLineNumberThend_fileofaNODEholds(thepointerto)thenameofthefilewherethetextthatcorrespondstothisnodeexists.Sincethere’s

thefilename,wenaturallyexpectthatthere’salsothelinenumber,butthecorrespondingmembercouldnotbefoundaroundhere.Actually,thelinenumberisbeingembeddedtoflagsbythefollowingmacro:

▼nd_linend_set_line

160#defineNODE_LSHIFT(FL_USHIFT+8)161#defineNODE_LMASK(((long)1<<(sizeof(NODE*)*CHAR_BIT-NODE_LSHIFT))-1)162#definend_line(n)\((unsignedint)((RNODE(n)->flags>>NODE_LSHIFT)&NODE_LMASK))163#definend_set_line(n,l)\164RNODE(n)->flags=((RNODE(n)->flags&~(-1<<NODE_LSHIFT))\|(((l)&NODE_LMASK)<<NODE_LSHIFT))

(node.h)

nd_set_line()isfairlyspectacular.However,asthenamessuggest,itiscertainthatnd_set_line()andnd_lineworkssymmetrically.Thus,ifwefirstexaminethesimplernd_line()andgrasptherelationshipbetweentheparameters,there’snoneedtoanalyzend_set_line()inthefirstplace.

ThefirstthingisNODE_LSHIFT,asyoucanguessfromthedescriptionofthenodetypesoftheprevioussection,itisthenumberofusedbitsinflags.FL_USHIFTisreservedbysystemofruby(11bits,ruby.h),8bitsareforitsnodetype.

ThenextthingisNODE_LMASK.

sizeof(NODE*)*CHAR_BIT-NODE_LSHIFT

Thisisthenumberoftherestofthebits.Let’sassumeitisrestbits.Thismakesthecodealotsimpler.

#defineNODE_LMASK(((long)1<<restbits)-1)

Fig.2showswhattheabovecodeseemstobedoing.Notethataborrowoccurswhensubtracting1.WecaneventuallyunderstandthatNODE_LMASKisasequencefilledwith1whosesizeisthenumberofthebitsthatarestillavailable.

Fig.2:NODE_LMASK

Now,let’slookatnd_line()again.

(RNODE(n)->flags>>NODE_LSHIFT)&NODE_LMASK

Bytherightshift,theunusedspaceisshiftedtotheLSB.ThebitwiseANDleavesonlytheunusedspace.Fig.3showshowflagsisused.SinceFL_USHIFTis11,in32-bitmachine32-(11+8)=13bitsareavailableforthelinenumber.

Fig.3:HowflagsareusedatNODE

…Thismeans,ifthelinenumbersbecomesbeyond2^13=8192,thelinenumbersshouldwronglybedisplayed.Let’stry.

File.open('overflow.rb','w'){|f|10000.times{f.puts}f.puts'raise'}

Withmy686machine,rubyoverflow.rbproperlydisplayed1809asalinenumber.I’vesucceeded.However,ifyouuse64-bitmachine,youneedtocreatealittlebiggerfileinordertosuccessfullyfail.

rb_node_newnode()

Lastlylet’slookatthefunctionrb_node_newnode()thatcreatesanode.

▼rb_node_newnode()

4228NODE*4229rb_node_newnode(type,a0,a1,a2)4230enumnode_typetype;4231NODE*a0,*a1,*a2;

4232{4233NODE*n=(NODE*)rb_newobj();42344235n->flags|=T_NODE;4236nd_set_type(n,type);4237nd_set_line(n,ruby_sourceline);4238n->nd_file=ruby_sourcefile;42394240n->u1.node=a0;4241n->u2.node=a1;4242n->u3.node=a2;42434244returnn;4245}

(parse.y)

We’veseenrb_newobj()intheChapter5:Garbagecollection.ItisthefunctiontogetavacantRVALUE.ByattachingtheT_NODEstruct-typeflagtoit,theinitializationasaVALUEwillcomplete.Ofcourse,it’spossiblethatsomevaluesthatarenotoftypeNODE*arepassedforu1u2u3,butreceivedasNODE*forthetimebeing.Sincethesyntaxtreesofrubydoesnotcontaindoubleandsuch,ifthevaluesarereceivedaspointers,itwillneverbetoosmallinsize.

Fortherestpart,youcanforgetaboutthedetailsyou’velearnedsofar,andassumeNODEis

flags

nodetype

nd_line

nd_file

u1

u2

u3

astructtypethathastheabovesevenmembers.

SyntaxTreeConstruction

Theroleoftheparseristoconvertthesourcecodethatisabytesequencetoasyntaxtree.Althoughthegrammarpassed,itdoesnotfinishevenhalfofthetask,sowehavetoassemblenodesandcreateatree.Inthissection,we’lllookattheconstructionprocessofthatsyntaxtree.

YYSTYPE

Essentiallythischapterisaboutactions,thusYYSTYPEwhichisthetypeof$$or$1becomesimportant.Let’slookatthe%unionofrubyfirst.

▼%uniondeclaration

170%union{171NODE*node;172IDid;173intnum;174structRVarmap*vars;175}

(parse.y)

structRVarmapisastructusedbytheevaluatorandholdsablocklocalvariable.Youcantelltherest.Themostusedoneisofcoursenode.

LandscapewithSyntaxTreesImentionedthatlookingatthefactfirstisatheoryofcodereading.Sincewhatwewanttoknowthistimeishowthegeneratedsyntaxtreeis,weshouldstartwithlookingattheanswer(thesyntaxtree).

It’salsoniceusingdebuggerstoobserveeverytime,butyoucanvisualizethesyntaxtreemorehandilybyusingthetoolnodedumpcontainedintheattachedCD-ROM,ThistoolisoriginallytheNodeDumpmadebyPragmaticProgrammersandremodeledforthisbook.Theoriginalversionshowsquiteexplanatoryoutput,butthisremodeledversiondeeplyanddirectlydisplaystheappearanceofthesyntaxtree.

Forexample,inordertodumpthesimpleexpressionm(a),youcandoasfollows:

%ruby-rnodedump-e'm(a)'NODE_NEWLINEnd_file="-e"nd_nth=1nd_next:NODE_FCALLnd_mid=9617(m)nd_args:

NODE_ARRAYnd_alen=1nd_head:NODE_VCALLnd_mid=9625(a)nd_next=(null)

The-roptionisusedtospecifythelibrarytobeload,andthe-eisusedtopassaprogram.Then,thesyntaxtreeexpressionoftheprogramwillbedumped.

I’llbrieflyexplainabouthowtoseethecontent.NODE_NEWLINEandNODE_FCALLandsucharethenodetypes.Whatarewrittenatthesameindentlevelofeachnodearethecontentsofitsnodemembers.Forexample,therootisNODE_NEWLINE,andithasthethreemembers:nd_filend_nthnd_next.nd_filepointstothe"-e"stringofC,andng_nthpointstothe1integerofC,andnd_nextholdsthenextnodeNODE_CALL.Butsincetheseexplanationintextareprobablynotintuitive,IrecommendyoutoalsocheckFig.4atthesametime.

Fig.4:SyntaxTree

I’llexplainthemeaningofeachnode.NODE_CALLisaFunctionCALL.NODE_ARRAYisasitsnamesuggeststhenodeofarray,andhereitexpressesthelistofarguments.NODE_VCALLisaVariableorCALL,areferencetoundefinedlocalvariablewillbecomethis.

Then,whatisNODE_NEWLINE?Thisisthenodetojointhenameofthecurrentlyexecutedfileandthelinenumberatruntimeandissetforeachstmt.Therefore,whenonlythinkingaboutthemeaningoftheexecution,thisnodecanbeignored.Whenyourequirenodedump-shortinsteadofnodedump,distractionslikeNODE_NEWLINEareleftoutinthefirstplace.Sinceitiseasiertoseeifitissimple,nodedump-shortwillbeusedlateronexceptforwhenparticularlywritten.

Now,we’lllookatthethreetypeofcomposingelementsinordertograsphowthewholesyntaxtreeis.Thefirstoneistheleavesofasyntaxtree.Next,we’lllookatexpressionsthatarecombinationsofthatleaves,thismeanstheyarebranchesofasyntaxtree.Thelastoneisthelisttolistupthestatementsthatisthetrunkofasyntaxtreeinotherwords.

LeafFirst,let’sstartwiththeedgesthataretheleavesofthesyntaxtree.Literalsandvariablereferencesandsoon,amongtherules,theyarewhatbelongtoprimaryandareparticularlysimpleevenamongtheprimaryrules.

%ruby-rnodedump-short-e'1'NODE_LITnd_lit=1:Fixnum

1asanumericvalue.There’snotanytwist.However,noticethatwhatisstoredinthenodeisnot1ofCbut1ofRuby(1ofFixnum).Thisisbecause…

%ruby-rnodedump-short-e':sym'NODE_LITnd_lit=9617:Symbol

Thisway,SymbolisrepresentedbythesameNODE_LITwhenitbecomesasyntaxtree.Astheaboveexample,VALUEisalwaysstoredinnd_litsoitcanbehandledcompletelyinthesamewaywhether

itisaSymboloraFixnumwhenexecuting.Inthisway,allweneedtodowhendealingwithitareretrievingthevalueinnd_litandreturningit.Sincewecreateasyntaxtreeinordertoexecuteit,designingitsothatitbecomesconvenientwhenexecutingistherightthingtodo.

%ruby-rnodedump-short-e'"a"'NODE_STRnd_lit="a":String

Astring.ThisisalsoaRubystring.Stringliteralsarecopiedwhenactuallyused.

%ruby-rnodedump-e'[0,1]'NODE_NEWLINEnd_file="-e"nd_nth=1nd_next:NODE_ARRAYnd_alen=2nd_head:NODE_LITnd_lit=0:Fixnumnd_next:NODE_ARRAYnd_alen=1nd_head:NODE_LITnd_lit=1:Fixnumnd_next=(null)

Array.Ican’tsaythisisaleaf,butlet’sallowthistobeherebecauseit’salsoaliteral.ItseemslikealistofNODE_ARRAYhungwitheachelementnode.ThereasonwhyonlyinthiscaseIdidn’tusenodedump-shortis…youwillunderstandafterfinishingtoread

thissection.

BranchNext,we’llfocuson“combinations”thatarebranches.ifwillbetakenasanexample.

if

Ifeellikeifisalwaysusedasanexample,that’sbecauseitsstructureissimpleandthere’snotanyreaderwhodon’tknowaboutif,soitisconvenientforwriters.

Anyway,thisisanexampleofif.Forexample,let’sconvertthiscodetoasyntaxtree.

▼TheSourceProgram

iftrue'trueexpr'else'falseexpr'end

▼Itssyntaxtreeexpression

NODE_IFnd_cond:NODE_TRUEnd_body:NODE_STRnd_lit="trueexpr":String

nd_else:NODE_STRnd_lit="falseexpr":String

Here,thepreviouslydescribednodedump-shortisused,soNODE_NEWLINEdisappeared.nd_condisthecondition,nd_bodyisthebodyofthetruecase,nd_elseisthebodyofthefalsecase.

Then,let’slookatthecodetobuildthis.

▼ifrule

1373|kIFexpr_valuethen1374compstmt1375if_tail1376kEND1377{1378$$=NEW_IF(cond($2),$4,$5);1379fixpos($$,$2);1380}

(parse.y)

ItseemsthatNEW_IF()isthemacrotocreateNODE_IF.Amongthevaluesofthesymbols,$2$4$5areused,thusthecorrespondencesbetweenthesymbolsoftheruleand$nare:

kIFexpr_valuethencompstmtif_tailkEND$1$2$3$4$5$6NEW_IF(expr_value,compstmt,if_tail)

thisway.Inotherwords,expr_valueistheconditionexpression,compstmt($4)isthecaseoftrue,if_tailisthecaseoffalse.

Ontheotherhand,themacrostocreatenodesareallnamedNEW_xxxx,andtheyaredefinednode.h.Let’slookatNEW_IF().

▼NEW_IF()

243#defineNEW_IF(c,t,e)rb_node_newnode(NODE_IF,c,t,e)

(node.h)

Asfortheparameters,itseemsthatcrepresentscondition,trepresentsthen,anderepresentselserespectively.Asdescribedattheprevioussection,theorderofmembersofanodeisnotsomeaningful,soyoudon’tneedtobecarefulaboutparameternamesinthiskindofplace.

And,thecode()whichprocessesthenodeoftheconditionexpressionintheactionisasemanticanalysisfunction.Thiswillbedescribedlater.

Additionally,fixpos()correctsthelinenumber.NODEisinitializedwiththefilenameandthelinenumberofthetimewhenitis“created”.However,forinstance,thecodeofifshouldalreadybeparsedbyendbythetimewhencreatingNODE_IF.Thus,thelinenumberwouldgowrongifitremainsuntouched.Therefore,itneedstobecorrectedbyfixpos().

fixpos(dest,src)

Thisway,thelinenumberofthenodedestissettotheoneofthe

nodesrc.Asforif,thelinenumberoftheconditionexpressionbecomesthelinenumberofthewholeifexpression.

elsif

Subsequently,let’slookattheruleofif_tail.

▼if_tail

1543if_tail:opt_else1544|kELSIFexpr_valuethen1545compstmt1546if_tail1547{1548$$=NEW_IF(cond($2),$4,$5);1549fixpos($$,$2);1550}

1553opt_else:none1554|kELSEcompstmt1555{1556$$=$2;1557}

(parse.y)

First,thisruleexpresses“alistendswithopt_elseafterzeroormorenumberofelsifclauses”.That’sbecause,if_tailappearsagainandagainwhileelsifcontinues,itdisappearswhenopt_elsecomesin.Wecanunderstandthisbyextractingarbitrarytimes.

if_tail:kELSIF....if_tailif_tail:kELSIF....kELSIF....if_tailif_tail:kELSIF....kELSIF....kELSIF....if_tailif_tail:kELSIF....kELSIF....kELSIF....opt_else

if_tail:kELSIF....kELSIF....kELSIF....kELSEcompstmt

Next,let’sfocusontheactions,surprisingly,elsifusesthesameNEW_IF()asif.Itmeans,thebelowtwoprogramswilllosethedifferenceaftertheybecomesyntaxtrees.

ifcond1ifcond1body1body1elsifcond2elsebody2ifcond2elsifcond3body2body3elseelseifcond3body4body3endelsebody4endendend

Cometothinkofit,inClanguageandsuch,there’snodistinctionbetweenthetwoalsoatthesyntaxlevel.Thusthismightbeamatterofcourse.Alternatively,theconditionaloperator(a?b:c)becomesindistinguishablefromifstatementaftertheybecomesyntaxtrees.

Theprecedenceswasverymeaningfulwhenitwasinthecontextofgrammar,buttheybecomeunnecessaryanymorebecausethestructureofasyntaxtreecontainsthatinformation.And,thedifferenceinappearancesuchasifandtheconditionaloperatorbecomecompletelymeaningless,itsmeaning(itsbehavior)onlymatters.Therefore,there’sperfectlynoproblemififandthe

conditionaloperatorarethesameinitssyntaxtreeexpression.

I’llintroduceafewmoreexamples.addand&&becomethesame.orand||arealsoequaltoeachother.notand!,ifandmodifierif,andsoon.Thesepairsalsobecomeequaltoeachother.

LeftRecursiveandRightRecursiveBytheway,thesymbolofalistwasalwayswrittenattheleftsidewhenexpressingalistinChapter9:yacccrashcourse.However,haveyounoticeditbecomesoppositeinif_tail?I’llshowonlythecrucialpartagain.

if_tail:opt_else|kELSIF...if_tail

Surely,itisoppositeofthepreviousexamples.if_tailwhichisthesymbolofalistisattherightside.

Infact,there’sanotherestablishedwayofexpressinglists,

list:END_ITEM|ITEMlist

whenyouwriteinthisway,itbecomesthelistthatcontainscontinuouszeroormorenumberofITEMandendswithEND_ITEM.

Asanexpressionofalist,whicheverisuseditdoesnotcreateasomuchdifference,butthewaythattheactionsareexecutedisfatallydifferent.Withtheformthatlistiswrittenattheright,theactions

aresequentiallyexecutedfromthelastITEM.We’vealreadylearnedaboutthebehaviorofthestackofwhenlistisattheleft,solet’strythecasethatlistisattheright.Theinputis4ITEMsandEND_ITEM.

emptyatfirstITEM shiftITEMITEMITEM shiftITEMITEMITEMITEM shiftITEMITEMITEMITEMITEM shiftITEMITEMITEMITEMITEMEND_ITEM shiftEND_ITEMITEMITEMITEMITEMlist reduceEND_ITEMtolistITEMITEMITEMlist reduceITEMlisttolistITEMITEMlist reduceITEMlisttolistITEMlist reduceITEMlisttolistlist reduceITEMlisttolist

accept.

Whenlistwasattheleft,shiftsandreductionsweredoneinturns.Thistime,asyousee,therearecontinuousshiftsandcontinuousreductions.

Thereasonwhyif_tailplaces“listattheright”istocreateasyntaxtreefromthebottomup.Whencreatingfromthebottomup,thenodeofifwillbeleftinhandintheend.Butifdefiningif_tailbyplacing“listattheleft”,inordertoeventuallyleavethenodeofifinhand,itneedstotraversealllinksoftheelsifandeverytimeelsifisfoundaddittotheend.Thisiscumbersome.

And,slow.Thus,if_tailisconstructedinthe“listattheright”manner.

Finally,themeaningoftheheadlineis,ingrammarterms,“theleftislist”iscalledleft-recursive,“therightislist”iscalledright-recursive.Thesetermsareusedmainlywhenreadingpapersaboutprocessinggrammarsorwritingabookofyacc.

TrunkLeaf,branch,andfinally,it’strunk.Let’slookathowthelistofstatementsarejoined.

▼TheSourceProgram

789

Thedumpofthecorrespondingsyntaxtreeisshownbelow.Thisisnotnodedump-shortbutintheperfectform.

▼ItsSyntaxTree

NODE_BLOCKnd_head:NODE_NEWLINEnd_file="multistmt"nd_nth=1nd_next:NODE_LITnd_lit=7:Fixnumnd_next:

NODE_BLOCKnd_head:NODE_NEWLINEnd_file="multistmt"nd_nth=2nd_next:NODE_LITnd_lit=8:Fixnumnd_next:NODE_BLOCKnd_head:NODE_NEWLINEnd_file="multistmt"nd_nth=3nd_next:NODE_LITnd_lit=9:Fixnumnd_next=(null)

WecanseethelistofNODE_BLOCKiscreatedandNODE_NEWLINEareattachedasheaders.(Fig.5)

Fig.5:NODE_BLOCKandNODE_NEWLINE

Itmeans,foreachstatement(stmt)NODE_NEWLINEisattached,andwhentheyaremultiple,itwillbealistofNODE_BLOCK.Let’salsoseethecode.

▼stmts

354stmts:none355|stmt356{357$$=newline_node($1);358}359|stmtstermsstmt360{361$$=block_append($1,newline_node($3));362}

(parse.y)

newline_node()capsNODE_NEWLINE,block_append()appendsittothelist.It’sstraightforward.Let’slookatthecontentonlyoftheblock_append().

block_append()

Itthisfunction,theerrorchecksareintheverymiddleandobstructive.ThusI’llshowthecodewithoutthatpart.

▼block_append()(omitted)

4285staticNODE*4286block_append(head,tail)4287NODE*head,*tail;4288{4289NODE*end;42904291if(tail==0)returnhead;4292if(head==0)returntail;42934294if(nd_type(head)!=NODE_BLOCK){4295end=NEW_BLOCK(head);4296end->nd_end=end;/*(A-1)*/4297fixpos(end,head);4298head=end;4299}4300else{4301end=head->nd_end;/*(A-2)*/4302}

/*……omitted……*/

4325if(nd_type(tail)!=NODE_BLOCK){4326tail=NEW_BLOCK(tail);4327tail->nd_end=tail;4328}4329end->nd_next=tail;

4330head->nd_end=tail->nd_end;/*(A-3)*/4331returnhead;4332}

(parse.y)

Accordingtotheprevioussyntaxtreedump,NEW_BLOCKwasalinkedlistusesnd_next.Beingawareofitwhilereading,itcanberead“ifeitherheadortailisnotNODE_BLOCK,wrapitwithNODE_BLOCKandjointhelistseachother.”

Additionally,on(A-1~3),thend_endoftheNODE_BLOCKoftheheadofthelistalwayspointstotheNODE_BLOCKofthetailofthelist.Thisisprobablybecauseinthiswaywedon’thavetotraverseallelementswhenaddinganelementtothetail(Fig.6).Converselyspeaking,whenyouneedtoaddelementslater,NODE_BLOCKissuitable.

Fig.6:Appendingiseasy.

Thetwotypesoflists

Now,I’veexplainedtheoutlinesofar.BecausethestructureofsyntaxtreewillalsoappearinPart3inlargeamounts,wewon’tgofurtheraslongasweareinPart2.Butbeforeending,there’sonemorethingI’dliketotalkabout.Itisaboutthetwogeneral-purposelists.

Thetwogeneral-purposelistsmeanBLOCKandLIST.BLOCKis,aspreviouslydescribed,alinkedlistofNODE_BLOCKtojointhestatements.LISTis,althoughitiscalledLIST,alistofNODE_ARRAY.Thisiswhatisusedforarrayliterals.LISTisusedtostoretheargumentsofamethodorthelistofmultipleassignments.

Asforthedifferencebetweenthetwolists,lookingattheusageofthenodesishelpfultounderstand.

NODE_BLOCK nd_head holdinganelementnd_end pointingtotheNODE_BLOCKoftheendofthelistnd_next pointingtothenextNODE_BLOCK

NODE_ARRAY nd_head holdinganelementnd_alen thelengthofthelistthatfollowsthisnodend_next pointingtothenextNODE_ARRAY

Theusagediffersonlyinthesecondelementsthatarend_endandnd_alen.Andthisisexactlythesignificanceoftheexistenceofeachtypeofthetwonodes.SinceitssizecanbestoredinNODE_ARRAY,weuseanARRAYlistwhenthesizeofthelistwillfrequentlyberequired.Otherwise,weuseaBLOCKlistthatisveryfasttojoin.Idon’tdescribethistopicindetailsbecausethecodesthatusethem

isnecessarytounderstandthesignificancebutnotshownhere,butwhenthecodesappearinPart3,I’dlikeyoutorecallthisandthink“Oh,thisusesthelength”.

SemanticAnalysis

AsIbrieflymentionedatthebeginningofPart2,therearetwotypesofanalysisthatareappearanceanalysisandsemanticanalysis.Theappearanceanalysisismostlydonebyyacc,therestisdoingthesemanticanalysisinsideactions.

ErrorsinsideactionsWhatdoesthesemanticanalysispreciselymean?Forexample,therearetypechecksinalanguagethathastypes.Alternatively,checkifvariableswiththesamenamearenotdefinedmultipletimes,andcheckifvariablesarenotusedbeforetheirdefinitions,andcheckiftheprocedurebeingusedisdefined,andcheckifreturnisnotusedoutsideofprocedures,andsoon.Thesearepartofthesemanticanalysis.

Whatkindofsemanticanalysisisdoneinthecurrentruby?Sincetheerrorchecksoccupiesalmostallofsemanticanalysisinruby,searchingtheplaceswheregeneratingerrorsseemsagoodway.Inaparserofyacc,yyerror()issupposedtobecalledwhenanerror

occurs.Converselyspeaking,there’sanerrorwhereyyerror()exists.So,Imadealistoftheplaceswherecallingyyerror()insidetheactions.

anexpressionnothavingitsvalue(voidvalueexpression)ataplacewhereavalueisrequiredanaliasof$nBEGINinsideofamethodENDinsideofamethodreturnoutsideofmethodsalocalvariableataplacewhereconstantisrequiredaclassstatementinsideofamethodaninvalidparametervariable($gvarandCONSTandsuch)parameterswiththesamenameappeartwiceaninvalidreceiverofasingletonmethod(def().methodandsuch)asingletonmethoddefinitiononliteralsanoddnumberofalistforhashliteralsanassignmenttoself/nil/true/false/__FILE__/__LINE__aconstantassignmentinsideofamethodamultipleassignmentinsideofaconditionalexpression

Thesecheckscanroughlybecategorizedbyeachpurposeasfollows:

forthebettererrormessageinordernottomaketheruletoocomplextheothers(puresemanticanalysis)

Forexample,“returnoutsideofamethod”isacheckinordernottomaketheruletoocomplex.Sincethiserrorisaproblemofthestructure,itcanbedealtwithbygrammar.Forexample,it’spossiblebydefiningtherulesseparatelyforbothinsideandoutsideofmethodsandmakingthelistofallwhatareallowedandwhatarenotallowedrespectively.Butthisisinanywaycumbersomeandrejectingitinanactionisfarmoreconcise.

And,“anassignmenttoself”seemsacheckforthebettererrormessage.Incomparisonto“returnoutsideofmethods”,rejectingitbygrammarismucheasier,butifitisrejectedbytheparser,theoutputwouldbejust"parseerror".Comparingtoit,thecurrent

%ruby-e'self=1'-e:1:Can'tchangethevalueofselfself=1^

thiserrorismuchmorefriendly.

Ofcourse,wecannotalwayssaythatanarbitraryruleisexactly“forthispurpose”.Forexample,asfor“returnoutsideofmethods”,thiscanalsobeconsideredthatthisisacheck“forthebettererrormessage”.Thepurposesareoverlappingeachother.

Now,theproblemis“apuresemanticanalysis”,inRubytherearefewthingsbelongtothiscategory.Inthecaseofatypedlanguage,thetypeanalysisisabigevent,butbecausevariablesarenottypedinRuby,itismeaningless.Whatisstandingoutinsteadisthe

cheekofanexpressionthathasitsvalue.

Toput“havingitsvalue”precisely,itis“youcanobtainavalueasaresultofevaluatingit”.returnandbreakdonothavevaluesbythemselves.Ofcourse,avalueispassedtotheplacewherereturnto,butnotanyvaluesareleftattheplacewherereturniswritten.Therefore,forexample,thenextexpressionisodd,

i=return(1)

Sincethiskindofexpressionsareclearlyduetomisunderstandingorsimplemistakes,it’sbettertorejectwhencompiling.Next,we’lllookatvalue_exprwhichisoneofthefunctionstocheckifittakesavalue.

value_expr()

value_expr()isthefunctiontocheckifitisanexprthathasavalue.

▼value_expr()

4754staticint4755value_expr(node)4756NODE*node;4757{4758while(node){4759switch(nd_type(node)){4760caseNODE_CLASS:4761caseNODE_MODULE:4762caseNODE_DEFN:4763caseNODE_DEFS:4764rb_warning("voidvalueexpression");4765returnQfalse;

47664767caseNODE_RETURN:4768caseNODE_BREAK:4769caseNODE_NEXT:4770caseNODE_REDO:4771caseNODE_RETRY:4772yyerror("voidvalueexpression");4773/*or"controlneverreach"?*/4774returnQfalse;47754776caseNODE_BLOCK:4777while(node->nd_next){4778node=node->nd_next;4779}4780node=node->nd_head;4781break;47824783caseNODE_BEGIN:4784node=node->nd_body;4785break;47864787caseNODE_IF:4788if(!value_expr(node->nd_body))returnQfalse;4789node=node->nd_else;4790break;47914792caseNODE_AND:4793caseNODE_OR:4794node=node->nd_2nd;4795break;47964797caseNODE_NEWLINE:4798node=node->nd_next;4799break;48004801default:4802returnQtrue;4803}4804}48054806returnQtrue;4807}

(parse.y)

AlgorithmSummary:Itsequentiallychecksthenodesofthetree,ifithits“anexpressioncertainlynothavingitsvalue”,itmeansthetreedoesnothaveanyvalue.Thenitwarnsaboutthatbyusingrb_warning()andreturnQfalse.Ifitfinishestotraversetheentiretreewithouthittingany“anexpressionnothavingitsvalue”,itmeansthetreedoeshaveavalue.ThusitreturnsQtrue.

Here,noticethatitdoesnotalwaysneedtocheckthewholetree.Forexample,let’sassumevalue_expr()iscalledontheargumentofamethod.Here:

▼checkthevalueofargbyusingvalue_expr()

1055arg_value:arg1056{1057value_expr($1);1058$$=$1;1059}

(parse.y)

Insideofthisargument$1,therecanalsobeothernestingmethodcallsagain.But,theargumentoftheinsidemethodmusthavebeenalreadycheckedwithvalue_expr(),soyoudon’thavetocheckitagain.

Let’sthinkmoregenerally.Assumeanarbitrarygrammarelement

Aexists,andassumevalue_expr()iscalledagainstitsallcomposingelements,thenecessitytochecktheelementAagainwoulddisappear.

Then,forexample,howisif?Isitpossibletobehandledasifvalue_expr()hasalreadycalledforallelements?IfIputonlythebottomline,itisn’t.Thatisbecause,sinceifisastatement(whichdoesnotuseavalue),themainbodyshouldnothavetoreturnavalue.Forexample,inthenextcase:

defmethodiftruereturn1elsereturn2end5end

Thisifstatementdoesnotneedavalue.Butinthenextcase,itsvalueisnecessary.

defmethod(arg)tmp=ifargthen3else98endtmp*tmp/3.5end

So,inthiscase,theifstatementmustbecheckedwhencheckingtheentireassignmentexpression.Thiskindofthingsarelaidoutintheswitchstatementofvalue_expr().

RemovingTailRecursionBytheway,whenlookingoverthewholevalue_expr,wecanseethatthere’sthefollowingpatternappearsfrequently:

while(node){switch(nd_type(node)){caseNODE_XXXX:node=node->nd_xxxx;break;::}}

Thisexpressionwillalsocarrythesamemeaningafterbeingmodifiedtothebelow:

returnvalue_expr(node->nd_xxxx)

Acodelikethiswhichdoesarecursivecalljustbeforereturniscalledatailrecursion.Itisknownthatthiscangenerallybeconvertedtogoto.Thismethodisoftenusedwhenoptimizing.AsforScheme,itisdefinedinspecificationsthattailrecursionsmustberemovedbylanguageprocessors.ThisisbecauserecursionsareoftenusedinsteadofloopsinLisp-likelanguages.

However,becarefulthattailrecursionsareonlywhen“callingjustbeforereturn”.Forexample,takealookattheNODE_IFofvalue_expr(),

if(!value_expr(node->nd_body))returnQfalse;node=node->nd_else;break;

Asshownabove,thefirsttimeisarecursivecall.Rewritingthistotheformofusingreturn,

returnvalue_expr(node->nd_body)&&value_expr(node->nd_else);

Iftheleftvalue_expr()isfalse,therightvalue_expr()isalsoexecuted.Inthiscase,theleftvalue_expr()isnot“justbefore”return.Therefore,itisnotatailrecursion.Hence,itcan’tbeextractedtogoto.

ThewholepictureofthevaluecheckAsforvaluechecks,wewon’treadthefunctionsfurther.Youmightthinkit’stooearly,butalloftheotherfunctionsare,asthesameasvalue_expr(),step-by-stepone-by-oneonlytraversingandcheckingnodes,sotheyarecompletelynotinteresting.However,I’dliketocoverthewholepictureatleast,soIfinishthissectionbyjustshowingthecallgraphoftherelevantfunctions(Fig.7).

Fig.7:thecallgraphofthevaluecheckfunctions

LocalVariables

LocalVariableDefinitionsThevariabledefinitionsinRubyarereallyvarious.Asforconstantsandclassvariables,thesearedefinedonthefirstassignment.Asforinstancevariablesandglobalvariables,asallnamescanbeconsideredthattheyarealreadydefined,youcanreferthemwithoutassigningbeforehand(althoughitproduceswarnings).

Thedefinitionsoflocalvariablesareagaincompletelydifferentfromtheaboveall.Alocalvariableisdefinedwhenitsassignmentappearsontheprogram.Forexample,asfollows:

lvar=nilplvar#beingdefined

Inthiscase,astheassignmenttolvariswrittenatthefirstline,inthismomentlvarisdefined.Whenitisundefined,itendsupwitharuntimeexceptionNameErrorasfollows:

%rubylvar.rblvar.rb:1:undefinedlocalvariableormethod`lvar'for#<Object:0x40163a9c>(NameError)

Whydoesitsay"localvariableormethod"?Asformethods,theparenthesesoftheargumentscanbeomittedwhencalling,sowhenthere’snotanyarguments,itcan’tbedistinguishedfromlocalvariables.Toresolvethissituation,rubytriestocallitasamethodwhenitfindsanundefinedlocalvariable.Thenifthecorrespondingmethodisnotfound,itgeneratesanerrorsuchastheaboveone.

Bytheway,itisdefinedwhen“itappears”,thismeansitisdefinedeventhoughitwasnotassigned.Theinitialvalueofadefinedvariableisnil.

iffalselvar="thisassigmentwillneverbeexecuted"endplvar#showsnil

Moreover,sinceitisdefined“when”it“appears”,thedefinitionhastobebeforethereferenceinasymbolsequence.Forexample,inthenextcase,itisnotdefined.

plvar#notdefined!lvar=nil#althoughappearinghere...

Becarefulaboutthepointof“inthesymbolsequence”.Ithascompletelynothingtodowiththeorderofevaluations.Forexample,forthenextcode,naturallytheconditionexpressionisevaluatedfirst,butinthesymbolsequence,atthemomentwhenpappearstheassignmenttolvarhasnotappearedyet.Therefore,thisproducesNameError.

p(lvar)iflvar=true

Whatwe’velearnedbynowisthatthelocalvariablesareextremelyinfluencedbytheappearances.Whenasymbolsequencethatexpressesanassignmentappears,itwillbedefinedintheappearanceorder.Basedonthisinformation,wecaninferthatrubyseemstodefinelocalvariableswhileparsingbecausetheorderofthesymbolsequencedoesnotexistafterleavingtheparser.Andinfact,itistrue.Inruby,theparserdefineslocalvariables.

BlockLocalVariablesThelocalvariablesnewlydefinedinaniteratorblockarecalled

blocklocalvariablesordynamicvariables.Blocklocalvariablesare,inlanguagespecifications,identicaltolocalvariables.However,thesetwodifferintheirimplementations.We’lllookathowisthedifferencefromnowon.

ThedatastructureWe’llstartwiththelocalvariabletablestructlocal_vars.

▼structlocal_vars

5174staticstructlocal_vars{5175ID*tbl;/*thetableoflocalvariablenames*/5176intnofree;/*whetheritisusedfromoutside*/5177intcnt;/*thesizeofthetblarray*/5178intdlev;/*thenestinglevelofdyna_vars*/5179structRVarmap*dyna_vars;/*blocklocalvariablenames*/5180structlocal_vars*prev;5181}*lvtbl;

(parse.y)

Themembernameprevindicatesthatthestructlocal_varsisaopposite-directionlinkedlist.…Basedonthis,wecanexpectastack.Thesimultaneouslydeclaredglobalvariablelvtblpointstolocal_varsthatisthetopofthatstack.

And,structRVarmapisdefinedinenv.h,andisavailabletootherfilesandisalsousedbytheevaluator.Thisisusedtostoretheblocklocalvariables.

▼structRVarmap

52structRVarmap{53structRBasicsuper;54IDid;/*thevariablename*/55VALUEval;/*itsvalue*/56structRVarmap*next;57};

(env.h)

Sincethere’sstructRBasicatthetop,thisisaRubyobject.Itmeansitismanagedbythegarbagecollector.Andsinceitisjoinedbythenextmember,itisprobablyalinkedlist.

Basedontheobservationwe’vedoneandtheinformationthatwillbeexplained,Fig.8illustratestheimageofbothstructswhileexecutingtheparser.

Fig.8:Theimageoflocalvariabletablesatruntime

LocalVariableScopeWhenlookingoverthelistoffunctionnamesofparse.y,wecanfindfunctionssuchaslocal_push()local_pop()local_cnt()arelaidout.Inwhateverwayofthinking,theyappeartoberelatingtoalocalvariable.Moreover,becausethenamesarepushpop,itisclearlyastack.Sofirst,let’sfindouttheplaceswhereusingthesefunctions.

▼local_push()local_pop()usedexamples

1475|kDEFfname1476{1477$<id>$=cur_mid;1478cur_mid=$2;1479in_def++;1480local_push(0);1481}1482f_arglist1483bodystmt1484kEND1485{1486/*NOEX_PRIVATEfortoplevel*/1487$$=NEW_DEFN($2,$4,$5,class_nest?NOEX_PUBLIC:NOEX_PRIVATE);1488if(is_attrset_id($2))$$->nd_noex=NOEX_PUBLIC;1489fixpos($$,$4);1490local_pop();1491in_def--;1492cur_mid=$<id>3;1493}

(parse.y)

Atdef,Icouldfindtheplacewhereitisused.Itcanalsobefoundinclassdefinitionsandsingletonclassdefinitions,andmoduledefinitions.Inotherwords,itistheplacewherethescopeoflocalvariablesiscut.Moreover,asforhowtheyareused,itdoespushwherethemethoddefinitionstartsanddoespopwhenthedefinitionends.Thismeans,asweexpected,itisalmostcertainthatthefunctionsstartwithlocal_arerelatingtolocalvariables.Anditisalsorevealedthatthepartbetweenpushandpopisprobablyalocalvariablescope.

Moreover,Ialsosearchedlocal_cnt().

▼NEW_LASGN()

269#defineNEW_LASGN(v,val)rb_node_newnode(NODE_LASGN,v,val,local_cnt(v))

(node.h)

Thisisfoundinnode.h.Eventhoughtherearealsotheplaceswhereusinginparse.y,Ifounditintheotherfile.Thus,probablyI’mindesperation.

ThisNEW_LASGNis“newlocalassignment”.Thisshouldmeanthenodeofanassignmenttoalocalvariable.Andalsoconsideringtheplacewhereusingit,theparametervisapparentlythelocalvariablename.valisprobably(asyntaxtreethatrepresents).theright-handsidevalue

Basedontheaboveobservations,local_push()isatthebeginningofthelocalvariable,local_cnt()isusedtoaddalocalvariableifthere’salocalvariableassignmentinthehalfway,local_pop()isusedwhenendingthescope.Thisperfectscenariocomesout.(Fig.9)

Fig.9:theflowofthelocalvariablemanagement

Then,let’slookatthecontentofthefunction.

pushandpop▼local_push()

5183staticvoid5184local_push(top)5185inttop;5186{5187structlocal_vars*local;51885189local=ALLOC(structlocal_vars);5190local->prev=lvtbl;5191local->nofree=0;5192local->cnt=0;5193local->tbl=0;5194local->dlev=0;5195local->dyna_vars=ruby_dyna_vars;5196lvtbl=local;5197if(!top){5198/*preservethevariabletableofthepreviousscopeintoval*/5199rb_dvar_push(0,(VALUE)ruby_dyna_vars);5200ruby_dyna_vars->next=0;5201}5202}

(parse.y)

Asweexpected,itseemsthatstructlocal_varsisusedasastack.Also,wecanseelvtblispointingtothetopofthestack.Thelinesrelatestorb_dvar_push()willbereadlater,soitisleftuntouchedfornow.

Subsequently,we’lllookatlocal_pop()andlocal_tbl()atthesame

time.

▼local_tbllocal_pop

5218staticID*5219local_tbl()5220{5221lvtbl->nofree=1;5222returnlvtbl->tbl;5223}

5204staticvoid5205local_pop()5206{5207structlocal_vars*local=lvtbl->prev;52085209if(lvtbl->tbl){5210if(!lvtbl->nofree)free(lvtbl->tbl);5211elselvtbl->tbl[0]=lvtbl->cnt;5212}5213ruby_dyna_vars=lvtbl->dyna_vars;5214free(lvtbl);5215lvtbl=local;5216}

(parse.y)

I’dlikeyoutolookatlocal_tbl().Thisisthefunctiontoobtainthecurrentlocalvariabletable(lvtbl->tbl).Bycallingthis,thenofreeofthecurrenttablebecomestrue.Themeaningofnofreeseemsnaturally“Don’tfree()”.Inotherwords,thisislikereferencecounting,“thistablewillbeused,sopleasedon’tfree()”.Converselyspeaking,whenlocal_tbl()wasnotcalledwithatableevenonce,thattablewillbefreedatthemomentwhenbeingpoppedandbediscarded.Forexample,thissituationprobably

happenswhenamethodwithoutanylocalvariables.

However,the“necessarytable”heremeanslvtbl->tbl.Asyoucansee,lvtblitselfwillbefreedatthesamemomentwhenbeingpopped.Itmeansonlythegeneratedlvtbl->tblisusedintheevaluator.Then,thestructureoflvtbl->tblisbecomingimportant.Let’slookatthefunctionlocal_cnt()(whichseems)toaddvariableswhichisprobablyhelpfultounderstandhowthestructureis.

Andbeforethat,I’dlikeyoutorememberthatlvtbl->cntisstoredattheindex0ofthelvtbl->tbl.

AddingvariablesThefunction(whichseems)toaddalocalvariableislocal_cnt().

▼local_cnt()

5246staticint5247local_cnt(id)5248IDid;5249{5250intcnt,max;52515252if(id==0)returnlvtbl->cnt;52535254for(cnt=1,max=lvtbl->cnt+1;cnt<max;cnt++){5255if(lvtbl->tbl[cnt]==id)returncnt-1;5256}5257returnlocal_append(id);5258}

(parse.y)

Thisscanslvtbl->tblandsearcheswhatisequalstoid.Ifthesearchedoneisfound,itstraightforwardlyreturnscnt-1.Ifnothingisfound,itdoeslocal_append().local_append()mustbe,asitiscalledappend,theproceduretoappend.Inotherwords,local_cnt()checksifthevariablewasalreadyregistered,ifitwasnot,addsitbyusinglocal_append()andreturnsit.

Whatisthemeaningofthereturnvalueofthisfunction?lvtbl->tblseemsanarrayofthevariables,sothere’reone-to-onecorrespondencesbetweenthevariablenamesand“theirindex–1(cnt-1)”.(Fig.10)

Fig.10:Thecorrespondencesbetweenthevariablenamesandthereturnvalues

Moreover,thisreturnvalueiscalculatedsothatthestartpointbecomes0,thelocalvariablespaceisprobablyanarray.And,thisreturnstheindextoaccessthatarray.Ifitisnot,liketheinstancevariablesorconstants,(theIDof)thevariablenamecouldhavebeenusedasakeyinthefirstplace.

Youmightwanttoknowwhyitisavoidingindex0(theloopstartfromcnt=1)forsomereasons,itisprobablytostoreavalueat

local_pop().

Basedontheknowledgewe’velearned,wecanunderstandtheroleoflocal_append()withoutactuallylookingatthecontent.Itregistersalocalvariableandreturns“(theindexofthevariableinlvtbl->tbl)–1”.Itisshownbelow,let’smakesure.

▼local_append()

5225staticint5226local_append(id)5227IDid;5228{5229if(lvtbl->tbl==0){5230lvtbl->tbl=ALLOC_N(ID,4);5231lvtbl->tbl[0]=0;5232lvtbl->tbl[1]='_';5233lvtbl->tbl[2]='~';5234lvtbl->cnt=2;5235if(id=='_')return0;5236if(id=='~')return1;5237}5238else{5239REALLOC_N(lvtbl->tbl,ID,lvtbl->cnt+2);5240}52415242lvtbl->tbl[lvtbl->cnt+1]=id;5243returnlvtbl->cnt++;5244}

(parse.y)

Itseemsdefinitelytrue.lvtbl->tblisanarrayofthelocalvariablenames,anditsindex–1isthereturnvalue(localvariableID).

Notethatitincreaseslvtbl->cnt.Sincethecodetoincreaselvtbl-

>cntonlyexistshere,fromonlythiscodeitsmeaningcanbedecided.Then,whatisthemeaning?Itis,since“lvtbl->cntincreasesby1whenanewvariableisadded”,“lvtbl->cntholdsthenumberoflocalvariablesinthisscope”.

Finally,I’llexplainabouttbl[1]andtbl[2].These'_'and'~'are,asyoucanguessifyouarefamiliarwithRuby,thespecialvariablesnamed$_and$~.Thoughtheirappearancesareidenticaltoglobalvariables,theyareactuallylocalvariables.EvenIfyoudidn’texplicitlyuseit,whenthemethodssuchasKernel#getsarecalled,thesevariablesareimplicitlyassigned,thusit’snecessarythatthespacesarealwaysallocated.

SummaryoflocalvariablesSincethedescriptionoflocalvariableswerecomplexinvariousways,let’ssummarizeit.

First,Itseemsthelocalvariablesaredifferentfromtheothervariablesbecausetheyarenotmanagedwithst_table.Then,wherearetheystoredin?Itseemstheanswerisanarray.Moreover,itisstoredinadifferentarrayforeachscope.

Thearrayislvtbl->tbl,andtheindex0holdsthelvtbl->cntwhichissetatlocal_pop().Inotherwords,itholdsthenumberofthelocalvariables.Theindex1ormoreholdthelocalvariablenamesdefinedinthescope.Fig.11showsthefinalappearanceweexpect.

Fig.11:correspondencesbetweenlocalvariablenamesandthereturnvalues

BlockLocalVariablesTherestisdyna_varswhichisamemberofstructlocal_vars.Inotherwords,thisisabouttheblocklocalvariables.Ithoughtthattheremustbethefunctionstodosomethingwiththis,lookedoverthelistofthefunctionnames,andfoundthemasexpected.Therearethesuspiciousfunctionsnameddyna_push()dyna_pop()dyna_in_block().Moreover,hereistheplacewheretheseareused.

▼anexampleusingdyna_pushdyna_pop

1651brace_block:'{'1652{1653$<vars>$=dyna_push();1654}1655opt_block_var1656compstmt'}'1657{1658$$=NEW_ITER($3,0,$4);1659fixpos($$,$4);1660dyna_pop($<vars>2);1661}

(parse.y)

pushatthebeginningofaniteratorblock,popattheend.Thismust

betheprocessofblocklocalvariables.

Now,wearegoingtolookatthefunctions.

▼dyna_push()

5331staticstructRVarmap*5332dyna_push()5333{5334structRVarmap*vars=ruby_dyna_vars;53355336rb_dvar_push(0,0);5337lvtbl->dlev++;5338returnvars;5339}

(parse.y)

Increasinglvtbl->dlevseemsthemarkindicatestheexistenceoftheblocklocalvariablescope.Meanwhile,rb_dvar_push()is…

▼rb_dvar_push()

691void692rb_dvar_push(id,value)693IDid;694VALUEvalue;695{696ruby_dyna_vars=new_dvar(id,value,ruby_dyna_vars);697}

(eval.c)

ItcreatesastructRVarmapthathasthevariablenameidandthevaluevalasitsmembers,addsittothetopoftheglobalvariable

ruby_dyna_vars.Thisisagainandagaintheformofcons.Indyna_push(),ruby_dyan_varsisnotsetaside,itseemsitaddsdirectlytotheruby_dyna_varsofthepreviousscope.

Moreover,thevalueoftheidmemberoftheRVarmaptobeaddedhereis0.Althoughitwasnotseriouslydiscussedinthisbook,theIDofrubywillneverbe0whileitisnormallycreatedbyrb_intern().Thus,wecaninferthatthisRVarmap,asitislikeNULorNULL,probablyhasaroleassentinel.Ifwethinkbasedonthisassumption,wecandescribethereasonwhytheholderofavariable(RVarmap)isaddedeventhoughnotanyvariablesareadded.

Next,dyna_pop().

▼dyna_pop()

5341staticvoid5342dyna_pop(vars)5343structRVarmap*vars;5344{5345lvtbl->dlev--;5346ruby_dyna_vars=vars;5347}

(parse.y)

Byreducinglvtbl->dlev,itwritesdownthefactthattheblocklocalvariablescopeended.Itseemsthatsomethingisdonebyusingtheargument,let’sseethislateratonce.

Theplacetoaddablocklocalvariablehasnotappearedyet.

Somethinglikelocal_cnt()oflocalvariablesismissing.So,Ididplentyofgrepwithdvaranddyna,andthiscodewasfound.

▼assignable()(partial)

4599staticNODE*4600assignable(id,val)4601IDid;4602NODE*val;4603{:4634rb_dvar_push(id,Qnil);4635returnNEW_DASGN_CURR(id,val);

(parse.y)

assignable()isthefunctiontocreateanoderelatestoassignments,thiscitationisthefragmentofthatfunctiononlycontainstheparttodealwithblocklocalvariables.Itseemsthatitaddsanewvariable(toruby_dyna_vars)byusingrb_dvar_push()thatwe’vejustseen.

ruby_dyna_varsintheparserNow,takingtheaboveallintoconsiderations,let’simaginetheappearanceofruby_dyna_varsatthemomentwhenalocalvariablescopeisfinishedtobeparsed.

First,asIsaidpreviously,theRVarmapofid=0whichisaddedatthebeginningofablockscopeisasentinelwhichrepresentsabreakbetweentwoblockscopes.We’llcallthis“theheaderof

ruby_dyna_vars”.

Next,amongthepreviouslyshownactionsoftheruleoftheiteratorblock,I’dlikeyoutofocusonthispart:

$<vars>$=dyna_push();/*whatassignedinto$<vars>$is...*/::dyna_pop($<vars>2);/*……appearsat$<vars>2*/

dyna_push()returnstheruby_dyna_varsatthemoment.dyna_pop()puttheargumentintoruby_dyna_vars.Thismeansruby_dyna_varswouldbesavedandrestoredforeachtheblocklocalvariablescope.Therefore,whenparsingthefollowingprogram,

iter{a=niliter{b=niliter{c=nil#nestinglevel3}bb=nil#nestinglevel2iter{e=nil}}#nestinglevel1}

Fig.12showstheruby_dyna_varsinthissituation.

Fig.12:ruby_dyna_varswhenallscopesarefinishedtobeparsed

Thisstructureisfairlysmart.That’sbecausethevariablesofthehigherlevelscannaturallybeaccessedbytraversingoverallofthelistevenifthenestinglevelisdeep.Thiswayhasthesimplersearchingprocessthancreatingadifferenttableforeachlevel.

Plus,inthefigure,itlookslikebbishungatastrangeplace,butthisiscorrect.Whenavariableisfoundatthenestlevelwhichisdecreasedafterincreasedonce,itisattachedtothesubsequentofthelistoftheoriginallevel.Moreover,inthisway,thespecificationoflocalvariablethat“onlythevariableswhichalreadyexistinthesymbolsequencearedefined”isexpressedinanaturalform.

Andfinally,ateachcutoflocalvariablescopes(thisisnotofblocklocalvariablescopes),thislinkisentirelysavedorrestoredtolvtbl->dyna_vars.I’dlikeyoutogobackalittleandcheck

local_push()andlocal_pop().

Bytheway,althoughcreatingtheruby_dyna_varslistwasahugetask,itisbyitselfnotusedattheevaluator.Thislistisusedonlytochecktheexistenceofthevariablesandwillbegarbagecollectedatthesamemomentwhenparsingisfinished.Andafterenteringtheevaluator,anotherchainiscreatedagain.There’saquitedeepreasonforthis,…we’llseearoundthisonceagaininPart3.

TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera

CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License

RubyHackingGuide

Chapter13:Structureofthe

evaluator

Outline

InterfaceWearenotfamiliarwiththeword“Hyo-ka-ki”(evaluator).Literally,itmustbea“-ki”(device)to“hyo-ka”(evaluating).Then,whatis“hyo-ka”?

“Hyo-ka”isthedefinitivetranslationof“evaluate”.However,ifthepremiseisdescribingaboutprograminglanguages,itcanbeconsideredasanerrorintranslation.It’shardtoavoidthattheword“hyo-ka”givestheimpressionof“whetheritisgoodorbad”.

“Evaluate”inthecontextofprograminglanguageshasnothingtodowith“goodorbad”,anditsmeaningismorecloseto“speculating”or“executing”.Theoriginof“evaluate”isaLatinword“ex+value+ate”.IfItranslateitdirectly,itis“turnitintoavalue”.Thismaybethesimplestwaytounderstand:todeterminethevaluefromanexpressionexpressedintext.

Veryfranklyspeaking,thebottomlineisthatevaluatingisexecutingawrittenexpressionandgettingtheresultofit.Thenwhyisitnotcalledjust“execute”?It’sbecauseevaluatingisnotonlyexecuting.

Forexample,inanordinaryprogramminglanguage,whenwewrite“3”,itwillbedealtwithasaninteger3.Thissituationissometimesdescribedas“theresultofevaluating”3"is3".It’shardtosayanexpressionofaconstantisexecuted,butitiscertainlyanevaluation.It’sallrightifthereexistaprogramminglanguageinwhichtheletter“3”,whenitisevaluated,willbedealtwith(evaluated)asaninteger6.

I’llintroduceanotherexample.Whenanexpressionconsistsofmultipleconstants,sometimestheconstantsarecalculatedduringthecompilingprocess(constantfolding).Weusuallydon’tcallit“executing”becauseexecutingindicatestheprocessthatthecreatedbinaryisworking.However,nomatterwhenitiscalculatedyou’llgetthesameresultfromthesameprogram.

Inotherwords,“evaluating”isusuallyequalsto“executing”,butessentially“evaluating”isdifferentfrom“executing”.Fornow,onlythispointiswhatI’dlikeyoutoremember.

Thecharacteristicsofruby'sevaluator.Thebiggestcharacteristicofruby‘sevaluatoristhat,asthisisalsoofthewholeruby’sinterpretor,thedifferenceinexpressions

betweentheC-levelcode(extensionlibraries)andtheRuby-levelcodeissmall.Inordinaryprogramminglanguages,theamountofthefeaturesofitsinterpretorwecanusefromextensionlibrariesisusuallyverylimited,butthereareawfullyfewlimitsinruby.Definingclasses,definingmethodsandcallingamethodwithoutlimitation,thesecanbetakenforgranted.Wecanalsouseexceptionhandling,iterators.Furthermore,threads.

Butwehavetocompensatefortheconveniencessomewhere.Somecodesareweirdlyhardtoimplement,somecodeshavealotoverhead,andtherearealotofplacesimplementingthealmostsamethingtwicebothforCandRuby.

Additionally,rubyisadynamiclanguage,itmeansthatyoucanconstructandevaluateastringatruntime.Thatisevalwhichisafunction-likemethod.Asyouexpected,itisnamedafter“evaluate”.Byusingit,youcanevendosomethinglikethis:

lvar=1answer=eval("lvar+lvar")#theansweris2

TherearealsoModule#module_evalandObject#instance_eval,eachmethodbehavesslightlydifferently.I’lldescribeaboutthemindetailinChapter17:Dynamicevaluation.

eval.c

Theevaluatorisimplementedineval.c.However,thiseval.cisareallyhugefile:ithas9000lines,itssizeis200Kbytes,andthe

numberofthefunctionsinitis309.Itishardtofightagainst.Whenthesizebecomesthisamount,it’simpossibletofigureoutitsstructurebyjustlookingoverit.

Sohowcanwedo?First,thebiggerthefile,thelesspossibilityofitscontentnotseparatedatall.Inotherwords,theinsideofitmustbemodularizedintosmallportions.Then,howcanwefindthemodules?I’lllistupsomeways.

Thefirstwayistoprintthelistofthedefinedfunctionsandlookattheprefixesofthem.rb_dvar_,rb_mod_,rb_thread—thereareplentyoffunctionswiththeseprefixes.Eachprefixclearlyindicateagroupofthesametypeoffunctions.

Alternatively,aswecantellwhenlookingatthecodeoftheclasslibraries,Init_xxxx()isalwaysputattheendofablockinruby.Therefore,Init_xxxx()alsoindicatesabreakbetweenmodules.

Additionally,thenamesareobviouslyimportant,too.Sinceeval()andrb_eval()andeval_node()appearclosetoeachother,wenaturallythinkthereshouldbeadeeprelationshipamongthem.

Finally,inthesourcecodeofruby,thedefinitionsoftypesorvariablesandthedeclarationsofprototypesoftenindicateabreakbetweenmodules.

Beingawareofthesepointswhenlooking,itseemsthateval.ccanbemainlydividedintothesemoduleslistedbelow:

SafeLevel alreadyexplainedinChapter7:SecurityMethodEntryManipulations

findingordeletingsyntaxtreeswhichareactualmethodbodies

EvaluatorCore theheartoftheevaluatorthatrb_eval()isatitscenter.

Exception generationsofexceptionsandcreationsofbacktraces

Method theimplementationofmethodcall

Iterator theimplementationoffunctionsthatarerelatedtoblocks

Load loadingandevaluatingexternalfilesProc theimplementationofProcThread theimplementationofRubythreads

Amongthem,“Load”and“Thread”arethepartsthatessentiallyshouldnotbeineval.c.Theyareineval.cmerelybecauseoftherestrictionsofClanguage.Toputitmoreprecisely,theyneedthemacrossuchasPUSH_TAGdefinedineval.c.So,IdecidedtoexcludethetwotopicsfromPart3anddealwiththematPart4.And,it’sprobablyallrightifIdon’texplainthesafelevelherebecauseI’vealreadydoneinPart1.

Excludingtheabovethree,thesixitemsarelefttobedescribed.Thebelowtableshowsthecorrespondingchapterofeachofthem:

MethodEntryManipulations thenextchapter:ContextEvaluatorCore theentirepartofPart3Exception thischapterMethod Chapter15:MethodsIterator Chapter16:Blocks

Proc Chapter16:Blocks

Frommainbywayofruby_runtorb_eval

CallGraphThetruecoreoftheevaluatorisafunctioncalledrb_eval().Inthischapter,wewillfollowthepathfrommain()tothatrb_eval().Firstofall,hereisaroughcallgrapharoundrb_eval:

main....main.cruby_init....eval.cruby_prog_init....ruby.cruby_options....eval.cruby_process_options....ruby.cruby_run....eval.ceval_noderb_eval*ruby_stop

Iputthefilenamesontherightsidewhenmovingtoanotherfile.Gazingthiscarefully,thefirstthingwe’llnoticeisthatthefunctionsofeval.ccallthefunctionsofruby.cback.

Iwroteitas“callingback”becausemain.candruby.carerelativelyfortheimplementationofrubycommand.eval.cistheimplementationoftheevaluatoritselfwhichkeepsalittledistancefromrubycommand.Inotherwords,eval.cissupposedtobeusedbyruby.candcallingthefunctionsofruby.cfromeval.cmakeseval.clessindependent.

Then,whyisthisinthisway?It’smainlybecauseoftherestrictionsofClanguage.Becausethefunctionssuchasruby_prog_init()andruby_process_options()starttousetheAPIoftherubyworld,it’spossibleanexceptionoccurs.However,inordertostopanexceptionofRuby,it’snecessarytousethemacronamedPUSH_TAG()whichcanonlybeusedineval.c.Inotherwords,essentially,ruby_init()andruby_run()shouldhavebeendefinedinruby.c.

Then,whyisn’tPUSH_TAGanexternfunctionorsomethingwhichisavailabletootherfiles?Actually,PUSH_TAGcanonlybeusedasapairwithPOP_TAGasfollows:

PUSH_TAG();/*dolotsofthings*/POP_TAG();

Becauseofitsimplementation,thetwomacrosshouldbeputintothesamefunction.It’spossibletoimplementinawaytobeabletodividethemintodifferentfunctions,butnotinsuchwaybecauseit’sslower.

Thenextthingwenoticeis,thefactthatitsequentiallycallsthefunctionsnamedruby_xxxxfrommain()seemsverymeaningful.Sincetheyarereallyobviouslysymmetric,it’soddifthere’snotanyrelationship.

Actually,thesethreefunctionshavedeeprelationships.Simplyspeaking,allofthesethreeare“built-inRubyinterfaces”.Thatis,

theyareusedonlywhencreatingacommandwithbuilt-inrubyinterpretorandnotwhenwritingextensionlibraries.Sincerubycommanditselfcanbeconsideredasoneofprogramswithbuilt-inRubyintheory,tousetheseinterfacesisnatural.

Whatistheruby_prefix?Sofar,theallofruby’sfunctionsareprefixedwithrb_.Whyaretherethetwotypes:rb_andruby_?Iinvestigatedbutcouldnotunderstandthedifference,soIaskeddirectly.Theanswerwas,“ruby_isfortheauxiliaryfunctionsofrubycommandandrb_isfortheofficialinterfaces”

“Then,whyarethevariableslikeruby_scopeareruby_?”,Iaskedfurther.Itseemsthisisjustacoincidence.Thevariableslikeruby_scopeareoriginallynamedasthe_xxxx,butinthemiddleoftheversion1.3there’sachangetoaddprefixestoallinterfaces.Atthattimeruby_wasaddedtothe“may-be-internals-for-some-reasons”variables.

Thebottomlineisthatruby_isattachedtothingsthatsupportrubycommandortheinternalvariablesandrb_isattachedtotheofficialinterfacesofrubyinterpretor.

main()

First,straightforwardly,I’llstartwithmain().Itisnicethatthisisveryshort.

▼main()

36int37main(argc,argv,envp)38intargc;39char**argv,**envp;40{41#ifdefined(NT)42NtInitialize(&argc,&argv);43#endif44#ifdefined(__MACOS__)&&defined(__MWERKS__)45argc=ccommand(&argv);46#endif4748ruby_init();49ruby_options(argc,argv);50ruby_run();51return0;52}

(main.c)

#ifdefNTisobviouslytheNTofWindowsNT.ButsomehowNTisalsodefinedinWin9x.So,itmeansWin32environment.NtInitialize()initializesargcargvandthesocketsystem(WinSock)forWin32.Becausethisfunctionisonlydoingtheinitialization,it’snotinterestingandnotrelatedtothemaintopic.Thus,Iomitthis.

And,__MACOS__isnot“Ma-Ko-Su”butMacOS.Inthiscase,itmeansMacOS9andbefore,anditdoesnotincludeMacOSX.Eventhoughsuch#ifdefremains,asIwroteatthebeginningofthisbook,thecurrentversioncannotrunonMacOS9andbefore.It’sjustalegacyfromwhenrubywasabletorunonit.Therefore,Ialsoomitthiscode.

Bytheway,asitisprobablyknownbythereaderswhoarefamiliarwithClanguage,theidentifiersstartingwithanunderbararereservedforthesystemlibrariesorOS.However,althoughtheyarecalled“reserved”,usingitisalmostneverresultinanerror,butifusingalittleweirdccitcouldresultinanerror.Forexample,itistheccofHP-US.HP-USisanUNIXwhichHPiscreating.Ifthere’sanyopinionsuchasHP-UXisnotweird,Iwoulddenyitoutloud.

Anyway,conventionally,wedon’tdefinesuchidentifiersinuserapplications.

Now,I’llstarttobrieflyexplainaboutthebuilt-inRubyinterfaces.

ruby_init()

ruby_init()initializestheRubyinterpretor.SinceonlyasingleinterpretorofthecurrentRubycanexistinaprocess,itdoesnotneedneitherargumentsorareturnvalue.Thispointisgenerallyconsideredas“lackoffeatures”.

Whenthere’sonlyasingleinterpretor,morethananything,thingsaroundthedevelopmentenvironmentshouldbeespeciallytroublesome.Namely,theapplicationssuchasirb,RubyWin,andRDE.Althoughloadingarewrittenprogram,theclasseswhicharesupposedtobedeletedwouldremain.TocounterthiswiththereflectionAPIisnotimpossiblebutrequiresalotofefforts.

However,itseemsthatMr.Matsumoto(Matz)purposefullylimitsthenumberofinterpretorstoone.“it’simpossibletoinitialize

completely”seemsitsreason.Forinstance,“theloadedextensionlibrariescouldnotberemoved”istakenasanexample.

Thecodeofruby_init()isomittedbecauseit’sunnecessarytoread.

ruby_options()

Whattoparsecommand-lineoptionsfortheRubyinterpreterisruby_options().Ofcourse,dependingonthecommand,wedonothavetousethis.

Insidethisfunction,-r(loadalibrary)and-e(passaprogramfromcommand-line)areprocessed.Thisisalsowherethefilepassedasacommand-lineargumentisparsedasaRubyprogram.

rubycommandreadsthemainprogramfromafileifitwasgiven,otherwisefromstdin.Afterthat,usingrb_compile_string()orrb_compile_file()introducedatPart2,itcompilesthetextintoasyntaxtree.Theresultwillbesetintotheglobalvariableruby_eval_tree.

Ialsoomitthecodeofruby_options()becauseit’sjustdoingnecessarythingsonebyoneandnotinteresting.

ruby_run()

Finally,ruby_run()startstoevaluatethesyntaxtreewhichwassettoruby_eval_tree.Wealsodon’talwaysneedtocallthisfunction.Otherthanruby_run(),forinstance,wecanevaluateastringby

usingafunctionnamedrb_eval_string().

▼ruby_run()

1257void1258ruby_run()1259{1260intstate;1261staticintex;1262volatileNODE*tmp;12631264if(ruby_nerrs>0)exit(ruby_nerrs);12651266Init_stack((void*)&tmp);1267PUSH_TAG(PROT_NONE);1268PUSH_ITER(ITER_NOT);1269if((state=EXEC_TAG())==0){1270eval_node(ruby_top_self,ruby_eval_tree);1271}1272POP_ITER();1273POP_TAG();12741275if(state&&!ex)ex=state;1276ruby_stop(ex);1277}

(eval.c)

WecanseethemacrosPUSH_xxxx(),butwecanignorethemfornow.I’llexplainaboutaroundthemlaterwhenthetimecomes.Theimportantthinghereisonlyeval_node().Itscontentis:

▼eval_node()

1112staticVALUE1113eval_node(self,node)1114VALUEself;

1115NODE*node;1116{1117NODE*beg_tree=ruby_eval_tree_begin;11181119ruby_eval_tree_begin=0;1120if(beg_tree){1121rb_eval(self,beg_tree);1122}11231124if(!node)returnQnil;1125returnrb_eval(self,node);1126}

(eval.c)

Thiscallsrb_eval()onruby_eval_tree.Theruby_eval_tree_beginisstoringthestatementsregisteredbyBEGIN.But,thisisalsonotimportant.

And,ruby_stop()insideofruby_run()terminatesallthreadsandfinalizesallobjectsandchecksexceptionsand,intheend,callsexit().Thisisalsonotimportant,sowewon’tseethis.

rb_eval()

OutlineNow,rb_eval().Thisfunctionisexactlytherealcoreofruby.Onerb_eval()callprocessesasingleNODE,andthewholesyntaxtreewillbeprocessedbycallingrecursively.(Fig.1)

Fig.1:rb_eval

rb_evalis,asthesameasyylex(),madeofahugeswitchstatementandbranchingbyeachtypeofthenodes.First,let’slookattheoutline.

▼rb_eval()Outline

2221staticVALUE2222rb_eval(self,n)2223VALUEself;2224NODE*n;2225{2226NODE*nodesave=ruby_current_node;2227NODE*volatilenode=n;2228intstate;2229volatileVALUEresult=Qnil;22302231#defineRETURN(v)do{\2232result=(v);\2233gotofinish;\2234}while(0)22352236again:2237if(!node)RETURN(Qnil);22382239ruby_last_node=ruby_current_node=node;2240switch(nd_type(node)){

caseNODE_BLOCK:.....caseNODE_POSTEXE:.....caseNODE_BEGIN::(plentyofcasestatements):3415default:3416rb_bug("unknownnodetype%d",nd_type(node));3417}3418finish:3419CHECK_INTS;3420ruby_current_node=nodesave;3421returnresult;3422}

(eval.c)

Intheomittedpart,plentyofthecodestoprocessallnodesarelisted.Bybranchinglikethis,itprocesseseachnode.Whenthecodeisonlyafew,itwillbeprocessedinrb_eval().Butwhenitbecomingmany,itwillbeaseparatedfunction.Mostoffunctionsineval.carecreatedinthisway.

Whenreturningavaluefromrb_eval(),itusesthemacroRETURN()insteadofreturn,inordertoalwayspassthroughCHECK_INTS.Sincethismacroisrelatedtothreads,youcanignorethisuntilthechapteraboutit.

Andfinally,thelocalvariablesresultandnodearevolatileforGC.

NODE_IF

Now,takingtheifstatementasanexample,let’slookatthe

processoftherb_eval()evaluationconcretely.Fromhere,inthedescriptionofrb_eval(),

Thesourcecode(aRubyprogram)ItscorrespondingsyntaxtreeThepartialcodeofrb_eval()toprocessthenode.

thesethreewillbelistedatthebeginning.

▼sourceprogram

iftrue'trueexpr'else'falseexpr'end

▼itscorrespondingsyntaxtree(nodedump)

NODE_NEWLINEnd_file="if"nd_nth=1nd_next:NODE_IFnd_cond:NODE_TRUEnd_body:NODE_NEWLINEnd_file="if"nd_nth=2nd_next:NODE_STRnd_lit="trueexpr":Stringnd_else:NODE_NEWLINEnd_file="if"

nd_nth=4nd_next:NODE_STRnd_lit="falseexpr":String

Aswe’veseeninPart2,elsifandunlesscanbe,bycontrivingthewaystoassemble,bundledtoasingleNODE_IFtype,sowedon’thavetotreatthemspecially.

▼rb_eval()−NODE_IF

2324caseNODE_IF:2325if(trace_func){2326call_trace_func("line",node,self,2327ruby_frame->last_func,2328ruby_frame->last_class);2329}2330if(RTEST(rb_eval(self,node->nd_cond))){2331node=node->nd_body;2332}2333else{2334node=node->nd_else;2335}2336gotoagain;

(eval.c)

Onlythelastifstatementisimportant.Ifrewritingitwithoutanychangeinitsmeaning,itbecomesthis:

if(RTEST(rb_eval(self,node->nd_cond))){(A)RETURN(rb_eval(self,node->nd_body));(B)}else{RETURN(rb_eval(self,node->nd_else));(C)}

First,at(A),evaluating(thenodeof)theRuby’sconditionstatementandtestingitsvaluewithRTEST().I’vementionedthatRTEST()isamacrototestwhetherornotaVALUEistrueofRuby.Ifthatwastrue,evaluatingthethensideclauseat(B).Iffalse,evaluatingtheelsesideclauseat(C).

Inaddition,I’vementionedthatifstatementofRubyalsohasitsownvalue,soit’snecessarytoreturnavalue.Sincethevalueofanifisthevalueofeitherthethensideortheelsesidewhichistheoneexecuted,returningitbyusingthemacroRETURN().

Intheoriginallist,itdoesnotcallrb_eval()recursivelybutjustdoesgoto.Thisisthe"conversionfromtailrecursiontogoto"whichhasalsoappearedinthepreviouschapter“Syntaxtreeconstruction”.

NODE_NEW_LINE

SincetherewasNODE_NEWLINEatthenodeforaifstatement,let’slookatthecodeforit.

▼rb_eval()–NODE_NEWLINE

3404caseNODE_NEWLINE:3405ruby_sourcefile=node->nd_file;3406ruby_sourceline=node->nd_nth;3407if(trace_func){3408call_trace_func("line",node,self,3409ruby_frame->last_func,3410ruby_frame->last_class);3411}

3412node=node->nd_next;3413gotoagain;

(eval.c)

There’snothingparticularlydifficult.

call_trace_func()hasalreadyappearedatNODE_IF.Hereisasimpleexplanationofwhatkindofthingitis.ThisisafeaturetotraceaRubyprogramfromRubylevel.Thedebugger(debug.rb)andthetracer(tracer.rb)andtheprofiler(profile.rb)andirb(interactiverubycommand)andmoreareusingthisfeature.

Byusingthefunction-likemethodset_trace_funcyoucanregisteraProcobjecttotrace,andthatProcobjectisstoredintotrace_func.Iftrace_funcisnot0,itmeansnotQFalse,itwillbeconsideredasaProcobjectandexecuted(atcall_trace_func()).

Thiscall_trace_func()hasnothingtodowiththemaintopicandnotsointerestingaswell.Thereforeinthisbook,fromnowon,I’llcompletelyignoreit.Ifyouareinterestedinit,I’dlikeyoutochallengeafterfinishingtheChapter16:Blocks.

Pseudo-localVariablesNODE_IFandsuchareinteriornodesinasyntaxtree.Let’slookattheleaves,too.

▼rb_eval()Ppseudo-LocalVariableNodes

2312caseNODE_SELF:2313RETURN(self);23142315caseNODE_NIL:2316RETURN(Qnil);23172318caseNODE_TRUE:2319RETURN(Qtrue);23202321caseNODE_FALSE:2322RETURN(Qfalse);

(eval.c)

We’veseenselfastheargumentofrb_eval().I’dlikeyoutomakesureitbygoingbackalittle.Theothersareprobablynotneededtobeexplained.

JumpTagNext,I’dliketoexplainNODE_WHILEwhichiscorrespondingtowhile,buttoimplementbreakornextonlywithrecursivecallsofafunctionisdifficult.Sincerubyenablesthesesyntaxesbyusingwhatnamed“jumptag”,I’llstartwithdescribingitfirst.

Simplyput,“jumptag”isawrapperofsetjmp()andlongjump()whicharelibraryfunctionsofClanguage.Doyouknowaboutsetjmp()?Thisfunctionhasalreadyappearedatgc.c,butitisusedinveryabnormalwaythere.setjmp()isusuallyusedtojumpoverfunctions.I’llexplainbytakingthebelowcodeasanexample.Theentrypointisparent().

▼setjmp()andlongjmp()

jmp_bufbuf;

voidchild2(void){longjmp(buf,34);/*gobackstraighttoparentthereturnvalueofsetjmpbecomes34*/puts("Thismessagewillneverbeprinted.");}

voidchild1(void){child2();puts("Thismessagewillneverbeprinted.");}

voidparent(void){intresult;if((result=setjmp(buf))==0){/*normallyreturnedfromsetjmp*/child1();}else{/*returnedfromchild2vialongjmp*/printf("%d\n",result);/*shows34*/}}

First,whensetjmp()iscalledatparent(),theexecutingstateatthetimeissavedtotheargumentbuf.Toputitalittlemoredirectly,theaddressofthetopofthemachinestackandtheCPUregistersaresaved.Ifthereturnvalueofsetjmp()was0,itmeansitnormallyreturnedfromsetjmp(),thusyoucanwritethesubsequentcodeasusual.Thisistheifside.Here,itcallschild1().

Next,thecontrolmovestochild2()andcallslongjump,thenitcangobackstraighttotheplacewheretheargumentbufwassetjmped.

Sointhiscase,itgoesbacktothesetjmpatparent().Whencomingbackvialongjmp,thereturnvalueofsetjmpbecomesthevalueofthesecondargumentoflongjmp,sotheelsesideisexecuted.And,evenifwepass0tolongjmp,itwillbeforcedtobeanothervalue.Thusit’sfruitless.

Fig.2showsthestateofthemachinestack.Theordinaryfunctionsreturnonlyonceforeachcall.However,it’spossiblesetjmp()returnstwice.IsithelpfultograsptheconceptifIsaythatitissomethinglikefork()?

Fig.2:setjmp()longjmp()Image

Now,we’velearnedaboutsetjmp()asapreparation.Ineval.c,EXEC_TAGcorrespondstosetjmp()andJUMP_TAG()correspondstolongjmp()respectively.(Fig.3)

Fig.3:“tagjump”image

Takealookatthisimage,itseemsthatEXEC_TAG()doesnothaveanyarguments.Wherehasjmp_bufgone?Actually,inruby,jmp_bufiswrappedbythestructstructtag.Let’slookatit.

▼structtag

783structtag{784jmp_bufbuf;785structFRAME*frame;/*FRAMEwhenPUSH_TAG*/786structiter*iter;/*ITERwhenPUSH_TAG*/787IDtag;/*tagtype*/788VALUEretval;/*thereturnvalueofthisjump*/789structSCOPE*scope;/*SCOPEwhenPUSH_TAG*/790intdst;/*thedestinationID*/791structtag*prev;792};

(eval.c)

Becausethere’sthememberprev,wecaninferthatstructtagisprobablyastackstructureusingalinkedlist.Moreover,bylookingaroundit,wecanfindthemacrosPUSH_TAG()andPOP_TAG,thusitdefinitelyseemsastack.

▼PUSH_TAG()POP_TAG()

793staticstructtag*prot_tag;/*thepointertotheheadofthemachinestack*/

795#definePUSH_TAG(ptag)do{\796structtag_tag;\797_tag.retval=Qnil;\798_tag.frame=ruby_frame;\799_tag.iter=ruby_iter;\800_tag.prev=prot_tag;\801_tag.scope=ruby_scope;\802_tag.tag=ptag;\803_tag.dst=0;\804prot_tag=&_tag

818#definePOP_TAG()\819if(_tag.prev)\820_tag.prev->retval=_tag.retval;\821prot_tag=_tag.prev;\822}while(0)

(eval.c)

I’dlikeyoutobeflabbergastedherebecausetheactualtagisfullyallocatedatthemachinestackasalocalvariable.(Fig.4).Moreover,do~whileisdividedbetweenthetwomacros.ThismightbeoneofthemostawfulusagesoftheCpreprocessor.HereisthemacrosPUSH/POPcoupledandextractedtomakeiteasytoread.

do{

structtag_tag;_tag.prev=prot_tag;/*savetheprevioustag*/prot_tag=&_tag;/*pushanewtagonthestack*//*doseveralthings*/prot_tag=_tag.prev;/*restoretheprevioustag*/}while(0);

Thismethoddoesnothaveanyoverheadoffunctioncalls,anditscostofthememoryallocationisnexttonothing.Thistechniqueisonlypossiblebecausetherubyevaluatorismadeofrecursivecallsofrb_eval().

Fig.4:thetagstackisembeddedinthemachinestack

Becauseofthisimplementation,it’snecessarythatPUSH_TAGand

POP_TAGareinthesameonefunctionasapair.Plus,sinceit’snotsupposedtobecarelesslyusedattheoutsideoftheevaluator,wecan’tmakethemavailabletootherfiles.

Additionally,let’salsotakealookatEXEC_TAG()andJUMP_TAG().

▼EXEC_TAG()JUMP_TAG()

810#defineEXEC_TAG()setjmp(prot_tag->buf)

812#defineJUMP_TAG(st)do{\813ruby_frame=prot_tag->frame;\814ruby_iter=prot_tag->iter;\815longjmp(prot_tag->buf,(st));\816}while(0)

(eval.c)

Inthisway,setjmpandlongjmparewrappedbyEXEC_TAG()andJUMP_TAG()respectively.ThenameEXEC_TAG()canlooklikeawrapperoflongjmp()atfirstsight,butthisoneistoexecutesetjmp().

Basedonalloftheabove,I’llexplainthemechanismofwhile.First,whenstartingwhileitdoesEXEC_TAG()(setjmp).Afterthat,itexecutesthemainbodybycallingrb_eval()recursively.Ifthere’sbreakornext,itdoesJUMP_TAG()(longjmp).Then,itcangobacktothestartpointofthewhileloop.(Fig.5)

Fig.5:theimplementationofwhilebyusing“tagjump”

Thoughbreakwastakenasanexamplehere,whatcannotbeimplementedwithoutjumpingisnotonlybreak.Evenifwelimitthecasetowhile,therearenextandredo.Additionally,returnfromamethodandexceptionsalsoshouldhavetoclimboverthewallofrb_eval().Andsinceit’scumbersometouseadifferenttagstackforeachcase,wewantforonlyonestacktohandleallcasesinonewayoranother.

Whatweneedtomakeitpossibleisjustattachinginformationabout“whatthepurposeofthisjumpis”.Conveniently,thereturnvalueofsetjmp()couldbespecifiedastheargumentoflongjmp(),thuswecanusethis.Thetypesareexpressedbythefollowingflags:

▼tagtype

828#defineTAG_RETURN0x1/*return*/

829#defineTAG_BREAK0x2/*break*/830#defineTAG_NEXT0x3/*next*/831#defineTAG_RETRY0x4/*retry*/832#defineTAG_REDO0x5/*redo*/833#defineTAG_RAISE0x6/*generalexceptions*/834#defineTAG_THROW0x7/*throw(won'tbeexplainedinthisboook)*/835#defineTAG_FATAL0x8/*fatal:exceptionswhicharenotcatchable*/836#defineTAG_MASK0xf

(eval.c)

Themeaningsarewrittenaseachcomment.ThelastTAG_MASKisthebitmasktotakeouttheseflagsfromareturnvalueofsetjmp().Thisisbecausethereturnvalueofsetjmp()canalsoincludeinformationwhichisnotabouta“typeofjump”.

NODE_WHILE

Now,byexaminingthecodeofNODE_WHILE,let’schecktheactualusageoftags.

▼TheSourceProgram

whiletrue'true_expr'end

▼Itscorrespondingsyntaxtree(nodedump-short)

NODE_WHILEnd_state=1(while)nd_cond:NODE_TRUEnd_body:

NODE_STRnd_lit="true_expr":String

▼rb_eval–NODE_WHILE

2418caseNODE_WHILE:2419PUSH_TAG(PROT_NONE);2420result=Qnil;2421switch(state=EXEC_TAG()){2422case0:2423if(node->nd_state&&!RTEST(rb_eval(self,node->nd_cond)))2424gotowhile_out;2425do{2426while_redo:2427rb_eval(self,node->nd_body);2428while_next:2429;2430}while(RTEST(rb_eval(self,node->nd_cond)));2431break;24322433caseTAG_REDO:2434state=0;2435gotowhile_redo;2436caseTAG_NEXT:2437state=0;2438gotowhile_next;2439caseTAG_BREAK:2440state=0;2441result=prot_tag->retval;2442default:2443break;2444}2445while_out:2446POP_TAG();2447if(state)JUMP_TAG(state);2448RETURN(result);

(eval.c)

Theidiomwhichwillappearoverandoveragainappearedinthe

abovecode.

PUSH_TAG(PROT_NONE);switch(state=EXEC_TAG()){case0:/*processnormally*/break;caseTAG_a:state=0;/*clearstatebecausethejumpwaitedforcomes*//*dotheprocessofwhenjumpedwithTAG_a*/break;caseTAG_b:state=0;/*clearstatebecausethejumpwaitedforcomes*//*dotheprocessofwhenjumpedwithTAG_b*/break;defaultbreak;/*thisjumpisnotwaitedfor,then...*/}POP_TAG();if(state)JUMP_TAG(state);/*..jumpagainhere*/

First,asPUSH_TAG()andPOP_TAG()arethepreviouslydescribedmechanism,it’snecessarytobeusedalwaysasapair.Also,theyneedtobewrittenoutsideofEXEC_TAG().And,applyEXEC_TAG()tothejustpushedjmp_buf.Thismeansdoingsetjmp().Ifthereturnvalueis0,sinceitmeansimmediatelyreturningfromsetjmp(),itdoesthenormalprocessing(thisusuallycontainsrb_eval()).IfthereturnvalueofEXEC_TAG()isnot0,sinceitmeansreturningvialongjmp(),itfiltersonlytheownnecessaryjumpsbyusingcaseandletstherest(default)pass.

Itmightbehelpfultoseealsothecodeofthejumpingside.Thebelowcodeisthehandlerofthenodeofredo.

▼rb_eval()–NODE_REDO

2560caseNODE_REDO:2561CHECK_INTS;2562JUMP_TAG(TAG_REDO);2563break;

(eval.c)

AsaresultofjumpingviaJUMP_TAG(),itgoesbacktothelastEXEC_TAG().ThereturnvalueatthetimeistheargumentTAG_REDO.Beingawareofthis,I’dlikeyoutolookatthecodeofNODE_WHILEandcheckwhatrouteistaken.

Theidiomhasenoughexplained,nowI’llexplainaboutthecodeofNODE_WHILEalittlemoreindetail.Asmentioned,sincetheinsideofcase0:isthemainprocess,Iextractedonlythatpart.Additionally,Imovedsomelabelstoenhancereadability.

if(node->nd_state&&!RTEST(rb_eval(self,node->nd_cond)))gotowhile_out;do{rb_eval(self,node->nd_body);}while(RTEST(rb_eval(self,node->nd_cond)));while_out:

Therearethetwoplacescallingrb_eval()onnode->nd_statewhichcorrespondstotheconditionalstatement.Itseemsthatonlythefirsttestoftheconditionisseparated.Thisistodealwithbothdo~whileandwhileatonce.Whennode->nd_stateis0itisado~while,when1itisanordinarywhile.Therestmightbeunderstoodby

followingstep-by-step,Iwon’tparticularlyexplain.

Bytheway,Ifeellikeiteasilybecomesaninfiniteloopifthereisnextorredointheconditionstatement.Sinceitisofcourseexactlywhatthecodemeans,it’sthefaultofwhowroteit,butI’malittlecuriousaboutit.So,I’veactuallytriedit.

%ruby-e'whilenextdonilend'-e:1:voidvalueexpression

It’ssimplyrejectedatthetimeofparsing.It’ssafebutnotaninterestingresult.Whatproducesthiserrorisvalue_expr()ofparse.y.

Thevalueofanevaluationofwhilewhilehadnothaditsvalueforalongtime,butithasbeenabletoreturnavaluebyusingbreaksinceruby1.7.Thistime,let’sfocusontheflowofthevalueofanevaluation.Keepinginmindthatthevalueofthelocalvariableresultbecomesthereturnvalueofrb_eval(),I’dlikeyoutolookatthefollowingcode:

result=Qnil;switch(state=EXEC_TAG()){case0:/*themainprocess*/caseTAG_REDO:caseTAG_NEXT:/*eachjump*/

caseTAG_BREAK:state=0;

result=prot_tag->retval;(A)default:break;}RETURN(result);

Whatweshouldfocusonisonly(A).Thereturnvalueofthejumpseemstobepassedviaprot_tag->retvalwhichisastructtag.Hereisthepassingside:

▼rb_eval()–NODE_BREAK

2219#definereturn_value(v)prot_tag->retval=(v)

2539caseNODE_BREAK:2540if(node->nd_stts){2541return_value(avalue_to_svalue(rb_eval(self,node->nd_stts)));2542}2543else{2544return_value(Qnil);2545}2546JUMP_TAG(TAG_BREAK);2547break;

(eval.c)

Inthisway,byusingthemacroreturn_value(),itassignsthevaluetothestructofthetopofthetagstack.

Thebasicflowisthis,butinpracticetherecouldbeanotherEXEC_TAGbetweenEXEC_TAG()ofNODE_WHILEandJUMP_TAG()ofNODE_BREAK.Forexample,rescueofanexceptionhandlingcanexistbetweenthem.

whilecond#EXEC_TAG()forNODE_WHILEbegin#EXEC_TAG()againforrescuebreak1rescueendend

Therefore,it’shardtodeterminewhetherornotthestricttagofwhendoingJUMP_TAG()atNODE_BREAKistheonewhichwaspushedatNODE_WHILE.Inthiscase,becauseretvalispropagatedinPOP_TAG()asshownbelow,thereturnvaluecanbepassedtothenexttagwithoutparticularthought.

▼POP_TAG()

818#definePOP_TAG()\819if(_tag.prev)\820_tag.prev->retval=_tag.retval;\821prot_tag=_tag.prev;\822}while(0)

(eval.c)

ThiscanprobablybedepictedasFig.6.

Fig.6:Transferringthereturnvalue

Exception

Asthesecondexampleoftheusageof“tagjump”,we’lllookathowexceptionsaredealtwith.

raise

WhenIexplainedwhile,welookedatthesetjmp()sidefirst.This

time,we’lllookatthelongjmp()sidefirstforachange.It’srb_exc_raise()whichisthesubstanceofraise.

▼rb_exc_raise()

3645void3646rb_exc_raise(mesg)3647VALUEmesg;3648{3649rb_longjmp(TAG_RAISE,mesg);3650}

(eval.c)

mesgisanexceptionobject(aninstanceofExceptionoroneofitssubclass).NoticethatItseemstojumpwithTAG_RAISEthistime.Andthebelowcodeisverysimplifiedrb_longjmp().

▼rb_longjmp()(simplified)

staticvoidrb_longjmp(tag,mesg)inttag;VALUEmesg;{if(NIL_P(mesg))mesg=ruby_errinfo;set_backtrace(mesg,get_backtrace(mesg));ruby_errinfo=mesg;JUMP_TAG(tag);}

Well,thoughthiscanbeconsideredasamatterofcourse,thisisjusttojumpasusualbyusingJUMP_TAG().

Whatisruby_errinfo?Bydoinggrepafewtimes,Ifiguredoutthatthisvariableisthesubstanceoftheglobalvariable$!ofRuby.Sincethisvariableindicatestheexceptionwhichiscurrentlyoccurring,naturallyitssubstanceruby_errinfoshouldhavethesamemeaningaswell.

TheBigPicture▼thesourceprogram

beginraise('exceptionraised')rescue'rescueclause'ensure'ensureclause'end

▼thesyntaxtree(nodedump-short)

NODE_BEGINnd_body:NODE_ENSUREnd_head:NODE_RESCUEnd_head:NODE_FCALLnd_mid=3857(raise)nd_args:NODE_ARRAY[0:NODE_STRnd_lit="exceptionraised":String]nd_resq:NODE_RESBODY

nd_args=(null)nd_body:NODE_STRnd_lit="rescueclause":Stringnd_head=(null)nd_else=(null)nd_ensr:NODE_STRnd_lit="ensureclause":String

Astherightorderofrescueandensureisdecidedatparserlevel,therightorderisstrictlydecidedatsyntaxtreeaswell.NODE_ENSUREisalwaysatthe“top”,NODE_RESCUEcomesnext,themainbody(whereraiseexist)isthelast.SinceNODE_BEGINisanodetodonothing,youcanconsiderNODE_ENSUREisvirtuallyonthetop.

Thismeans,sinceNODE_ENSUREandNODE_RESCUEareabovethemainbodywhichwewanttoprotect,wecanstopraisebymerelydoingEXEC_TAG().Orrather,thetwonodesareputaboveinsyntaxtreeforthispurpose,isprobablymoreaccuratetosay.

ensure

WearegoingtolookatthehandlerofNODE_ENSUREwhichisthenodeofensure.

▼rb_eval()–NODE_ENSURE

2634caseNODE_ENSURE:2635PUSH_TAG(PROT_NONE);2636if((state=EXEC_TAG())==0){2637result=rb_eval(self,node->nd_head);(A-1)2638}

2639POP_TAG();2640if(node->nd_ensr){2641VALUEretval=prot_tag->retval;(B-1)2642VALUEerrinfo=ruby_errinfo;26432644rb_eval(self,node->nd_ensr);(A-2)2645return_value(retval);(B-2)2646ruby_errinfo=errinfo;2647}2648if(state)JUMP_TAG(state);(B-3)2649break;

(eval.c)

Thisbranchusingifisanotheridiomtodealwithtag.ItinterruptsajumpbydoingEXEC_TAG()thenevaluatestheensureclause((node->nd_ensr).Asfortheflowoftheprocess,it’sprobablystraightforward.

Again,we’lltrytothinkaboutthevalueofanevaluation.Tocheckthespecificationfirst,

beginexpr0ensureexpr1end

fortheabovestatement,thevalueofthewholebeginwillbethevalueofexpr0regardlessofwhetherornotensureexists.Thisbehaviorisreflectedtothecode(A-1,2),sothevalueoftheevaluationofanensureclauseiscompletelydiscarded.

At(B-1,3),itdealswiththeevaluatedvalueofwhenajump

occurredatthemainbody.Imentionedthatthevalueofthiscaseisstoredinprot_tag->retval,soitsavesthevaluetoalocalvariabletopreventfrombeingcarelesslyoverwrittenduringtheexecutionoftheensureclause(B-1).Aftertheevaluationoftheensureclause,itrestoresthevaluebyusingreturn_value()(B-2).Whenanyjumphasnotoccurred,state==0inthiscase,prot_tag->retvalisnotusedinthefirstplace.

rescue

It’sbeenalittlewhile,I’llshowthesyntaxtreeofrescueagainjustincase.

▼SourceProgram

beginraise()rescueArgumentError,TypeError'errorraised'end

▼ItsSyntaxTree(nodedump-short)

NODE_BEGINnd_body:NODE_RESCUEnd_head:NODE_FCALLnd_mid=3857(raise)nd_args=(null)nd_resq:NODE_RESBODYnd_args:

NODE_ARRAY[0:NODE_CONSTnd_vid=4733(ArgumentError)1:NODE_CONSTnd_vid=4725(TypeError)]nd_body:NODE_STRnd_lit="errorraised":Stringnd_head=(null)nd_else=(null)

I’dlikeyoutomakesurethat(thesyntaxtreeof)thestatementtoberescueedis“under”NODE_RESCUE.

▼rb_eval()–NODE_RESCUE

2590caseNODE_RESCUE:2591retry_entry:2592{2593volatileVALUEe_info=ruby_errinfo;25942595PUSH_TAG(PROT_NONE);2596if((state=EXEC_TAG())==0){2597result=rb_eval(self,node->nd_head);/*evaluatethebody*/2598}2599POP_TAG();2600if(state==TAG_RAISE){/*anexceptionoccurredatthebody*/2601NODE*volatileresq=node->nd_resq;26022603while(resq){/*dealwiththerescueclauseonebyone*/2604ruby_current_node=resq;2605if(handle_rescue(self,resq)){/*Ifdealtwithbythisclause*/2606state=0;2607PUSH_TAG(PROT_NONE);2608if((state=EXEC_TAG())==0){2609result=rb_eval(self,resq->nd_body);2610}/*evaluatetherescueclause*/

2611POP_TAG();2612if(state==TAG_RETRY){/*Sinceretryoccurred,*/2613state=0;2614ruby_errinfo=Qnil;/*theexceptionisstopped*/2615gotoretry_entry;/*converttogoto*/2616}2617if(state!=TAG_RAISE){/*Alsobyrescueandsuch*/2618ruby_errinfo=e_info;/*theexceptionisstopped*/2619}2620break;2621}2622resq=resq->nd_head;/*moveontothenextrescueclause*/2623}2624}2625elseif(node->nd_else){/*whenthereisanelseclause,*/2626if(!state){/*evaluateitonlywhenanyexceptionhasnotoccurred.*/2627result=rb_eval(self,node->nd_else);2628}2629}2630if(state)JUMP_TAG(state);/*thejumpwasnotwaitedfor*/2631}2632break;

(eval.c)

Eventhoughthesizeisnotsmall,it’snotdifficultbecauseitonlysimplydealwiththenodesonebyone.Thisisthefirsttimehandle_rescue()appeared,butforsomereasonswecannotlookatthisfunctionnow.I’llexplainonlyitseffectshere.Itsprototypeisthis,

staticinthandle_rescue(VALUEself,NODE*resq)

anditdetermineswhetherthecurrentlyoccurringexception(ruby_errinfo)isasubclassoftheclassthatisexpressedbyresq(TypeError,forinstance).Thereasonwhypassingselfisthatit’s

necessarytocallrb_eval()insidethisfunctioninordertoevaluateresq.

TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera

CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License

RubyHackingGuide

Chapter14:Context

Therangecoveredbythischapterisreallybroad.Firstofall,I’lldescribeabouthowtheinternalstateoftheevaluatorisexpressed.Afterthat,asanactualexample,we’llreadhowthestateischangedonaclassdefinitionstatement.Subsequently,we’llexaminehowtheinternalstateinfluencesmethoddefinitionstatements.Lastly,we’llobservehowthebothstatementschangethebehaviorsofthevariabledefinitionsandthevariablereferences.

TheRubystack

ContextandStackWithanimageofatypicalprocedurallanguage,eachtimecallingaprocedure,theinformationwhichisnecessarytoexecutetheproceduresuchasthelocalvariablespaceandtheplacetoreturnisstoredinastruct(astackframe)anditispushedonthestack.Whenreturningfromaprocedure,thestructwhichisonthetopofthestackispoppedandthestateisreturnedtothepreviousmethod.TheexecutingimageofaCprogramwhichwasexplainedatChapter5:Garbagecollectionisaperfectexample.

Whattobecarefulabouthereis,whatischangingduringthe

executionisonlythestack,onthecontrary,theprogramremainsunchangedwhereveritis.Forexample,ifitis“areferencetothelocalvariablei”,there’sjustanorderof“givemeiofthecurrentframe”,itisnotwrittenas“givemeiofthatframe”.Inotherwords,“only”thestateofthestackinfluencestheconsequence.Thisiswhy,evenifaprocedureiscalledanytimeandanynumberoftimes,weonlyhavetowriteitscodeonce(Fig.1).

Fig.1:Whatischangingisonlythestack

TheexecutionofRubyisalsobasicallynothingbutchainedcallsofmethodswhichareprocedures,soessentiallyithasthesameimageasabove.Inotherwords,withthesamecode,thingsbeingaccessedsuchaslocalvariablescopeandtheblocklocalscopewillbechanging.Andthesekindofscopesareexpressedbystacks.

HoweverinRuby,forinstance,youcantemporarilygobacktothescopepreviouslyusedbyusingiteratorsorProc.Thiscannotbeimplementedwithjustsimplypushing/poppingastack.ThereforetheframesoftheRubystackwillbeintricatelyrearrangedduringexecution.AlthoughIcallit“stack”,itcouldbebettertoconsideritasalist.

Otherthanthemethodcall,thelocalvariablescopecanalsobechangedontheclassdefinitions.So,themethodcallsdoesnotmatchthetransitionsofthelocalvariablescope.Sincetherearealsoblocks,it’snecessarytohandlethemseparately.Forthesevariousreasons,surprisingly,therearesevenstacks.

StackPointer

StackFrameType Description

ruby_frame structFRAME therecordsofmethodcallsruby_scope structSCOPE thelocalvariablescoperuby_block structBLOCK theblockscope

ruby_iter structiter whetherornotthecurrentFRAMEisaniterator

ruby_class VALUE theclasstodefinemethodsonruby_cref NODE(NODE_CREF) theclassnestinginformation

ChasonlyonestackandRubyhassevenstacks,bysimplearithmetic,theexecutingimageofRubyisatleastseventimesmorecomplicatedthanC.Butitisactuallynotseventimesatall,it’satleasttwentytimesmorecomplicated.

First,I’llbrieflydescribeaboutthesestacksandtheirstackframestructs.Thedefinedfileiseithereval.corevn.h.Basicallythesestackframesaretouchedonlybyeval.c…iswhatitshouldbeifitwerepossible,butgc.cneedstoknowthestructtypeswhenmarking,sosomeofthemareexposedinenv.h.

Ofcourse,markingcouldbedoneintheotherfilebutgc.c,butitrequiresseparatedfunctionswhichcauseslowingdown.The

ordinaryprogramshadbetternotcareaboutsuchthings,butboththegarbagecollectorandthecoreoftheevaluatoristheruby’sbiggestbottleneck,soit’squiteworthtooptimizeevenforjustonemethodcall.

ruby_frame

ruby_frameisastacktorecordmethodcalls.ThestackframestructisstructFRAME.ThisterminologyisabitconfusingbutpleasebeawarethatI’lldistinctivelywriteitjustaframewhenitmeansa“stackframe”asageneralnounandFRAMEwhenitmeansstructFRAME.

▼ruby_frame

16externstructFRAME{17VALUEself;/*self*/18intargc;/*theargumentcount*/19VALUE*argv;/*thearrayofargumentvalues*/20IDlast_func;/*thenameofthisFRAME(whencalled)*/21IDorig_func;/*thenameofthisFRAME(whendefined)*/22VALUElast_class;/*theclassoflast_func'sreceiver*/23VALUEcbase;/*thebasepointforsearchingconstantsandclassvariables*/24structFRAME*prev;25structFRAME*tmp;/*toprotectfromGC.thiswillbedescribedlater*/26structRNode*node;/*thefilenameandthelinenumberofthecurrentlyexecutedline.*/27intiter;/*isthiscalledwithablock?*/28intflags;/*thebelowtwo*/29}*ruby_frame;

33#defineFRAME_ALLOCA0/*FRAMEisallocatedonthemachinestack*/34#defineFRAME_MALLOC1/*FRAMEisallocatedbymalloc*/

(env.h)

Firstafall,sincethere’stheprevmember,youcaninferthatthestackismadeofalinkedlist.(Fig.2)

Fig.2:ruby_frame

Thefactthatruby_xxxxpointstothetopstackframeiscommontoallstacksandwon’tbementionedeverytime.

Thefirstmemberofthestructisself.Thereisalsoselfintheargumentsofrb_eval(),butwhythisstructremembersanotherself?ThisisfortheC-levelfunctions.Moreprecisely,it’sforrb_call_super()thatiscorrespondingtosuper.Inordertoexecutesuper,itrequiresthereceiverofthecurrentmethod,butthecallersideofrb_call_super()couldnothavesuchinformation.However,thechainofrb_eval()isinterruptedbeforethetimewhentheexecutionoftheuser-definedCcodestarts.Therefore,theconclusionisthatthereneedawaytoobtaintheinformationofselfoutofnothing.And,FRAMEistherightplacetostoreit.

Thinkingalittlefurther,It’smysteriousthatthereareargcandargv.Becauseparametervariablesarelocalvariablesafterall,itisunnecessarytopreservethegivenargumentsafterassigningthemintothelocalvariablewiththesamenamesatthebeginningofthemethod,isn’tit?Then,whatistheuseofthem?Theansweristhatthisisactuallyforsuperagain.InRuby,whencallingsuperwithoutanyarguments,thevaluesoftheparametervariablesofthemethodwillbepassedtothemethodofthesuperclass.Thus,(thelocalvariablespacefor)theparametervariablesmustbereserved.

Additionally,thedifferencebetweenlast_funcandorig_funcwillbecomeoutinthecaseslikewhenthemethodisaliased.Forinstance,

classCdeforig()endaliasaliorigendC.new.ali

inthiscase,last_func=aliandorig_func=orig.Notsurprisingly,thesemembersalsohavetodowithsuper.

ruby_scope

ruby_scopeisthestacktorepresentthelocalvariablescope.Themethodandclassdefinitionstatements,themoduledefinitionstatementsandthesingletonclassdefinitionstatements,allofthemaredifferentscopes.ThestackframestructisstructSCOPE.

I’llcallthisframeSCOPE.

▼ruby_scope

36externstructSCOPE{37structRBasicsuper;38ID*local_tbl;/*anarrayofthelocalvariablenames*/39VALUE*local_vars;/*thespacetostorelocalvariables*/40intflags;/*thebelowfour*/41}*ruby_scope;

43#defineSCOPE_ALLOCA0/*local_varsisallocatedbyalloca*/44#defineSCOPE_MALLOC1/*local_varsisallocatedbymalloc*/45#defineSCOPE_NOSTACK2/*POP_SCOPEisdone*/46#defineSCOPE_DONT_RECYCLE4/*ProciscreatedwiththisSCOPE*/

(env.h)

SincethefirstelementisstructRBasic,thisisaRubyobject.ThisisinordertohandleProcobjects.Forexample,let’strytothinkaboutthecaselikethis:

defmake_counterlvar=0returnProc.new{lvar+=1}end

cnt=make_counter()pcnt.call#1pcnt.call#2pcnt.call#3cnt=nil#cutthereference.ThecreatedProcfinallybecomesunnecessaryhere.

TheProcobjectcreatedbythismethodwillpersistlongerthanthemethodthatcreatesit.And,becausetheProccanrefertothelocalvariablelvar,thelocalvariablesmustbepreserveduntiltheProc

willdisappear.Thus,ifitwerenothandledbythegarbagecollector,noonecandeterminethetimetofree.

TherearetworeasonswhystructSCOPEisseparatedfromstructFRAME.Firstly,thethingslikeclassdefinitionstatementsarenotmethodcallsbutcreatedistinctlocalvariablescopes.Secondly,whenacalledmethodisdefinedinCtheRuby’slocalvariablespaceisunnecessary.

ruby_block

structBLOCKistherealbodyofaRuby’siteratorblockoraProcobject,itisalsokindofasnapshotoftheevaluatoratsomepoint.ThisframewillalsobebrieflywrittenasBLOCKasinthesamemannerasFRAMEandSCOPE.

▼ruby_block

580staticstructBLOCK*ruby_block;

559structBLOCK{560NODE*var;/*theblockparameters(mlhs)*/561NODE*body;/*thecodeoftheblockbody*/562VALUEself;/*theselfwhenthisBLOCKiscreated*/563structFRAMEframe;/*thecopyofruby_framewhenthisBLOCKiscreated*/564structSCOPE*scope;/*theruby_scopewhenthisBLOCKiscreated*/565structBLOCKTAG*tag;/*theidentityofthisBLOCK*/566VALUEklass;/*theruby_classwhenthisBLOCKiscreated*/567intiter;/*theruby_iterwhenthisBLOCKiscreated*/568intvmode;/*thescope_vmodewhenthisBLOCKiscreated*/569intflags;/*BLOCK_D_SCOPE,BLOCK_DYNAMIC*/570structRVarmap*dyna_vars;/*theblocklocalvariablespace*/571VALUEorig_thread;/*thethreadthatcreatesthisBLOCK*/572VALUEwrapper;/*theruby_wrapperwhenthisBLOCKiscreated*/

573structBLOCK*prev;574};

553structBLOCKTAG{554structRBasicsuper;555longdst;/*destination,thatis,theplacetoreturn*/556longflags;/*BLOCK_DYNAMIC,BLOCK_ORPHAN*/557};

576#defineBLOCK_D_SCOPE1/*havingdistinctblocklocalscope*/577#defineBLOCK_DYNAMIC2/*BLOCKwastakenfromaRubyprogram*/578#defineBLOCK_ORPHAN4/*theFRAMEthatcreatesthisBLOCKhasfinished*/

(eval.c)

Notethatframeisnotapointer.ThisisbecausetheentirecontentofstructFRAMEwillbeallcopiedandpreserved.TheentirestructFRAMEis(forbetterperformance)allocatedonthemachinestack,butBLOCKcouldpersistlongerthantheFRAMEthatcreatesit,thepreservationisapreparationforthatcase.

Additionally,structBLOCKTAGisseparatedinordertodetectthesameblockwhenmultipleProcobjectsarecreatedfromtheblock.TheProcobjectswhichwerecreatedfromtheonesameblockhavethesameBLOCKTAG.

ruby_iter

Thestackruby_iterindicateswhethercurrentlycallingmethodisaniterator(whetheritiscalledwithablock).Theframeisstructiter.ButforconsistencyI’llcallitITER.

▼ruby_iter

767staticstructiter*ruby_iter;

763structiter{764intiter;/*thebelowthree*/765structiter*prev;766};

769#defineITER_NOT0/*thecurrentlyevaluatedmethodisnotaniterator*/770#defineITER_PRE1/*themethodwhichisgoingtobeevaluatednextisaniterator*/771#defineITER_CUR2/*thecurrentlyevaluatedmethodisaniterator*/(eval.c)

Althoughforeachmethodwecandeterminewhetheritisaniteratorornot,there’sanotherstructthatisdistinctfromstructFRAME.Why?

It’sobviousyouneedtoinformittothemethodwhen“itisaniterator”,butyoualsoneedtoinformthefactwhen“itisnotaniterator”.However,pushingawholeBLOCKjustforthisisveryheavy.Itwillalsocausethatinthecallersidetheproceduressuchasvariablereferenceswouldneedlesslyincrease.Thus,it’sbettertopushthesmallerandlighterITERinsteadofBLOCK.ThiswillbediscussedindetailinChapter16:Blocks.

ruby_dyna_vars

Theblocklocalvariablespace.TheframestructisstructRVarmapthathasalreadyseeninPart2.Formnowon,I’llcallitjustVARS.

▼structRVarmap

52structRVarmap{53structRBasicsuper;54IDid;/*thenameofthevariable*/55VALUEval;/*thevalueofthevariable*/56structRVarmap*next;57};

(env.h)

NotethataframeisnotasinglestructRVarmapbutalistofthestructs(Fig.3).Andeachframeiscorrespondingtoalocalvariablescope.Sinceitcorrespondsto“localvariablescope”andnot“blocklocalvariablescope”,forinstance,evenifblocksarenested,onlyasinglelistisusedtoexpress.Thebreakbetweenblocksaresimilartotheoneoftheparser,itisexpressedbyaRVarmap(header)whoseidis0.Detailsaredeferredagain.ItwillbeexplainedinChapter16:Blocks.

Fig.3:ruby_dyna_vars

ruby_class

ruby_classrepresentsthecurrentclasstowhichamethodis

defined.Sinceselfwillbethatclasswhenit’sanormalclassdefinitionstatement,ruby_class==self.But,whenitisthetoplevelorinthemiddleofparticularmethodslikeevalandinstance_eval,self!=ruby_classispossible.

Theframeofruby_classisasimpleVALUEandthere’snoparticularframestruct.Then,howcoulditbelikeastack?Moreover,thereweremanystructswithouttheprevpointer,howcouldtheseformastack?Theanswerisdeferredtothenextsection.

Fromnowon,I’llcallthisframeCLASS.

ruby_cref

ruby_crefrepresentstheinformationofthenestingofaclass.I’llcallthisframeCREFwiththesamewayofnamingasbefore.Itsstructis…

▼ruby_cref

847staticNODE*ruby_cref=0;

(eval.c)

…surprisinglyNODE.Thisisusedjustasa“definedstructwhichcanbepointedbyaVALUE”.ThenodetypeisNODE_CREFandtheassignmentsofitsmembersareshownbelow:

UnionMember MacroToAccess Usage

u1.value nd_clss theouterclass(VALUE)u2 – –u3.node nd_next preservethepreviousCREF

Eventhoughthemembernameisnd_next,thevalueitactuallyhasisthe“previous(prev)”CREF.Takingthefollowingprogramasanexample,I’llexplaintheactualappearance.

classAclassBclassCnil#(A)endendend

Fig.4showshowruby_crefiswhenevaluatingthecode(A).

Fig.4:ruby_cref

However,illustratingthisimageeverytimeistediousanditsintentionbecomesunclear.Therefore,thesamestateasFig.4willbeexpressedinthefollowingnotation:

A←B←C

PUSH/POPMacrosForeachstackframestruct,themacrostopushandpopareavailable.Forinstance,PUSH_FRAMEandPOP_FRAMEforFRAME.Becausethesewillappearinamoment,I’llthenexplaintheusageandcontent.

TheotherstatesWhiletheyarenotsoimportantasthemainstacks,theevaluatorofrubyhastheseveralotherstates.Thisisabrieflistofthem.However,someofthemarenotstacks.Actually,mostofthemarenot.

VariableName Type Meaning

scope_vmode int thedefaultvisibilitywhenamethodisdefined

ruby_in_eval int whetherornotparsingaftertheevaluationisstarted

ruby_current_node NODE* thefilenameandthelinenumberofwhatcurrentlybeingevaluated

ruby_safe_level int $SAFEruby_errinfo VALUE theexceptioncurrentlybeinghandled

ruby_wrapper VALUE thewrappermoduletoisolatetheenvironment

ModuleDefinition

Theclassstatementandthemodulestatementandthesingletonclassdefinitionstatement,theyareallimplementedinsimilarways.

Becauseseeingsimilarthingscontinuouslythreetimesisnotinteresting,thistimelet’sexaminethemodulestatementwhichhastheleastelements(thus,issimple).

Firstofall,whatisthemodulestatement?Conversely,whatshouldhappenisthemodulestatement?Let’strytolistupseveralfeatures:

anewmoduleobjectshouldbecreatedthecreatedmoduleshouldbeselfitshouldhaveanindependentlocalvariablescopeifyouwriteaconstantassignment,aconstantshouldbedefinedonthemoduleifyouwriteaclassvariableassignment,aclassvariableshouldbedefinedonthemodule.ifyouwriteadefstatement,amethodshouldbedefinedonthemodule

Whatisthewaytoarchivethesethings?…isthepointofthissection.Now,let’sstarttolookatthecodes.

Investigation

▼TheSourceProgram

moduleMa=1end

▼ItsSyntaxTree

NODE_MODULEnd_cname=9621(M)nd_body:NODE_SCOPEnd_rval=(null)nd_tbl=3[_~a]nd_next:NODE_LASGNnd_cnt=2nd_value:NODE_LITnd_lit=1:Fixnum

nd_cnameseemsthemodulename.cnameisprobablyeitherConstNAMEorClassNAME.Idumpedseveralthingsandfoundthatthere’salwaysNODE_SCOPEinnd_body.Sinceitsmembernd_tblholdsalocalvariabletableanditsnameissimilartostructSCOPE,itappearscertainthatthisNODE_SCOPEplaysanimportantroletocreatealocalvariablescope.

NODE_MODULE

Let’sexaminethehandlerofNODE_MODULEofrb_eval().Thepartsthatarenotclosetothemainline,suchasruby_raise()anderrorhandlingwerecutdrastically.Sofar,therehavebeenalotof

cuttingworksfor200pages,ithasalreadybecameunnecessarytoshowtheoriginalcode.

▼rb_eval()−NODE_MODULE(simplified)

caseNODE_MODULE:{VALUEmodule;

if(rb_const_defined_at(ruby_class,node->nd_cname)){/*justobtainthealreadycreatedmodule*/module=rb_const_get(ruby_class,node->nd_cname);}else{/*createanewmoduleandsetitintotheconstant*/module=rb_define_module_id(node->nd_cname);rb_const_set(ruby_cbase,node->nd_cname,module);rb_set_class_path(module,ruby_class,rb_id2name(node->nd_cname));}

result=module_setup(module,node->nd_body);}break;

First,we’dliketomakesurethemoduleisnestedanddefinedabove(themoduleholdedby)ruby_class.Wecanunderstanditfromthefactthatitcallsruby_const_xxxx()onruby_class.Justonceruby_cbaseappears,butitisusuallyidenticaltoruby_class,sowecanignoreit.Eveniftheyaredifferent,itrarelycausesaproblem.

Thefirsthalf,itisbranchingbyifbecauseitneedstocheckifthemodulehasalreadybeendefined.Thisisbecause,inRuby,wecando“additional”definitionsonthesameonemoduleanynumberoftimes.

moduleMdefa#M#aisdeifnedendendmoduleM#addadefinition(notre-definingoroverwriting)defb#M#bisdefinedendend

Inthisprogram,thetwomethods,aandb,willbedefinedonthemoduleM.

Inthiscase,ontheseconddefinitionofMthemoduleMwasalreadysettotheconstant,justobtainingandusingitwouldbesufficient.IftheconstantMdoesnotexistyet,itmeansthefirstdefinitionandthemoduleiscreated(byrb_define_module_id())

Lastly,module_setup()isthefunctionexecutingthebodyofamodulestatement.Notonlythemodulestatementsbuttheclassstatementsandthesingletonclassstatementsareexecutedbymodule_setup().ThisisthereasonwhyIsaid“allofthesethreetypeofstatementsaresimilarthings”.Fornow,I’dlikeyoutonotethatnode->nd_body(NODE_SCOPE)ispassedasanargument.

module_setup

Forthemoduleandclassandsingletonclassstatements,module_setup()executestheirbodies.Finally,theRubystackmanipulationswillappearinlargeamounts.

▼module_setup()

3424staticVALUE3425module_setup(module,n)3426VALUEmodule;3427NODE*n;3428{3429NODE*volatilenode=n;3430intstate;3431structFRAMEframe;3432VALUEresult;/*OK*/3433TMP_PROTECT;34343435frame=*ruby_frame;3436frame.tmp=ruby_frame;3437ruby_frame=&frame;34383439PUSH_CLASS();3440ruby_class=module;3441PUSH_SCOPE();3442PUSH_VARS();3443/*(A)ruby_scope->local_varsinitialization*/3444if(node->nd_tbl){3445VALUE*vars=TMP_ALLOC(node->nd_tbl[0]+1);3446*vars++=(VALUE)node;3447ruby_scope->local_vars=vars;3448rb_mem_clear(ruby_scope->local_vars,node->nd_tbl[0]);3449ruby_scope->local_tbl=node->nd_tbl;3450}3451else{3452ruby_scope->local_vars=0;3453ruby_scope->local_tbl=0;3454}34553456PUSH_CREF(module);3457ruby_frame->cbase=(VALUE)ruby_cref;3458PUSH_TAG(PROT_NONE);3459if((state=EXEC_TAG())==0){3460if(trace_func){3461call_trace_func("class",ruby_current_node,ruby_class,3462ruby_frame->last_func,3463ruby_frame->last_class);

3464}3465result=rb_eval(ruby_class,node->nd_next);3466}3467POP_TAG();3468POP_CREF();3469POP_VARS();3470POP_SCOPE();3471POP_CLASS();34723473ruby_frame=frame.tmp;3474if(trace_func){3475call_trace_func("end",ruby_last_node,0,3476ruby_frame->last_func,ruby_frame->last_class);3477}3478if(state)JUMP_TAG(state);34793480returnresult;3481}

(eval.c)

Thisistoobigtoreadallinonegulp.Let’scutthepartsthatseemsunnecessary.

First,thepartsaroundtrace_funccanbedeletedunconditionally.

Wecanseetheidiomsrelatedtotags.Let’ssimplifythembyexpressingwiththeRuby’sensure.

Immediatelyafterthestartofthefunction,theargumentnispurposefullyassignedtothelocalvariablenode,butvolatileisattachedtonodeanditwouldneverbeassignedafterthat,thusthisistopreventfrombeinggarbagecollected.Ifweassumethattheargumentwasnodefromthebeginning,itwouldnotchangethemeaning.

Inthefirsthalfofthefunction,there’sthepartmanipulatingruby_framecomplicatedly.Itisobviouslypairedupwiththepartruby_frame=frame.tmpinthelasthalf.We’llfocusonthispartlater,butforthetimebeingthiscanbeconsideredaspushpopofruby_frame.

Plus,itseemsthatthecode(A)canbe,ascommented,summarizedastheinitializationofruby_scope->local_vars.Thiswillbediscussedlater.

Consequently,itcouldbesummarizedasfollows:

▼module_setup(simplified)

staticVALUEmodule_setup(module,node)VALUEmodule;NODE*node;{structFRAMEframe;VALUEresult;

pushFRAMEPUSH_CLASS();ruby_class=module;PUSH_SCOPE();PUSH_VARS();ruby_scope->local_varsinitializaionPUSH_CREF(module);ruby_frame->cbase=(VALUE)ruby_cref;beginresult=rb_eval(ruby_class,node->nd_next);ensurePOP_TAG();POP_CREF();POP_VARS();

POP_SCOPE();POP_CLASS();popFRAMEendreturnresult;}

Itdoesrb_eval()withnode->nd_next,soit’scertainthatthisisthecodeofthemodulebody.Theproblemsareabouttheothers.Thereare5pointstosee.

ThingsoccuronPUSH_SCOPE()PUSH_VARS()HowthelocalvariablespaceisallocatedTheeffectofPUSH_CLASSTherelationshipbetweenruby_crefandruby_frame->cbaseWhatisdonebymanipulatingruby_frame

Let’sinvestigatetheminorder.

CreatingalocalvariablescopePUSH_SCOPEpushesalocalvariablespaceandPUSH_VARS()pushesablocklocalvariablespace,thusanewlocalvariablescopeiscreatedbythesetwo.Let’sexaminethecontentsofthesemacrosandwhatisdone.

▼PUSH_SCOPE()POP_SCOPE()

852#definePUSH_SCOPE()do{\853volatileint_vmode=scope_vmode;\854structSCOPE*volatile_old;\

855NEWOBJ(_scope,structSCOPE);\856OBJSETUP(_scope,0,T_SCOPE);\857_scope->local_tbl=0;\858_scope->local_vars=0;\859_scope->flags=0;\860_old=ruby_scope;\861ruby_scope=_scope;\862scope_vmode=SCOPE_PUBLIC

869#definePOP_SCOPE()\870if(ruby_scope->flags&SCOPE_DONT_RECYCLE){\871if(_old)scope_dup(_old);\872}\873if(!(ruby_scope->flags&SCOPE_MALLOC)){\874ruby_scope->local_vars=0;\875ruby_scope->local_tbl=0;\876if(!(ruby_scope->flags&SCOPE_DONT_RECYCLE)&&\877ruby_scope!=top_scope){\878rb_gc_force_recycle((VALUE)ruby_scope);\879}\880}\881ruby_scope->flags|=SCOPE_NOSTACK;\882ruby_scope=_old;\883scope_vmode=_vmode;\884}while(0)

(eval.c)

Asthesameastags,SCOPEsalsocreateastackbybeingsynchronizedwiththemachinestack.Whatdifferentiateslightlyisthatthespacesofthestackframesareallocatedintheheap,themachinestackisusedinordertocreatethestackstructure(Fig.5.).

Fig.5.ThemachinestackandtheSCOPEStack

Additionally,theflagslikeSCOPE_somethingrepeatedlyappearinginthemacrosarenotabletobeexplaineduntilIfinishtotalkallaboutinwhatformeachstackframeisrememberedandaboutblocks.Thus,thesewillbediscussedinChapter16:Blocksallatonce.

AllocatingthelocalvariablespaceAsImentionedmanytimes,thelocalvariablescopeisrepresentedbystructSCOPE.ButstructSCOPEisliterallya“scope”anditdoesnothavetherealbodytostorelocalvariables.Toputitmoreprecisely,ithasthepointertoaspacebutthere’sstillnoarrayattheplacewheretheonepointsto.Thefollowingpartofmodule_setuppreparesthearray.

▼Thepreparationofthelocalvariableslots

3444if(node->nd_tbl){3445VALUE*vars=TMP_ALLOC(node->nd_tbl[0]+1);3446*vars++=(VALUE)node;3447ruby_scope->local_vars=vars;3448rb_mem_clear(ruby_scope->local_vars,node->nd_tbl[0]);3449ruby_scope->local_tbl=node->nd_tbl;3450}3451else{3452ruby_scope->local_vars=0;3453ruby_scope->local_tbl=0;3454}

(eval.c)

TheTMP_ALLOC()atthebeginningwillbedescribedinthenextsection.IfIputitshortly,itis“allocathatisassuredtoallocateonthestack(therefore,wedonotneedtoworryaboutGC)”.

node->nd_tblholdsinfactthelocalvariablenametablethathasappearedinChapter12:Syntaxtreeconstruction.Itmeansthatnd_tbl[0]containsthetablesizeandtherestisanarrayofID.Thistableisdirectlypreservedtolocal_tblofSCOPEandlocal_varsisallocatedtostorethelocalvariablevalues.Becausetheyareconfusing,it’sagoodthingwritingsomecommentssuchas“Thisisthevariablename”,“thisisthevalue”.Theonewithtblisforthenames.

Fig.6.ruby_scope->local_vars

Whereisthisnodeused?Iexaminedthealllocal_varsmembersbutcouldnotfindtheaccesstoindex-1ineval.c.Expandingtherangeoffilestoinvestigate,Ifoundtheaccessingc.c.

▼rb_gc_mark_children()—T_SCOPE

815caseT_SCOPE:816if(obj->as.scope.local_vars&&(obj->as.scope.flags&SCOPE_MALLOC)){817intn=obj->as.scope.local_tbl[0]+1;818VALUE*vars=&obj->as.scope.local_vars[-1];819820while(n--){821rb_gc_mark(*vars);822vars++;823}824}825break;

(gc.c)

Apparently,thisisamechanismtoprotectnodefromGC.Butwhyisitnecessarytotomarkithere?nodeispurposefullystoreintothevolatilelocalvariable,soitwouldnotbegarbage-collectedduringtheexecutionofmodule_setup().

Honestlyspeaking,Iwasthinkingitmightmerelybeamistakeforawhilebutitturnedoutit’sactuallyveryimportant.Theissueisthisatthenextlineofthenextline:

▼ruby_scope->local_tbl

3449ruby_scope->local_tbl=node->nd_tbl;

(eval.c)

Thelocalvariablenametablepreparedbytheparserisdirectlyused.Whenisthistablefreed?It’sthetimewhenthenodebecomenottobereferredfromanywhere.Then,whenshouldnodebefreed?It’sthetimeaftertheSCOPEassignedonthislinewilldisappearcompletely.Then,whenisthat?

SCOPEsometimespersistslongerthanthestatementthatcausesthecreationofit.AsitwillbediscussedatChapter16:Blocks,ifaProcobjectiscreated,itrefersSCOPE.Thus,Ifmodule_setup()hasfinished,theSCOPEcreatedthereisnotnecessarilybewhatisnolongerused.That’swhyit’snotsufficientthatnodeisonlyreferredfrom(thestackframeof)module_setup().Itmustbereferred“directly”fromSCOPE.

Ontheotherhand,thevolatilenodeofthelocalvariablecannotberemoved.Withoutit,nodeisfloatingonairuntilitwillbeassignedtolocal_vars.

Howeverthen,local_varsofSCOPEisnotsafe,isn’tit?TMP_ALLOC()is,asImentioned,theallocationonthestack,itbecomesinvalidatthetimemodule_setup()ends.Thisisinfact,atthemomentwhenProciscreated,theallocationmethodisabruptlyswitchedtomalloc().DetailswillbedescribedinChapter16:Blocks.

Lastly,rb_mem_clear()seemszero-fillingbutactuallyitisQnil-fillingtoanarrayofVALUE(array.c).Bythis,alldefinedlocalvariablesareinitializedasnil.

TMP_ALLOC

Next,let’sreadTMP_ALLOCthatallocatesthelocalvariablespace.ThismacroisactuallypairedwithTMP_PROTECTexistingsilentlyatthebeginningofmodule_setup().Itstypicalusageisthis:

VALUE*ptr;TMP_PROTECT;

ptr=TMP_ALLOC(size);

ThereasonwhyTMP_PROTECTisintheplaceforthelocalvariabledefinitionsisthat…Let’sseeitsdefinition.

▼TMP_ALLOC()

1769#ifdefC_ALLOCA1770#defineTMP_PROTECTNODE*volatiletmp__protect_tmp=01771#defineTMP_ALLOC(n)\1772(tmp__protect_tmp=rb_node_newnode(NODE_ALLOCA,\1773ALLOC_N(VALUE,n),tmp__protect_tmp,n),\1774(void*)tmp__protect_tmp->nd_head)1775#else1776#defineTMP_PROTECTtypedefintfoobazzz1777#defineTMP_ALLOC(n)ALLOCA_N(VALUE,n)1778#endif

(eval.c)

…itisbecauseitdefinesalocalvariable.

AsdescribedinChapter5:Garbagecollection,intheenvironmentof#ifdefC_ALLOCA(thatis,thenativealloca()doesnotexist)malloca()isusedtoemulatealloca().However,theargumentsofamethodareobviouslyVALUEsandtheGCcouldnotfindaVALUEifitisstoredintheheap.Therefore,itisenforcedthatGCcanfinditthroughNODE.

Fig.7.anchorthespacetothestackthroughNODE

Onthecontrary,intheenvironmentwiththetruealloca(),wecannaturallyusealloca()andthere’snoneedtouseTMP_PROTECT.Thus,aharmlessstatementisarbitrarilywritten.

Bytheway,whydotheywanttousealloca()verymuchbyallmeans.It’smerelybecause"alloca()isfasterthanmalloc()",theysaid.Onecanthinkthatit’snotsoworthtocareaboutsuchtinydifference,butbecausethecoreoftheevaluatoristhebiggest

bottleneckofruby,…thesameasabove.

Changingtheplacetodefinemethodson.

Thevalueofthestackruby_classistheplacetodefineamethodonatthetime.Conversely,ifonepushavaluetoruby_class,itchangestheclasstodefineamethodon.Thisisexactlywhatisnecessaryforaclassstatement.Therefore,It’salsonecessarytodoPUSH_CLASS()inmodule_setup().Hereisthecodeforit:

PUSH_CLASS();ruby_class=module;::POP_CLASS();

Whyistheretheassignmenttoruby_classafterdoingPUSH_CLASS().Wecanunderstanditunexpectedlyeasilybylookingatthedefinition.

▼PUSH_CLASS()POP_CLASS()

841#definePUSH_CLASS()do{\842VALUE_class=ruby_class

844#definePOP_CLASS()ruby_class=_class;\845}while(0)

(eval.c)

Becauseruby_classisnotmodifiedeventhoughPUSH_CLASSisdone,

itisnotactuallypusheduntilsettingbyhand.Thus,thesetwoarecloserto“saveandrestore”ratherthan“pushandpop”.

YoumightthinkthatitcanbeacleanermacroifpassingaclassastheargumentofPUSH_CLASS()…It’sabsolutelytrue,butbecausetherearesomeplaceswecannotobtaintheclassbeforepushing,itisinthisway.

NestingClassesruby_crefrepresentstheclassnestinginformationatruntime.Therefore,it’snaturallypredictedthatruby_crefwillbepushedonthemodulestatementsorontheclassstatements.Inmodule_setup(),itispushedasfollows:

PUSH_CREF(module);ruby_frame->cbase=(VALUE)ruby_cref;::POP_CREF();

Here,moduleisthemodulebeingdefined.Let’salsoseethedefinitionsofPUSH_CREF()andPOP_CREF().

▼PUSH_CREF()POP_CREF()

849#definePUSH_CREF(c)\ruby_cref=rb_node_newnode(NODE_CREF,(c),0,ruby_cref)850#definePOP_CREF()ruby_cref=ruby_cref->nd_next

(eval.c)

UnlikePUSH_SCOPEorsomething,therearenotanycomplicatedtechniquesandit’sveryeasytodealwith.It’salsonotgoodifthere’scompletelynotanysuchthing.

Theproblemremainsunsolvediswhatisthemeaningofruby_frame->cbase.ItistheinformationtoreferaclassvariableoraconstantfromthecurrentFRAME.Detailswillbediscussedinthelastsectionofthischapter.

ReplacingframesLastly,let’sfocusonthemanipulationofruby_frame.Thefirstthingisitsdefinition:

structFRAMEframe;

Itisnotapointer.ThismeansthattheentireFRAMEisallocatedonthestack.BoththemanagementstructureoftheRubystackandthelocalvariablespaceareonthestack,butinthecaseofFRAMEtheentirestructisstoredonthestack.Theextremeconsumptionofthemachinestackbyrubyisthefruitofthese“smalltechniques”pilingup.

Thennext,let’slookatwheredoingseveralthingswithframe.

frame=*ruby_frame;/*copytheentirestruct*/frame.tmp=ruby_frame;/*protecttheoriginalFRAMEfromGC*/ruby_frame=&frame;/*replaceruby_frame*/::

ruby_frame=frame.tmp;/*restore*/

Thatis,ruby_frameseemstemporarilyreplaced(notpushing).Whyisitdoingsuchthing?

IdescribedthatFRAMEis“pushedonmethodcalls”,buttobemoreprecise,itisthestackframetorepresent“themainenvironmenttoexecuteaRubyprogram”.Youcaninferitfrom,forinstance,ruby_frame->cbasewhichappearedpreviously.last_funcwhichis“thelastcalledmethodname”alsosuggestsit.

Then,whyisFRAMEnotstraightforwardlypushed?ItisbecausethisistheplacewhereitisnotallowedtopushFRAME.FRAMEiswantedtobepushed,butifFRAMEispushed,itwillappearinthebacktracesoftheprogramwhenanexceptionoccurs.Thebacktracesarethingsdisplayedlikefollowings:

%rubyt.rbt.rb:11:in`c':someerroroccured(ArgumentError)fromt.rb:7:in`b'fromt.rb:3:in`a'fromt.rb:14

Butthemodulestatementsandtheclassstatementsarenotmethodcalls,soitisnotdesirabletoappearinthis.That’swhyitis“replaced”insteadof“pushed”.

Themethoddefinition

Asthenexttopicofthemoduledefinitions,let’slookatthemethoddefinitions.

Investigation▼TheSourceProgram

defm(a,b,c)nilend

▼ItsSyntaxTree

NODE_DEFNnd_mid=9617(m)nd_noex=2(NOEX_PRIVATE)nd_defn:NODE_SCOPEnd_rval=(null)nd_tbl=5[_~abc]nd_next:NODE_ARGSnd_cnt=3nd_rest=-1nd_opt=(null)NODE_NIL

Idumpedseveralthingsandfoundthatthere’salwaysNODE_SCOPEinnd_defn.NODE_SCOPEis,aswe’veseenatthemodulestatements,thenodetostoretheinformationtopushalocalvariablescope.

NODE_DEFN

Subsequently,wewillexaminethecorrespondingcodeofrb_eval().Thispartcontainsalotoferrorhandlingsandtedious,theyareallomittedagain.Thewayofomittingisasusual,deletingtheeverypartstodirectlyorindirectlycallrb_raise()rb_warn()rb_warning().

▼rb_eval()−NODE_DEFN(simplified)

NODE*defn;intnoex;

if(SCOPE_TEST(SCOPE_PRIVATE)||node->nd_mid==init){noex=NOEX_PRIVATE;(A)}elseif(SCOPE_TEST(SCOPE_PROTECTED)){noex=NOEX_PROTECTED;(B)}elseif(ruby_class==rb_cObject){noex=node->nd_noex;(C)}else{noex=NOEX_PUBLIC;(D)}

defn=copy_node_scope(node->nd_defn,ruby_cref);rb_add_method(ruby_class,node->nd_mid,defn,noex);result=Qnil;

Inthefirsthalf,therearethewordslikeprivateorprotected,soitisprobablyrelatedtovisibility.noex,whichisusedasthenamesofflags,seemsNOdeEXposure.Let’sexaminetheifstatementsinorder.

(A)SCOPE_TEST()isamacrotocheckifthere’sanargumentflaginscope_vmode.Therefore,thefirsthalfofthisconditionalstatement

means“isitaprivatescope?”.Thelasthalfmeans“it’sprivateifthisisdefininginitialize”.Themethodinitializetoinitializeanobjectwillunquestionablybecomeprivate.

(B)Itisprotectedifthescopeisprotected(notsurprisingly).Myfeelingisthatthere’refewcasesprotectedisrequiredinRuby.

(C)Thisisabug.Ifoundthisjustbeforethesubmissionofthisbook,soIcouldn’tfixthisbeforehand.Inthelatestcodethispartisprobablyalreadyremoved.Theoriginalintentionistoenforcethemethodsdefinedattopleveltobeprivate.

(D)Ifitisnotanyoftheaboveconditions,itispublic.

Actually,there’snotathingtoworthtocareaboutuntilhere.Theimportantpartisthenexttwolines.

defn=copy_node_scope(node->nd_defn,ruby_cref);rb_add_method(ruby_class,node->nd_mid,defn,noex);

copy_node_scope()isafunctiontocopy(only)NODE_SCOPEattachedtothetopofthemethodbody.Itisimportantthatruby_crefispassed…butdetailswillbedescribedsoon.

Aftercopying,thedefinitionisfinishedbyaddingitbyrb_add_method().Theplacetodefineonisofcourseruby_class.

copy_node_scope()

copy_node_scope()iscalledonlyfromthetwoplaces:themethoddefinition(NODE_DEFN)andthesingletonmethoddefinition(NODE_DEFS)inrb_eval().Therefore,lookingatthesetwoissufficienttodetecthowitisused.Plus,theusagesatthesetwoplacesarealmostthesame.

▼copy_node_scope()

1752staticNODE*1753copy_node_scope(node,rval)1754NODE*node;1755VALUErval;1756{1757NODE*copy=rb_node_newnode(NODE_SCOPE,0,rval,node->nd_next);17581759if(node->nd_tbl){1760copy->nd_tbl=ALLOC_N(ID,node->nd_tbl[0]+1);1761MEMCPY(copy->nd_tbl,node->nd_tbl,ID,node->nd_tbl[0]+1);1762}1763else{1764copy->nd_tbl=0;1765}1766returncopy;1767}

(eval.c)

Imentionedthattheargumentrvalistheinformationoftheclassnesting(ruby_cref)ofwhenthemethodisdefined.Apparently,itisrvalbecauseitwillbesettond_rval.

Inthemainifstatementcopiesnd_tblofNODE_SCOPE.Itisalocalvariablenametableinotherwords.The+1atALLOC_Nistoadditionallyallocatethespacefornd_tbl[0].Aswe’veseeninPart

2,nd_tbl[0]holdsthelocalvariablescount,thatwas“theactuallengthofnd_tbl–1”.

Tosummarize,copy_node_scope()makesacopyoftheNODE_SCOPEwhichistheheaderofthemethodbody.However,nd_rvalisadditionallysetanditistheruby_cref(theclassnestinginformation)ofwhentheclassisdefined.Thisinformationwillbeusedlaterwhenreferringconstantsorclassvariables.

rb_add_method()

Thenextthingisrb_add_method()thatisthefunctiontoregisteramethodentry.

▼rb_add_method()

237void238rb_add_method(klass,mid,node,noex)239VALUEklass;240IDmid;241NODE*node;242intnoex;243{244NODE*body;245246if(NIL_P(klass))klass=rb_cObject;247if(ruby_safe_level>=4&&(klass==rb_cObject||!OBJ_TAINTED(klass))){248rb_raise(rb_eSecurityError,"Insecure:can'tdefinemethod");249}250if(OBJ_FROZEN(klass))rb_error_frozen("class/module");251rb_clear_cache_by_id(mid);252body=NEW_METHOD(node,noex);253st_insert(RCLASS(klass)->m_tbl,mid,body);254}

(eval.c)

NEW_METHOD()isamacrotocreateNODE.rb_clear_cache_by_id()isafunctiontomanipulatethemethodcache.Thiswillbeexplainedinthenextchapter“Method”.

Let’slookatthesyntaxtreewhichiseventuallystoredinm_tblofaclass.Ipreparednodedump-methodforthiskindofpurposes.(nodedump-method:comeswithnodedump.nodedumpistools/nodedump.tar.gzoftheattachedCD-ROM)

%ruby-e'classCdefm(a)puts"ok"endendrequire"nodedump-method"NodeDump.dumpC,:m#dumpthemethodmoftheclassC'NODE_METHODnd_noex=0(NOEX_PUBLIC)nd_cnt=0nd_body:NODE_SCOPEnd_rval=Object<-Cnd_tbl=3[_~a]nd_next:NODE_ARGSnd_cnt=1nd_rest=-1nd_opt=(null)U⽛S頏著

**unhandled**

ThereareNODE_METHODatthetopandNODE_SCOPEpreviouslycopiedbycopy_node_scope()atthenext.Theseprobablyrepresenttheheaderofamethod.Idumpedseveralthingsandthere’snotanyNODE_SCOPEwiththemethodsdefinedinC,thusitseemstoindicatethatthemethodisdefinedatRubylevel.

Additionally,atnd_tblofNODE_SCOPEtheparametervariablename(a)appears.Imentionedthattheparametervariablesareequivalenttothelocalvariables,andthisbrieflyimpliesit.

I’llomittheexplanationaboutNODE_ARGSherebecauseitwillbedescribedatthenextchapter“Method”.

Lastly,thend_cntoftheNODE_METHOD,it’snotsonecessarytocareaboutthistime.Itisusedwhenhavingtodowithalias.

AssignmentandReference

Cometothinkofit,mostofthestacksareusedtorealizeavarietyofvariables.Wehavelearnedtopushvariousstacks,thistimelet’sexaminethecodetoreferencevariables.

Localvariable

Theallnecessaryinformationtoassignorreferlocalvariableshasappeared,soyouareprobablyabletopredict.Therearethefollowingtwopoints:

localvariablescopeisanarraywhichispointedbyruby_scope->local_vars

thecorrespondencebetweeneachlocalvariablenameandeacharrayindexhasalreadyresolvedattheparserlevel.

Therefore,thecodeforthelocalvariablereferencenodeNODE_LVARisasfollows:

▼rb_eval()−NODE_LVAR

2975caseNODE_LVAR:2976if(ruby_scope->local_vars==0){2977rb_bug("unexpectedlocalvariable");2978}2979result=ruby_scope->local_vars[node->nd_cnt];2980break;

(eval.c)

Itgoeswithoutsayingbutnode->nd_cntisthevaluethatlocal_cnt()oftheparserreturns.

Constant

CompleteSpecificationInChapter6:Variablesandconstants,Italkedaboutinwhatform

constantsarestoredandAPI.Constantsarebelongtoclassesandinheritedasthesameasmethods.Asfortheiractualappearances,theyareregisteredtoiv_tblofstructRClasswithinstancevariablesandclassvariables.

Thesearchingpathofaconstantisfirstlytheouterclass,secondlythesuperclass,however,rb_const_get()onlysearchesthesuperclass.Why?Toanswerthisquestion,Ineedtorevealthelastspecificationofconstants.Takealookatthefollowingcode:

classAC=5defA.newputsCsuperendend

A.newisasingletonmethodofA,soitsclassisthesingletonclass(A).Ifitisinterpretedbyfollowingtherule,itcannotobtaintheconstantCwhichisbelongstoA.

Butbecauseitiswrittensoclose,tobecometowantrefertheconstantCishumannature.Therefore,suchreferenceispossibleinRuby.ItcanbesaidthatthisspecificationreflectsthecharacteristicofRuby“Theemphasisisontheappearanceofthesourcecode”.

IfIgeneralizethisrule,whenreferringaconstantfrominsideofamethod,bysettingtheplacewhichthemethoddefinitionis“written”asthestartpoint,itreferstheconstantoftheouterclass.

And,“theclassofwherethemethodiswritten”dependsonitscontext,thusitcouldnotbehandledwithouttheinformationfromboththeparserandtheevaluator.Thisiswhyrb_cost_get()didnothavethesearchingpathoftheouterclass.

cbase

Then,let’slookatthecodetoreferconstantsincludingtheouterclass.Theordinaryconstantreferencestowhich::isnotattached,becomeNODE_CONSTinthesyntaxtree.Thecorrespondingcodeinrb_eval()is…

▼rb_eval()−NODE_CONST

2994caseNODE_CONST:2995result=ev_const_get(RNODE(ruby_frame->cbase),node->nd_vid,self);2996break;

(eval.c)

First,nd_vidappearstobeVariableIDanditprobablymeansaconstantname.And,ruby_frame->cbaseis“theclasswherethemethoddefinitioniswritten”.Thevaluewillbesetwheninvokingthemethod,thusthecodetosethasnotappearedyet.Andtheplacewherethevaluetobesetcomesfromisthend_rvalthathasappearedincopy_node_scope()ofthemethoddefinition.I’dlikeyoutogobackalittleandcheckthatthememberholdstheruby_crefofwhenthemethodisdefined.

Thismeans,first,theruby_creflinkisbuiltwhendefiningaclassoramodule.AssumethatthejustdefinedclassisC(Fig.81),

Definingthemethodm(thisisprobablyC#m)here,thenthecurrentruby_crefismemorizedbythemethodentry(Fig.82).

Afterthat,whentheclassstatementfinishedtheruby_crefwouldstarttopointanothernode,butnode->nd_rvalnaturallycontinuestopointtothesamething.(Fig.83)

Then,wheninvokingthemethodC#m,getnode->nd_rvalandinsertintothejustpushedruby_frame->cbase(Fig.84)

…Thisisthemechanism.Complicated.

Fig8.CREFTrasfer

ev_const_get()

Now,let’sgobacktothecodeofNODE_CONST.Sinceonlyev_const_get()isleft,we’lllookatit.

▼ev_const_get()

1550staticVALUE1551ev_const_get(cref,id,self)1552NODE*cref;1553IDid;1554VALUEself;1555{1556NODE*cbase=cref;1557VALUEresult;15581559while(cbase&&cbase->nd_next){1560VALUEklass=cbase->nd_clss;15611562if(NIL_P(klass))returnrb_const_get(CLASS_OF(self),id);1563if(RCLASS(klass)->iv_tbl&&st_lookup(RCLASS(klass)->iv_tbl,id,&result)){1564returnresult;1565}1566cbase=cbase->nd_next;1567}1568returnrb_const_get(cref->nd_clss,id);1569}

(eval.c)

((Accordingtotheerrata,thedescriptionofev_const_get()waswrong.Iomitthispartfornow.))

ClassvariableWhatclassvariablesrefertoisalsoruby_cref.Needlesstosay,unliketheconstantswhichsearchovertheouterclassesoneafteranother,itusesonlythefirstelement.Let’slookatthecodeofNODE_CVARwhichisthenodetorefertoaclassvariable.

Whatisthecvar_cbase()?Ascbaseisattached,itisprobablyrelatedtoruby_frame->cbase,buthowdotheydiffer?Let’slookatit.

▼cvar_cbase()

1571staticVALUE1572cvar_cbase()1573{1574NODE*cref=RNODE(ruby_frame->cbase);15751576while(cref&&cref->nd_next&&FL_TEST(cref->nd_clss,FL_SINGLETON)){1577cref=cref->nd_next;1578if(!cref->nd_next){1579rb_warn("classvariableaccessfromtoplevelsingletonmethod");1580}1581}1582returncref->nd_clss;1583}

(eval.c)

Ittraversescbaseuptotheclassthatisnotthesingletonclass,itseems.Thisfeatureisaddedtocounterthefollowingkindofcode:

classCclassC@@cvar=1@@cvar=1class<<CdefC.mdefm@@cvar@@cvarendenddefC.m2defm2@@cvar+@@cvar@@cvar+@@cvarendendendendend

Boththeleftandrightcodeendsupdefiningthesamemethod,butifyouwriteinthewayoftherightsideitistedioustowritetheclassnamerepeatedlyasthenumberofmethodsincreases.Therefore,whendefiningmultiplesingletonmethods,manypeoplechoosetowriteintheleftsidewayofusingthesingletonclassdefinitionstatementtobundle.

However,thesetwodiffersinthevalueofruby_cref.Theoneusingthesingletonclassdefinitionisruby_cref=(C)andtheotheronedefiningsingletonmethodsseparatelyisruby_cref=C.Thismaycausetodifferintheplaceswhereclassvariablesreferto,sothisisnotconvenient.

Therefore,assumingit’srarecasetodefineclassvariablesonsingletonclasses,itskipsoversingletonclasses.Thisreflectsagainthattheemphasisismoreontheusabilityratherthantheconsistency.

And,whenthecaseisaconstantreference,sinceitsearchesalloftheouterclasses,Cisincludedinthesearchpathineitherway,sothere’snoproblem.Plus,asforanassignment,sinceitcouldn’tbewritteninsidemethodsinthefirstplace,itisalsonotrelated.

MultipleAssignmentIfsomeoneasked“whereisthemostcomplicatedspecificationofRuby?”,Iwouldinstantlyanswerthatitismultipleassignment.Itisevenimpossibletounderstandthebigpictureofmultiple

assignment,IhaveanaccountofwhyIthinkso.Inshort,thespecificationofthemultipleassignmentisdefinedwithoutevenasubtleintentiontoconstructsothatthewholespecificationiswell-organized.Thebasisofthespecificationisalways“thebehaviorwhichseemsconvenientinseveraltypicalusecases”.ThiscanbesaidabouttheentireRuby,butparticularlyaboutthemultipleassignment.

Then,howcouldweavoidbeinglostinthejungleofcodes.Thisissimilartoreadingthestatefulscanneranditisnotseeingthewholepicture.There’snowholepictureinthefirstplace,wecouldnotseeit.Cuttingthecodeintoblockslike,thiscodeiswrittenforthisspecification,thatcodeiswrittenforthatspecification,…understandingthecorrespondencesonebyoneinsuchmanneristheonlyway.

Butthisbookistounderstandtheoverallstructureofrubyandisnot“AdvancedRubyProgramming”.Thus,dealingwithverytinythingsisnotfruitful.Sohere,weonlythinkaboutthebasicstructureofmultipleassignmentandtheverysimple“multiple-to-multiple”case.

First,followingthestandard,let’sstartwiththesyntaxtree.

▼TheSourceProgram

a,b=7,8

▼ItsSyntaxTree

NODE_MASGNnd_head:NODE_ARRAY[0:NODE_LASGNnd_cnt=2nd_value:1:NODE_LASGNnd_cnt=3nd_value:]nd_value:NODE_REXPANDnd_head:NODE_ARRAY[0:NODE_LITnd_lit=7:Fixnum1:NODE_LITnd_lit=8:Fixnum]

Boththeleft-handandright-handsidesarethelistsofNODE_ARRAY,there’sadditionallyNODE_REXPANDintherightside.REXPANDmaybe“RightvalueEXPAND”.Wearecuriousaboutwhatthisnodeisdoing.Let’ssee.

▼rb_eval()−NODE_REXPAND

2575caseNODE_REXPAND:2576result=avalue_to_svalue(rb_eval(self,node->nd_head));2577break;

(eval.c)

Youcanignoreavalue_to_svalue().NODE_ARRAYisevaluatedbyrb_eval(),(becauseitisthenodeofthearrayliteral),itisturnedintoaRubyarrayandreturnedback.So,beforetheleft-handsideishandled,allintheright-handsideareevaluated.Thisenableseventhefollowingcode:

a,b=b,a#swapvariablesinoneline

Let’slookatNODE_MASGNintheleft-handside.

▼rb_eval()−NODE_MASGN

2923caseNODE_MASGN:2924result=massign(self,node,rb_eval(self,node->nd_value),0);2925break;

(eval.c)

Hereisonlytheevaluationoftheright-handside,therestsaredelegatedtomassign().

massign()

▼massi……

3917staticVALUE3918massign(self,node,val,pcall)3919VALUEself;3920NODE*node;3921VALUEval;3922intpcall;3923{

(eval.c)

I’msorrythisishalfway,butI’dlikeyoutostopandpayattentiontothe4thargument.pcallisProcCALL,thisindicateswhetherornotthefunctionisusedtocallProcobject.BetweenProccallsandtheothersthere’salittledifferenceinthestrictnessofthecheckofthemultipleassignments,soaflagisreceivedtocheck.Obviously,thevalueisdecidedtobeeither0or1.

Then,I’dlikeyoutolookatthepreviouscodecallingmassign(),itwaspcall=0.Therefore,weprobablydon’tmindifassumingitispcall=0forthetimebeingandextractingthevariables.Thatis,whenthere’sanargumentlikepcallwhichisslightlychangingthebehavior,wealwaysneedtoconsiderthetwopatternsofscenarios,soitisreallycumbersome.Ifthere’sonlyoneactualfunctionmassign(),tothinkasifthereweretwofunctions,pcall=0andpcall=1,iswaysimplertoread.

Whenwritingaprogramwemustavoidduplicationsasmuchaspossible,butthisprincipleisunrelatedifitiswhenreading.Ifpatternsarelimited,copyingitandlettingittoberedundantisrathertherightapproach.Therearewordings“optimizeforspeed”“optimizeforthecodesize”,inthiscasewe’ll“optimizeforreadability”.

So,assumingitispcall=0andcuttingthecodesasmuchaspossibleandthefinalappearanceisshownasfollows:

▼massign()(simplified)

staticVALUEmassign(self,node,val/*,pcall=0*/)VALUEself;NODE*node;VALUEval;{NODE*list;longi=0,len;

val=svalue_to_mvalue(val);len=RARRAY(val)->len;list=node->nd_head;/*(A)*/for(i=0;list&&i<len;i++){assign(self,list->nd_head,RARRAY(val)->ptr[i],pcall);list=list->nd_next;}/*(B)*/if(node->nd_args){if(node->nd_args==(NODE*)-1){/*nocheckformere`*'*/}elseif(!list&&i<len){assign(self,node->nd_args,rb_ary_new4(len-i,RARRAY(val)->ptr+i),pcall);}else{assign(self,node->nd_args,rb_ary_new2(0),pcall);}}

/*(C)*/while(list){i++;assign(self,list->nd_head,Qnil,pcall);list=list->nd_next;}returnval;}

valistheright-handsidevalue.Andthere’sthesuspiciousconversioncalledsvalue_to_mvalue(),sincemvalue_to_svalue()appearedpreviouslyandsvalue_to_mvalue()inthistime,soyoucaninfer“itmustbegettingback”.((errata:itwasavalue_to_svalue()inthepreviouscase.Therefore,it’shardtoinfer“gettingback”,butyoucanignorethemanyway.))Thus,thebotharedeleted.Inthenextline,sinceitusesRARRAY(),youcaninferthattheright-handsidevalueisanArrayofRuby.Meanwhile,theleft-handsideisnode->nd_head,soitisthevalueassignedtothelocalvariablelist.Thislistisalsoanode(NODE_ARRAY).

We’lllookatthecodebyclause.

(A)assignis,asthenamesuggests,afunctiontoperformanone-to-oneassignment.Sincetheleft-handsideisexpressedbyanode,ifitis,forinstance,NODE_IASGN(anassignmenttoaninstancevariable),itassignswithrb_ivar_set().So,whatitisdoinghereisadjustingtoeitherlistandvalwhichisshorteranddoingone-to-oneassignments.(Fig.9)

Fig.9.assignwhencorresponded

(B)ifthereareremaindersontheright-handside,turnthemintoa

Rubyarrayandassignitinto(theleft-handsideexpressedby)thenode->nd_args.

(C)ifthereareremaindersontheleft-handside,assignniltoallofthem.

Bytheway,theprocedurewhichisassumingpcall=0thencuttingoutisverysimilartothedataflowanalytics/constantfoldingsusedontheoptimizationphaseofcompilers.Therefore,wecanprobablyautomateittosomeextent.

TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera

CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License

RubyHackingGuide

Chapter15:Methods

Inthischapter,I’lltalkaboutmethodsearchingandinvoking.

Searchingmethods

TerminologyInthischapter,bothmethodcallsandmethoddefinitionsarediscussed,andtherewillappearreallyvarious“arguments”.Therefore,tomakeitnotconfusing,let’sstrictlydefinetermshere:

m(a)#aisa"normalargument"m(*list)#listisan"arrayargument"m(&block)#blockisa"blockargument"

defm(a)#aisa"normalparameter"defm(a=nil)#aisan"optionparameter",nilis"itdefaultvalue".defm(*rest)#restisa"restparameter"defm(&block)#blockisa"blockparameter"

Inshort,theyareall“arguments”whenpassingand“parameters”whenreceiving,andeachadjectiveisattachedaccordingtoitstype.

However,amongtheabovethings,the“blockarguments”andthe“blockparameters”willbediscussedinthenextchapter.

Investigation▼TheSourceProgram

obj.method(7,8)

▼ItsSyntaxTree

NODE_CALLnd_mid=9049(method)nd_recv:NODE_VCALLnd_mid=9617(obj)nd_args:NODE_ARRAY[0:NODE_LITnd_lit=7:Fixnum1:NODE_LITnd_lit=8:Fixnum]

ThenodeforamethodcallisNODE_CALL.Thend_argsholdstheargumentsasalistofNODE_ARRAY.

Additionally,asthenodesformethodcalls,therearealsoNODE_FCALLandNODE_VCALL.NODE_FCALLisforthe“method(args)”form,NODE_VCALLcorrespondstomethodcallsinthe“method”formthatisthesameformasthelocalvariables.FCALLandVCALLcouldactuallybeintegratedintoone,butbecausethere’snoneedtoprepareargumentswhenitisVCALL,theyareseparatedfromeachotheronlyinordertosavebothtimesandmemoriesforit.

Now,let’slookatthehandlerofNODE_CALLinrb_eval().

▼rb_eval()−NODE_CALL

2745caseNODE_CALL:2746{2747VALUErecv;2748intargc;VALUE*argv;/*usedinSETUP_ARGS*/2749TMP_PROTECT;27502751BEGIN_CALLARGS;2752recv=rb_eval(self,node->nd_recv);2753SETUP_ARGS(node->nd_args);2754END_CALLARGS;27552756SET_CURRENT_SOURCE();2757result=rb_call(CLASS_OF(recv),recv,node->nd_mid,argc,argv,0);2758}2759break;

(eval.c)

Theproblemsareprobablythethreemacros,BEGIN_CALLARGSSETUP_ARGS()END_CALLARGS.Itseemsthatrb_eval()istoevaluatethereceiverandrb_call()istoinvokethemethod,wecanroughlyimaginethattheevaluationoftheargumentsmightbedoneinthethreemacros,butwhatisactuallydone?BEGIN_CALLARGSandEND_CALLARGSaredifficulttounderstandbeforetalkingabouttheiterators,sotheyareexplainedinthenextchapter“Block”.Here,let’sinvestigateonlyaboutSETUP_ARGS().

SETUP_ARGS()

SETUP_ARGS()isthemacrotoevaluatetheargumentsofamethod.

Insideofthismacro,asthecommentintheoriginalprogramsays,thevariablesnamedargcandargvareused,sotheymustbedefinedinadvance.AndbecauseitusesTMP_ALLOC(),itmustuseTMP_PROTECTinadvance.Therefore,somethinglikethefollowingisaboilerplate:

intargc;VALUE*argv;/*usedinSETUP_ARGS*/TMP_PROTECT;

SETUP_ARGS(args_node);

args_nodeis(thenoderepresents)theargumentsofthemethod,turnitintoanarrayofthevaluesobtainedbyevaluatingit,andstoreitinargv.Let’slookatit:

▼SETUP_ARGS()

1780#defineSETUP_ARGS(anode)do{\1781NODE*n=anode;\1782if(!n){\noarguments1783argc=0;\1784argv=0;\1785}\1786elseif(nd_type(n)==NODE_ARRAY){\onlynormalarguments1787argc=n->nd_alen;\1788if(argc>0){\argumentspresent1789inti;\1790n=anode;\1791argv=TMP_ALLOC(argc);\1792for(i=0;i<argc;i++){\1793argv[i]=rb_eval(self,n->nd_head);\1794n=n->nd_next;\1795}\1796}\1797else{\noarguments

1798argc=0;\1799argv=0;\1800}\1801}\1802else{\bothoroneofanarrayargument1803VALUEargs=rb_eval(self,n);\andablockargument1804if(TYPE(args)!=T_ARRAY)\1805args=rb_ary_to_ary(args);\1806argc=RARRAY(args)->len;\1807argv=ALLOCA_N(VALUE,argc);\1808MEMCPY(argv,RARRAY(args)->ptr,VALUE,argc);\1809}\1810}while(0)

(eval.c)

Thisisabitlong,butsinceitclearlybranchesinthreeways,notsoterribleactually.Themeaningofeachbranchiswrittenascomments.

Wedon’thavetocareaboutthecasewithnoarguments,thetworestbranchesaredoingsimilarthings.Roughlyspeaking,whattheyaredoingconsistsofthreesteps:

allocateaspacetostoretheargumentsevaluatetheexpressionsoftheargumentscopythevalueintothevariablespace

IfIwriteinthecode(andtidyupalittle),itbecomesasfollows.

/*****elseifclause、argc!=0*****/inti;n=anode;argv=TMP_ALLOC(argc);/*1*/for(i=0;i<argc;i++){argv[i]=rb_eval(self,n->nd_head);/*2,3*/

n=n->nd_next;}

/*****elseclause*****/VALUEargs=rb_eval(self,n);/*2*/if(TYPE(args)!=T_ARRAY)args=rb_ary_to_ary(args);argc=RARRAY(args)->len;argv=ALLOCA_N(VALUE,argc);/*1*/MEMCPY(argv,RARRAY(args)->ptr,VALUE,argc);/*3*/

TMP_ALLOC()isusedintheelseifside,butALLOCA_N(),whichisordinaryalloca(),isusedintheelseside.Why?Isn’titdangerousintheC_ALLOCAenvironmentbecausealloca()isequivalenttomalloc()?

Thepointisthat“intheelsesidethevaluesofargumentsarealsostoredinargs”.IfIillustrate,itwouldlooklikeFigure1.

Figure1:Beingintheheapisallright.

IfatleastoneVALUEisonthestack,otherscanbesuccessivelymarkedthroughit.ThiskindofVALUEplaysaroletotieuptheotherVALUEstothestacklikeananchor.Namely,itbecomes“anchorVALUE”.Intheelseside,argsistheanchorVALUE.

Foryourinformation,“anchorVALUE”isthewordjustcoinednow.

rb_call()

SETUP_ARGS()isrelativelyoffthetrack.Let’sgobacktothemainline.Thefunctiontoinvokeamethod,itisrb_call().Intheoriginalthere’recodeslikeraisingexceptionswhenitcouldnotfindanything,asusualI’llskipallofthem.

▼rb_call()(simplified)

staticVALUErb_call(klass,recv,mid,argc,argv,scope)VALUEklass,recv;IDmid;intargc;constVALUE*argv;intscope;{NODE*body;intnoex;IDid=mid;structcache_entry*ent;

/*searchovermethodcache*/ent=cache+EXPR1(klass,mid);if(ent->mid==mid&&ent->klass==klass){/*cachehit*/klass=ent->origin;id=ent->mid0;noex=ent->noex;body=ent->method;}else{/*cachemiss,searchingstep-by-step*/body=rb_get_method_body(&klass,&id,&noex);}

/*...checkthevisibility...*/

returnrb_call0(klass,recv,mid,id,argc,argv,body,noex&NOEX_UNDEF);}

Thebasicwayofsearchingmethodswasdiscussedinchapter2:“Object”.Itisfollowingitssuperclassesandsearchingm_tbl.Thisisdonebysearch_method().

Theprincipleiscertainlythis,butwhenitcomestothephasetoexecuteactually,ifitsearchesbylookingupitshashmanytimesforeachmethodcall,itsspeedwouldbetooslow.Toimprovethis,inruby,onceamethodiscalled,itwillbecached.Ifamethodiscalledonce,it’softenimmediatelycalledagain.Thisisknownasanexperientialfactandthiscacherecordsthehighhitrate.

Whatislookingupthecacheisthefirsthalfofrb_call().Onlywith

ent=cache+EXPR1(klass,mid);

thisline,thecacheissearched.We’llexamineitsmechanismindetaillater.

Whenanycachewasnothit,thenextrb_get_method_body()searchestheclasstreestep-by-stepandcachestheresultatthesametime.Figure2showstheentireflowofsearching.

Figure2:MethodSearch

MethodCacheNext,let’sexaminethestructureofthemethodcacheindetail.

▼MethodCache

180#defineCACHE_SIZE0x800181#defineCACHE_MASK0x7ff182#defineEXPR1(c,m)((((c)>>3)^(m))&CACHE_MASK)183184structcache_entry{/*methodhashtable.*/185IDmid;/*method'sid*/186IDmid0;/*method'soriginalid*/187VALUEklass;/*receiver'sclass*/188VALUEorigin;/*wheremethoddefined*/189NODE*method;190intnoex;191};192193staticstructcache_entrycache[CACHE_SIZE];

(eval.c)

IfIdescribethemechanismshortly,itisahashtable.Imentionedthattheprincipleofthehashtableistoconvertatablesearchtoanindexingofanarray.Threethingsarenecessarytoaccomplish:anarraytostorethedata,akey,andahashfunction.

First,thearrayhereisanarrayofstructcache_entry.Andthe

methodisuniquelydeterminedbyonlytheclassandthemethodname,sothesetwobecomethekeyofthehashcalculation.Therestisdonebycreatingahashfunctiontogeneratetheindex(0x000~0x7ff)ofthecachearrayformthekey.ItisEXPR1().Amongitsarguments,cistheclassobjectandmisthemethodname(ID).(Figure3)

Figure3:MethodCache

However,EXPR1()isnotaperfecthashfunctionoranything,soadifferentmethodcangeneratethesameindexcoincidentally.Butbecausethisisnothingmorethanacache,conflictsdonotcauseaproblem.Itjustslowsitsperformancedownalittle.

TheeffectofMethodCacheBytheway,howmucheffectiveisthemethodcacheinactuality?Wecouldnotbeconvincedjustbybeingsaid“itisknownas…”.Let’smeasurebyourselves.

Type Program HitRategeneratingLALRparser raccruby.y 99.9%

generatingamailthread amailer 99.1%generatingadocument rd2htmlrubyrefm.rd 97.8%

Surprisingly,inallofthethreeexperimentsthehitrateismorethan95%.Thisisawesome.Apparently,theeffectof“itisknowas…”isoutstanding.

Invocation

rb_call0()

Therehavebeenmanythingsandfinallywearrivedatthemethodinvoking.However,thisrb_call0()ishuge.Asit’smorethan200lines,itwouldcometo5,6pages.Ifthewholepartislaidouthere,itwouldbedisastrous.Let’slookatitbydividingintosmallportions.Startingwiththeoutline:

▼rb_call0()(Outline)

4482staticVALUE4483rb_call0(klass,recv,id,oid,argc,argv,body,nosuper)4484VALUEklass,recv;4485IDid;4486IDoid;4487intargc;/*OK*/4488VALUE*argv;/*OK*/4489NODE*body;/*OK*/4490intnosuper;4491{4492NODE*b2;/*OK*/4493volatileVALUEresult=Qnil;

4494intitr;4495staticinttick;4496TMP_PROTECT;44974498switch(ruby_iter->iter){4499caseITER_PRE:4500itr=ITER_CUR;4501break;4502caseITER_CUR:4503default:4504itr=ITER_NOT;4505break;4506}45074508if((++tick&0xff)==0){4509CHECK_INTS;/*betterthannothing*/4510stack_check();4511}4512PUSH_ITER(itr);4513PUSH_FRAME();45144515ruby_frame->last_func=id;4516ruby_frame->orig_func=oid;4517ruby_frame->last_class=nosuper?0:klass;4518ruby_frame->self=recv;4519ruby_frame->argc=argc;4520ruby_frame->argv=argv;45214522switch(nd_type(body)){/*...mainprocess...*/46984699default:4700rb_bug("unknownnodetype%d",nd_type(body));4701break;4702}4703POP_FRAME();4704POP_ITER();4705returnresult;4706}

(eval.c)

First,anITERispushedandwhetherornotthemethodisaniteratorisfinallyfixed.AsitsvalueisusedbythePUSH_FRAME()whichcomesimmediatelyafterit,PUSH_ITER()needstoappearbeforehand.PUSH_FRAME()willbediscussedsoon.

AndifIfirstdescribeaboutthe“…mainprocess…”part,itbranchesbasedonthefollowingnodetypesandeachbranchdoesitsinvokingprocess.

NODE_CFUNC methodsdefinedinCNODE_IVAR attr_readerNODE_ATTRSET attr_writerNODE_SUPER superNODE_ZSUPER superwithoutargumentsNODE_DMETHOD invokeUnboundMethodNODE_BMETHOD invokeMethodNODE_SCOPE methodsdefinedinRuby

Someoftheabovenodesarenotexplainedinthisbookbutnotsoimportantandcouldbeignored.TheimportantthingsareonlyNODE_CFUNC,NODE_SCOPEandNODE_ZSUPER.

PUSH_FRAME()

▼PUSH_FRAME()POP_FRAME()

536#definePUSH_FRAME()do{\537structFRAME_frame;\538_frame.prev=ruby_frame;\539_frame.tmp=0;\540_frame.node=ruby_current_node;\

541_frame.iter=ruby_iter->iter;\542_frame.cbase=ruby_frame->cbase;\543_frame.argc=0;\544_frame.argv=0;\545_frame.flags=FRAME_ALLOCA;\546ruby_frame=&_frame

548#definePOP_FRAME()\549ruby_current_node=_frame.node;\550ruby_frame=_frame.prev;\551}while(0)

(eval.c)

First,we’dliketomakesuretheentireFRAMEisallocatedonthestack.Thisisidenticaltomodule_setup().Therestisbasicallyjustdoingordinaryinitializations.

IfIaddonemoredescription,theflagFRAME_ALLOCAindicatestheallocationmethodoftheFRAME.FRAME_ALLOCAobviouslyindicates“itisonthestack”.

rb_call0()–NODE_CFUNCAlotofthingsarewritteninthispartoftheoriginalcode,butmostofthemarerelatedtotrace_funcandsubstantivecodeisonlythefollowingline:

▼rb_call0()−NODE_CFUNC(simplified)

caseNODE_CFUNC:result=call_cfunc(body->nd_cfnc,recv,len,argc,argv);break;

Then,asforcall_cfunc()…

▼call_cfunc()(simplified)

4394staticVALUE4395call_cfunc(func,recv,len,argc,argv)4396VALUE(*func)();4397VALUErecv;4398intlen,argc;4399VALUE*argv;4400{4401if(len>=0&&argc!=len){4402rb_raise(rb_eArgError,"wrongnumberofarguments(%dfor%d)",4403argc,len);4404}44054406switch(len){4407case-2:4408return(*func)(recv,rb_ary_new4(argc,argv));4409break;4410case-1:4411return(*func)(argc,argv,recv);4412break;4413case0:4414return(*func)(recv);4415break;4416case1:4417return(*func)(recv,argv[0]);4418break;4419case2:4420return(*func)(recv,argv[0],argv[1]);4421break;::4475default:4476rb_raise(rb_eArgError,"toomanyarguments(%d)",len);4477break;4478}4479returnQnil;/*notreached*/4480}

(eval.c)

Asshownabove,itbranchesbasedontheargumentcount.Themaximumargumentcountis15.

NotethatneitherSCOPEorVARSispushedwhenitisNODE_CFUNC.ItmakessensebecauseamethoddefinedinCdoesnotuseRuby’slocalvariables.Butitsimultaneouslymeansthatifthe“current”localvariablesareaccessedbyC,theyareactuallythelocalvariablesofthepreviousFRAME.Andinsomeplaces,say,rb_svar(eval.c),itisactuallydone.

rb_call0()–NODE_SCOPENODE_SCOPEistoinvokeamethoddefinedinRuby.ThispartformsthefoundationofRuby.

▼rb_call0()−NODE_SCOPE(outline)

4568caseNODE_SCOPE:4569{4570intstate;4571VALUE*local_vars;/*OK*/4572NODE*saved_cref=0;45734574PUSH_SCOPE();4575/*(A)forwardCREF*/4576if(body->nd_rval){4577saved_cref=ruby_cref;4578ruby_cref=(NODE*)body->nd_rval;4579ruby_frame->cbase=body->nd_rval;4580}

/*(B)initializeruby_scope->local_vars*/4581if(body->nd_tbl){4582local_vars=TMP_ALLOC(body->nd_tbl[0]+1);4583*local_vars++=(VALUE)body;4584rb_mem_clear(local_vars,body->nd_tbl[0]);4585ruby_scope->local_tbl=body->nd_tbl;4586ruby_scope->local_vars=local_vars;4587}4588else{4589local_vars=ruby_scope->local_vars=0;4590ruby_scope->local_tbl=0;4591}4592b2=body=body->nd_next;45934594PUSH_VARS();4595PUSH_TAG(PROT_FUNC);45964597if((state=EXEC_TAG())==0){4598NODE*node=0;4599inti;

/*……(C)assigntheargumentstothelocalvariables……*/

4666if(trace_func){4667call_trace_func("call",b2,recv,id,klass);4668}4669ruby_last_node=b2;/*(D)methodbody*/4670result=rb_eval(recv,body);4671}4672elseif(state==TAG_RETURN){/*backviareturn*/4673result=prot_tag->retval;4674state=0;4675}4676POP_TAG();4677POP_VARS();4678POP_SCOPE();4679ruby_cref=saved_cref;4680if(trace_func){4681call_trace_func("return",ruby_last_node,recv,id,klass);4682}4683switch(state){4684case0:

4685break;46864687caseTAG_RETRY:4688if(rb_block_given_p()){4689JUMP_TAG(state);4690}4691/*fallthrough*/4692default:4693jump_tag_but_local_jump(state);4694break;4695}4696}4697break;

(eval.c)

(A)CREFforwarding,whichwasdescribedatthesectionofconstantsinthepreviouschapter.Inotherwords,cbaseistransplantedtoFRAMEfromthemethodentry.

(B)Thecontenthereiscompletelyidenticaltowhatisdoneatmodule_setup().Anarrayisallocatedatlocal_varsofSCOPE.WiththisandPUSH_SCOPE()andPUSH_VARS(),thelocalvariablescopecreationiscompleted.Afterthis,onecanexecuterb_eval()intheexactlysameenvironmentastheinteriorofthemethod.

(C)Thissetsthereceivedargumentstotheparametervariables.Theparametervariablesareinessenceidenticaltothelocalvariables.ThingssuchasthenumberofargumentsarespecifiedbyNODE_ARGS,allithastodoissettingonebyone.Detailswillbeexplainedsoon.And,

(D)thisexecutesthemethodbody.Obviously,thereceiver(recv)

becomesself.Inotherwords,itbecomesthefirstargumentofrb_eval().Afterall,themethodiscompletelyinvoked.

SetParametersThen,we’llexaminethetotallyskippedpart,whichsetsparameters.Butbeforethat,I’dlikeyoutofirstcheckthesyntaxtreeofthemethodagain.

%ruby-rnodedump-e'defm(a)nilend'NODE_SCOPEnd_rval=(null)nd_tbl=3[_~a]nd_next:NODE_BLOCKnd_head:NODE_ARGSnd_cnt=1nd_rest=-1nd_opt=(null)nd_next:NODE_BLOCKnd_head:NODE_NEWLINEnd_file="-e"nd_nth=1nd_next:NODE_NILnd_next=(null)

NODE_ARGSisthenodetospecifytheparametersofamethod.Iaggressivelydumpedseveralthings,anditseemeditsmembersareusedasfollows:

nd_cnt thenumberofthenormalparameters

nd_rest thevariableIDoftherestparameter.-1iftherestparameterismissing

nd_opt holdsthesyntaxtreetorepresentthedefaultvaluesoftheoptionparameters.alistofNODE_BLOCK

Ifonehasthisamountoftheinformation,thelocalvariableIDforeachparametervariablecanbeuniquelydetermined.First,Imentionedthat0and1arealways$_and$~.In2andlater,thenecessarynumberofordinaryparametersareinline.ThenumberofoptionparameterscanbedeterminedbythelengthofNODE_BLOCK.Againnexttothem,therest-parametercomes.

Forexample,ifyouwriteadefinitionasbelow,

defm(a,b,c=nil,*rest)lvar1=nilend

localvariableIDsareassignedasfollows.

0123456$_$~abcrestlvar1

Areyoustillwithme?Takingthisintoconsiderations,let’slookatthecode.

▼rb_call0()−NODE_SCOPE−assignmentsofarguments

4601if(nd_type(body)==NODE_ARGS){/*nobody*/4602node=body;/*NODE_ARGS*/4603body=0;/*themethodbody*/4604}

4605elseif(nd_type(body)==NODE_BLOCK){/*hasbody*/4606node=body->nd_head;/*NODE_ARGS*/4607body=body->nd_next;/*themethodbody*/4608}4609if(node){/*havesomewhatparameters*/4610if(nd_type(node)!=NODE_ARGS){4611rb_bug("noargument-node");4612}46134614i=node->nd_cnt;4615if(i>argc){4616rb_raise(rb_eArgError,"wrongnumberofarguments(%dfor%d)",4617argc,i);4618}4619if(node->nd_rest==-1){/*norestparameter*//*countingthenumberofparameters*/4620intopt=i;/*thenumberofparameters(iisnd_cnt)*/4621NODE*optnode=node->nd_opt;46224623while(optnode){4624opt++;4625optnode=optnode->nd_next;4626}4627if(opt<argc){4628rb_raise(rb_eArgError,4629"wrongnumberofarguments(%dfor%d)",argc,opt);4630}/*assigningatthesecondtimeinrb_call0*/4631ruby_frame->argc=opt;4632ruby_frame->argv=local_vars+2;4633}46344635if(local_vars){/*hasparameters*/4636if(i>0){/*hasnormalparameters*/4637/*+2toskipthespacesfor$_and$~*/4638MEMCPY(local_vars+2,argv,VALUE,i);4639}4640argv+=i;argc-=i;4641if(node->nd_opt){/*hasoptionparameters*/4642NODE*opt=node->nd_opt;46434644while(opt&&argc){4645assign(recv,opt->nd_head,*argv,1);

4646argv++;argc--;4647opt=opt->nd_next;4648}4649if(opt){4650rb_eval(recv,opt);4651}4652}4653local_vars=ruby_scope->local_vars;4654if(node->nd_rest>=0){/*hasrestparameter*/4655VALUEv;4656/*makeanarrayoftheremainningparametersandassignittoavariable*/4657if(argc>0)4658v=rb_ary_new4(argc,argv);4659else4660v=rb_ary_new2(0);4661ruby_scope->local_vars[node->nd_rest]=v;4662}4663}4664}

(eval.c)

Sincecommentsareaddedmorethanbefore,youmightbeabletounderstandwhatitisdoingbyfollowingstep-by-step.

OnethingI’dliketomentionisaboutargcandargvofruby_frame.Itseemstobeupdatedonlywhenanyrest-parameterdoesnotexist,whyisitonlywhenanyrest-parameterdoesnotexist?

Thispointcanbeunderstoodbythinkingaboutthepurposeofargcandargv.Thesemembersactuallyexistforsuperwithoutarguments.Itmeansthefollowingform:

super

Thissuperhasabehaviortodirectlypasstheparametersofthecurrentlyexecutingmethod.Toenabletopassatthemoment,theargumentsaresavedinruby_frame->argv.

Goingbacktothepreviousstoryhere,ifthere’sarest-parameter,passingtheoriginalparameterslistsomehowseemsmoreconvenient.Ifthere’snot,theoneafteroptionparametersareassignedseemsbetter.

defm(a,b,*rest)super#probably5,6,7,8shouldbepassedendm(5,6,7,8)

defm(a,b=6)super#probably5,6shouldbepassedendm(5)

Thisisaquestionofwhichisbetterasaspecificationratherthan“itmustbe”.Ifamethodhasarest-parameter,itsupposedtoalsohavearest-parameteratsuperclass.Thus,ifthevalueafterprocessedispassed,there’sthehighpossibilityofbeinginconvenient.

Now,I’vesaidvariousthings,butthestoryofmethodinvocationisalldone.Therestis,astheendingofthischapter,lookingattheimplementationofsuperwhichisjustdiscussed.

super

WhatcorrespondstosuperareNODE_SUPERandNODE_ZSUPER.NODE_SUPERisordinarysuper,andNODE_ZSUPERissuperwithoutarguments.

▼rb_eval()−NODE_SUPER

2780caseNODE_SUPER:2781caseNODE_ZSUPER:2782{2783intargc;VALUE*argv;/*usedinSETUP_ARGS*/2784TMP_PROTECT;2785/*(A)casewhensuperisforbidden*/2786if(ruby_frame->last_class==0){2787if(ruby_frame->orig_func){2788rb_name_error(ruby_frame->last_func,2789"superclassmethod`%s'disabled",2790rb_id2name(ruby_frame->orig_func));2791}2792else{2793rb_raise(rb_eNoMethodError,"supercalledoutsideofmethod");2794}2795}/*(B)setuporevaluateparameters*/2796if(nd_type(node)==NODE_ZSUPER){2797argc=ruby_frame->argc;2798argv=ruby_frame->argv;2799}2800else{2801BEGIN_CALLARGS;2802SETUP_ARGS(node->nd_args);2803END_CALLARGS;2804}2805/*(C)yetmysteriousPUSH_ITER()*/2806PUSH_ITER(ruby_iter->iter?ITER_PRE:ITER_NOT);2807SET_CURRENT_SOURCE();2808result=rb_call(RCLASS(ruby_frame->last_class)->super,2809ruby_frame->self,ruby_frame->orig_func,

2810argc,argv,3);2811POP_ITER();2812}2813break;

(eval.c)

Forsuperwithoutarguments,Isaidthatruby_frame->argvisdirectlyusedasarguments,thisisdirectlyshownat(B).

(C)justbeforecallingrb_call(),doingPUSH_ITER().Thisisalsowhatcannotbeexplainedindetail,butinthiswaytheblockpassedtothecurrentmethodcanbehandedovertothenextmethod(meaning,themethodofsuperclassthatisgoingtobecalled).

Andfinally,(A)whenruby_frame->last_classis0,callingsuperseemsforbidden.Sincetheerrormessagesays“mustbeenabledbyrb_enable_super()”,itseemsitbecomescallablebycallingrb_enable_super().((errata:Theerrormessage“mustbeenabledbyrb_enable_super()”existsnotinthislistbutinrb_call_super().))Why?

First,Ifweinvestigateinwhatkindofsituationlast_classbecomes0,itseemsthatitiswhileexecutingthemethodwhosesubstanceisdefinedinC(NODE_CFUNC).Moreover,itisthesamewhendoingaliasorreplacingsuchmethod.

I’veunderstooduntilthere,buteventhoughreadingsourcecodes,Icouldn’tunderstandthesubsequentsofthem.BecauseIcouldn’t,

Isearched“rb_enable_super”overtheruby’smailinglistarchivesandfoundit.Accordingtothatmail,thesituationlookslikeasfollows:

Forexample,there’samethodnamedString.new.Ofcourse,thisisamethodtocreateastring.String.newcreatesastructofT_STRING.Therefore,youcanexpectthatthereceiverisalwaysofT_STRINGwhenwritinganinstancemethodsofString.

Then,superofString.newisObject.new.Object.newcreateastructofT_OBJECT.WhathappensifString.newisreplacedbynewdefinitionandsuperiscalled?

defString.newsuperend

Asaconsequence,anobjectwhosestructisofT_OBJECTbutwhoseclassisStringiscreated.However,amethodofStringiswrittenwithexpectationofastructofT_STRING,sonaturallyitdowns.

Howcanweavoidthis?Theansweristoforbidtocallanymethodexpectingastructofadifferentstructtype.Buttheinformationof“expectingstructtype”isnotattachedtomethod,andalsonottoclass.Forexample,ifthere’sawaytoobtainT_STRINGfromStringclass,itcanbecheckedbeforecalling,butcurrentlywecan’tdosuchthing.Therefore,asthesecond-bestplan,“superfrommethodsdefinedinCisforbidden”isdefined.Inthisway,ifthelayerofmethodsatClevelispreciselycreated,itcannotbegot

downatleast.And,whenthecaseis“It’sabsolutelysafe,soallowsuper”,supercanbeenabledbycallingrb_enable_super().

Inshort,theheartoftheproblemismissmatchofstructtypes.Thisisthesameastheproblemthatoccursattheallocationframework.

Then,howtosolvethisistosolvetherootoftheproblemthat“theclassdoesnotknowthestruct-typeoftheinstance”.But,inordertoresolvethis,atleastnewAPIisnecessary,andifdoingmoredeeply,compatibilitywillbelost.Therefore,forthetimebeing,thefinalsolutionhasnotdecidedyet.

TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera

CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License

RubyHackingGuide

Chapter16:Blocks

Iterator

Inthischapter,BLOCK,whichisthelastbignameamongthesevenRubystacks,comesin.Afterfinishingthis,theinternalstateoftheevaluatorisvirtuallyunderstood.

TheWholePictureWhatisthemechanismofiterators?First,let’sthinkaboutasmallprogramasbelow:

▼TheSourceProgram

iter_method()do9#amarktofindthisblockend

Let’scheckthetermsjustincase.Asforthisprogram,iter_methodisaniteratormethod,do~endisaniteratorblock.Hereisthesyntaxtreeofthisprogrambeingdumped.

▼ItsSyntaxTree

NODE_ITERnd_iter:NODE_FCALLnd_mid=9617(iter_method)nd_args=(null)nd_var=(null)nd_body:NODE_LITnd_lit=9:Fixnum

Lookingfortheblockbyusingthe9writtenintheiteratorblockasatrace,wecanunderstandthatNODE_ITERseemstorepresenttheiteratorblock.AndNODE_FCALLwhichcallsiter_methodisatthe“below”ofthatNODE_ITER.Inotherwords,thenodeofiteratorblockappearsearlierthanthecalloftheiteratormethod.Thismeans,beforecallinganiteratormethod,ablockispushedatanothernode.

Andcheckingbyfollowingtheflowofcodewithdebugger,Ifoundthattheinvocationofaniteratorisseparatedinto3steps:NODE_ITERNODE_CALLandNODE_YIELD.Thismeans,

1. pushablock(NODE_ITER)2. callthemethodwhichisaniterator(NODE_CALL)3. yield(NODE_YEILD)

that’sall.

PushablockFirst,let’sstartwiththefirststep,thatisNODE_ITER,whichisthe

nodetopushablock.

▼rb_eval()−NODE_ITER(simplified)

caseNODE_ITER:{iter_retry:PUSH_TAG(PROT_FUNC);PUSH_BLOCK(node->nd_var,node->nd_body);

state=EXEC_TAG();if(state==0){PUSH_ITER(ITER_PRE);result=rb_eval(self,node->nd_iter);POP_ITER();}elseif(_block.tag->dst==state){state&=TAG_MASK;if(state==TAG_RETURN||state==TAG_BREAK){result=prot_tag->retval;}}POP_BLOCK();POP_TAG();switch(state){case0:break;

caseTAG_RETRY:gotoiter_retry;

caseTAG_BREAK:break;

caseTAG_RETURN:return_value(result);/*fallthrough*/default:JUMP_TAG(state);}}

break;

Sincetheoriginalcodecontainsthesupportoftheforstatement,itisdeleted.Afterremovingthecoderelatingtotags,thereareonlypush/popofITERandBLOCKleft.Becausetherestisordinarilydoingrb_eval()withNODE_FCALL,theseITERandBLOCKarethenecessaryconditionstoturnamethodintoaniterator.

ThenecessityofpushingBLOCKisfairlyreasonable,butwhat’sITERfor?Actually,tothinkaboutthemeaningofITER,youneedtothinkfromtheviewpointofthesidethatusesBLOCK.

Forexample,supposeamethodisjustcalled.Andruby_blockexists.ButsinceBLOCKispushedregardlessofthebreakofmethodcalls,theexistenceofablockdoesnotmeantheblockispushedforthatmethod.It’spossiblethattheblockispushedforthepreviousmethod.(Figure1)

Figure1:noone-to-onecorrespondencebetweenFRAMEandBLOCK

So,inordertodetermineforwhichmethodtheblockispushed,

ITERisused.BLOCKisnotpushedforeachFRAMEbecausepushingBLOCKisalittleheavy.Howmuchheavyis,let’scheckitinpractice.

PUSH_BLOCK()

TheargumentofPUSH_BLOCK()is(thesyntaxtreeof)theblockparameterandtheblockbody.

▼PUSH_BLOCK()POP_BLOCK()

592#definePUSH_BLOCK(v,b)do{\593structBLOCK_block;\594_block.tag=new_blktag();\595_block.var=v;\596_block.body=b;\597_block.self=self;\598_block.frame=*ruby_frame;\599_block.klass=ruby_class;\600_block.frame.node=ruby_current_node;\601_block.scope=ruby_scope;\602_block.prev=ruby_block;\603_block.iter=ruby_iter->iter;\604_block.vmode=scope_vmode;\605_block.flags=BLOCK_D_SCOPE;\606_block.dyna_vars=ruby_dyna_vars;\607_block.wrapper=ruby_wrapper;\608ruby_block=&_block

610#definePOP_BLOCK()\611if(_block.tag->flags&(BLOCK_DYNAMIC))\612_block.tag->flags|=BLOCK_ORPHAN;\613elseif(!(_block.scope->flags&SCOPE_DONT_RECYCLE))\614rb_gc_force_recycle((VALUE)_block.tag);\615ruby_block=_block.prev;\616}while(0)

(eval.c)

Let’smakesurethataBLOCKis“thesnapshotoftheenvironmentofthemomentofcreation”.Asaproofofit,exceptforCREFandBLOCK,thesixstackframesaresaved.CREFcanbesubstitutedbyruby_frame->cbase,there’snoneedtopush.

And,I’dliketocheckthethreepointsaboutthemechanismofpush.BLOCKisfullyallocatedonthestack.BLOCKcontainsthefullcopyofFRAMEatthemoment.BLOCKisdifferentfromtheothermanystackframestructsinhavingthepointertothepreviousBLOCK(prev).

TheflagsusedinvariouswaysatPOP_BLOCK()isnotexplainednowbecauseitcanonlybeunderstoodafterseeingtheimplementationofProclater.

Andthetalkisabout“BLOCKisheavy”,certainlyitseemsalittleheavy.Whenlookinginsideofnew_blktag(),wecanseeitdoesmalloc()andstoreplentyofmembers.Butlet’sdeferthefinaljudgeuntilafterlookingatandcomparingwithPUSH_ITER().

PUSH_ITER()

▼PUSH_ITER()POP_ITER()

773#definePUSH_ITER(i)do{\774structiter_iter;\775_iter.prev=ruby_iter;\776_iter.iter=(i);\777ruby_iter=&_iter

779#definePOP_ITER()\780ruby_iter=_iter.prev;\781}while(0)

(eval.c)

Onthecontrary,thisisapparentlylight.Itonlyusesthestackspaceandhasonlytwomembers.EvenifthisispushedforeachFRAME,itwouldprobablymatterlittle.

IteratorMethodCallAfterpushingablock,thenextthingistocallaniteratormethod(amethodwhichisaniterator).Therealsoneedsalittlemachinery.Doyourememberthatthere’sacodetomodifythevalueofruby_iteratthebeginningofrb_call0?Here.

▼rb_call0()−movingtoITER_CUR

4498switch(ruby_iter->iter){4499caseITER_PRE:4500itr=ITER_CUR;4501break;4502caseITER_CUR:4503default:4504itr=ITER_NOT;4505break;4506}

(eval.c)

SinceITER_PREispushedpreviouslyatNODE_TER,thiscodemakesruby_iterITER_CUR.Atthismoment,amethodfinally“becomes”an

iterator.Figure2showsthestateofthestacks.

Figure2:thestateoftheRubystacksonaniteratorcall.

Thepossiblevalueofruby_iterisnottheoneoftwobooleanvalues(forthatmethodornot),butoneofthreestepsbecausethere’salittlegapbetweenthetimingswhenpushingablockandinvokinganiteratormethod.Forexample,there’stheevaluationoftheargumentsofaniteratormethod.Sinceit’spossiblethatitcontainsmethodcallsinsideit,there’sthepossibilitythatoneofthatmethodsmistakenlythinksthatthejustpushedblockisforitselfandusesitduringtheevaluation.Therefore,thetimingwhenamethodbecomesaniterator,thismeansturningintoITER_CUR,hastobetheplaceinsideofrb_call()thatisjustbeforefinishingtheinvocation.

▼theprocessingorder

method(arg){block}#pushablock

method(arg){block}#evaluatethearuguments

method(arg){block}#amethodcall

Forexample,inthelastchapter“Method”,there’samacronamedBEGIN_CALLARGSatahandlerofNODE_CALL.ThisiswheremakinguseofthethirdstepITER.Let’sgobackalittleandtrytoseeit.

BEGIN_CALLARGSEND_CALLARGS

▼BEGIN_CALLARGSEND_CALLARGS

1812#defineBEGIN_CALLARGSdo{\1813structBLOCK*tmp_block=ruby_block;\1814if(ruby_iter->iter==ITER_PRE){\1815ruby_block=ruby_block->prev;\1816}\1817PUSH_ITER(ITER_NOT)

1819#defineEND_CALLARGS\1820ruby_block=tmp_block;\1821POP_ITER();\1822}while(0)

(eval.c)

Whenruby_iterisITER_PRE,aruby_blockissetaside.Thiscodeisimportant,forinstance,inthebelowcase:

obj.m1{yield}.m2{nil}

Theevaluationorderofthisexpressionis:

1. pushtheblockofm22. pushtheblockofm1

3. callthemethodm14. callthemethodm2

Therefore,iftherewasnotBEGIN_CALLARGS,m1willcalltheblockofm2.

And,ifthere’sonemoreiteratorconnected,thenumberofBEGIN_CALLARGSincreasesatthesametimeinthiscase,sothere’snoproblem.

BlockInvocationThethirdphaseofiteratorinvocation,itmeansthelastphase,isblockinvocation.

▼rb_eval()−NODE_YIELD

2579caseNODE_YIELD:2580if(node->nd_stts){2581result=avalue_to_yvalue(rb_eval(self,node->nd_stts));2582}2583else{2584result=Qundef;/*noarg*/2585}2586SET_CURRENT_SOURCE();2587result=rb_yield_0(result,0,0,0);2588break;

(eval.c)

nd_sttsistheparameterofyield.avalue_to_yvalue()wasmentionedalittleatthemultipleassignments,butyoucanignorethis.

((errata:actually,itwasnotmentioned.Youcanignorethisanyway.))Theheartofthebehaviorisnotthisbutrb_yield_0().Sincethisfunctionisalsoverylong,Ishowthecodeafterextremelysimplifyingit.Mostofthemethodstosimplifyarepreviouslyused.

cutthecodesrelatingtotrace_func.cuterrorscutthecodesexistonlytopreventfromGCAsthesameasmassign(),there’stheparameterpcall.Thisparameteristochangethelevelofrestrictionoftheparametercheck,sonotimportanthere.Therefore,assumepcal=0andperformconstantfoldings.

Andthistime,Iturnonthe“optimizeforreadabilityoption”asfollows.

whenacodebranchinghasequivalentkindofbranches,leavethemainoneandcuttherest.ifaconditionistrue/falseinthealmostallcase,assumeitistrue/false.assumethere’snotagjumpoccurs,deleteallcodesrelatingtotag.

Ifthingsaredoneuntilthis,itbecomesveryshorter.

▼rb_yield_0()(simplified)

staticVALUE

rb_yield_0(val,self,klass,/*pcall=0*/)VALUEval,self,klass;{volatileVALUEresult=Qnil;volatileVALUEold_cref;volatileVALUEold_wrapper;structBLOCK*volatileblock;structSCOPE*volatileold_scope;structFRAMEframe;intstate;

PUSH_VARS();PUSH_CLASS();block=ruby_block;frame=block->frame;frame.prev=ruby_frame;ruby_frame=&(frame);old_cref=(VALUE)ruby_cref;ruby_cref=(NODE*)ruby_frame->cbase;old_wrapper=ruby_wrapper;ruby_wrapper=block->wrapper;old_scope=ruby_scope;ruby_scope=block->scope;ruby_block=block->prev;ruby_dyna_vars=new_dvar(0,0,block->dyna_vars);ruby_class=block->klass;self=block->self;

/*settheblockarguments*/massign(self,block->var,val,pcall);

PUSH_ITER(block->iter);/*executetheblockbody*/result=rb_eval(self,block->body);POP_ITER();

POP_CLASS();/*……collectruby_dyna_vars……*/POP_VARS();ruby_block=block;ruby_frame=ruby_frame->prev;ruby_cref=(NODE*)old_cref;ruby_wrapper=old_wrapper;

ruby_scope=old_scope;

returnresult;}

Asyoucansee,themoststackframesarereplacedwithwhatsavedatruby_block.Thingstosimplesave/restoreareeasytounderstand,solet’sseethehandlingoftheotherframesweneedtobecarefulabout.

FRAME

structFRAMEframe;

frame=block->frame;/*copytheentirestruct*/frame.prev=ruby_frame;/*bythesetwolines……*/ruby_frame=&(frame);/*……frameispushed*/

Differingfromtheotherframes,aFRAMEisnotusedinthesavedstate,butanewFRAMEiscreatedbyduplicating.ThiswouldlooklikeFigure3.

Figure3:pushacopiedframe

Aswe’veseenthecodeuntilhere,itseemsthatFRAMEwillneverbe“reused”.WhenpushingFRAME,anewFRAMEwillalwaysbecreated.

BLOCK

block=ruby_block;:ruby_block=block->prev;:ruby_block=block;

WhatisthemostmysteriousisthisbehaviorofBLOCK.Wecan’teasilyunderstandwhetheritissavingorpopping.It’scomprehensiblethatthefirststatementandthethirdstatementareasapair,andthestatewillbeeventuallyback.However,whatistheconsequenceofthesecondstatement?

ToputtheconsequenceofI’veponderedalotinonephrase,“goingbacktotheruby_blockofatthemomentwhenpushingtheblock”.Aniteratoris,inshort,thesyntaxtogobacktothepreviousframe.Therefore,allwehavetodoisturningthestateofthestackframeintowhatwasatthemomentwhencreatingtheblock.And,thevalueofruby_blockatthemomentwhencreatingtheblockis,itseemscertainthatitwasblock->prev.Therefore,itiscontainedinprev.

Additionally,forthequestion“isitnoproblemtoassumewhatinvokedisalwaysthetopofruby_block?”,there’snochoicebutsaying“astherb_yield_0side,youcanassumeso”.Topushthe

blockwhichshouldbeinvokedonthetopoftheruby_blockistheworkofthesidetopreparetheblock,andnottheworkofrb_yield_0.

AnexampleofitisBEGIN_CALLARGSwhichwasdiscussedinthepreviouschapter.Whenaniteratorcallcascades,thetwoblocksarepushedandthetopofthestackwillbetheblockwhichshouldnotbeused.Therefore,itispurposefullycheckedandsetaside.

VARS

Cometothinkofit,IthinkwehavenotlookedthecontentsofPUSH_VARS()andPOP_VARS()yet.Let’sseethemhere.

▼PUSH_VARS()POP_VARS()

619#definePUSH_VARS()do{\620structRVarmap*volatile_old;\621_old=ruby_dyna_vars;\622ruby_dyna_vars=0

624#definePOP_VARS()\625if(_old&&(ruby_scope->flags&SCOPE_DONT_RECYCLE)){\626if(RBASIC(_old)->flags)/*ifwerenotrecycled*/\627FL_SET(_old,DVAR_DONT_RECYCLE);\628}\629ruby_dyna_vars=_old;\630}while(0)

(eval.c)

Thisisalsonotpushinganewstruct,tosay“setaside/restore”iscloser.Inpractice,inrb_yield_0,PUSH_VARS()isusedonlytoset

asidethevalue.Whatactuallypreparesruby_dyna_varsisthisline.

ruby_dyna_vars=new_dvar(0,0,block->dyna_vars);

Thistakesthedyna_varssavedinBLOCKandsetsit.Anentryisattachedatthesametime.I’dlikeyoutorecallthedescriptionofthestructureofruby_dyna_varsinPart2,itsaidtheRVarmapwhoseidis0suchastheonecreatedhereisusedasthebreakbetweenblockscopes.

However,infact,betweentheparserandtheevaluator,theformofthelinkstoredinruby_dyna_varsisslightlydifferent.Let’slookatthedvar_asgn_curr()function,whichassignsablocklocalvariableatthecurrentblock.

▼dvar_asgn_curr()

737staticinlinevoid738dvar_asgn_curr(id,value)739IDid;740VALUEvalue;741{742dvar_asgn_internal(id,value,1);743}

699staticvoid700dvar_asgn_internal(id,value,curr)701IDid;702VALUEvalue;703intcurr;704{705intn=0;706structRVarmap*vars=ruby_dyna_vars;707

708while(vars){709if(curr&&vars->id==0){710/*firstnullisadvarheader*/711n++;712if(n==2)break;713}714if(vars->id==id){715vars->val=value;716return;717}718vars=vars->next;719}720if(!ruby_dyna_vars){721ruby_dyna_vars=new_dvar(id,value,0);722}723else{724vars=new_dvar(id,value,ruby_dyna_vars->next);725ruby_dyna_vars->next=vars;726}727}

(eval.c)

Thelastifstatementistoaddavariable.Ifwefocusonthere,wecanseealinkisalwayspushedinatthe“next”toruby_dyna_vars.Thismeans,itwouldlooklikeFigure4.

Figure4:thestructureofruby_dyna_vars

Thisdiffersfromthecaseoftheparserinonepoint:theheaders(id=0)toindicatethebreaksofscopesareattachedbeforethelinks.Ifaheaderisattachedafterthelinks,thefirstoneofthescopecannotbeinsertedproperly.(Figure5)((errata:Itwasdescribedthatruby_dyna_varsoftheevaluatoralwaysformsasinglestraightlink.Butaccordingtotheerrata,itwaswrong.Thatpartandrelevantdescriptionsareremoved.))

Figure5:Theentrycannotbeinsertedproperly.

TargetSpecifiedJumpThecoderelatestojumptagsareomittedinthepreviouslyshowncode,butthere’saneffortthatwe’veneverseenbeforeinthejumpofrb_yield_0.Whyistheeffortnecessary?I’lltellthereasoninadvance.I’dlikeyoutoseethebelowprogram:

[0].eachdobreakend#theplacetoreachbybreak

likethisway,inthecasewhendoingbreakfrominsideofablock,itisnecessarytogetoutoftheblockandgotothemethodthatpushedtheblock.Whatdoesitactuallymean?Let’sthinkbylookingatthe(dynamic)callgraphwheninvokinganiterator.

rb_eval(NODE_ITER)....catch(TAG_BREAK)rb_eval(NODE_CALL)....catch(TAG_BREAK)rb_eval(NODE_YIELD)rb_yield_0rb_eval(NODE_BREAK)....throw(TAG_BREAK)

SincewhatpushedtheblockisNODE_ITER,itshouldgobacktoaNODE_ITERwhendoingbreak.However,NODE_CALLiswaitingforTAG_BREAKbeforeNODE_ITER,inordertoturnabreakovermethodsintoanerror.Thisisaproblem.WeneedtosomehowfindawaytogostraightbacktoaNODE_ITER.

Andactually,“goingbacktoaNODE_ITER”willstillbeaproblem.Ifiteratorsarenesting,therecouldbemultipleNODE_ITERs,thusthe

onecorrespondstothecurrentblockisnotalwaysthefirstNODE_ITER.Inotherwords,weneedtorestrictonly“theNODE_ITERthatpushedthecurrentlybeinginvokedblock”

Then,let’sseehowthisisresolved.

▼rb_yield_0()−thepartsrelatestotags

3826PUSH_TAG(PROT_NONE);3827if((state=EXEC_TAG())==0){/*……evaluatethebody……*/3838}3839else{3840switch(state){3841caseTAG_REDO:3842state=0;3843CHECK_INTS;3844gotoredo;3845caseTAG_NEXT:3846state=0;3847result=prot_tag->retval;3848break;3849caseTAG_BREAK:3850caseTAG_RETURN:3851state|=(serial++<<8);3852state|=0x10;3853block->tag->dst=state;3854break;3855default:3856break;3857}3858}3859POP_TAG();

(eval.c)

ThepartsofTAG_BREAKandTAG_RETURNarecrucial.

First,serialisastaticvariableofrb_yield_0(),itsvaluewillbedifferenteverytimecallingrb_yield_0.“serial”istheserialof“serialnumber”.

Thereasonwhyleftshiftingby8bitsseemsinordertoavoidoverlappingthevaluesofTAG_xxxx.TAG_xxxxisintherangebetween0x1~0x8,4bitsareenough.And,thebit-orof0x10seemstopreventserialfromoverflow.In32-bitmachine,serialcanuseonly24bits(only16milliontimes),recentmachinecanletitoverflowwithinlessthan10seconds.Ifthishappens,thetop24bitsbecomeall0inline.Therefore,if0x10didnotexist,statewouldbethesamevalueasTAG_xxxx(SeealsoFigure6).

Figure6:block->tag->dst

Now,tag->dstbecamethevaluewhichdiffersfromTAG_xxxxandisuniqueforeachcall.Inthissituation,becauseanordinaryswitchaspreviousonescannotreceiveit,thesidetostopjumpsshouldneedeffortstosomeextent.Theplacewheremakinganeffortisthisplaceofrb_eval:NODE_ITER:

▼rb_eval()−NODE_ITER(tostopjumps)

caseNODE_ITER:{state=EXEC_TAG();if(state==0){/*……invokeaniterator……*/}elseif(_block.tag->dst==state){state&=TAG_MASK;if(state==TAG_RETURN||state==TAG_BREAK){result=prot_tag->retval;}}}

IncorrespondingNODE_ITERandrb_yield_0,blockshouldpointtothesamething,sotag->dstwhichwassetatrb_yield_0comesinhere.Becauseofthis,onlythecorrespondingNODE_ITERcanproperlystopthejump.

CheckofablockWhetherornotacurrentlybeingevaluatedmethodisaniterator,inotherwords,whetherthere’sablock,canbecheckedbyrb_block_given_p().Afterreadingtheaboveall,wecantellitsimplementation.

▼rb_block_given_p()

3726int3727rb_block_given_p()3728{3729if(ruby_frame->iter&&ruby_block)3730returnQtrue;3731returnQfalse;

3732}

(eval.c)

Ithinkthere’snoproblem.WhatI’dliketotalkaboutthistimeisactuallyanotherfunctiontocheck,itisrb_f_block_given_p().

▼rb_f_block_given_p()

3740staticVALUE3741rb_f_block_given_p()3742{3743if(ruby_frame->prev&&ruby_frame->prev->iter&&ruby_block)3744returnQtrue;3745returnQfalse;3746}

(eval.c)

ThisisthesubstanceofRuby’sblock_given?.Incomparisontorb_block_given_p(),thisisdifferentincheckingtheprevofruby_frame.Whyisthis?

Thinkingaboutthemechanismtopushablock,tocheckthecurrentruby_framelikerb_block_given_p()isright.Butwhencallingblock_given?fromRuby-level,sinceblock_given?itselfisamethod,anextraFRAMEispushed.Hence,weneedtocheckthepreviousone.

Proc

TodescribeaProcobjectfromtheviewpointofimplementing,itis“aBLOCKwhichcanbebringouttoRubylevel”.BeingabletobringouttoRubylevelmeanshavingmorelatitude,butitalsomeanswhenandwhereitwillbeusedbecomescompletelyunpredictable.Focusingonhowtheinfluenceofthisfactis,let’slookattheimplementation.

ProcobjectcreationAProcobjectiscreatedwithProc.new.Itssubstanceisproc_new().

▼proc_new()

6418staticVALUE6419proc_new(klass)6420VALUEklass;6421{6422volatileVALUEproc;6423structBLOCK*data,*p;6424structRVarmap*vars;64256426if(!rb_block_given_p()&&!rb_f_block_given_p()){6427rb_raise(rb_eArgError,"triedtocreateProcobjectwithoutablock");6428}6429/*(A)allocatebothstructRDataandstructBLOCK*/6430proc=Data_Make_Struct(klass,structBLOCK,blk_mark,blk_free,data);6431*data=*ruby_block;64326433data->orig_thread=rb_thread_current();6434data->wrapper=ruby_wrapper;6435data->iter=data->prev?Qtrue:Qfalse;/*(B)theessentialinitializationisfinishedbyhere*/6436frame_dup(&data->frame);

6437if(data->iter){6438blk_copy_prev(data);6439}6440else{6441data->prev=0;6442}6443data->flags|=BLOCK_DYNAMIC;6444data->tag->flags|=BLOCK_DYNAMIC;64456446for(p=data;p;p=p->prev){6447for(vars=p->dyna_vars;vars;vars=vars->next){6448if(FL_TEST(vars,DVAR_DONT_RECYCLE))break;6449FL_SET(vars,DVAR_DONT_RECYCLE);6450}6451}6452scope_dup(data->scope);6453proc_save_safe_level(proc);64546455returnproc;6456}

(eval.c)

ThecreationofaProcobjectitselfisunexpectedlysimple.Between(A)and(B),aspaceforanProcobjectisallocatedanditsinitializationcompletes.Data_Make_Struct()isasimplemacrothatdoesbothmalloc()andData_Wrap_Struct()atthesametime.

Theproblemsexistafterthat:

frame_dup()

blk_copy_prev()

FL_SET(vars,DVAR_DONT_RECYCLE)

scope_dup()

Thesefourhavethesamepurposes.Theyare:

moveallofwhatwereputonthemachinestacktotheheap.preventfromcollectingevenifafterPOP

Here,“all”meanstheallthingsincludingprev.Fortheallstackframespushedthere,itduplicateseachframebydoingmalloc()andcopying.VARSisusuallyforcedtobecollectedbyrb_gc_force_recycle()atthesamemomentofPOP,butthisbehaviorisstoppedbysettingtheDVAR_DONT_RECYCLEflag.Andsoon.Reallyextremethingsaredone.

Whyaretheseextremethingsnecessary?Thisisbecause,unlikeiteratorblocks,aProccanpersistlongerthanthemethodthatcreatedit.AndtheendofamethodmeansthethingsallocatedonthemachinestacksuchasFRAME,ITER,andlocal_varsofSCOPEareinvalidated.It’seasytopredictwhattheconsequenceofusingtheinvalidatedmemories.(Anexampleanswer:itbecomestroublesome).

ItriedtocontriveawaytoatleastusethesameFRAMEfrommultipleProc,butsincetherearetheplacessuchasold_framewheresettingasidethepointerstothelocalvariables,itdoesnotseemgoingwell.Ifitrequiresaloteffortsinanyway,anothereffort,say,allocatingallofthemwithmalloc()fromthefristplace,seemsbettertogiveitatry.

Anyway,Isentimentallythinkthatit’ssurprisingthatitrunswiththatspeedeventhoughdoingtheseextremethings.Indeed,ithasbecomeagoodtime.

FloatingFramePreviously,Imentioneditjustinonephrase“duplicateallframes”,butsincethatwasunclear,let’slookatmoredetails.Thepointsarethenexttwo:

HowtoduplicateallWhyallofthemareduplicated

Thenfirst,let’sstartwiththesummaryofhoweachstackframeissaved.

Frame location hasprevpointer?FRAME stack yesSCOPE stack nolocal_tbl heaplocal_vars stackVARS heap noBLOCK stack yes

CLASSCREFITERarenotnecessarythistime.SinceCLASSisageneralRubyobject,rb_gc_force_recycle()isnotcalledwithitevenbymistake(it’simpossible)andbothCREFandITERbecomesunnecessaryafterstoringitsvaluesatthemomentinFRAME.Thefourframesintheabovetableareimportantbecausethesewillbemodifiedorreferredtomultipletimeslater.Therestthreewillnot.

Then,thistalkmovestohowtoduplicateall.Isaid“how”,butitdoesnotaboutsuchas“bymalloc()”.Theproblemishowto

duplicate“all”.Itisbecause,hereI’dlikeyoutoseetheabovetable,therearesomeframeswithoutanyprevpointer.Inotherwords,wecannotfollowlinks.Inthissituation,howcanweduplicateall?

Afairlyclevertechniqueusedtocounterthis.Let’stakeSCOPEasanexample.Afunctionnamedscope_dup()isusedpreviouslyinordertoduplicateSCOPE,solet’sseeitfirst.

▼scope_dup()onlythebeginning

6187staticvoid6188scope_dup(scope)6189structSCOPE*scope;6190{6191ID*tbl;6192VALUE*vars;61936194scope->flags|=SCOPE_DONT_RECYCLE;

(eval.c)

Asyoucansee,SCOPE_DONT_RECYCLEisset.Thennext,takealookatthedefinitionofPOP_SCOPE():

▼POP_SCOPE()onlythebeginning

869#definePOP_SCOPE()\870if(ruby_scope->flags&SCOPE_DONT_RECYCLE){\871if(_old)scope_dup(_old);\872}\

(eval.c)

Whenitpops,ifSCOPE_DONT_RECYCLEflagwassettothecurrentSCOPE(ruby_scope),italsodoesscope_dup()ofthepreviousSCOPE(_old).Inotherwords,SCOPE_DONT_RECYCLEisalsosettothisone.Inthisway,onebyone,theflagispropagatedatthetimewhenitpops.(Figure7)

Figure7:flagpropagation

SinceVARSalsodoesnothaveanyprevpointer,thesametechniqueisusedtopropagatetheDVAR_DONT_RECYCLEflag.

Next,thesecondpoint,trytothinkabout“whyallofthemareduplicated”.WecanunderstandthatthelocalvariablesofSCOPEcanbereferredtolaterifitsProciscreated.However,isitnecessarytocopyallofthemincludingthepreviousSCOPEinordertoaccomplishthat?

Honestlyspeaking,Icouldn’tfindtheanswerofthisquestionandhasbeenworriedabouthowcanIwritethissectionforalmostthreedays,I’vejustgottheanswer.Takealookatthenextprogram:

defget_procProc.new{nil}

end

env=get_proc{p'ok'}eval("yield",env)

Ihavenotexplainedthisfeature,butbypassingaProcobjectasthesecondargumentofeval,youcanevaluatethestringinthatenvironment.

Itmeans,asthereaderswhohavereaduntilherecanprobablytell,itpushesthevariousenvironmentstakenfromtheProc(meaningBLOCK)andevaluates.Inthiscase,itnaturallyalsopushesBLOCKandyoucanturntheBLOCKintoaProcagain.Then,usingtheProcwhendoingeval…ifthingsaredonelikethis,youcanaccessalmostallinformationofruby_blockfromRubylevelasyoulike.Thisisthereasonwhytheentirestacksneedtobefullyduplicated.((errata:wecannotaccessruby_blockaswelikefromRubylevel.ThereasonwhyallSCOPEsareduplicatedwasnotunderstood.Itseemsallwecandoistoinvestigatethemailinglistarchivesofthetimewhenthischangewasapplied.(Itisstillnotcertainwhetherwecanfindoutthereasoninthisway.)))

InvocationofProcNext,we’lllookattheinvocationofacreatedProc.SinceProc#callcanbeusedfromRubytoinvoke,wecanfollowthesubstanceofit.

ThesubstanceofProc#callisproc_call():

▼proc_call()

6570staticVALUE6571proc_call(proc,args)6572VALUEproc,args;/*OK*/6573{6574returnproc_invoke(proc,args,Qtrue,Qundef);6575}

(eval.c)

Delegatetoproc_invoke().WhenIlookupinvokeinadictionary,itwaswrittensuchas“callon(God,etc.)forhelp”,butwhenitisinthecontextofprogramming,itisoftenusedinthealmostsamemeaningas“activate”.

Theprototypeoftheproc_invoke()is,

proc_invoke(VALUEproc,VALUEargs,intpcall,VALUEself)

However,accordingtothepreviouscode,pcall=Qtrueandself=Qundefinthiscase,sothesetwocanberemovedbyconstantfoldings.

▼proc_invoke(simplified)

staticVALUEproc_invoke(proc,args,/*pcall=Qtrue*/,/*self=Qundef*/)VALUEproc,args;VALUEself;{structBLOCK*volatileold_block;structBLOCK_block;structBLOCK*data;

volatileVALUEresult=Qnil;intstate;volatileintorphan;volatileintsafe=ruby_safe_level;volatileVALUEold_wrapper=ruby_wrapper;structRVarmap*volatileold_dvars=ruby_dyna_vars;

/*(A)takeBLOCKfromprocandassignittodata*/Data_Get_Struct(proc,structBLOCK,data);/*(B)blk_orphan*/orphan=blk_orphan(data);

ruby_wrapper=data->wrapper;ruby_dyna_vars=data->dyna_vars;/*(C)pushBLOCKfromdata*/old_block=ruby_block;_block=*data;ruby_block=&_block;

/*(D)transitiontoITER_CUR*/PUSH_ITER(ITER_CUR);ruby_frame->iter=ITER_CUR;

PUSH_TAG(PROT_NONE);state=EXEC_TAG();if(state==0){proc_set_safe_level(proc);/*(E)invoketheblock*/result=rb_yield_0(args,self,0,pcall);}POP_TAG();

POP_ITER();if(ruby_block->tag->dst==state){state&=TAG_MASK;/*targetspecifiedjump*/}ruby_block=old_block;ruby_wrapper=old_wrapper;ruby_dyna_vars=old_dvars;ruby_safe_level=safe;

switch(state){case0:

break;caseTAG_BREAK:result=prot_tag->retval;break;caseTAG_RETURN:if(orphan){/*orphanprocedure*/localjump_error("returnfromproc-closure",prot_tag->retval);}/*fallthrough*/default:JUMP_TAG(state);}returnresult;}

Thecrucialpointsarethree:C,D,andE.

(C)AtNODE_ITERaBLOCKiscreatedfromthesyntaxtreeandpushed,butthistime,aBLOCKistakenfromProcandpushed.

(D)ItwasITER_PREbeforebecomingITER_CURatrb_call0(),butthistimeitgoesdirectlyintoITER_CUR.

(E)Ifthecasewasanordinaryiterator,itsmethodcallexistsbeforeyeildoccursthengoingtorb_yield_0,butthistimerb_yield_()isdirectlycalledandinvokesthejustpushedblock.

Inotherwords,inthecaseofiterator,theproceduresareseparatedintothreeplaces,NODE_ITER~rb_call0()~NODE_YIELD.Butthistime,theyaredoneallatonce.

Finally,I’lltalkaboutthemeaningofblk_orphan().Asthenamesuggests,itisafunctiontodeterminethestateof“themethod

whichcreatedtheProchasfinished”.Forexample,theSCOPEusedbyaBLOCKhasalreadybeenpopped,youcandetermineithasfinished.

BlockandProcInthepreviouschapter,variousthingsaboutargumentsandparametersofmethodsarediscussed,butIhavenotdescribedaboutblockparametersyet.Althoughitisbrief,hereI’llperformthefinalpartofthatseries.

defm(&block)end

Thisisa“blockparameter”.Thewaytoenablethisisverysimple.Ifmisaniterator,itiscertainthataBLOCKwasalreadypushed,turnitintoaProcandassigninto(inthiscase)thelocalvariableblock.HowtoturnablockintoaProcisjustcallingproc_new(),whichwaspreviouslydescribed.Thereasonwhyjustcallingisenoughcanbealittleincomprehensible.HoweverwhicheverProc.neworm,thesituation“amethodiscalledandaBLOCKispushed”isthesame.Therefore,fromClevel,anytimeyoucanturnablockintoaProcbyjustcallingproc_new().

Andifmisnotaniterator,allwehavetodoissimplyassigningnil.

Next,itisthesidetopassablock.

m(&block)

Thisisa“blockargument”.Thisisalsosimple,takeaBLOCKfrom(aProcobjectstoredin)blockandpushit.WhatdiffersfromPUSH_BLOCK()isonlywhetheraBLOCKhasalreadybeencreatedinadvanceornot.

Thefunctiontodothisprocedureisblock_pass().Ifyouarecuriousabout,checkandconfirmaroundit.However,itreallydoesjustonlywhatwasdescribedhere,it’spossibleyou’llbedisappointed…

TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera

CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License

RubyHackingGuide

Chapter17:Dynamic

evaluation

Overview

Ihavealreadyfinishedtodescribeaboutthemechanismoftheevaluatorbythepreviouschapter.Inthischapter,byincludingtheparserinadditiontoit,let’sexaminethebigpictureas“theevaluatorinabroadsense”.Therearethreetargets:eval,Module#module_evalandObject#instance_eval.

eval

I’vealreadydescribedabouteval,butI’llintroducemoretinythingsaboutithere.

Byusingeval,youcancompileandevaluateastringatruntimeintheplace.Itsreturnvalueisthevalueofthelastexpressionoftheprogram.

peval("1+1")#2

Youcanalsorefertoavariableinitsscopefrominsideofastringtoeval.

lvar=5@ivar=6peval("lvar+@ivar")#11

Readerswhohavebeenreadinguntilherecannotsimplyreadandpassovertheword“itsscope”.Forinstance,youarecuriousabouthowisits“scope”ofconstants,aren’tyou?Iam.Toputthebottomlinefirst,basicallyyoucanthinkitdirectlyinheritstheenvironmentofoutsideofeval.

Andyoucanalsodefinemethodsanddefineclasses.

defaeval('classC;deftest()puts("ok")endend')end

a()#defineclassCandC#testC.new.test#showsok

Moreover,asmentionedalittleinthepreviouschapter,whenyoupassaProcasthesecondargument,thestringcanbeevaluatedinitsenvironment.

defnew_envn=5Proc.new{nil}#turntheenvironmentofthismethodintoanobjectandreturnitend

peval('n*3',new_env())#15

module_evalandinstance_evalWhenaProcispassedasthesecondargumentofeval,theevaluationscanbedoneinitsenvironment.module_evalandinstance_evalisitslimited(orshortcut)version.Withmodule_eval,youcanevaluateinanenvironmentthatisasifinamodulestatementoraclassstatement.

lvar="toplevellvar"#alocalvariabletoconfirmthisscope

moduleMendM.module_eval(<<'EOS')#asuitablesituationtousehere-documentplvar#referablepself#showsMdefok#defineM#okputs'ok'endEOS

Withinstance_eval,youcanevaluateinanenvironmentwhoseselfofthesingletonclassstatementistheobject.

lvar="toplevellvar"#alocalvariabletoconfirmthisscope

obj=Object.newobj.instance_eval(<<'EOS')plvar#referablepself#shows#<Object:0x40274f5c>defok#defineobj.okputs'ok'endEOS

Additionally,thesemodule_evalandinstance_evalcanalsobeused

asiterators,ablockisevaluatedineachenvironmentinthatcase.Forinstance,

obj=Object.newpobj##<Object:0x40274fac>obj.instance_eval{pself##<Object:0x40274fac>}

Likethis.

However,betweenthecasewhenusingastringandthecasewhenusingablock,thebehavioraroundlocalvariablesisdifferenteachother.Forexample,whencreatingablockintheamethodthendoinginstance_evalitinthebmethod,theblockwouldrefertothelocalvariablesofa.Whencreatingastringintheamethodthendoinginstance_evalitinthebmethod,frominsideofthestring,itwouldrefertothelocalvariablesofb.Thescopeoflocalvariablesisdecided“atcompiletime”,theconsequencediffersbecauseastringiscompiledeverytimebutablockiscompiledwhenloadingfiles.

eval

eval()

TheevalofRubybranchesmanytimesbasedonthepresenceandabsenceoftheparameters.Let’sassumetheformofcallislimited

tothebelow:

eval(prog_string,some_block)

Then,sincethismakestheactualinterfacefunctionrb_f_eval()almostmeaningless,we’llstartwiththefunctioneval()whichisonesteplower.Thefunctionprototypeofeval()is:

staticVALUEeval(VALUEself,VALUEsrc,VALUEscope,char*file,intline);

scopeistheProcofthesecondparameter.fileandlineisthefilenameandlinenumberofwhereastringtoevalissupposedtobelocated.Then,let’sseethecontent:

▼eval()(simplified)

4984staticVALUE4985eval(self,src,scope,file,line)4986VALUEself,src,scope;4987char*file;4988intline;4989{4990structBLOCK*data=NULL;4991volatileVALUEresult=Qnil;4992structSCOPE*volatileold_scope;4993structBLOCK*volatileold_block;4994structRVarmap*volatileold_dyna_vars;4995VALUEvolatileold_cref;4996intvolatileold_vmode;4997volatileVALUEold_wrapper;4998structFRAMEframe;4999NODE*nodesave=ruby_current_node;5000volatileintiter=ruby_frame->iter;5001intstate;

50025003if(!NIL_P(scope)){/*alwaystruenow*/5009Data_Get_Struct(scope,structBLOCK,data);5010/*pushBLOCKfromdata*/5011frame=data->frame;5012frame.tmp=ruby_frame;/*topreventfromGC*/5013ruby_frame=&(frame);5014old_scope=ruby_scope;5015ruby_scope=data->scope;5016old_block=ruby_block;5017ruby_block=data->prev;5018old_dyna_vars=ruby_dyna_vars;5019ruby_dyna_vars=data->dyna_vars;5020old_vmode=scope_vmode;5021scope_vmode=data->vmode;5022old_cref=(VALUE)ruby_cref;5023ruby_cref=(NODE*)ruby_frame->cbase;5024old_wrapper=ruby_wrapper;5025ruby_wrapper=data->wrapper;5032self=data->self;5033ruby_frame->iter=data->iter;5034}5045PUSH_CLASS();5046ruby_class=ruby_cbase;/*==ruby_frame->cbase*/50475048ruby_in_eval++;5049if(TYPE(ruby_class)==T_ICLASS){5050ruby_class=RBASIC(ruby_class)->klass;5051}5052PUSH_TAG(PROT_NONE);5053if((state=EXEC_TAG())==0){5054NODE*node;50555056result=ruby_errinfo;5057ruby_errinfo=Qnil;5058node=compile(src,file,line);5059if(ruby_nerrs>0){5060compile_error(0);5061}5062if(!NIL_P(result))ruby_errinfo=result;5063result=eval_node(self,node);5064}5065POP_TAG();

5066POP_CLASS();5067ruby_in_eval--;5068if(!NIL_P(scope)){/*alwaystruenow*/5069intdont_recycle=ruby_scope->flags&SCOPE_DONT_RECYCLE;50705071ruby_wrapper=old_wrapper;5072ruby_cref=(NODE*)old_cref;5073ruby_frame=frame.tmp;5074ruby_scope=old_scope;5075ruby_block=old_block;5076ruby_dyna_vars=old_dyna_vars;5077data->vmode=scope_vmode;/*savethemodificationofthevisibilityscope*/5078scope_vmode=old_vmode;5079if(dont_recycle){/*……copySCOPEBLOCKVARS……*/5097}5098}5104if(state){5105if(state==TAG_RAISE){/*……prepareanexceptionobject……*/5121rb_exc_raise(ruby_errinfo);5122}5123JUMP_TAG(state);5124}51255126returnresult;5127}

(eval.c)

Ifthisfunctionisshownwithoutanypreamble,youprobablyfeel“oww!”.Butwe’vedefeatedmanyfunctionsofeval.cuntilhere,sothisisnotenoughtobeanenemyofus.Thisfunctionisjustcontinuouslysaving/restoringthestacks.Thepointsweneedtocareaboutareonlythebelowthree:

unusuallyFRAMEisalsoreplaced(notcopiedandpushed)ruby_crefissubstituted(?)byruby_frame->cbase

onlyscope_vmodeisnotsimplyrestoredbutinfluencesdata.

Andthemainpartsarethecompile()andeval_node()locatedaroundthemiddle.Thoughit’spossiblethateval_node()hasalreadybeenforgotten,itisthefunctiontostarttheevaluationoftheparameternode.Itwasalsousedinruby_run().

Hereiscompile().

▼compile()

4968staticNODE*4969compile(src,file,line)4970VALUEsrc;4971char*file;4972intline;4973{4974NODE*node;49754976ruby_nerrs=0;4977Check_Type(src,T_STRING);4978node=rb_compile_string(file,src,line);49794980if(ruby_nerrs==0)returnnode;4981return0;4982}

(eval.c)

ruby_nerrsisthevariableincrementedinyyerror().Inotherwords,ifthisvariableisnon-zero,itindicatesmorethanoneparseerrorhappened.And,rb_compile_string()wasalreadydiscussedinPart2.ItwasafunctiontocompileaRubystringintoasyntaxtree.

Onethingbecomesaproblemhereislocalvariable.Aswe’veseeninChapter12:Syntaxtreeconstruction,localvariablesaremanagedbyusinglvtbl.However,sinceaSCOPE(andpossiblyalsoVARS)alreadyexists,weneedtoparseinthewayofwritingoverandaddingtoit.Thisisinfacttheheartofeval(),andistheworstdifficultpart.Let’sgobacktoparse.yagainandcompletethisinvestigation.

top_local

I’vementionedthatthefunctionsnamedlocal_push()local_pop()areusedwhenpushingstructlocal_vars,whichisthemanagementtableoflocalvariables,butactuallythere’sonemorepairoffunctionstopushthemanagementtable.Itisthepairoftop_local_init()andtop_local_setup().Theyarecalledinthissortofway.

▼Howtop_local_init()iscalled

program:{top_local_init();}compstmt{top_local_setup();}

Ofcourse,inactualityvariousotherthingsarealsodone,butallofthemarecutherebecauseit’snotimportant.Andthisisthecontentofit:

▼top_local_init()

5273staticvoid5274top_local_init()5275{5276local_push(1);5277lvtbl->cnt=ruby_scope->local_tbl?ruby_scope->local_tbl[0]:0;5278if(lvtbl->cnt>0){5279lvtbl->tbl=ALLOC_N(ID,lvtbl->cnt+3);5280MEMCPY(lvtbl->tbl,ruby_scope->local_tbl,ID,lvtbl->cnt+1);5281}5282else{5283lvtbl->tbl=0;5284}5285if(ruby_dyna_vars)5286lvtbl->dlev=1;5287else5288lvtbl->dlev=0;5289}

(parse.y)

Thismeansthatlocal_tbliscopiedfromruby_scopetolvtbl.Asforblocklocalvariables,sinceit’sbettertoseethemallatoncelater,we’llfocusonordinarylocalvariablesforthetimebeing.Next,hereistop_local_setup().

▼top_local_setup()

5291staticvoid5292top_local_setup()5293{5294intlen=lvtbl->cnt;/*thenumberoflocalvariablesafterparsing*/5295inti;/*thenumberoflocalvaraiblesbeforeparsing*/52965297if(len>0){5298i=ruby_scope->local_tbl?ruby_scope->local_tbl[0]:0;52995300if(i<len){5301if(i==0||(ruby_scope->flags&SCOPE_MALLOC)==0){

5302VALUE*vars=ALLOC_N(VALUE,len+1);5303if(ruby_scope->local_vars){5304*vars++=ruby_scope->local_vars[-1];5305MEMCPY(vars,ruby_scope->local_vars,VALUE,i);5306rb_mem_clear(vars+i,len-i);5307}5308else{5309*vars++=0;5310rb_mem_clear(vars,len);5311}5312ruby_scope->local_vars=vars;5313ruby_scope->flags|=SCOPE_MALLOC;5314}5315else{5316VALUE*vars=ruby_scope->local_vars-1;5317REALLOC_N(vars,VALUE,len+1);5318ruby_scope->local_vars=vars+1;5319rb_mem_clear(ruby_scope->local_vars+i,len-i);5320}5321if(ruby_scope->local_tbl&&ruby_scope->local_vars[-1]==0){5322free(ruby_scope->local_tbl);5323}5324ruby_scope->local_vars[-1]=0;/*NODEisnotnecessaryanymore*/5325ruby_scope->local_tbl=local_tbl();5326}5327}5328local_pop();5329}

(parse.y)

Sincelocal_varscanbeeitherinthestackorintheheap,itmakesthecodecomplextosomeextent.However,thisisjustupdatinglocal_tblandlocal_varsofruby_scope.(WhenSCOPE_MALLOCwasset,local_varswasallocatedbymalloc()).Andhere,becausethere’snomeaningofusingalloca(),itisforcedtochangeitsallocationmethodtomalloc.

BlockLocalVariableBytheway,howaboutblocklocalvariables?Tothinkaboutthis,wehavetogobacktotheentrypointoftheparserfirst,itisyycompile().

▼settingruby_dyna_varsaside

staticNODE*yycompile(f,line){structRVarmap*vars=ruby_dyna_vars;:n=yyparse();:ruby_dyna_vars=vars;}

Thislookslikeameresave-restore,butthepointisthatthisdoesnotcleartheruby_dyna_vars.ThismeansthatalsointheparseritdirectlyaddselementstothelinkofRVarmapcreatedintheevaluator.

However,accordingtothepreviousdescription,thestructureofruby_dyna_varsdiffersbetweentheparserandtheevalutor.Howdoesitdealwiththedifferenceinthewayofattachingtheheader(RVarmapwhoseid=0)?

Whatishelpfulhereisthe“1”oflocal_push(1)intop_local_init().Whentheargumentoflocal_push()becomestrue,itdoesnotattachthefirstheaderofruby_dyna_vars.Itmeans,itwouldlook

likeFigure1.Now,itisassuredthatwecanrefertotheblocklocalvariablesoftheoutsidescopefrominsideofastringtoeval.

Figure1:ruby_dyna_varsinsideeval

Well,it’ssurewecanreferto,butdidn’tyousaythatruby_dyna_varsisentirelyfreedintheparser?Whatcanwedoifthelinkcreatedattheevaluatorwillbefreed?…I’dlikethereaderswhonoticedthistoberelievedbyreadingthenextpart.

▼yycompile()−freeingruby_dyna_vars

2386vp=ruby_dyna_vars;2387ruby_dyna_vars=vars;2388lex_strterm=0;2389while(vp&&vp!=vars){2390structRVarmap*tmp=vp;2391vp=vp->next;2392rb_gc_force_recycle((VALUE)tmp);2393}

(parse.y)

Itisdesignedsothattheloopwouldstopwhenitreachesthelinkcreatedattheevaluator(vars).

instance_eval

TheWholePictureThesubstanceofModule#module_evalisrb_mod_module_eval(),andthesubstanceofObject#instance_evalisrb_obj_instance_eval().

▼rb_mod_module_eval()rb_obj_instance_eval()

5316VALUE5317rb_mod_module_eval(argc,argv,mod)5318intargc;5319VALUE*argv;5320VALUEmod;5321{5322returnspecific_eval(argc,argv,mod,mod);5323}

5298VALUE5299rb_obj_instance_eval(argc,argv,self)5300intargc;5301VALUE*argv;5302VALUEself;5303{5304VALUEklass;53055306if(rb_special_const_p(self)){5307klass=Qnil;5308}5309else{5310klass=rb_singleton_class(self);5311}53125313returnspecific_eval(argc,argv,klass,self);5314}

(eval.c)

Thesetwomethodshaveacommonpartas“amethodtoreplaceselfwithclass”,thatpartisdefinedasspecific_eval().Figure2showsitandalsowhatwillbedescribed.Whatwithparenthesesarecallsbyfunctionpointers.

Figure2:CallGraph

Whicheverinstance_evalormodule_eval,itcanacceptbothablockandastring,thusitbranchesforeachparticularprocesstoyieldandevalrespectively.However,mostofthemarealsocommonagain,thispartisextractedasexec_under().

Butforthosewhoreading,onehavetosimultaneouslyfaceat2times2=4ways,itisnotagoodplan.Therefore,hereweassumeonlythecasewhen

1. itisaninstance_eval2. whichtakesastringasitsargument

.Andextractingallfunctionsunderrb_obj_instance_eval()in-line,foldingconstants,we’llreadtheresult.

AfterAbsorbedAfterall,itbecomesverycomprehensibleincomparisontotheonebeforebeingabsorbed.

▼specific_eval()−instance_eval,eval,string

staticVALUEinstance_eval_string(self,src,file,line)VALUEself,src;constchar*file;intline;{VALUEsclass;VALUEresult;intstate;intmode;

sclass=rb_singleton_class(self);

PUSH_CLASS();ruby_class=sclass;PUSH_FRAME();ruby_frame->self=ruby_frame->prev->self;ruby_frame->last_func=ruby_frame->prev->last_func;ruby_frame->last_class=ruby_frame->prev->last_class;ruby_frame->argc=ruby_frame->prev->argc;ruby_frame->argv=ruby_frame->prev->argv;if(ruby_frame->cbase!=sclass){ruby_frame->cbase=rb_node_newnode(NODE_CREF,sclass,0,ruby_frame->cbase);}PUSH_CREF(sclass);

mode=scope_vmode;

SCOPE_SET(SCOPE_PUBLIC);PUSH_TAG(PROT_NONE);if((state=EXEC_TAG())==0){result=eval(self,src,Qnil,file,line);}POP_TAG();SCOPE_SET(mode);

POP_CREF();POP_FRAME();POP_CLASS();if(state)JUMP_TAG(state);

returnresult;}

ItseemsthatthispushesthesingletonclassoftheobjecttoCLASSandCREFandruby_frame->cbase.Themainprocessisone-shotofeval().ItisunusualthatthingssuchasinitializingFRAMEbyastruct-copyaremissing,butthisisalsonotcreatesomuchdifference.

BeforebeingabsorbedThoughtheauthorsaiditbecomesmorefriendlytoread,it’spossibleithasbeenalreadysimplesinceitwasnotabsorbed,let’scheckwhereissimplifiedincomparisontothebefore-absorbedone.

Thefirstoneisspecific_eval().SincethisfunctionistosharethecodeoftheinterfacetoRuby,almostallpartsofitistoparsetheparameters.Hereistheresultofcuttingthemall.

▼specific_eval()(simplified)

5258staticVALUE5259specific_eval(argc,argv,klass,self)5260intargc;5261VALUE*argv;5262VALUEklass,self;5263{5264if(rb_block_given_p()){

5268returnyield_under(klass,self);5269}5270else{

5294returneval_under(klass,self,argv[0],file,line);5295}5296}

(eval.c)

Asyoucansee,thisisperfectlybranchesintwowaysbasedonwhetherthere’sablockornot,andeachroutewouldneverinfluencetheother.Therefore,whenreading,weshouldreadonebyone.Tobeginwith,theabsorbedversionisenhancedinthispoint.

Andfileandlineareirrelevantwhenreadingyield_under(),thusinthecasewhentherouteofyieldisabsorbedbythemainbody,itmightbecomeobviousthatwedon’thavetothinkabouttheparseoftheseparametersatall.

Next,we’lllookateval_under()andeval_under_i().

▼eval_under()

5222staticVALUE5223eval_under(under,self,src,file,line)5224VALUEunder,self,src;5225constchar*file;5226intline;5227{5228VALUEargs[4];52295230if(ruby_safe_level>=4){5231StringValue(src);5232}5233else{5234SafeStringValue(src);5235}5236args[0]=self;5237args[1]=src;5238args[2]=(VALUE)file;5239args[3]=(VALUE)line;5240returnexec_under(eval_under_i,under,under,args);5241}

5214staticVALUE5215eval_under_i(args)5216VALUE*args;5217{5218returneval(args[0],args[1],Qnil,(char*)args[2],(int)args[3]);5219}

(eval.c)

Inthisfunction,inordertomakeitsargumentssingle,itstoresthemintotheargsarrayandpassesit.Wecanimaginethatthisargsexistsasatemporarycontainertopassfromeval_under()toeval_under_i(),butnotsurethatitistrulyso.It’spossiblethatargsismodifiedinsideevec_under().

Asawaytoshareacode,thisisaveryrightwaytodo.Butforthosewhoreadit,thiskindofindirectpassingisincomprehensible.

Particularly,becausethereareextracastingsforfileandlinetofoolthecompiler,itishardtoimaginewhatweretheiractualtypes.Thepartsaroundthisentirelydisappearedintheabsorbedversion,soyoudon’thavetoworryaboutgettinglost.

However,it’stoomuchtosaythatabsorbingandextractingalwaysmakesthingseasiertounderstand.Forexample,whencallingexec_under(),underispassedasboththesecondandthirdarguments,butisitallrightiftheexec_under()sideextractsthebothparametervariablesintounder?Thatistosay,thesecondandthirdargumentsofexec_under()are,infact,indicatingCLASSandCREFthatshouldbepushed.CLASSandCREFare“differentthings”,itmightbebettertousedifferentvariables.Alsointhepreviousabsorbedversion,foronlythispoint,

VALUEsclass=.....;VALUEcbase=sclass;

IthoughtthatIwouldwritethisway,butalsothoughtitcouldgivethestrangeimpressionifabruptlyonlythesevariablesareleft,thusitwasextractedassclass.Itmeansthatthisisonlybecauseoftheflowofthetexts.

Bynow,somanytimes,I’veextractedargumentsandfunctions,andforeachtimeIrepeatedlyexplainedthereasontoextract.Theyare

thereareonlyafewpossiblepatternsthebehaviorcanslightlychange

Definitely,I’mnotsaying“Inwhateverwaysextractingvariousthingsalwaysmakesthingssimpler”.

Inwhatevercase,whatofthefirstpriorityisthecomprehensibilityforourselfandnotkeepcomplyingthemethodology.Whenextractingmakesthingssimpler,extractit.Whenwefeelthatnotextractingorconverselybundlingasaproceduremakesthingseasiertounderstand,letusdoit.Asforruby,Ioftenextractedthembecausetheoriginaliswrittenproperly,butifasourcecodewaswrittenbyapoorprogrammer,aggressivelybundlingtofunctionsshouldoftenbecomeagoodchoice.

TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera

CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License

RubyHackingGuide

TranslatedbyVincentISAMBART

Chapter18:Loading

Outline

InterfaceAttheRubylevel,therearetwoproceduresthatcanbeusedforloading:requireandload.

require'uri'#loadtheurilibraryload'/home/foo/.myrc'#readaresourcefile

Theyarebothnormalmethods,compiledandevaluatedexactlylikeanyothercode.Itmeansloadingoccursaftercompilationgavecontroltotheevaluationstage.

Thesetwofunctioneachhavetheirownuse.‘require’istoloadlibraries,andloadistoloadanarbitraryfile.Let’sseethisinmoredetails.

require

requirehasfourfeatures:

thefileissearchedforintheloadpath

itcanloadextensionlibrariesthe.rb/.soextensioncanbeomittedagivenfileisneverloadedmorethanonce

Ruby’sloadpathisintheglobalvariable$:,whichcontainsanarrayofstrings.Forexample,displayingthecontentofthe$:intheenvironmentIusuallyusewouldshow:

%ruby-e'puts$:'/usr/lib/ruby/site_ruby/1.7/usr/lib/ruby/site_ruby/1.7/i686-linux/usr/lib/ruby/site_ruby/usr/lib/ruby/1.7/usr/lib/ruby/1.7/i686-linux.

Callingputsonanarraydisplaysoneelementoneachlinesoit’seasytoread.

AsIranconfigureusing--prefix=/usr,thelibrarypathis/usr/lib/rubyandbelow,butifyoucompileitnormallyfromthesourcecode,thelibrarieswillbein/usr/local/lib/rubyandbelow.InaWindowsenvironment,therewillalsobeadriveletter.

Then,let’strytorequirethestandardlibrarynkf.sofromtheloadpath.

require'nkf'

Iftherequirednamehasnoextension,requiresilentlycompensates.First,ittrieswith.rb,thenwith.so.Onsome

platformsitalsotriestheplatform’sspecificextensionforextensionlibraries,forexample.dllinaWindowsenvironmentor.bundleonMacOSX.

Let’sdoasimulationonmyenvironment.rubychecksthefollowingpathsinsequentialorder.

/usr/lib/ruby/site_ruby/1.7/nkf.rb/usr/lib/ruby/site_ruby/1.7/nkf.so/usr/lib/ruby/site_ruby/1.7/i686-linux/nkf.rb/usr/lib/ruby/site_ruby/1.7/i686-linux/nkf.so/usr/lib/ruby/site_ruby/nkf.rb/usr/lib/ruby/site_ruby/nkf.so/usr/lib/ruby/1.7/nkf.rb/usr/lib/ruby/1.7/nkf.so/usr/lib/ruby/1.7/i686-linux/nkf.rb/usr/lib/ruby/1.7/i686-linux/nkf.sofound!

nkf.sohasbeenfoundin/usr/lib/ruby/1.7/i686-linux.Oncethefilehasbeenfound,require’slastfeature(notloadingthefilemorethanonce)locksthefile.Thelocksarestringsputintheglobalvariable$".Inourcasethestring"nkf.so"hasbeenputthere.Eveniftheextensionhasbeenomittedwhencallingrequire,thefilenamein$"hastheextension.

require'nkf'#afterloadingnkf...p$"#["nkf.so"]thefileislocked

require'nkf'#nothinghappensifwerequireitagainp$"#["nkf.so"]thecontentofthelockarraydoesnotchange

Therearetworeasonsforaddingthemissingextension.Thefirstoneisnottoloadittwiceifthesamefileislaterrequiredwithits

extension.Thesecondoneistobeabletoloadbothnkf.rbandnkf.so.Infacttheextensionsaredisparate(.so.dll.bundleetc.)dependingontheplatform,butatlockingtimetheyallbecome.so.That’swhywhenwritingaRubyprogramyoucanignorethedifferencesofextensionsandconsiderit’salwaysso.SoyoucansaythatrubyisquiteUNIXoriented.

Bytheway,$"canbefreelymodifiedevenattheRubylevelsowecannotsayit’sastronglock.Youcanforexampleloadanextensionlibrarymultipletimesifyouclear$".

load

loadisaloteasierthanrequire.Likerequire,itsearchesthefilein$:.ButitcanonlyloadRubyprograms.Furthermore,theextensioncannotbeomitted:thecompletefilenamemustalwaysbegiven.

load'uri.rb'#loadtheURIlibrarythatispartofthestandardlibrary

Inthissimpleexamplewetrytoloadalibrary,buttheproperwaytouseloadisforexampletoloadaresourcefilegivingitsfullpath.

FlowofthewholeprocessIfweroughlysplitit,“loadingafile”canbesplitin:

findingthefilereadingthefileandmappingittoaninternalform

evaluatingit

Theonlydifferencebetweenrequireandloadishowtofindthefile.Therestisthesameinboth.

Wewilldevelopthelastevaluationpartalittlemore.LoadedRubyprogramsarebasicallyevaluatedatthetop-level.Itmeansthedefinedconstantswillbetop-levelconstantsandthedefinedmethodswillbefunction-stylemethods.

###mylib.rbMY_OBJECT=Object.newdefmy_p(obj)pobjend

###first.rbrequire'mylib'my_pMY_OBJECT#wecanusetheconstantsandmethodsdefinedinanotherfile

Onlythelocalvariablescopeofthetop-levelchangeswhenthefilechanges.Inotherwords,localvariablescannotbesharedbetweendifferentfiles.YoucanofcoursesharethemusingforexampleProcbutthishasnothingtodowiththeloadmechanism.

Somepeoplealsomisunderstandtheloadingmechanism.Whatevertheclassyouareinwhenyoucallload,itdoesnotchangeanything.Evenif,likeinthefollowingexample,youloadafileinthemodulestatement,itdoesnotserveanypurpose,aseverythingthatisatthetop-leveloftheloadedfileisputattheRubytop-level.

require'mylib'#whatevertheplaceyourequirefrom,beitatthetop-levelmoduleSandBoxrequire'mylib'#orinamodule,theresultisthesameend

HighlightsofthischapterWiththeaboveknowledgeinourmind,wearegoingtoread.Butbecausethistimeitsspecificationisdefinedveryparticularly,ifwesimplyreadit,itcouldbejustaenumerationofthecodes.Therefore,inthischapter,wearegoingtoreducethetargettothefollowing3points:

loadingserialisationtherepartitionofthefunctionsinthedifferentsourcefileshowextensionlibrariesareloaded

Regardingthefirstpoint,youwillunderstanditwhenyouseeit.

Forthesecondpoint,thefunctionsthatappearinthischaptercomefrom4differentfiles,eval.cruby.cfile.cdln.c.Whyisthisinthisway?We’lltrytothinkabouttherealisticsituationbehindit.

Thethirdpointisjustlikeitsnamesays.Wewillseehowthecurrentlypopulartrendofexecutiontimeloading,morecommonlyreferredtoasplug-ins,works.Thisisthemostinterestingpartofthischapter,soI’dliketouseasmanypagesaspossibletotalkaboutit.

Searchingthelibrary

rb_f_require()

Thebodyofrequireisrb_f_require.First,wewillonlylookatthepartconcerningthefilesearch.Havingmanydifferentcasesisbothersomesowewilllimitourselvestothecasewhennofileextensionisgiven.

▼rb_f_require()(simplifiedversion)

5527VALUE5528rb_f_require(obj,fname)5529VALUEobj,fname;5530{5531VALUEfeature,tmp;5532char*ext,*ftptr;/*OK*/5533intstate;5534volatileintsafe=ruby_safe_level;55355536SafeStringValue(fname);5537ext=strrchr(RSTRING(fname)->ptr,'.');5538if(ext){/*...ifthefileextensionhasbeengiven...*/5584}5585tmp=fname;5586switch(rb_find_file_ext(&tmp,loadable_ext)){5587case0:5588break;55895590case1:5591feature=fname=tmp;5592gotoload_rb;55935594default:5595feature=tmp;5596fname=rb_find_file(tmp);

5597gotoload_dyna;5598}5599if(rb_feature_p(RSTRING(fname)->ptr,Qfalse))5600returnQfalse;5601rb_raise(rb_eLoadError,"Nosuchfiletoload--%s",RSTRING(fname)->ptr);56025603load_dyna:/*...loadanextensionlibrary...*/5623returnQtrue;56245625load_rb:/*...loadaRubyprogram...*/5648returnQtrue;5649}

5491staticconstchar*constloadable_ext[]={5492".rb",DLEXT,/*DLEXT=".so",".dll",".bundle"...*/5493#ifdefDLEXT25494DLEXT2,/*DLEXT2=".dll"onCygwin,MinGW*/5495#endif549605497};

(eval.c)

Inthisfunctionthegotolabelsload_rbandload_dynaareactuallylikesubroutines,andthetwovariablesfeatureandfnamearemoreorlesstheirparameters.Thesevariableshavethefollowingmeaning.

variable meaning example

feature thelibraryfilenamethatwillbeputin$" uri.rb、nkf.so

fname thefullpathtothelibrary /usr/lib/ruby/1.7/uri.rb

Thenamefeaturecanbefoundinthefunctionrb_feature_p().This

functionchecksifafilehasbeenlocked(wewilllookatitjustafter).

Thefunctionsactuallysearchingforthelibraryarerb_find_file()andrb_find_file_ext().rb_find_file()searchesafileintheloadpath$'.rb_find_file_ext()doesthesamebutthedifferenceisthatittakesasasecondparameteralistofextensions(i.e.loadable_ext)andtriestheminsequentialorder.

Belowwewillfirstlookentirelyatthefilesearchingcode,thenwewilllookatthecodeoftherequirelockinload_rb.

rb_find_file()

Firstthefilesearchcontinuesinrb_find_file().Thisfunctionsearchesthefilepathinthegloballoadpath$'(rb_load_path).Thestringcontaminationcheckistiresomesowe’llonlylookatthemainpart.

▼rb_find_file()(simplifiedversion)

2494VALUE2495rb_find_file(path)2496VALUEpath;2497{2498VALUEtmp;2499char*f=RSTRING(path)->ptr;2500char*lpath;

2530if(rb_load_path){2531longi;2532

2533Check_Type(rb_load_path,T_ARRAY);2534tmp=rb_ary_new();2535for(i=0;i<RARRAY(rb_load_path)->len;i++){2536VALUEstr=RARRAY(rb_load_path)->ptr[i];2537SafeStringValue(str);2538if(RSTRING(str)->len>0){2539rb_ary_push(tmp,str);2540}2541}2542tmp=rb_ary_join(tmp,rb_str_new2(PATH_SEP));2543if(RSTRING(tmp)->len==0){2544lpath=0;2545}2546else{2547lpath=RSTRING(tmp)->ptr;2551}2552}

2560f=dln_find_file(f,lpath);2561if(file_load_ok(f)){2562returnrb_str_new2(f);2563}2564return0;2565}

(file.c)

IfwewritewhathappensinRubywegetthefollowing:

tmp=[]#makeanarray$:.eachdo|path|#repeatoneachelementoftheloadpathtmp.pushpathifpath.length>0#checkthepathandpushitendlpath=tmp.join(PATH_SEP)#concatenateallelementsinonestringseparatedbyPATH_SEP

dln_find_file(f,lpath)#mainprocessing

PATH_SEPisthepathseparator:':'underUNIX,';'underWindows.rb_ary_join()createsastringbyputtingitbetweenthedifferent

elements.Inotherwords,theloadpaththathadbecomeanarrayisbacktoastringwithaseparator.

Why?It’sonlybecausedln_find_file()takesthepathsasastringwithPATH_SEPasaseparator.Butwhyisdln_find_file()implementedlikethat?It’sjustbecausedln.cisnotalibraryforruby.Evenifithasbeenwrittenbythesameauthor,it’sageneralpurposelibrary.That’spreciselyforthisreasonthatwhenIsortedthefilesbycategoryintheIntroductionIputthisfileintheUtilitycategory.GeneralpurposelibrariescannotreceiveRubyobjectsasparametersorreadrubyglobalvariables.

dln_find_file()alsoexpandsforexample~tothehomedirectory,butinfactthisisalreadydoneintheomittedpartofrb_find_file().Soinruby‘scaseit’snotnecessary.

LoadingwaitHere,filesearchisfinishedquickly.Thencomesistheloadingcode.Ormoreaccurately,itis“uptojustbeforetheload”.Thecodeofrb_f_require()’sload_rbhasbeenputbelow.

▼rb_f_require():load_rb

5625load_rb:5626if(rb_feature_p(RSTRING(feature)->ptr,Qtrue))5627returnQfalse;5628ruby_safe_level=0;5629rb_provide_feature(feature);5630/*theloadingofRubyprogramsisserialised*/

5631if(!loading_tbl){5632loading_tbl=st_init_strtable();5633}5634/*partialstate*/5635ftptr=ruby_strdup(RSTRING(feature)->ptr);5636st_insert(loading_tbl,ftptr,curr_thread);/*...loadtheRubyprogramandevaluateit...*/5643st_delete(loading_tbl,&ftptr,0);/*loadingdone*/5644free(ftptr);5645ruby_safe_level=safe;

(eval.c)

Likementionedabove,rb_feature_p()checksifalockhasbeenputin$".Andrb_provide_feature()pushesastringin$",inotherwordslocksthefile.

Theproblemcomesafter.Likethecommentsays“theloadingofRubyprogramsisserialised”.Inotherwords,afilecanonlybeloadedfromonethread,andifduringtheloadinganotherthreadtriestoloadthesamefile,thatthreadwillwaitforthefirstloadingtobefinished.Ifitwerenotthecase:

Thread.fork{require'foo'#Atthebeginningofrequire,foo.rbisaddedto$"}#Howeverthethreadchangesduringtheevaluationoffoo.rbrequire'foo'#foo.rbisalreadyin$"sothefunctionreturnsimmediately#(A)theclassesoffooareused...

Bydoingsomethinglikethis,eventhoughthefoolibraryisnotreallyloaded,thecodeat(A)endsupbeingexecuted.

Theprocesstoenterthewaitingstateissimple.Ast_tableiscreatedinloading_tbl,theassociation“feature=>waitingthread”is

recordedinit.curr_threadisineval.c’sfunctions,itsvalueisthecurrentrunningthread.

Themechanismtoenterthewaitingstateisverysimple.Ast_tableiscreatedintheloading_tblglobalvariable,anda“feature=>loadingthread”associationiscreated.curr_threadisavariablefromeval.c,anditsvalueisthecurrentlyrunningthread.Thatmakesanexclusivelock.Andinrb_feature_p(),wewaitfortheloadingthreadtoendlikethefollowing.

▼rb_feature_p()(secondhalf)

5477rb_thread_tth;54785479while(st_lookup(loading_tbl,f,&th)){5480if(th==curr_thread){5481returnQtrue;5482}5483CHECK_INTS;5484rb_thread_schedule();5485}

(eval.c)

Whenrb_thread_schedule()iscalled,thecontrolistransferredtoanotherthread,andthisfunctiononlyreturnsafterthecontrolreturnedbacktothethreadwhereitwascalled.Whenthefilenamedisappearsfromloading_tbl,theloadingisfinishedsothefunctioncanend.Thecurr_threadcheckisnottolockitself(figure1).

Figure1:Serialisationofloads

LoadingofRubyprograms

rb_load()

Wewillnowlookattheloadingprocessitself.Let’sstartbythepartinsiderb_f_require()’sload_rbloadingRubyprograms.

▼rb_f_require()-load_rb-loading

5638PUSH_TAG(PROT_NONE);5639if((state=EXEC_TAG())==0){5640rb_load(fname,0);5641}5642POP_TAG();

(eval.c)

Therb_load()whichiscalledhereisactuallythe“meat”oftheRuby-levelload.Thismeansitneedstosearchonceagain,butlookingatthesameprocedureonceagainistoomuchtrouble.Therefore,thatpartisomittedinthebelowcodes.

Andthesecondargumentwrapisfoldedwith0becauseitis0intheabovecallingcode.

▼rb_load()(simplifiededition)

voidrb_load(fname,/*wrap=0*/)VALUEfname;{intstate;volatileIDlast_func;

volatileVALUEwrapper=0;volatileVALUEself=ruby_top_self;NODE*saved_cref=ruby_cref;

PUSH_VARS();PUSH_CLASS();ruby_class=rb_cObject;ruby_cref=top_cref;/*(A-1)changeCREF*/wrapper=ruby_wrapper;ruby_wrapper=0;PUSH_FRAME();ruby_frame->last_func=0;ruby_frame->last_class=0;ruby_frame->self=self;/*(A-2)changeruby_frame->cbase*/ruby_frame->cbase=(VALUE)rb_node_newnode(NODE_CREF,ruby_class,0,0);PUSH_SCOPE();/*atthetop-levelthevisibilityisprivatebydefault*/SCOPE_SET(SCOPE_PRIVATE);PUSH_TAG(PROT_NONE);ruby_errinfo=Qnil;/*makesureit'snil*/state=EXEC_TAG();last_func=ruby_frame->last_func;if(state==0){NODE*node;

/*(B)thisisdealtwithasevalforsomereasons*/ruby_in_eval++;rb_load_file(RSTRING(fname)->ptr);ruby_in_eval--;node=ruby_eval_tree;if(ruby_nerrs==0){/*noparseerroroccurred*/eval_node(self,node);}}ruby_frame->last_func=last_func;POP_TAG();ruby_cref=saved_cref;POP_SCOPE();POP_FRAME();POP_CLASS();POP_VARS();ruby_wrapper=wrapper;if(ruby_nerrs>0){/*aparseerroroccurred*/

ruby_nerrs=0;rb_exc_raise(ruby_errinfo);}if(state)jump_tag_but_local_jump(state);if(!NIL_P(ruby_errinfo))/*anexceptionwasraisedduringtheloading*/rb_exc_raise(ruby_errinfo);}

Justafterwethoughtwe’vebeenthroughthestormofstackmanipulationsweenteredagain.Althoughthisistough,let’scheerupandreadit.

Asthelongfunctionsusuallyare,almostallofthecodeareoccupiedbytheidioms.PUSH/POP,tagprotectingandre-jumping.Amongthem,whatwewanttofocusonisthethingson(A)whichrelatetoCREF.Sincealoadedprogramisalwaysexecutedonthetop-level,itsetsaside(notpush)ruby_crefandbringsbacktop_cref.ruby_frame->cbasealsobecomesanewone.

Andonemoreplace,at(B)somehowruby_in_evalisturnedon.Whatisthepartinfluencedbythisvariable?Iinvestigateditanditturnedoutthatitseemsonlyrb_compile_error().Whenruby_in_evalistrue,themessageisstoredintheexceptionobject,butwhenitisnottrue,themessageisprintedtostderr.Inotherwords,whenitisaparseerrorofthemainprogramofthecommand,itwantstoprintdirectlytostderr,butwheninsideoftheevaluator,itisnotappropriatesoitstopstodoit.Itseemsthe“eval”ofruby_in_evalmeansneithertheevalmethodnortheeval()functionbut“evaluate”asageneralnoun.Or,it’spossibleitindicateseval.c.

rb_load_file()

Then,allofasudden,thesourcefileisruby.chere.Ortoputitmoreaccurately,essentiallyitisfavorableiftheentireloadingcodewasputinruby.c,butrb_load()hasnochoicebuttousePUSH_TAGandsuch.Therefore,puttingitineval.cisinevitable.Ifitwerenotthecase,allofthemwouldbeputineval.cinthefirstplace.

Then,itisrb_load_file().

▼rb_load_file()

865void866rb_load_file(fname)867char*fname;868{869load_file(fname,0);870}

(ruby.c)

Delegatedentirely.Thesecondargumentscriptofload_file()isabooleanvalueanditindicateswhetheritisloadingthefileoftheargumentoftherubycommand.Now,becausewe’dliketoassumeweareloadingalibrary,let’sfolditbyreplacingitwithscript=0.Furthermore,inthebelowcode,alsothinkingaboutthemeanings,nonessentialthingshavealreadybeenremoved.

▼load_file()(simplifiededition)

staticvoid

load_file(fname,/*script=0*/)char*fname;{VALUEf;{FILE*fp=fopen(fname,"r");(A)if(fp==NULL){rb_load_fail(fname);}fclose(fp);}f=rb_file_open(fname,"r");(B)rb_compile_file(fname,f,1);(C)rb_io_close(f);}

(A)Thecalltofopen()istocheckifthefilecanbeopened.Ifthereisnoproblem,it’simmediatelyclosed.Itmayseemalittleuselessbutit’sanextremelysimpleandyethighlyportableandreliablewaytodoit.

(B)Thefileisopenedonceagain,thistimeusingtheRubylevellibraryFile.open.ThefilewasnotopenedwithFile.openfromthebeginningsoasnottoraiseanyRubyexception.Hereifanyexceptionoccurredwewouldliketohavealoadingerror,butgettingtheerrorsrelatedtoopen,forexampleErrno::ENOENT,Errno::EACCESS…,wouldbeproblematic.Weareinruby.csowecannotstopatagjump.

(C)Usingtheparserinterfacerb_compile_file(),theprogramisreadfromanIOobject,andcompiledinasyntaxtree.Thesyntaxtreeisaddedtoruby_eval_treesothereisnoneedtogettheresult.

That’sallfortheloadingcode.Finally,thecallswerequitedeepsothecallgraphofrb_f_require()isshownbellow.

rb_f_require....eval.crb_find_file....file.cdln_find_file....dln.cdln_find_file_1rb_loadrb_load_file....ruby.cload_filerb_compile_file....parse.yeval_node

Youmustbringcallgraphsonalongtrip.It’scommonknowledge.

ThenumberofopenrequiredforloadingPreviously,therewasopenusedjusttocheckifafilecanbeopen,butinfact,duringtheloadingprocessofruby,additionallyotherfunctionssuchasrb_find_file_ext()alsointernallydochecksbyusingopen.Howmanytimesisopen()calledinthewholeprocess?

Ifyou’rewonderingthat,justactuallycountingitistherightattitudeasaprogrammer.Wecaneasilycountitbyusingasystemcalltracer.ThetooltousewouldbestraceonLinux,trussonSolaris,ktraceortrussonBSD.Likethis,foreachOS,thenameisdifferentandthere’snoconsistency,butyoucanfindthembygoogling.

Ifyou’reusingWindows,probablyyourIDEwillhaveatracerbuiltin.Well,asmymainenvironmentisLinux,Ilookedusingstrace.

Theoutputisdoneonstderrsoitwasredirectedusing2>&1.

%straceruby-e'require"rational"'2>&1|grep'^open'open("/etc/ld.so.preload",O_RDONLY)=-1ENOENTopen("/etc/ld.so.cache",O_RDONLY)=3open("/usr/lib/libruby-1.7.so.1.7",O_RDONLY)=3open("/lib/libdl.so.2",O_RDONLY)=3open("/lib/libcrypt.so.1",O_RDONLY)=3open("/lib/libc.so.6",O_RDONLY)=3open("/usr/lib/ruby/1.7/rational.rb",O_RDONLY|O_LARGEFILE)=3open("/usr/lib/ruby/1.7/rational.rb",O_RDONLY|O_LARGEFILE)=3open("/usr/lib/ruby/1.7/rational.rb",O_RDONLY|O_LARGEFILE)=3open("/usr/lib/ruby/1.7/rational.rb",O_RDONLY|O_LARGEFILE)=3

Untiltheopenoflibc.so.6,itistheopenusedintheimplementationofdynamiclinks,andtherearetheotherfouropens.Thusitseemsthethreeofthemareuseless.

Loadingofextensionlibraries

rb_f_require()-load_dynaThistimewewillseetheloadingofextensionlibraries.Wewillstartwithrb_f_require()’sload_dyna.However,wedonotneedthepartaboutlockinganymoresoitwasremoved.

▼rb_f_require()-load_dyna

5607{5608intvolatileold_vmode=scope_vmode;

56095610PUSH_TAG(PROT_NONE);5611if((state=EXEC_TAG())==0){5612void*handle;56135614SCOPE_SET(SCOPE_PUBLIC);5615handle=dln_load(RSTRING(fname)->ptr);5616rb_ary_push(ruby_dln_librefs,LONG2NUM((long)handle));5617}5618POP_TAG();5619SCOPE_SET(old_vmode);5620}5621if(state)JUMP_TAG(state);

(eval.c)

Bynow,thereisverylittleherewhichisnovel.Thetagsareusedonlyinthewayoftheidiom,andtosave/restorethevisibilityscopeisdoneinthewaywegetusedtosee.Allthatremainsisdln_load().Whatonearthisthatfor?Fortheanswer,continuetothenextsection.

Brushupaboutlinksdln_load()isloadinganextensionlibrary,butwhatdoesloadinganextensionlibrarymean?Totalkaboutit,weneedtodramaticallyrollbackthetalktothephysicalworld,andstartwithaboutlinks.

IthinkcompilingCprogramsis,ofcourse,notanewthingforyou.SinceI’musinggcconLinux,Icancreatearunnableprograminthefollowingmanner.

%gcchello.c

Accordingtothefilename,thisisprobablyan“Hello,World!”program.InUNIX,gccoutputsaprogramintoafilenameda.outbydefault,soyoucansubsequentlyexecuteitinthefollowingway:

%./a.outHello,World!

Itiscreatedproperly.

Bytheway,whatisgccactuallydoinghere?Usuallywejustsay“compile”or“compile”,butactually

1. preprocess(cpp)2. compileCintoassembly(cc)3. assembletheassemblylanguageintomachinecode(as)4. link(ld)

therearethesefoursteps.Amongthem,preprocessingandcompilingandassemblingaredescribedinalotofplaces,butthedescriptionoftenendswithoutclearlydescribingaboutthelinkingphase.Itislikeahistoryclassinschoolwhichwouldneverreach“modernage”.Therefore,inthisbook,tryingtoprovidetheextinguishedpart,I’llbrieflysummarizewhatislinking.

Aprogramfinishedtheassemblingphasebecomesan“objectfile”insomewhatformat.Thefollowingformatsaresomeofsuchformatswhicharemajor.

ELF,ExecutableandLinkingFormat(recentUNIX)

a.out,assembleroutput(relativelyoldUNIX)COFF,CommonObjectFileFormat(Win32)

Itmightgowithoutsayingthatthea.outasanobjectfileformatandthea.outasadefaultoutputfilenameofccaretotallydifferentthings.Forexample,onmodernLinux,whenwecreateitordinarily,thea.outfileinELFformatiscreated.

And,howtheseobjectfileformatsdiffereachotherisnotimportantnow.Whatwehavetorecognizenowis,alloftheseobjectfilescanbeconsideredas“asetofnames”.Forexample,thefunctionnamesandthevariablenameswhichexistinthisfile.

And,setsofnamescontainedintheobjectfilehavetwotypes.

setofnecessarynames(forinstance,theexternalfunctionscalledinternally.e.g.printf)

setofprovidingnames(forinstance,thefunctionsdefinedinternally.e.g.hello)

Andlinkingis,whengatheringmultipleobjectfiles,checkingif“thesetofprovidingnames”contains“thesetofnecessarynames”entirely,andconnectingthemeachother.Inotherwords,pullingthelinesfromallof“thenecessarynames”,eachlinemustbeconnectedtooneof“theprovidingnames”ofaparticularobjectfile.(Figure.2)Toputthisintechnicalterms,itisresolvingundefinedsymbols.

Figure2:objectfilesandlinking

Logicallythisishowitis,butinrealityaprogramcan’trunonlybecauseofthis.Atleast,Cprogramscannotrunwithoutconvertingthenamestotheaddresses(numbers).

So,afterthelogicalconjunctions,thephysicalconjunctionsbecomenecessary.Wehavetomapobjectfilesintotherealmemoryspaceandsubstitutetheallnameswithnumbers.Concretelyspeaking,forinstance,theaddressestojumptoonfunctioncallsareadjustedhere.

And,basedonthetimingwhentodothesetwoconjunctions,linkingisdividedintotwotypes:staticlinkinganddynamiclinking.Staticlinkingfinishestheallphasesduringthecompiletime.Ontheotherhand,dynamiclinkingdeferssomeoftheconjunctionstotheexecutingtime.Andlinkingisfinallycompletedwhenexecuting.

However,whatexplainedhereisaverysimpleidealisticmodel,andithasanaspectdistortingtherealityalot.Logicalconjunctionsandphysicalconjunctionsarenotsocompletelyseparated,and“anobjectfileisasetofnames”istoonaive.Butthebehavioraroundthisconsiderablydiffersdependingoneachplatform,describingseriouslywouldendupwithonemorebook.Toobtaintherealisticlevelknowledge,additionally,“ExpertCProgramming:DeepCSecrets”byPetervanderLinden,“LinkersandLoaders”byJohnR.LevineIrecommendtoreadthesebooks.

LinkingthatistrulydynamicAndfinallywegetintoourmaintopic.The“dynamic”in“dynamiclinking”naturallymeansit“occursatexecutiontime”,butwhatpeopleusuallyrefertoas“dynamiclinking”isprettymuchdecidedalreadyatcompiletime.Forexample,thenamesoftheneededfunctions,andwhichlibrarytheycanbefoundin,arealreadyknown.Forinstance,ifyouneedcos(),youknowit’sinlibm,soyouusegcc-lm.Ifyoudidn’tspecifythecorrectlibraryatcompiletime,you’dgetalinkerror.

Butextensionlibrariesaredifferent.Neitherthenamesoftheneededfunctions,orthenameofthelibrarywhichdefinesthemareknownatcompiletime.Weneedtoconstructastringatexecutiontimeandloadandlink.Itmeansthateven“thelogicalconjunctions”inthesenseofthepreviouswordsshouldbedoneentirelyatexecutiontime.Inordertodoit,anothermechanismthatisalittledifferentformtheordinaldynamiclinkingsis

required.

Thismanipulation,linkingthatisentirelydecidedatruntime,isusuallycalled“dynamicload”.

DynamicloadAPII’vefinishedtoexplaintheconcept.Therestishowtodothatdynamicloading.Thisisnotadifficultthing.Usuallythere’saspecificAPIpreparedinthesystem,wecanaccomplishitbymerelycallingit.

Forexample,whatisrelativelybroadforUNIXistheAPInameddlopen.However,Ican’tsay“ItisalwaysavailableonUNIX”.Forexample,foralittlepreviousHP-UXhasatotallydifferentinterface,andaNeXT-flavorAPIisusedonMacOSX.Andevenifitisthesamedlopen,itisincludedinlibconBSD-derivedOS,anditisattachedfromoutsideaslibdlonLinux.Therefore,itisdesperatelynotportable.ItdiffersevenamongUNIX-basedplatforms,itisobvioustobecompletelydifferentintheotherOperatingSystems.ItisunlikelythatthesameAPIisused.

Then,howrubyisdoingis,inordertoabsorbthetotallydifferentinterfaces,thefilenameddln.cisprepared.dlnisprobablytheabbreviationof“dynamiclink”.dln_load()isoneoffunctionsofdln.c.

WheredynamicloadingAPIsaretotallydifferenteachother,the

onlysavingistheusagepatternofAPIiscompletelythesame.Whicheverplatformyouareon,

1. mapthelibrarytotheaddressspaceoftheprocess2. takethepointerstothefunctionscontainedinthelibrary3. unmapthelibrary

itconsistsofthesetheresteps.Forexample,ifitisdlopen-basedAPI,

1. dlopen2. dlsym3. dlclose

arethecorrespondences.IfitisWin32API,

1. LoadLibrary(orLoadLibraryEx)2. GetProcAddress3. FreeLibrary

arethecorrespondences.

Atlast,I’lltalkaboutwhatdln_load()isdoingbyusingtheseAPIs.Itis,infact,callingInit_xxxx().Byreachinghere,wefinallybecometobeabletoillustratetheentireprocessofrubyfromtheinvocationtothecompletionwithoutanylacks.Inotherwords,whenrubyisinvoked,itinitializestheevaluatorandstartsevaluatingaprogrampassedinsomewhatway.Ifrequireorloadoccursduringtheprocess,itloadsthelibraryandtransfersthe

control.TransferringthecontrolmeansparsingandevaluatingifitisaRubylibraryanditmeansloadingandlinkingandfinallycallingInit_xxxx()ifitisanextensionlibrary.

dln_load()

Finally,we’vereachedthecontentofdln_load().dln_load()isalsoalongfunction,butitsstructureissimplebecauseofsomereasons.Takealookattheoutlinefirst.

▼dln_load()(outline)

void*dln_load(file)constchar*file;{#ifdefined_WIN32&&!defined__CYGWIN__loadwithWin32API#elseinitializationdependingoneachplatform#ifdefeachplatform……routinesforeachplatform……#endif#endif#if!defined(_AIX)&&!defined(NeXT)failed:rb_loaderror("%s-%s",error,file);#endifreturn0;/*dummyreturn*/}

Thisway,thepartconnectingtothemainiscompletelyseparatedbasedoneachplatform.Whenthinking,weonlyhavetothinkaboutoneplatformatatime.SupportedAPIsareasfollows:

dlopen(MostofUNIX)LoadLibrary(Win32)shl_load(abitoldHP-UX)a.out(veryoldUNIX)rld_load(beforeNeXT4)dyld(NeXTorMacOSX)get_image_symbol(BeOS)GetDiskFragment(MacOs9andbefore)load(abitoldAIX)

dln_load()-dlopen()First,let’sstartwiththeAPIcodeforthedlopenseries.

▼dln_load()-dlopen()

1254void*1255dln_load(file)1256constchar*file;1257{1259constchar*error=0;1260#defineDLN_ERROR()(error=dln_strerror(),\strcpy(ALLOCA_N(char,strlen(error)+1),error))1298char*buf;1299/*writeastring"Init_xxxx"tobuf(thespaceisallocatedwithalloca)*/1300init_funcname(&buf,file);

1304{1305void*handle;1306void(*init_fct)();13071308#ifndefRTLD_LAZY1309#defineRTLD_LAZY1

1310#endif1311#ifndefRTLD_GLOBAL1312#defineRTLD_GLOBAL01313#endif13141315/*(A)loadthelibrary*/1316if((handle=(void*)dlopen(file,RTLD_LAZY|RTLD_GLOBAL))==NULL){1317error=dln_strerror();1318gotofailed;1319}1320/*(B)getthepointertoInit_xxxx()*/1321init_fct=(void(*)())dlsym(handle,buf);1322if(init_fct==NULL){1323error=DLN_ERROR();1324dlclose(handle);1325gotofailed;1326}1327/*(C)callInit_xxxx()*/1328(*init_fct)();13291330returnhandle;1331}

1576failed:1577rb_loaderror("%s-%s",error,file);1580}

(dln.c)

(A)theRTLD_LAZYastheargumentofdlopen()indicates“resolvingtheundefinedsymbolswhenthefunctionsareactuallydemanded”Thereturnvalueisthemark(handle)todistinguishthelibraryandwealwaysneedtopassitwhenusingdl*().

(B)dlsym()getsthefunctionpointerfromthelibraryspecifiedbythehandle.IfthereturnvalueisNULL,itmeansfailure.Here,

gettingthepointertoInit_xxxx()IfthereturnvalueisNULL,itmeansfailure.Here,thepointertoInit_xxxx()isobtainedandcalled.

dlclose()isnotcalledhere.SincethepointerstothefunctionsoftheloadedlibraryarepossiblyreturnedinsideInit_xxx(),itistroublesomeifdlclose()isdonebecausetheentirelibrarywouldbedisabledtouse.Thus,wecan’tcalldlclose()untiltheprocesswillbefinished.

dln_load()—Win32AsforWin32,LoadLibrary()andGetProcAddress()areused.ItisverygeneralWin32APIwhichalsoappearsonMSDN.

▼dln_load()-Win32

1254void*1255dln_load(file)1256constchar*file;1257{

1264HINSTANCEhandle;1265charwinfile[MAXPATHLEN];1266void(*init_fct)();1267char*buf;12681269if(strlen(file)>=MAXPATHLEN)rb_loaderror("filenametoolong");12701271/*writethe"Init_xxxx"stringtobuf(thespaceisallocatedwithalloca)*/1272init_funcname(&buf,file);12731274strcpy(winfile,file);1275

1276/*loadthelibrary*/1277if((handle=LoadLibrary(winfile))==NULL){1278error=dln_strerror();1279gotofailed;1280}12811282if((init_fct=(void(*)())GetProcAddress(handle,buf))==NULL){1283rb_loaderror("%s-%s\n%s",dln_strerror(),buf,file);1284}12851286/*callInit_xxxx()*/1287(*init_fct)();1288returnhandle;

1576failed:1577rb_loaderror("%s-%s",error,file);1580}

(dln.c)

DoingLoadLibrary()thenGetProcAddress().Thepatternissoequivalentthatnothingislefttosay,Idecidedtoendthischapter.

TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera

CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License

RubyHackingGuide

Chapter19:Threads

Outline

RubyInterfaceCometothinkofit,IfeelIhavenotintroducedanactualcodetouseRubythreads.Thisisnotsospecial,buthereI’llintroduceitjustincase.

Thread.fork{whiletrueputs'forkedthread'end}whiletrueputs'mainthread'end

Whenexecutingthisprogram,alotof"forkedthread"and"mainthread"areprintedintheproperlymixedstate.

Ofcourse,otherthanjustcreatingmultiplethreads,therearealsovariouswaystocontrol.There’snotthesynchronizeasareservedwordlikeJava,commonprimitivessuchasMutexorQueueorMonitorareofcourseavailable,andthebelowAPIscanbeusedtocontrola

threaditself.

▼ThreadAPI

Thread.pass transfertheexecutiontoanyotherthreadThread.kill(th) terminatestheththreadThread.exit terminatesthethreaditselfThread.stop temporarilystopthethreaditselfThread#join waitingforthethreadtofinishThread#wakeup towakeupthetemporarilystoppedthread

rubyThreadThreadsaresupposedto“runalltogether”,butactuallytheyarerunningforalittletimeinturns.Tobeprecise,bymakingsomeeffortsonamachineofmultiCPU,it’spossiblethat,forinstance,twoofthemarerunningatthesametime.Butstill,iftherearemorethreadsthanthenumberofCPU,theyhavetoruninturns.

Inotherwords,inordertocreatethreads,someonehastoswitchthethreadsinsomewhere.Thereareroughlytwowaystodoit:kernel-levelthreadsanduser-levelthreads.Theyarerespectively,asthenamessuggest,tocreateathreadinkerneloratuser-level.Ifitiskernel-level,bymakinguseofmulti-CPU,multiplethreadscanrunatthesametime.

Then,howaboutthethreadofruby?Itisuser-levelthread.And(Therefore),thenumberofthreadsthatarerunnableatthesametimeislimitedtoone.

Isitpreemptive?I’lldescribeaboutthetraitsofrubythreadsinmoredetail.Asanalternativepointofviewofthreads,there’sthepointthatis“isitpreemptive?”.

Whenwesay“thread(system)ispreemptive”,thethreadswillautomaticallybeswitchedwithoutbeingexplicitlyswitchedbyitsuser.Lookingthisfromtheoppositedirection,theusercan’tcontrolthetimingofswitchingthreads.

Ontheotherhand,inanon-preemptivethreadsystem,untiltheuserwillexplicitlysay“Icanpassthecontrolrighttothenextthread”,threadswillneverbeswitched.Lookingthisfromtheoppositedirection,whenandwherethere’sthepossibilityofswitchingthreadsisobvious.

Thisdistinctionisalsoforprocesses,inthatcase,preemptiveisconsideredas“superior”.Forexample,ifaprogramhadabuganditenteredaninfiniteloop,theprocesseswouldneverbeabletoswitch.Thismeansauserprogramcanhaltthewholesystemandisnotgood.And,switchingprocesseswasnon-preemptiveonWindows3.1becauseitsbasewasMS-DOS,butWindows95ispreemptive.Thus,thesystemismorerobust.Hence,itissaidthatWindows95is“superior”to3.1.

Then,howabouttherubythread?ItispreemptiveatRuby-level,andnon-preemptiveatClevel.Inotherwords,whenyouarewritingCcode,youcandeterminealmostcertainlythetimingsof

switchingthreads.

Whyisthisdesignedinthisway?Threadsareindeedconvenient,butitsuseralsoneedtopreparecertainminds.Itmeansthatitisnecessarythecodeiscompatibletothethreads.(Itmustbemulti-threadsafe).Inotherwords,inordertomakeitpreemptivealsoinClevel,theallClibrarieshavetobethreadsafe.

Butinreality,therearealsoalotofClibrariesthatarestillnotthreadsafe.Alotofeffortsweremadetoeasetowriteextensionlibraries,butitwouldbebrownifthenumberofusablelibrariesisdecreasedbyrequiringthreadsafety.Therefore,non-preemptiveatClevelisareasonablechoiceforruby.

ManagementSystemWe’veunderstandrubythreadisnon-preemptiveatClevel.Itmeansafteritrunsforawhile,itvoluntarilyletgoofthecontrollingright.Then,I’dlikeyoutosupposethatnowacurrentlybeingexecutedthreadisabouttoquittheexecution.Whowillnextreceivethecontrolright?Butbeforethat,it’simpossibletoguessitwithoutknowinghowthreadsareexpressedinsiderubyinthefirstplace.Let’slookatthevariablesandthedatatypestomanagethreads.

▼thestructuretomanagethreads

864typedefstructthread*rb_thread_t;865staticrb_thread_tcurr_thread=0;

866staticrb_thread_tmain_thread;

7301structthread{7302structthread*next,*prev;

(eval.c)

Sincestructthreadisveryhugeforsomereason,thistimeInarroweditdowntotheonlyimportantpart.Itiswhythereareonlythetwo.Thesenextandprevaremembernames,andtheirtypesarerb_thread_t,thuswecanexpectrb_thread_tisconnectedbyadual-directionallinklist.Andactuallyitisnotanordinarydual-directionallist,thebothendsareconnected.Itmeans,itiscircular.Thisisabigpoint.Addingthestaticmain_threadandcurr_threadvariablestoit,thewholedatastructurewouldlooklikeFigure1.

Figure1:thedatastructurestomanagethreads

main_thread(mainthread)meansthethreadexistedatthetime

whenaprogramstarted,meaningthe“first”thread.curr_threadisobviouslycurrentthread,meaningthethreadcurrentlyrunning.Thevalueofmain_threadwillneverchangewhiletheprocessisrunning,butthevalueofcurr_threadwillchangefrequently.

Inthisway,becausethelistisbeingacircle,theproceduretochose“thenextthread”becomeseasy.Itcanbedonebymerelyfollowingthenextlink.Onlybythis,wecanrunallthreadsequallytosomeextent.

Whatdoesswitchingthreadsmean?Bytheway,whatisathreadinthefirstplace?Or,whatmakesustosaythreadsareswitched?

Theseareverydifficultquestions.Similartowhataprogramisorwhatanobjectis,whenaskedaboutwhatareusuallyunderstoodbyfeelings,it’shardtoanswerclearly.Especially,“whatisthedifferencebetweenthreadsandprocesses?”isagoodquestion.

Still,inarealisticrange,wecandescribeittosomeextent.Whatnecessaryforthreadsisthecontextofexecuting.Asforthecontextofruby,aswe’veseenbynow,itconsistsofruby_frameandruby_scopeandruby_classandsoon.Andrubyallocatesthesubstanceofruby_frameonthemachinestack,andtherearealsothestackspaceusedbyextensionlibraries,thereforethemachinestackisalsonecessaryasacontextofaRubyprogram.Andfinally,theCPUregistersareindispensable.Thesevariouscontextsarethe

elementstoenablethreads,andswitchingthemmeansswitchingthreads.Or,itiscalled“context-switch”.

Thewayofcontext-switchingTheresttalkishowtoswitchcontexts.ruby_scopeandruby_classareeasytoreplace:allocatespacesforthemsomewheresuchastheheapandsetthemasideonebyone.FortheCPUregisters,wecanmakeitbecausewecansaveandwritebackthembyusingsetjmp().Thespacesforbothpurposesarerespectivelypreparedinrb_thread_t.

▼structthread(partial)

7301structthread{7302structthread*next,*prev;7303jmp_bufcontext;

7315structFRAME*frame;/*ruby_frame*/7316structSCOPE*scope;/*ruby_scope*/7317structRVarmap*dyna_vars;/*ruby_dyna_vars*/7318structBLOCK*block;/*ruby_block*/7319structiter*iter;/*ruby_iter*/7320structtag*tag;/*prot_tag*/7321VALUEklass;/*ruby_class*/7322VALUEwrapper;/*ruby_wrapper*/7323NODE*cref;/*ruby_cref*/73247325intflags;/*scope_vmode/rb_trap_immediate/raised*/73267327NODE*node;/*rb_current_node*/73287329inttracing;/*tracing*/7330VALUEerrinfo;/*$!*/7331VALUElast_status;/*$?*/7332VALUElast_line;/*$_*/

7333VALUElast_match;/*$~*/73347335intsafe;/*ruby_safe_level*/

(eval.c)

Asshownabove,therearethemembersthatseemtocorrespondtoruby_frameandruby_scope.There’salsoajmp_buftosavetheregisters.

Then,theproblemisthemachinestack.Howcanwesubstitutethem?

Thewaywhichisthemoststraightforwardforthemechanismisdirectlywritingoverthepointertotheposition(end)ofthestack.Usually,itisintheCPUregisters.Sometimesitisaspecificregister,anditisalsopossiblethatageneral-purposeregisterisallocatedforit.Anyway,itisinsomewhere.Forconvenience,we’llcallitthestackpointerfromnowon.Itisobviousthatthedifferentspacecanbeusedasthestackbymodifyingit.ButitisalsoobviousinthiswaywehavetodealwithitforeachCPUandforeachOS,thusitisreallyhardtoservethepotability.

Therefore,rubyusesaveryviolentwaytoimplementthesubstitutionofthemachinestack.Thatis,ifwecan’tmodifythestackpointer,let’smodifytheplacethestackpointerpointsto.Weknowthestackcanbedirectlymodifiedaswe’veseeninthedescriptionaboutthegarbagecollection,therestisslightlychangingwhattodo.Theplacetostorethestackproperlyexistsinstructthread.

▼structthread(partial)

7310intstk_len;/*thestacklength*/7311intstk_max;/*thesizeofmemoryallocatedforstk_ptr*/7312VALUE*stk_ptr;/*thecopyofthestack*/7313VALUE*stk_pos;/*thepositionofthestack*/

(eval.c)

HowtheexplanationgoesSofar,I’vetalkedaboutvariousthings,buttheimportantpointscanbesummarizedtothethree:

WhenTowhichthreadHow

toswitchcontext.Thesearealsothepointsofthischapter.Below,I’lldescribethemusingasectionforeachofthethreepointsrespectively.

Trigger

Tobeginwith,it’sthefirstpoint,whentoswitchthreads.Inotherwords,whatisthecauseofswitchingthreads.

WaitingI/OForexample,whentryingtoreadinsomethingbycallingIO#getsorIO#read,sincewecanexpectitwilltakealotoftimetoread,it’sbettertoruntheotherthreadsinthemeantime.Inotherwords,aforcibleswitchbecomesnecessaryhere.Belowistheinterfaceofgetc.

▼rb_getc()

1185int1186rb_getc(f)1187FILE*f;1188{1189intc;11901191if(!READ_DATA_PENDING(f)){1192rb_thread_wait_fd(fileno(f));1193}1194TRAP_BEG;1195c=getc(f);1196TRAP_END;11971198returnc;1199}

(io.c)

READ_DATA_PENDING(f)isamacrotocheckifthecontentofthebufferofthefileisstillthere.Ifthere’sthecontentofthebuffer,itmeansitcanmovewithoutanywaitingtime,thusitwouldreaditimmediately.Ifitwasempty,itmeansitwouldtakesometime,thusitwouldrb_thread_wait_fd().Thisisanindirectcauseofswitchingthreads.

Ifrb_thread_wait_fd()is“indirect”,therealsoshouldbea“direct”cause.Whatisit?Let’sseetheinsideofrb_thread_wait_fd().

▼rb_thread_wait_fd()

8047void8048rb_thread_wait_fd(fd)8049intfd;8050{8051if(rb_thread_critical)return;8052if(curr_thread==curr_thread->next)return;8053if(curr_thread->status==THREAD_TO_KILL)return;80548055curr_thread->status=THREAD_STOPPED;8056curr_thread->fd=fd;8057curr_thread->wait_for=WAIT_FD;8058rb_thread_schedule();8059}

(eval.c)

There’srb_thread_schedule()atthelastline.Thisfunctionisthe“directcause”.Itistheheartoftheimplementationoftherubythreads,anddoesselectandswitchtothenextthread.

Whatmakesusunderstandthisfunctionhassuchroleis,inmycase,Iknewtheword“scheduling”ofthreadsbeforehand.Evenifyoudidn’tknow,becauseyouremembersnow,you’llbeabletonoticeitatthenexttime.

And,inthiscase,itdoesnotmerelypassthecontroltotheotherthread,butitalsostopsitself.Moreover,ithasanexplicitdeadlinethatis“bythetimewhenitbecomesreadable”.Therefore,this

requestshouldbetoldtorb_thread_schedule().Thisistheparttoassignvariousthingstothemembersofcurr_thread.Thereasontostopisstoredinwait_for,theinformationtobeusedwhenwakingupisstoredinfd,respectively.

WaitingtheotherthreadAfterunderstandingthreadsareswitchedatthetimingofrb_thread_schedule(),thistime,conversely,fromtheplacewhererb_thread_schedule()appears,wecanfindtheplaceswherethreadsareswitched.Thenbyscanning,Ifounditinthefunctionnamedrb_thread_join().

▼rb_thread_join()(partial)

8227staticint8228rb_thread_join(th,limit)8229rb_thread_tth;8230doublelimit;8231{

8243curr_thread->status=THREAD_STOPPED;8244curr_thread->join=th;8245curr_thread->wait_for=WAIT_JOIN;8246curr_thread->delay=timeofday()+limit;8247if(limit<DELAY_INFTY)curr_thread->wait_for|=WAIT_TIME;8248rb_thread_schedule();

(eval.c)

ThisfunctionisthesubstanceofThread#join,andThread#joinisamethodtowaituntilthereceiverthreadwillend.Indeed,since

there’stimetowait,runningtheotherthreadsiseconomy.Becauseofthis,thesecondreasontoswitchisfound.

WaitingForTimeMoreover,alsointhefunctionnamedrb_thread_wait_for(),rb_thread_schedule()wasfound.Thisisthesubstanceof(Ruby’s)sleepandsuch.

▼rb_thread_wait_for(simplified)

8080void8081rb_thread_wait_for(time)8082structtimevaltime;8083{8084doubledate;

8124date=timeofday()+(double)time.tv_sec+(double)time.tv_usec*1e-6;8125curr_thread->status=THREAD_STOPPED;8126curr_thread->delay=date;8127curr_thread->wait_for=WAIT_TIME;8128rb_thread_schedule();8129}

(eval.c)

timeofday()returnsthecurrenttime.Becausethevalueoftimeisaddedtoit,dateindicatesthetimewhenthewaitingtimeisover.Inotherwords,thisistheorder“I’dliketostopuntilitwillbethespecifictime”.

Switchbyexpirations

Intheaboveallcases,becausesomemanipulationsaredonefromRubylevel,consequentlyitcausestoswitchthreads.Inotherwords,bynow,theRuby-levelisalsonon-preemptive.Onlybythis,ifaprogramwastosingle-mindedlykeepcalculating,aparticularthreadwouldcontinuetoruneternally.Therefore,weneedtoletitvoluntarydisposethecontrolrightafterrunningforawhile.Then,howlongathreadcanrunbythetimewhenitwillhavetostop,iswhatI’lltalkaboutnext.

setitimer

Sinceitisthesameeverynowandthen,Ifeellikelackingtheskilltoentertain,butIsearchedtheplaceswherecallingrb_thread_schedule()further.Andthistimeitwasfoundinthestrangeplace.Itishere.

▼catch_timer()

8574staticvoid8575catch_timer(sig)8576intsig;8577{8578#if!defined(POSIX_SIGNAL)&&!defined(BSD_SIGNAL)8579signal(sig,catch_timer);8580#endif8581if(!rb_thread_critical){8582if(rb_trap_immediate){8583rb_thread_schedule();8584}8585elserb_thread_pending=1;8586}8587}

(eval.c)

Thisseemssomethingrelatingtosignals.Whatisthis?Ifollowedtheplacewherethiscatch_timer()functionisused,thenitwasusedaroundhere:

▼rb_thread_start_0()(partial)

8620staticVALUE8621rb_thread_start_0(fn,arg,th_arg)8622VALUE(*fn)();8623void*arg;8624rb_thread_tth_arg;8625{

8632#ifdefined(HAVE_SETITIMER)8633if(!thread_init){8634#ifdefPOSIX_SIGNAL8635posix_signal(SIGVTALRM,catch_timer);8636#else8637signal(SIGVTALRM,catch_timer);8638#endif86398640thread_init=1;8641rb_thread_start_timer();8642}8643#endif

(eval.c)

Thismeans,catch_timerisasignalhandlerofSIGVTALRM.

Here,“whatkindofsignalSIGVTALRMis”becomesthequestion.Thisisactuallythesignalsentwhenusingthesystemcallnamedsetitimer.That’swhythere’sacheckofHAVE_SETITIMERjustbeforeit.setitimerisanabbreviationof“SETIntervalTIMER”anda

systemcalltotellOStosendsignalswithacertaininterval.

Then,whereistheplacecallingsetitimer?Itistherb_thread_start_timer(),whichiscoincidentlylocatedatthelastofthislist.

Tosumupall,itbecomesthefollowingscenario.setitimerisusedtosendsignalswithacertaininterval.Thesignalsarecaughtbycatch_timer().There,rb_thread_schedule()iscalledandthreadsareswitched.Perfect.

However,signalscouldoccuranytime,ifitwasbasedononlywhatdescribeduntilhere,itmeansitwouldalsobepreemptiveatClevel.Then,I’dlikeyoutoseethecodeofcatch_timer()again.

if(rb_trap_immediate){rb_thread_schedule();}elserb_thread_pending=1;

There’sarequiredconditionthatisdoingrb_thread_schedule()onlywhenitisrb_trap_immediate.Thisisthepoint.rb_trap_immediateis,asthenamesuggests,expressing“whetherornotimmediatelyprocesssignals”,anditisusuallyfalse.ItbecomestrueonlywhilethelimitedtimesuchaswhiledoingI/Oonasinglethread.Inthesourcecode,itisthepartbetweenTRAP_BEGandTRAP_END.

Ontheotherhand,sincerb_thread_pendingissetwhenitisfalse,let’sfollowthis.Thisvariableisusedinthefollowingplace.

▼CHECK_INTS−HAVE_SETITIMER

73#ifdefined(HAVE_SETITIMER)&&!defined(__BOW__)74EXTERNintrb_thread_pending;75#defineCHECK_INTSdo{\76if(!rb_prohibit_interrupt){\77if(rb_trap_pending)rb_trap_exec();\78if(rb_thread_pending&&!rb_thread_critical)\79rb_thread_schedule();\80}\81}while(0)

(rubysig.h)

Thisway,insideofCHECK_INTS,rb_thread_pendingischeckedandrb_thread_schedule()isdone.Itmeans,whenreceivingSIGVTALRM,rb_thread_pendingbecomestrue,thenthethreadwillbeswitchedatthenexttimegoingthroughCHECK_INTS.

ThisCHECK_INTShasappearedatvariousplacesbynow.Forexample,rb_eval()andrb_call0()andrb_yeild_0.CHECK_INTSwouldbemeaninglessifitwasnotlocatedwheretheplacefrequentlybeingpassed.Therefore,itisnaturaltoexistintheimportantfunctions.

tick

Weunderstoodthecasewhenthere’ssetitimer.Butwhatifsetitimerdoesnotexist?Actually,theanswerisinCHECK_INTS,whichwe’vejustseen.Itisthedefinitionofthe#elseside.

▼CHECK_INTS−notHAVE_SETITIMER

84EXTERNintrb_thread_tick;85#defineTHREAD_TICK50086#defineCHECK_INTSdo{\87if(!rb_prohibit_interrupt){\88if(rb_trap_pending)rb_trap_exec();\89if(!rb_thread_critical){\90if(rb_thread_tick--<=0){\91rb_thread_tick=THREAD_TICK;\92rb_thread_schedule();\93}\94}\95}\96}while(0)

(rubysig.h)

EverytimegoingthroughCHECK_INTS,decrementrb_thread_tick.Whenitbecomes0,dorb_thread_schedule().Inotherwords,themechanismisthatthethreadwillbeswitchedafterTHREAD_TICK(=500)timesgoingthroughCHECK_INTS.

Scheduling

Thesecondpointistowhichthreadtoswitch.Whatsolelyresponsibleforthisdecisionisrb_thread_schedule().

rb_thread_schedule()

Theimportantfunctionsofrubyarealwayshuge.This

rb_thread_schedule()hasmorethan220lines.Let’sexhaustivelydivideitintoportions.

▼rb_thread_schedule()(outline)

7819void7820rb_thread_schedule()7821{7822rb_thread_tnext;/*OK*/7823rb_thread_tth;7824rb_thread_tcurr;7825intfound=0;78267827fd_setreadfds;7828fd_setwritefds;7829fd_setexceptfds;7830structtimevaldelay_tv,*delay_ptr;7831doubledelay,now;/*OK*/7832intn,max;7833intneed_select=0;7834intselect_timeout=0;78357836rb_thread_pending=0;7837if(curr_thread==curr_thread->next7838&&curr_thread->status==THREAD_RUNNABLE)7839return;78407841next=0;7842curr=curr_thread;/*startingthread*/78437844while(curr->status==THREAD_KILLED){7845curr=curr->prev;7846}

/*……preparethevariablesusedatselect……*//*……selectifnecessary……*//*……decidethethreadtoinvokenext……*//*……context-switch……*/8045}

(eval.c)

(A)Whenthere’sonlyonethread,thisdoesnotdoanythingandreturnsimmediately.Therefore,thetalksafterthiscanbethoughtbasedontheassumptionthattherearealwaysmultiplethreads.

(B)Subsequently,theinitializationofthevariables.Wecanconsiderthepartuntilandincludingthewhileistheinitialization.Sincecurisfollowingprev,thelastalivethread(status!=THREAD_KILLED)willbeset.Itisnot“thefirst”onebecausetherearealotofloopsthat“startwiththenextofcurrthendealwithcurrandend”.

Afterthat,wecanseethesentencesaboutselect.Sincethethreadswitchofrubyisconsiderablydependingonselect,let’sfirststudyaboutselectinadvancehere.

select

selectisasystemcalltowaituntilthepreparationforreadingorwritingacertainfilewillbecompleted.Itsprototypeisthis:

intselect(intmax,fd_set*readset,fd_set*writeset,fd_set*exceptset,structtimeval*timeout);

Inthevariableoftypefd_set,asetoffdthatwewanttocheckisstored.Thefirstargumentmaxis“(themaximumvalueoffdinfd_set)+1”.Thetimeoutisthemaximumwaitingtimeofselect.IftimeoutisNULL,itwouldwaiteternally.Iftimeoutis0,without

waitingforevenjustasecond,itwouldonlycheckandreturnimmediately.Asforthereturnvalue,I’lltalkaboutitatthemomentwhenusingit.

I’lltalkaboutfd_setindetail.fd_setcanbemanipulatedbyusingthebelowmacros:

▼fd_setmaipulation

fd_setset;

FD_ZERO(&set)/*initialize*/FD_SET(fd,&set)/*addafiledescriptorfdtotheset*/FD_ISSET(fd,&set)/*trueiffdisintheset*/

fd_setistypicallyabitarray,andwhenwewanttocheckn-thfiledescriptor,then-thbitisset(Figure2).

Figure2:fd_set

I’llshowasimpleusageexampleofselect.

▼ausageexmpleofselect

#include<stdio.h>#include<sys/types.h>#include<sys/time.h>#include<unistd.h>

intmain(intargc,char**argv){char*buf[1024];fd_setreadset;

FD_ZERO(&readset);/*initializereadset*/FD_SET(STDIN_FILENO,&readset);/*putstdinintotheset*/select(STDIN_FILENO+1,&readset,NULL,NULL,NULL);read(STDIN_FILENO,buf,1024);/*successwithoutdelay*/exit(0);}

Thiscodeassumethesystemcallisalwayssuccess,thustherearenotanyerrorchecksatall.I’dlikeyoutoseeonlytheflowthatisFD_ZERO→FD_SET→select.SinceherethefifthargumenttimeoutofselectisNULL,thisselectcallwaitseternallyforreadingstdin.Andsincethisselectiscompleted,thenextreaddoesnothavetowaittoreadatall.Byputtingprintinthemiddle,youwillgetfurtherunderstandingsaboutitsbehavior.AndalittlemoredetailedexamplecodeisputintheattachedCD-ROM{seealsodoc/select.html}.

PreparationsforselectNow,we’llgobacktothecodeofrb_thread_schedule().Sincethiscodebranchesbasedonthereasonwhythreadsarewaiting.I’llshowthecontentinshortenedform.

▼rb_thread_schedule()−preparationsforselect

7848again:/*initializethevariablesrelatingtoselect*/7849max=-1;7850FD_ZERO(&readfds);7851FD_ZERO(&writefds);7852FD_ZERO(&exceptfds);7853delay=DELAY_INFTY;7854now=-1.0;78557856FOREACH_THREAD_FROM(curr,th){7857if(!found&&th->status<=THREAD_RUNNABLE){7858found=1;7859}7860if(th->status!=THREAD_STOPPED)continue;7861if(th->wait_for&WAIT_JOIN){/*……joinwait……*/7866}7867if(th->wait_for&WAIT_FD){/*……I/Owait……*/7871}7872if(th->wait_for&WAIT_SELECT){/*……selectwait……*/7882}7883if(th->wait_for&WAIT_TIME){/*……timewait……*/7899}7900}7901END_FOREACH_FROM(curr,th);

(eval.c)

Whetheritissupposedtobeornot,whatstandoutarethemacrosnamedFOREACH-some.Thesetwoaredefinedasfollows:

▼FOREACH_THREAD_FROM

7360#defineFOREACH_THREAD_FROM(f,x)x=f;do{x=x->next;7361#defineEND_FOREACH_FROM(f,x)}while(x!=f)

(eval.c)

Let’sextractthemforbetterunderstandability.

th=curr;do{th=th->next;{.....}}while(th!=curr);

Thismeans:followthecircularlistofthreadsfromthenextofcurrandprocesscurratlastandend,andmeanwhilethethvariableisused.ThismakesmethinkabouttheRuby’siterators…isthismytoomuchimagination?

Here,we’llgobacktothesubsequenceofthecode,itusesthisabitstrangeloopandchecksifthere’sanythreadwhichneedsselect.Aswe’veseenpreviously,sinceselectcanwaitforreading/writing/exception/timeallatonce,youcanprobablyunderstandI/Owaitsandtimewaitscanbecentralizedbysingleselect.AndthoughIdidn’tdescribeaboutitintheprevioussection,selectwaitsarealsopossible.There’salsoamethodnamedIO.selectintheRuby’slibrary,andyoucanuserb_thread_select()atClevel.Therefore,weneedtoexecutethatselectatthesametime.Bymergingfd_set,multipleselectcanbedoneatonce.

Therestisonlyjoinwait.Asforitscode,let’sseeitjustincase.

▼rb_thread_schedule()−selectpreparation−joinwait

7861if(th->wait_for&WAIT_JOIN){7862if(rb_thread_dead(th->join)){7863th->status=THREAD_RUNNABLE;7864found=1;7865}7866}

(eval.c)

Themeaningofrb_thread_dead()isobviousbecauseofitsname.Itdetermineswhetherornotthethreadoftheargumenthasfinished.

CallingselectBynow,we’vefiguredoutwhetherselectisnecessaryornot,andifitisnecessary,itsfd_sethasalreadyprepared.Evenifthere’saimmediatelyinvocablethread(THREAD_RUNNABLE),weneedtocallselectbeforehand.It’spossiblethatthere’sactuallyathreadthatithasalreadybeenwhilesinceitsI/Owaitfinishedandhasthehigherpriority.Butinthatcase,tellselecttoimmediatelyreturnandletitonlycheckifI/Owascompleted.

▼rb_thread_schedule()−select

7904if(need_select){7905/*convertdelayintotimeval*/7906/*iftheresimmediatelyinvocablethreads,doonlyI/Ochecks*/7907if(found){7908delay_tv.tv_sec=0;

7909delay_tv.tv_usec=0;7910delay_ptr=&delay_tv;7911}7912elseif(delay==DELAY_INFTY){7913delay_ptr=0;7914}7915else{7916delay_tv.tv_sec=delay;7917delay_tv.tv_usec=(delay-(double)delay_tv.tv_sec)*1e6;7918delay_ptr=&delay_tv;7919}79207921n=select(max+1,&readfds,&writefds,&exceptfds,delay_ptr);7922if(n<0){/*……beingcutinbysignalorsomething……*/7944}7945if(select_timeout&&n==0){/*……timeout……*/7960}7961if(n>0){/*……properlyfinished……*/7989}7990/*Inasomewherethread,itsI/Owaithasfinished.7991rolltheloopagaintodetectthethread*/7992if(!found&&delay!=DELAY_INFTY)7993gotoagain;7994}

(eval.c)

Thefirsthalfoftheblockisaswritteninthecomment.Sincedelayistheusecuntiltheanythreadwillbenextinvocable,itisconvertedintotimevalform.

Inthelasthalf,itactuallycallsselectandbranchesbasedonitsresult.Sincethiscodeislong,Idivideditagain.Whenbeingcutinbyasignal,iteithergoesbacktothebeginningthenprocessesagainorbecomesanerror.Whataremeaningfularetheresttwo.

TimeoutWhenselectistimeout,athreadoftimewaitorselectwaitmaybecomeinvocable.Checkaboutitandsearchrunnablethreads.Ifitisfound,setTHREAD_RUNNABLEtoit.

CompletingnormallyIfselectisnormallycompleted,itmeanseitherthepreparationforI/Oiscompletedorselectwaitends.Searchthethreadsthatarenolongerwaitingbycheckingfd_set.Ifitisfound,setTHREAD_RUNNABLEtoit.

DecidethenextthreadTakingalltheinformationintoconsiderations,eventuallydecidethenextthreadtoinvoke.SinceallwhatwasinvocableandallwhathadfinishedwaitingandsoonbecameRUNNABLE,youcanarbitrarypickuponeofthem.

▼rb_thread_schedule()−decidethenextthread

7996FOREACH_THREAD_FROM(curr,th){7997if(th->status==THREAD_TO_KILL){/*(A)*/7998next=th;7999break;8000}8001if(th->status==THREAD_RUNNABLE&&th->stk_ptr){8002if(!next||next->priority<th->priority)/*(B)*/8003next=th;8004}

8005}8006END_FOREACH_FROM(curr,th);

(eval.c)

(A)ifthere’sathreadthatisabouttofinish,giveitthehighpriorityandletitfinish.

(B)findoutwhatseemsrunnable.Howeveritseemstoconsiderthevalueofpriority.ThismembercanalsobemodifiedfromRubylevelbyusingTread#priorityThread#priority=.rubyitselfdoesnotespeciallymodifyit.

Ifthesearedonebutthenextthreadcouldnotbefound,inotherwordsifthenextwasnotset,whathappen?Sinceselecthasalreadybeendone,atleastoneofthreadsoftimewaitorI/Owaitshouldhavefinishedwaiting.Ifitwasmissing,therestisonlythewaitsfortheotherthreads,andmoreoverthere’snorunnablethreads,thusthiswaitwillneverend.Thisisadeadlock.

Ofcourse,fortheotherreasons,adeadlockcanhappen,butgenerallyit’sveryhardtodetectadeadlock.Especiallyinthecaseofruby,MutexandsuchareimplementedatRubylevel,theperfectdetectionisnearlyimpossible.

SwitchingThreadsThenextthreadtoinvokehasbeendetermined.I/Oandselectcheckshasalsobeendone.Therestistransferringthecontroltothetargetthread.However,forthelastofrb_thread_schedule()and

thecodetoswitchthreads,I’llstartanewsection.

ContextSwitch

Thelastthirdpointisthread-switch,anditiscontext-switch.Thisisthemostinterestingpartofthreadsofruby.

TheBaseLineThenwe’llstartwiththetailofrb_thread_schedule().Sincethestoryofthissectionisverycomplex,I’llgowithasignificantlysimplifiedversion.

▼rb_thread_schedule()(contextswitch)

if(THREAD_SAVE_CONTEXT(curr)){return;}rb_thread_restore_context(next,RESTORE_NORMAL);

AsforthepartofTHREAD_SAVE_CONTEXT(),weneedtoextractthecontentatseveralplacesinordertounderstand.

▼THREAD_SAVE_CONTEXT()

7619#defineTHREAD_SAVE_CONTEXT(th)\7620(rb_thread_save_context(th),thread_switch(setjmp((th)->context)))

7587staticint7588thread_switch(n)7589intn;7590{7591switch(n){7592case0:7593return0;7594caseRESTORE_FATAL:7595JUMP_TAG(TAG_FATAL);7596break;7597caseRESTORE_INTERRUPT:7598rb_interrupt();7599break;/*……processvariousabnormalthings……*/7612caseRESTORE_NORMAL:7613default:7614break;7615}7616return1;7617}

(eval.c)

IfImergethethreethenextractit,hereistheresult:

rb_thread_save_context(curr);switch(setjmp(curr->context)){case0:break;caseRESTORE_FATAL:....caseRESTORE_INTERRUPT:..../*……processabnormals……*/caseRESTORE_NORMAL:default:return;}rb_thread_restore_context(next,RESTORE_NORMAL);

Atbothofthereturnvalueofsetjmp()andrb_thread_restore_context(),RESTORE_NORMALappears,thisisclearlysuspicious.Sinceitdoeslongjmp()inrb_thread_restore_context(),wecanexpectthecorrespondencebetweensetjmp()andlongjmp().Andifwewillimaginethemeaningalsofromthefunctionnames,

savethecontextofthecurrentthreadsetjmprestorethecontextofthenextthreadlongjmp

Theroughmainflowwouldprobablylooklikethis.Howeverwhatwehavetobecarefulabouthereis,thispairofsetjmp()andlongjmp()isnotcompletedinthisthread.setjmp()isusedtosavethecontextofthisthread,longjmp()isusedtorestorethecontextofthenextthread.Inotherwords,there’sachainofsetjmp/longjmp()asfollows.(Figure3)

Figure3:thebackstitchbychainingofsetjmp

WecanrestorearoundtheCPUregisterswithsetjmp()/longjmp(),sotheremainingcontextistheRubystacksinadditiontothemachinestack.rb_thread_save_context()istosaveit,andrb_thread_restore_context()istorestoreit.Let’slookateachoftheminsequentialorder.

rb_thread_save_context()

Now,we’llstartwithrb_thread_save_context(),whichsavesacontext.

▼rb_thread_save_context()(simplified)

7539staticvoid7540rb_thread_save_context(th)7541rb_thread_tth;

7542{7543VALUE*pos;7544intlen;7545staticVALUEtval;75467547len=ruby_stack_length(&pos);7548th->stk_len=0;7549th->stk_pos=(rb_gc_stack_start<pos)?rb_gc_stack_start7550:rb_gc_stack_start-len;7551if(len>th->stk_max){7552REALLOC_N(th->stk_ptr,VALUE,len);7553th->stk_max=len;7554}7555th->stk_len=len;7556FLUSH_REGISTER_WINDOWS;7557MEMCPY(th->stk_ptr,th->stk_pos,VALUE,th->stk_len);

/*…………omission…………*/}

(eval.c)

Thelasthalfisjustkeepassigningtheglobalvariablessuchasruby_scopeintoth,soitisomittedbecauseitisnotinteresting.Therest,inthepartshownabove,itattemptstocopytheentiremachinestackintotheplacewhereth->stk_ptrpointsto.

First,itisruby_stack_length()whichwritestheheadaddressofthestackintotheparameterposandreturnsitslength.Therangeofthestackisdeterminedbyusingthisvalueandtheaddressofthebottom-endsideissettoth->stk_ptr.Wecanseesomebranches,itisbecausebothastackextendinghigherandastackextendinglowerarepossible.(Figure4)

Fig.4:astackextendingaboveandastackextendingbelow

Afterthat,therestisallocatingamemoryinwhereth->stkptrpointstoandcopyingthestack:allocatethememorywhosesizeisth->stk_maxthencopythestackbythelenlength.

FLUSH_REGISTER_WINDOWSwasdescribedinChapter5:Garbagecollection,soitsexplanationmightnolongerbenecessary.Thisisamacro(whosesubstanceiswritteninAssembler)towritedownthecacheofthestackspacetothememory.Itmustbecalledwhenthetargetistheentirestack.

rb_thread_restore_context()

Andfinally,itisrb_thread_restore_context(),whichisthefunctiontorestoreathread.

▼rb_thread_restore_context()

7635staticvoid7636rb_thread_restore_context(th,exit)7637rb_thread_tth;7638intexit;7639{7640VALUEv;7641staticrb_thread_ttmp;7642staticintex;7643staticVALUEtval;76447645if(!th->stk_ptr)rb_bug("unsavedcontext");76467647if(&v<rb_gc_stack_start){7648/*themachinestackextendinglower*/7649if(&v>th->stk_pos)stack_extend(th,exit);7650}7651else{7652/*themachinestackextendinghigher*/7653if(&v<th->stk_pos+th->stk_len)stack_extend(th,exit);7654}

/*omission……backtheglobalvariables*/

7677tmp=th;7678ex=exit;7679FLUSH_REGISTER_WINDOWS;7680MEMCPY(tmp->stk_pos,tmp->stk_ptr,VALUE,tmp->stk_len);76817682tval=rb_lastline_get();7683rb_lastline_set(tmp->last_line);7684tmp->last_line=tval;7685tval=rb_backref_get();7686rb_backref_set(tmp->last_match);

7687tmp->last_match=tval;76887689longjmp(tmp->context,ex);7690}

(eval.c)

Thethparameteristhetargettogivetheexecutionback.MEMCPY()andlongjmp()inthelasthalfareattheheart.ThecloserMEMCPY()tothelast,thebetteritis,becauseafterthismanipulation,thestackisinadestroyedstateuntillongjmp().

Nevertheless,therearerb_lastline_set()andrb_backref_set().Theyaretherestorationsof$_and$~.Sincethesetwovariablesarenotonlylocalvariablesbutalsothreadlocalvariables,evenifitisonlyasinglelocalvariableslot,thereareitsasmanyslotsasthenumberofthreads.Thismustbeherebecausetheplaceactuallybeingwrittenbackisthestack.Becausetheyarelocalvariables,theirslotspacesareallocatedwithalloca().

That’sitforthebasics.Butifwemerelywritethestackback,inthecasewhenthestackofthecurrentthreadisshorterthanthestackofthethreadtoswitchto,thestackframeoftheverycurrentlyexecutingfunction(itisrb_thread_restore_context)wouldbeoverwritten.Itmeansthecontentofthethparameterwillbedestroyed.Therefore,inordertopreventthisfromoccurring,wefirstneedtoextendthestack.Thisisdonebythestack_extend()inthefirsthalf.

▼stack_extend()

7624staticvoid7625stack_extend(th,exit)7626rb_thread_tth;7627intexit;7628{7629VALUEspace[1024];76307631memset(space,0,1);/*preventarrayfromoptimization*/7632rb_thread_restore_context(th,exit);7633}

(eval.c)

Byallocatingalocalvariable(whichwillbeputatthemachinestackspace)whosesizeis1K,forciblyextendthestack.However,thoughthisisamatterofcourse,doingreturnfromstack_extend()meanstheextendedstackwillshrinkimmediately.Thisiswhyrb_thread_restore_context()iscalledagainimmediatelyintheplace.

Bytheway,thecompletionofthetaskofrb_thread_restore_context()meansithasreachedthecalloflongjmp(),andonceitiscalleditwillneverreturnback.Obviously,thecallofstack_extend()willalsoneverreturn.Therefore,rb_thread_restore_context()doesnothavetothinkaboutsuchaspossibleproceduresafterreturningfromstack_extend().

IssuesThisistheimplementationoftherubythreadswitch.Wecan’tthinkitislightweight.Plentyofmalloc()realloc()andplentyof

memcpy()anddoingsetjmp()longjmp()thenfurthermorecallingfunctionstoextendthestack.There’snoproblemtoexpress“Itisdeadlyheavy”.Butinstead,there’snotanysystemcalldependingonaparticularOS,andtherearejustafewassemblyonlyfortheregisterwindowsofSparc.Indeed,thisseemstobehighlyportable.

There’sanotherproblem.Itis,becausethestacksofallthreadsareallocatedtothesameaddress,there’sthepossibilitythatthecodeusingthepointertothestackspaceisnotrunnable.Actually,Tcl/Tkexcellentlymatchesthissituation,inordertobypass,Ruby’sTcl/Tkinterfacereluctantlychosestoaccessonlyfromthemainthread.

Ofcourse,thisdoesnotgoalongwithnativethreads.Itwouldbenecessarytorestrictrubythreadstorunonlyonaparticularnativethreadinordertoletthemworkproperly.InUNIX,therearestillafewlibrariesthatusealotofthreads.ButinWin32,becausethreadsarerunningeverynowandthen,weneedtobecarefulaboutit.

TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera

CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License

RubyHackingGuide

FinalChapter:Ruby’s

future

Issuestobeaddressed

rubyisn’t‘completelyfinishedsoftware’。It’sstillbeingdeveloped,therearestillalotofissues.Firstly,wewanttotryremovinginherentproblemsinthecurrentinterpreter.

Theorderofthetopicsismostlyinthesameorderasthechaptersofthisbook.

PerformanceofGCTheperformanceofthecurrentGCmightbe“notnotablybad,butnotnotablygood”.“notnotablybad”means“itwon’tcausetroublesinourdailylife”,and“notnotablygood”means“itsdownsidewillbeexposedunderheavyload”.Forexample,ifitisanapplication

whichcreatesplentyofobjectsandkeepsholdingthem,itsspeedwouldslowdownradically.EverytimedoingGC,itneedstomarkalloftheobjects,andfurthermoreitwouldbecomestoneedtoinvokeGCmoreoftenbecauseitcan’tcollectthem.Tocounterthisproblem,GenerationalGC,whichwasmentionedinChapter5,mustbeeffective.(Atleast,itissaidsointheory.)

Alsoregardingitsresponsespeed,therearestillroomswecanimprove.WiththecurrentGC,whileitisrunning,theentireinterpretorstops.Thus,whentheprogramisaneditororaGUIapplication,sometimesitfreezesandstopstoreact.Evenifit’sjust0.1second,stoppingwhentypingcharacterswouldgiveaverybadimpression.Currently,therearefewsuchapplicationscreatedor,evenifexists,itssizemightbeenoughsmallnottoexposethisproblem.However,ifsuchapplicationwillactuallybecreatedinthefuture,theremightbethenecessitytoconsiderIncrementalGC.

ImplementationofparserAswesawinPart2,theimplementationofrubyparserhasalreadyutilized@yacc@’sabilitytoalmostitslimit,thusIcan’tthinkitcanendurefurtherexpansions.It’sallrightifthere’snothingplannedtoexpand,butabigname“keywordargument”isplannednextandit’ssadifwecouldnotexpressanotherdemandedgrammarbecauseofthelimitationofyacc.

Reuseofparser

Ruby’sparserisverycomplex.Inparticular,dealingwitharoundlex_stateseriouslyisveryhard.Duetothis,embeddingaRubyprogramorcreatingaprogramtodealwithaRubyprogramitselfisquitedifficult.

Forexample,I’mdevelopingatoolnamedracc,whichisprefixedwithRbecauseitisaRuby-versionyacc.Withracc,thesyntaxofgrammarfilesarealmostthesameasyaccbutwecanwriteactionsinRuby.Todoso,itcouldnotdeterminetheendofanactionwithoutparsingRubycodeproperly,but“properly”isverydifficult.Sincethere’snootherchoice,currentlyI’vecompromisedatthelevelthatitcanparse“almostall”.

AsanotherexamplewhichrequiresanalyzingRubyprogram,Icanenumeratesometoolslikeindentandlint,butcreatingsuchtoolalsorequiresalotefforts.Itwouldbedesperateifitissomethingcomplexlikearefactoringtool.

Then,whatcanwedo?Ifwecan’trecreatethesamething,whatif@ruby@’soriginalparsercanbeusedasacomponent?Inotherwords,makingtheparseritselfalibrary.Thisisafeaturewewantbyallmeans.

However,whatbecomesproblemhereis,aslongasyaccisused,wecannotmakeparserreentrant.Itmeans,say,wecannotcallyyparse()recursively,andwecannotcallitfrommultiplethreads.Therefore,itshouldbeimplementedinthewayofnotreturning

controltoRubywhileparsing.

HidingCodeWithcurrentruby,itdoesnotworkwithoutthesourcecodeoftheprogramtorun.Thus,peoplewhodon’twantotherstoreadtheirsourcecodemighthavetrouble.

InterpretorObjectCurrentlyeachprocesscannothavemultiplerubyinterpretors,thiswasdiscussedinChapter13.Ifhavingmultipleinterpretorsispracticallypossible,itseemsbetter,butisitpossibletoimplementsuchthing?

ThestructureofevaluatorCurrenteval.cis,aboveall,toocomplex.EmbeddingRuby’sstackframestomachinestackcouldoccasionallybecomethesourceoftrouble,usingsetjmp()longjmp()aggressivelymakesitlesseasytounderstandandslowsdownitsspeed.ParticularlywithRISCmachine,whichhasmanyregisters,usingsetjmp()aggressivelycaneasilycauseslowingdownbecausesetjmp()setasideallthingsinregisters.

Theperformanceofevaluatorrubyisalreadyenoughfastforordinaryuse.Butasidefromit,

regardingalanguageprocessor,definitelythefasteristhebetter.Toachievebetterperformance,inotherwordstooptimize,whatcanwedo?Insuchcase,thefirstthingwehavetodoisprofiling.SoIprofiled.

%cumulativeselfselftotaltimesecondssecondscallsms/callms/callname20.251.641.6426383590.000.00rb_eval12.472.651.0111139470.000.00ruby_re_match8.893.370.7255192490.000.00rb_call06.543.900.5321563870.000.00st_lookup6.304.410.5115990960.000.00rb_yield_05.434.850.4455192490.000.00rb_call5.195.270.423880660.000.00st_foreach3.465.550.2886058660.000.00rb_gc_mark2.225.730.1838195880.000.00call_cfunc

ThisisaprofilewhenrunningsomeapplicationbutthisisapproximatelytheprofileofageneralRubyprogram.rb_eval()appearedintheoverwhelmingpercentagebeingatthetop,afterthat,inadditiontofunctionsofGC,evaluatorcore,functionsthatarespecifictotheprogramaremixed.Forexample,inthecaseofthisapplication,ittakesalotoftimeforregularexpressionmatch(ruby_re_match).

However,evenifweunderstoodthis,thequestionishowtoimproveit.Tothinksimply,itcanbearchivedbymakingrb_eval()faster.Thatsaid,butasforrubycore,therearealmostnotanyroomwhichcanbeeasilyoptimized.Forinstance,apparently“tailrecursive→gotoconversion”usedintheplaceofNODE_IFandothershasalreadyappliedalmostallpossibleplacesitcanbe

applied.Inotherwords,withoutchangingthewayofthinkingfundamentally,there’snoroomtoimprove.

TheimplementationofthreadThiswasalsodiscussedinChapter19.Therearereallyalotofissuesabouttheimplementationofthecurrentruby’sthread.Particularly,itcannotmixwithnativethreadssobadly.Thetwogreatadvantagesof@ruby@’sthread,(1)highportability(2)thesamebehavioreverywhere,aredefinitelyincomparable,butprobablythatimplementationissomethingwecannotcontinuetouseeternally,isn’tit?

ruby2

Subsequently,ontheotherhand,I’llintroducethetrendoftheoriginalruby,howitistryingtocountertheseissues.

RiteAtthepresenttime,ruby’sedgeis1.6.7asthestableversionand1.7.3asthedevelopmentversion,butperhapsthenextstableversion1.8willcomeoutinthenearfuture.Thenatthatpoint,thenextdevelopmentversion1.9.0willstartatthesametime.Andafterthat,thisisalittleirregularbut1.9.1willbethenextstableversion.

stable development whentostart1.6.x 1.7.x 1.6.0wasreleasedon2000-09-191.8.x 1.9.x probablyitwillcomeoutwithin6months1.9.1~ 2.0.0 maybeabout2yearslater

Andthenext-to-nextgenerationaldevelopmentversionisruby2,whosecodenameisRite.ApparentlythisnameindicatesarespectfortheinadequacythatJapanesecannotdistinguishthesoundsofLandR.

Whatwillbechangedin2.0is,inshort,almostalltheentirecore.Thread,evaluator,parser,allofthemwillbechanged.However,nothinghasbeenwrittenasacodeyet,sothingswrittenhereisentirelyjusta“plan”.Ifyouexpectsomuch,it’spossibleitwillturnoutdisappointments.Therefore,fornow,let’sjustexpectslightly.

ThelanguagetowriteFirstly,thelanguagetouse.DefinitelyitwillbeC.Mr.Matsumotosaidtoruby-talk,whichistheEnglishmailinglistforRuby,

IhateC++.

So,C++ismostunlikely.Evenifallthepartswillberecreated,itisreasonablethattheobjectsystemwillremainalmostthesame,sonottoincreaseextraeffortsaroundthisisnecessary.However,chancesaregoodthatitwillbeANSICnexttime.

GCRegardingtheimplementationofGC,thegoodstartpointwouldbeBoehmGC\footnote{BoehmGChttp://www.hpl.hp.com/personal/Hans_Boehm/gc}.BohemGCisaconservativeandincrementalandgenerationalGC,furthermore,itcanmarkallstackspacesofallthreadsevenwhilenativethreadsarerunning.It’sreallyanimpressiveGC.Evenifitisintroducedonce,it’shardtotellwhetheritwillbeusedperpetually,butanywayitwillproceedforthedirectiontowhichwecanexpectsomewhatimprovementonspeed.

ParserRegardingthespecification,it’sverylikelythatthenestedmethodcallswithoutparentheseswillbeforbidden.Aswe’veseen,command_callhasagreatinfluenceonalloverthegrammar.Ifthisissimplified,boththeparserandthescannerwillalsobesimplifiedalot.However,theabilitytoomitparenthesesitselfwillneverbedisabled.

Andregardingitsimplementation,whetherwecontinuetouseyaccisstillunderdiscussion.Ifwewon’tuse,itwouldmeanhand-writing,butisitpossibletoimplementsuchcomplexthingbyhand?Suchanxietymightleft.Whicheverwaywechoose,thepathmustbethorny.

Evaluator

Theevaluatorwillbecompletelyrecreated.Itsaimsaremainlytoimprovespeedandtosimplifytheimplementation.Therearetwomainviewpoints:

removerecursivecallslikerb_eval()switchtoabytecodeinterpretor

First,removingrecursivecallsofrb_eval().Thewaytoremoveis,maybethemostintuitiveexplanationisthatit’slikethe“tailrecursive→gotoconversion”.Insideasinglerb_eval(),circlingaroundbyusinggoto.Thatdecreasesthenumberoffunctioncallsandremovesthenecessityofsetjmp()thatisusedforreturnorbreak.However,whenafunctiondefinedinCiscalled,callingafunctionisinevitable,andatthatpointsetjmp()willstillberequired.

Bytecodeis,inshort,somethinglikeaprogramwritteninmachinelanguage.ItbecamefamousbecauseofthevirtualmachineofSmalltalk90,itiscalledbytecodebecauseeachinstructionisone-byte.Forthosewhoareusuallyworkingatmoreabstractlevel,bytewouldseemsonaturalbasisinsizetodealwith,butinmanycaseseachinstructionconsistsofbitsinmachinelanguages.Forexample,inAlpha,amonga32-bitinstructioncode,thebeginning6-bitrepresentstheinstructiontype.

Theadvantageofbytecodeinterpretorsismainlyforspeed.Therearetworeasons:Firstly,unlikesyntaxtrees,there’snoneedto

traversepointers.Secondly,it’seasytodopeepholeoptimization.

Andinthecasewhenbytecodeissavedandreadinlater,becausethere’snoneedtoparse,wecannaturallyexpectbetterperformance.However,parsingisaprocedurewhichisdoneonlyonceatthebeginningofaprogramandevencurrentlyitdoesnottakesomuchtime.Therefore,itsinfluencewillnotbesomuch.

Ifyou’dliketoknowabouthowthebytecodeevaluatorcouldbe,regex.cisworthtolookat.Foranotherexample,Pythonisabytecodeinterpretor.

ThreadRegardingthread,thethingisnativethreadsupport.Theenvironmentaroundthreadhasbeensignificantlyimproved,comparingwiththesituationin1994,theyearofRuby’sbirth.Soitmightbejudgedthatwecangetalongwithnativethreadnow.

UsingnativethreadmeansbeingpreemptivealsoatClevel,thustheinterpretoritselfmustbemulti-threadsafe,butitseemsthispointisgoingtobesolvedbyusingagloballockforthetimebeing.

Additionally,thatsomewhatarcane“continuation”,itseemslikelytoberemoved.ruby’scontinuationhighlydependsontheimplementationofthread,sonaturallyitwilldisappearifthreadisswitchedtonativethread.Theexistenceofthatfeatureisbecause“itcanbeimplemented”anditisrarelyactuallyused.Thereforetheremightbenoproblem.

M17NInaddition,I’dliketomentionafewthingsaboutclasslibraries.Thisisaboutmulti-lingualization(M17Nforshort).Whatitmeansexactlyinthecontextofprogrammingisbeingabletodealwithmultiplecharacterencodings.

rubywithMulti-lingualizationsupporthasalreadyimplementedandyoucanobtainitfromtheruby_m17mbranchoftheCVSrepository.Itisnotabsorbedyetbecauseitisjudgedthatitsspecificationisimmature.Ifgoodinterfacesisdesigned,itwillbeabsorbedatsomepointinthemiddleof1.9.

IOTheIOclassincurrentRubyisasimplewrapperofstdio,butinthisapproach,

therearetoomanybutslightdifferencesbetweenvariousplatforms.we’dliketohavefinercontrolonbuffers.

thesetwopointscausecomplaints.Therefore,itseemsRitewillhaveitsownstdio.

RubyHackingGuide

Sofar,we’vealwaysactedasobserverswholookatrubyfromoutside.But,ofcourse,rubyisnotaproductwhichdisplayedininashowcase.Itmeanswecaninfluenceitifwetakeanactionforit.Inthelastsectionofthisbook,I’llintroducethesuggestionsandactivitiesforrubyfromcommunity,asafarewellgiftforRubyHackersbothatpresentandinthefuture.

GenerationalGCFirst,asalsomentionedinChapter5,thegenerationalGCmadebyMr.KiyamaMasato.Asdescribedbefore,withthecurrentpatch,

itislessfastthanexpected.itneedstobeupdatedtofittheedgeruby

thesepointsareproblems,buthereI’dliketohighlyvalueitbecause,morethananythingelse,itwasthefirstlargenon-officialpatch.

OnigurumaTheregularexpressionengineusedbycurrentRubyisaremodeledversionofGNUregex.ThatGNUregexwasinthefirstplacewrittenforEmacs.Andthenitwasremodeledsothatitcansupportmulti-bytecharacters.AndthenMr.MatsumotoremodeledsothatitiscompatiblewithPerl.Aswecaneasilyimaginefromthishistory,itsconstructionisreallyintricateandspooky.Furthermore,duetotheLPGLlicenseofthisGNUregex,

thelicenseofrubyisverycomplicated,soreplacingthisenginehasbeenanissuefromalongtimeago.

Whatsuddenlyemergedhereistheregularexpressionengine“Oniguruma”byMr.K.Kosako.Iheardthisiswrittenreallywell,itislikelybeingabsorbedassoonaspossible.

YoucanobtainOnigurumafromtheruby’sCVSrepositoryinthefollowingway.

%cvs-d:pserver:anonymous@cvs.ruby-lang.org:/srccooniguruma

ripperNext,ripperismyproduct.Itisanextensionlibrarymadebyremodelingparse.y.Itisnotachangeappliedtotheruby’smainbody,butIintroducedithereasonepossibledirectiontomaketheparseracomponent.

Itisimplementedwithkindofstreaminginterfaceanditcanpickupthingssuchastokenscanorparser’sreductionasevents.ItisputintheattachedCD-ROM\footnote{ripper:archives/ripper-

0.0.5.tar.gzoftheattachedCD-ROM},soI’dlikeyoutogiveitatry.Notethatthesupportedgrammarisalittledifferentfromthecurrentonebecausethisversionisbasedonruby1.7almosthalf-yearago.

Icreatedthisjustbecause“Ihappenedtocomeupwiththisidea”,ifthisisaccounted,Ithinkitisconstructedwell.Ittookonlythree

daysorsotoimplement,reallyjustapieceofcake.

AparseralternativeThisproducthasnotyetappearedinaclearform,there’sapersonwhowriteaRubyparserinC++whichcanbeusedtotallyindependentofruby.([ruby-talk:50497]).

JRubyMoreaggressively,there’sanattempttorewriteentiretheinterpretor.Forexample,aRubywritteninJava,Ruby\footnote{JRubyhttp://jruby.sourceforge.net},hasappeared.Itseemsitisbeingimplementedbyalargegroupofpeople,Mr.JanArnePetersenandmanyothers.

Itrieditalittleandasmyreviews,

theparseriswrittenreallywell.Itdoespreciselyhandleevenfinerbehaviorssuchasspacesorheredocument.instance_evalseemsnotineffect(probablyitcouldn’tbehelped).ithasjustafewbuilt-inlibrariesyet(couldn’tbehelpedaswell).wecan’tuseextensionlibrarieswithit(naturally).becauseRuby’sUNIXcentricisallcutout,there’slittlepossibilitythatwecanrunalready-existingscriptswithoutanychange.slow

perhapsIcouldsayatleastthesethings.Regardingthelastone“slow”,itsdegreeis,theexecutiontimeittakesis20timeslongerthantheoneoftheoriginalruby.Goingthisfaristooslow.ItisnotexpectedrunningfastbecausethatRubyVMrunsonJavaVM.Waitingforthemachinetobecome20timesfasterseemsonlyway.

However,theoverallimpressionIgotwas,it’swaybetterthanIimagined.

NETRubyIfitcanrunwithJava,itshouldalsowithC#.Therefore,aRubywritteninC#appeared,“NETRuby\footnote{NETRubyhttp://sourceforge.jp/projects/netruby/}”.TheauthorisMr.arton.

BecauseIdon’thaveany.NETenvironmentathand,Icheckedonlythesourcecode,butaccordingtotheauthor,

morethananything,it’sslowithasafewclasslibrariesthecompatibilityofexceptionhandlingisnotgood

suchthingsaretheproblems.Butinstance_evalisineffect(astounding!).

Howtojoinrubydevelopmentruby’sdeveloperisreallyMr.Matsumotoasanindividual,

regardingthefinaldecisionaboutthedirectionrubywilltake,hehasthedefinitiveauthority.Butatthesametime,rubyisanopensourcesoftware,anyonecanjointhedevelopment.Joiningmeans,youcansuggestyouropinionsorsendpatches.Thebelowistoconcretelytellyouhowtojoin.

Inruby‘scase,themailinglistisatthecenterofthedevelopment,soit’sgoodtojointhemailinglist.Themailinglistscurrentlyatthecenterofthecommunityarethree:ruby-list,ruby-dev,ruby-talk.ruby-listisamailinglistfor“anythingrelatingtoRuby”inJapanese.ruby-devisforthedevelopmentversionruby,thisisalsoinJapanese.ruby-talkisanEnglishmailinglist.Thewaytojoinisshownonthepage“mailinglists”atRuby’sofficialsite\footnote{Ruby’sofficialsite:http://www.ruby-lang.org/ja/}.Forthesemailinglists,read-onlypeoplearealsowelcome,soIrecommendjustjoiningfirstandwatchingdiscussionstograsphowitis.

ThoughRuby’sactivitystartedinJapan,recentlysometimesitissaid“themainauthoritynowbelongstoruby-talk”.Butthecenterofthedevelopmentisstillruby-dev.Becausepeoplewhohasthecommitrighttoruby(e.g.coremembers)aremostlyJapanese,thedifficultyandreluctanceofusingEnglishnaturallyleadthemtoruby-dev.IftherewillbemorecorememberswhoprefertouseEnglish,thesituationcouldbechanged,butmeanwhilethecoreofruby’sdevelopmentmightremainruby-dev.

However,it’sbadifpeoplewhocannotspeakJapanesecannotjointhedevelopment,socurrentlythesummaryofruby-devistranslatedonceaweekandpostedtoruby-talk.Ialsohelpthatsummarising,butonlythreepeopledoitinturnnow,sothesituationisreallyharsh.Thememberstohelpsummarizeisalwaysindemand.Ifyouthinkyou’rethepersonwhocanhelp,I’dlikeyoutostateitatruby-list.

Andasthelastnote,onlyitssourcecodeisnotenoughforasoftware.It’snecessarytopreparevariousdocumentsandmaintainwebsites.Andpeoplewhotakecareofthesekindofthingsarealwaysinshort.There’salsoamailinglistforthedocument-relatedactivities,butasthefirststepyoujusthavetopropose“I’dliketodosomething”toruby-list.I’llansweritasmuchaspossible,andotherpeoplewouldrespondtoit,too.

FinaleThelongjourneyofthisbookisgoingtoendnow.Astherewasthelimitationofthenumberofpages,explainingallofthepartscomprehensivelywasimpossible,howeverItoldeverythingIcouldtellabouttheruby‘score.Iwon’taddextrathingsanymorehere.Ifyoustillhavethingsyoudidn’tunderstand,I’dlikeyoutoinvestigateitbyreadingthesourcecodebyyourselfasmuchasyouwant.

TheoriginalworkisCopyright©2002-2004MineroAOKI.

TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera

CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License