Upload
dangnhan
View
237
Download
3
Embed Size (px)
Citation preview
CSE506:Opera.ngSystems
LogicalDiagram
MemoryManagement
CPUScheduler
User
Kernel
Hardware
BinaryFormats
Consistency
SystemCalls
Interrupts Disk Net
RCU FileSystem
DeviceDrivers
Networking Sync
MemoryAllocators ThreadsToday’s
Lecture
2
CSE506:Opera.ngSystems
Review• We’veseenhowpagingandsegmentaJonworkonx86– Mapslogicaladdressestophysicalpages– Thesearethelow-levelhardwaretools
• Thislecture:builduptohigher-levelabstracJons• Namely,theprocessaddressspace
3
CSE506:Opera.ngSystems
DefiniJons(canvary)• Processisavirtualaddressspace– 1+threadsofexecuJonworkwithinthisaddressspace
• Aprocessiscomposedof:– Memory-mappedfiles
• Includesprogrambinary
– Anonymouspages:nofilebacking• Whentheprocessexits,theircontentsgoaway
4
CSE506:Opera.ngSystems
AddressSpaceLayout• Determined(mostly)bytheapplicaJon• DeterminedatcompileJme– LinkdirecJvescaninfluencethis
• Seekern/kernel.ldinJOS;specifieskernelstarJngaddress
• OSusuallyreservespartoftheaddressspacetomapitself– UpperGBonx86Linux
• ApplicaJoncandynamicallyrequestnewmappingsfromtheOS,ordeletemappings
5
CSE506:Opera.ngSystems
SimpleExample
VirtualAddressSpace
0 0xffffffff
hello libc.soheap
• “Helloworld”binaryspecifiedloadaddress• Alsospecifieswhereitwantslibc• Dynamicallyaskskernelfor“anonymous”pagesforitsheapandstack
stk
6
CSE506:Opera.ngSystems
InpracJce• Youcansee(partof)therequestedmemorylayoutofaprogramusingldd:
$ ldd /usr/bin/git linux-vdso.so.1 => (0x00007fff197be000) libz.so.1 => /lib/libz.so.1 (0x00007f31b9d4e000) libpthread.so.0 => /lib/libpthread.so.0 (0x00007f31b9b31000) libc.so.6 => /lib/libc.so.6 (0x00007f31b97ac000) /lib64/ld-linux-x86-64.so.2 (0x00007f31b9f86000)
7
CSE506:Opera.ngSystems
Problem1:Howtorepresentinthekernel?• Whatisthebestwaytorepresentthecomponentsofaprocess?– CommonquesJon:ismappedataddressx?
• Pagefaults,newmemorymappings,etc.
• Hint:a64-bitaddressspaceisseriouslyhuge• Hint:someprograms(likedatabases)maptonsofdata– Othersmapverylikle
• Noonesizefitsall
8
CSE506:Opera.ngSystems
SparserepresentaJon• Naïveapproachmightmakeabigarrayofpages– Markemptyspaceasunused– ButthiswastesOSmemory
• Bekeridea:onlyallocatenodesinadatastructureformemorythatismappedtosomething– KerneldatastructurememoryuseproporJonaltocomplexityofaddressspace!
9
CSE506:Opera.ngSystems
Linux:vm_area_struct• LinuxrepresentsporJonsofaprocesswithavm_area_struct,orvma
• Includes:– Startaddress(virtual)– Endaddress(firstaddressaqervma)–why?
• Memoryregionsarepagealigned
– ProtecJon(read,write,execute,etc)–implicaJon?• DifferentpageprotecJonsmeansnewvma
– Pointertofile(ifone)– Otherbookkeeping
10
CSE506:Opera.ngSystems
SimplelistrepresentaJon
ProcessAddressSpace0 0xffffffff
vma/bin/ls
start end
next
vmaanon(data)
vmalibc.so
mm_struct(process)
11
CSE506:Opera.ngSystems
Simplelist• Lineartraversal–O(n)– Shouldn’tweuseadatastructurewiththesmallestO?
• PracJcalsystembuildingquesJon:– Whatisthecommoncase?– IsitpasttheasymptoJccrossoverpoint?
• IftreetraversalisO(logn),butaddsbookkeepingoverhead,whichmakessensefor:– 10vmas:log10=~3;10/2=5;Comparableeitherway– 100vmas:log100startsmakingsense
12
CSE506:Opera.ngSystems
Commoncases• Manyprogramsaresimple– Onlyloadafewlibraries– Smallamountofdata
• Someprogramsarelargeandcomplicated– Databases
• Linuxsplitsthedifferenceandusesbothalistandared-blacktree
13
CSE506:Opera.ngSystems
Red-blacktrees• (Roughly)balancedtree• ReadthewikipediaarJcleifyouaren’tfamiliarwiththem
• Popularinrealsystems– AsymptoJcaverage==worstcasebehavior
• InserJon,deleJon,search:logn• Traversal:n
14
CSE506:Opera.ngSystems
OpJmizaJons• UsinganRB-treegetsuslogarithmicsearchJme• OthersuggesJons?• Locality:IfIjustaccessedregionx,thereisareasonablygoodchanceI’llaccessitagain– Linuxcachesapointerineachprocesstothelastvmalookedup
– Sourcecode(mm/mmap.c)claims35%hitrate
15
CSE506:Opera.ngSystems
Memorymappingrecap• VMAreastructuretracksregionsthataremapped– Efficientlyrepresentasparseaddressspace– OnbothalistandanRB-tree
• Fastlineartraversal• Efficientlookupinalargeaddressspace
– Cachelastlookuptoexploittemporallocality
16
CSE506:Opera.ngSystems
LinuxAPIs• mmap(void*addr,size_tlength,intprot,intflags,intfd,off_toffset);
• munmap(void*addr,size_tlength);
• Howtocreateananonymousmapping?• Whatifyoudon’tcarewhereamemoryregiongoes(aslongasitdoesn’tclobbersomethingelse)?
17
CSE506:Opera.ngSystems
Example1:• Let’smapa1page(4k)anonymousregionfordata,read-writeataddress0x40000
• mmap(0x40000,4096,PROT_READ|PROT_WRITE,MAP_ANONYMOUS,-1,0);– Whywouldn’twewantexecpermission?
18
CSE506:Opera.ngSystems
Insertat0x400000x1000-0x4000
mm_struct(process)
0x20000-0x21000 0x100000-0x10f000
1) Isanythingalreadymappedat0x40000-0x41000?2) Ifnot,createanewvmaandinsertit3) Recall:pageswillbeallocatedondemand
19
CSE506:Opera.ngSystems
Scenario2• Whatifthereissomethingalreadymappedtherewithread-onlypermission?– Case1:Lastpageoverlaps– Case2:Firstpageoverlaps– Case3:Ourtargetisinthemiddle
20
CSE506:Opera.ngSystems
Case1:Insertat0x400000x1000-0x4000
mm_struct(process)
0x20000-0x41000 0x100000-0x10f000
1) Isanythingalreadymappedat0x40000-0x41000?2) Ifattheendanddifferentpermissions:
1) Truncatepreviousvma2) Insertnewvma
3) Ifpermissionsarethesame,onecanreplacepagesand/orextendpreviousvma
21
CSE506:Opera.ngSystems
Case3:Insertat0x400000x1000-0x4000
mm_struct(process)
0x20000-0x50000 0x100000-0x10f000
1) Isanythingalreadymappedat0x40000-0x41000?2) Ifinthemiddleanddifferentpermissions:
1) Splitpreviousvma2) Insertnewvma
22
CSE506:Opera.ngSystems
Demandpaging• CreaJngamemorymapping(vma)doesn’tnecessarilyallocatephysicalmemoryorsetuppagetableentries– Whatmechanismdoyouusetotellwhenapageisneeded?
• Itpaystobelazy!– Aprogrammaynevertouchthememoryitmaps.
• Examples?– Programmaynotuseallcodeinalibrary
– Saveworkcomparedtotraversingupfront– Hiddencosts?OpJmizaJons?
• Pagefaultsareexpensive;heurisJcscouldhelpperformance
23
CSE506:Opera.ngSystems
Unixfork()• Recall:thisfuncJoncreatesandstartsacopyoftheprocess;idenJcalexceptforthereturnvalue
• Example:int pid = fork();if (pid == 0) {
// child code} else if (pid > 0) {
// parent code} else // error
24
CSE506:Opera.ngSystems
Copy-On-Write(COW)• Naïveapproachwouldmarchthroughaddressspaceandcopyeachpage– Mostprocessesimmediatelyexec() anewbinarywithoutusinganyofthesepages
– Again,lazyisbeker!
25
CSE506:Opera.ngSystems
HowdoesCOWwork?• Memoryregions:– Newcopiesofeachvmaareallocatedforchildduringfork– Asarepagetables
• Pagesinmemory:– Inpagetable(andin-memoryrepresentaJon),clearwritebit,setCOWbit• IstheCOWbithardwarespecified?• No,OSusesoneoftheavailablebitsinthePTE
– Makeanew,writeablecopyonawritefault
26
CSE506:Opera.ngSystems
Idiosyncrasy1:StacksGrowDown• InLinux/Unix,asyouaddframestoastack,theyactuallydecreaseinvirtualaddressorder
• Example:
main()
foo()
bar()
Stack“bokom”–0x13000
0x12600
0x12300
0x11900
ExceedsstackpageOSallocatesa
newpage
28
CSE506:Opera.ngSystems
Problem1:Expansion• Recall:OSisfreetoallocateanyfreepageinthevirtualaddressspaceifuserdoesn’tspecifyanaddress
• WhatiftheOSallocatesthepagebelowthe“top”ofthestack?– Youcan’tgrowthestackanyfurther– Outofmemoryfaultwithplentyofmemoryspare
• OSmustreservestackporJonofaddressspace– Fortunatethatmemoryareasaredemandpaged
29
CSE506:Opera.ngSystems
• Unixhasbeenaroundlongerthanpaging– RememberdatasegmentabstracJon?– UnixsoluJon:
• Stackandheapmeetinthemiddle– Outofmemorywhentheymeet
Heap Stack
Feed2Birdswith1Scone
DataSegment
Grows Grows
30
CSE506:Opera.ngSystems
Butnowwehavepaging• UnixandLinuxsJllhaveadatasegmentabstracJon– EventhoughtheyuseflatdatasegmentaJon!
• sys_brk()adjuststheendpointoftheheap– SJllusedbymanymemoryallocatorstoday
31
CSE506:Opera.ngSystems
WindowsComparison• LPVOIDVirtualAllocEx(__inHANDLEhProcess, __in_optLPVOIDlpAddress,
__inSIZE_TdwSize, __inDWORDflAllocaJonType, __inDWORDflProtect);
• LibraryfuncJonapplicaJonsprogramto– Providedbyntdll.dll–theroughequivalentofUnixlibc– Implementedwithanundocumentedsystemcall
32
CSE506:Opera.ngSystems
WindowsComparison• LPVOIDVirtualAllocEx(__inHANDLEhProcess,
__in_optLPVOIDlpAddress, __inSIZE_TdwSize, __inDWORDflAllocaJonType, __inDWORDflProtect);
• Programmingenvironmentdifferences:– Parametersannotated(__out,__in_opt,etc),compilerchecks
– Nameencodestype,byconvenJon– dwSizemustbepage-aligned(justlikemmap)
33
CSE506:Opera.ngSystems
WindowsComparison• LPVOIDVirtualAllocEx(__inHANDLEhProcess,
__in_optLPVOIDlpAddress, __inSIZE_TdwSize, __inDWORDflAllocaJonType, __inDWORDflProtect);
• DifferentcapabiliJes– hProcessdoesn’thavetobeyou!Pros/Cons?– flAllocaJonType–canbereservedorcommiked
• Andotherflags
34
CSE506:Opera.ngSystems
Reservedmemory• AnexplicitabstracJonforcaseswhereyouwanttopreventtheOSfrommappinganythingtoanaddressregion
• Tousetheregion,itmustberemappedinthecommikedstate
• Why?– MyspeculaJon:GivestheOSmoreinformaJonforadvancedheurisJcsthandemandpaging
35
CSE506:Opera.ngSystems
Part1Summary• Understandwhatavmais,howitismanipulatedinkernelforcallslikemmap
• Demandpaging,COW,andotheropJmizaJons• brkandthedatasegment• WindowsVirtualAllocEx()vs.Unixmmap()
36
CSE506:Opera.ngSystems
Part2:ProgramBinaries• Howareaddressspacesrepresentedinabinaryfile?• Howareprocessesloaded?
37
CSE506:Opera.ngSystems
Linux:ELF• ExecutableandLinkableFormat• StandardonmostUnixsystems– AndusedinJOS– Youwillimplementpartoftheloaderinlab3
• 2headers:– Programheader:0+segments(memorylayout)– SecJonheader:0+secJons(linkinginformaJon)
38
CSE506:Opera.ngSystems
Helpfultools• readelf-Linuxtoolthatprintspartoftheelfheaders• objdump–LinuxtoolthatdumpsporJonsofabinary– Includesadisassembler;readsdebuggingsymbolsifpresent
39
CSE506:Opera.ngSystems
KeyELFSecJons• .text–Whereread/executecodegoes– Canbemappedwithoutwritepermission
• .data–ProgrammeriniJalizedread/writedata– Ex:aglobalintthatstartsat3goeshere
• .bss–UniniJalizeddata(iniJallyzerobyconvenJon)• ManyothersecJons
40
CSE506:Opera.ngSystems
HowELFLoadingWorks• execve(“foo”,…)• KernelparsesthefileenoughtoidenJfywhetheritisasupportedformat– Kernelloadsthetext,data,andbsssecJons
• ELFheaderalsogivesfirstinstrucJontoexecute– KerneltransferscontroltothisapplicaJoninstrucJon
41
CSE506:Opera.ngSystems
StaJcvs.DynamicLinking• StaJcLinking:– ApplicaJonbinaryisself-contained
• DynamicLinking:– ApplicaJonneedscodeand/orvariablesfromanexternallibrary
• Howdoesdynamiclinkingwork?– Eachbinaryincludesa“jumptable”forexternalreferences– JumptableisfilledinatrunJmebytheloader
42
CSE506:Opera.ngSystems
Jumptableexample• SupposeIwanttocallfoo()inanotherlibrary• Compilerallocatesanentryinthejumptableforfoo– Sayitisindex3,andanentryis8bytes
• Compilergenerateslocalcodelikethis:– mov rax, 24(rbx) // rbx points to the // jump table
– call *rax • LoaderiniJalizesthejumptablesatrunJme
43
CSE506:Opera.ngSystems
DynamicLinking(Overview)• RatherthanloadingtheapplicaJon,loadtheloader(ld.so),givetheloadertheactualprogramasanargument
• Kerneltransferscontroltoloader(inuserspace)• Loader:– 1)Walkstheprogram’sELFheaderstoidenJfyneededlibraries
– 2)Issuemmap()callstomapinsaidlibraries– 3)Fixthejumptablesineachbinary– 4)Callmain()
44
CSE506:Opera.ngSystems
Recap• Understandbasicsofprogramloading• OSdoespreliminaryexecutableparsing,mapsinprogramandmaybedynamiclinker
• Linkerdoesneededfixupfortheprogramtowork
45