IO Virtualization on ARM_Part3

  • Upload
    prodip7

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

  • 8/13/2019 IO Virtualization on ARM_Part3

    1/11

    12/3/13 IO Virtualization on ARM

    www.futurechips.org/chip-design-for-all/arm-virtualization-part-3-iommu.html 1/11

    Search

    Future Chips

    ARM Virtualization I/O Virtualization (Part 3)

    In the second partof the series we introduced memory management and interrupt

    handling support provided by virtualization hardware extensions. But effective

    virtualization solutions need to reach beyond the core to communicate with peripheral

    devices. In this post we discuss the various techniques used for virtualizing I/O,the

    problems faced, and the hardware solutions to mitigate these problems.

    The Difficulty Of Virtualizing I/O

    Before we talk about the hardware solutions at the system level for virtualization we

    need to set up a motivation for what is driving these features. To appreciate the

    problems we have to recognize that in some ways communicating with I/O in a

    virtualized environment is a paradox. We want to run an operating system in a

    sandboxed environment where it is oblivious to the the system outside the virtual

    environment. But I/O cannot be oblivious to the outside environment because it iscommunicating with that environment. So, understandably virtualizing I/O becomes a

    difficult problem.

    So moving away from the philosophical questions, what is the goal of virtualization and

    how does I/O fit into that goal? In my view it is to provide a managed environment for

    hosting a VM that improves the overall user experience. To achieve this goal, ideally

    wed like I/O in a VM to have the following properties:

    1. The guest has access to the same I/O devices it would use in a nativeenvironment.

    April 1, 2013

    Posted byAli Hussain

    at 9:25 am

    Add comments

    Chip Design for All,

    Tips f or Power Coders,

    Understanding Chips

    Tagged w ith:AMD-Vi,

    arm, cortex a15, Cortex

    A57, drivers, emulation,

    Intel VT-d, IOMMU,

    paravirtualization, System

    MMU, virtualization

    Subscribe

    QR Code

    Chip Design for All(21)

    Fun(11)

    Parallel Programming(13)

    Software for Hardware guys(22)

    Thoughts for Researchers(16)

    Thoughts on Latest Happenings(8)

    Tips for Power Coders(25)

    Understanding Chips(3)

    Categories

    Meet Flux7 Labs (update + shameless

    marketing)

    ARM Virtualization ARM vs x86 (Part 5)

    ARM Virtualization Applications (Part 4)

    Recent Posts

    http://en.wikipedia.org/wiki/QR_Codehttp://en.wikipedia.org/wiki/QR_Codehttp://en.wikipedia.org/wiki/QR_Codehttp://www.futurechips.org/chip-design-for-all/arm-virtualization-part-3-iommu.htmlhttp://twitter.com/futurechipshttp://www.futurechips.org/feedhttp://www.facebook.com/pages/FutureChips/163794200349948http://delicious.com/futurechipshttp://www.linkedin.com/pub/future-chips/35/1b5/2b9http://www.futurechips.org/http://www.futurechips.org/tag/amd-vihttp://www.futurechips.org/category/chip-design-for-allhttp://www.futurechips.org/chip-design-for-all/arm-virtualization-part-3-iommu.htmlhttp://www.futurechips.org/chip-design-for-all/arm-virtualization-part-3-iommu.htmlhttp://www.futurechips.org/chip-design-for-all/arm-virtualization-part-3-iommu.htmlhttp://www.futurechips.org/chip-design-for-all/arm-virtualization-part-3-iommu.htmlhttp://www.futurechips.org/chip-design-for-all/arm-virtualization-part-3-iommu.htmlhttp://www.futurechips.org/thoughts-on-latest-happenings/arm-virtualization-part-4-applications.htmlhttp://www.futurechips.org/thoughts-on-latest-happenings/arm-virtualization-arm-x86-part-5.htmlhttp://www.futurechips.org/thoughts-on-latest-happenings/quick-post-meet-flux7-labs-blatant-marketing.htmlhttp://www.futurechips.org/category/understanding-chipshttp://www.futurechips.org/category/tips-for-power-codershttp://www.futurechips.org/category/thoughts-on-latest-happeningshttp://www.futurechips.org/category/thoughts-for-researchershttp://www.futurechips.org/category/software-for-hardware-guyshttp://www.futurechips.org/category/parallel-programming-2http://www.futurechips.org/category/funhttp://www.futurechips.org/category/chip-design-for-allhttp://en.wikipedia.org/wiki/QR_Codehttp://www.linkedin.com/pub/future-chips/35/1b5/2b9http://delicious.com/futurechipshttp://www.facebook.com/pages/FutureChips/163794200349948http://www.futurechips.org/feedhttp://twitter.com/futurechipshttp://www.futurechips.org/category/understanding-chipshttp://www.futurechips.org/category/tips-for-power-codershttp://www.futurechips.org/category/chip-design-for-allhttp://www.futurechips.org/tag/virtualizationhttp://www.futurechips.org/tag/system-mmuhttp://www.futurechips.org/tag/paravirtualizationhttp://www.futurechips.org/tag/iommuhttp://www.futurechips.org/tag/intel-vt-dhttp://www.futurechips.org/tag/emulationhttp://www.futurechips.org/tag/drivershttp://www.futurechips.org/tag/cortex-a57http://www.futurechips.org/tag/cortex-a15http://www.futurechips.org/tag/armhttp://www.futurechips.org/tag/amd-vihttp://www.futurechips.org/author/ali-hussainhttp://www.futurechips.org/understanding-chips/arm-virtualization-part-2-memory-interrupts.htmlhttp://www.futurechips.org/chip-design-for-all/arm-virtualization-part-3-iommu.htmlhttp://www.futurechips.org/
  • 8/13/2019 IO Virtualization on ARM_Part3

    2/11

    12/3/13 IO Virtualization on ARM

    www.futurechips.org/chip-design-for-all/arm-virtualization-part-3-iommu.html 2/11

    2. The guest OS cannot affect the I/O operations or memory of other guests.

    3. The software changes to the guest OS must be minimal.

    4. The guest OS needs to be able to recover from a failure of the hardware or

    migration of the VM.

    5. The I/O operations on the guest OS should have similar performance to running

    natively.

    In this list we can see how several items on the list are competing with other items on

    the list. So the final solution will require trade-offs based on the particular use-case.

    Now, With these goals in mind let us look at the various techniques for implementing I/O

    virtualization and the problems faced.

    Emulated Or Paravirtualized Dev ices

    When implementing full virtualization, one of the simplest options is for the guest OS to

    emulate a virtual device on the host. The guest communicates with this virtual deviceand the hypervisor detects the guests communication. This can be done using trapping

    of device accesses, or permissions to certain pages of memory. The hypervisor

    understands the operations by the guest OS on the virtual device and performs the

    corresponding operation on the physical device. This technique is called hosted or split

    I/O.

    ARM Virtualization I/O Virtualization

    (Part 3)

    ARM Virtualization Extensions Memory

    and Interrupts (Part 2)

    Writing and Optimizing Parallel

    Programs A complete example57

    comment(s)

    What makes parallel programming

    hard?46 comment(s)

    Quick Post: Should you ever use Linked-

    Lists?44 comment(s)

    Parallel Programming: When Amdahls

    law is inapplicable?23 comment(s)

    How to trick C/C++ compilers into

    generating terrible code?21

    comment(s)

    Q & A: Do mul ticores s ave energy? Not

    really.15 comment(s)

    Which little PC should I buy? Raspberry

    Pi? Mele A1000? or 14 comment(s)

    What every Programmer should know

    about the memory system12

    comment(s)

    Open MP vs p threads11 comment(s)

    Ten things every programmer must

    know about hardware10 comment(s)

    Popular Posts

    Log in

    Entries RSS

    Comments RSS

    Meta

    http://www.futurechips.org/comments/feedhttp://www.futurechips.org/feedhttp://www.futurechips.org/wp-login.phphttp://www.futurechips.org/tips-for-power-coders/programmer-hardware.htmlhttp://www.futurechips.org/tips-for-power-coders/open-mp-pthreads.htmlhttp://www.futurechips.org/chip-design-for-all/what-every-programmer-should-know-about-the-memory-system.htmlhttp://www.futurechips.org/thoughts-for-researchers/comparison-small-pcs-rasberry-pi.htmlhttp://www.futurechips.org/chip-design-for-all/a-multicore-save-energy.htmlhttp://www.futurechips.org/tips-for-power-coders/how-to-trick-cc-compilers-into-generating-terrible-code.htmlhttp://www.futurechips.org/thoughts-for-researchers/parallel-programming-gene-amdahl-said.htmlhttp://www.futurechips.org/thoughts-for-researchers/quick-post-linked-lists.htmlhttp://www.futurechips.org/tips-for-power-coders/parallel-programming.htmlhttp://www.futurechips.org/tips-for-power-coders/writing-optimizing-parallel-programs-complete.htmlhttp://www.futurechips.org/understanding-chips/arm-virtualization-part-2-memory-interrupts.htmlhttp://www.futurechips.org/chip-design-for-all/arm-virtualization-part-3-iommu.html
  • 8/13/2019 IO Virtualization on ARM_Part3

    3/11

    12/3/13 IO Virtualization on ARM

    www.futurechips.org/chip-design-for-all/arm-virtualization-part-3-iommu.html 3/11

    The advantage of this technique is that since every call goes through the hypervisor,

    the hypervisor can provide the desired functionality. For example the hypervisor can

    track every I/O operation the device is presently waiting on. Similarly restricting a guest

    from affecting other guests becomes simplified because all physical device accesses

    are managed by the hypervisor. But this technique has a high CPU overhead. The data

    needs to be copied multiple times, processed through multiple I/O stacks, etc.

    The performance can be improved by using paravirtualization. In this case the device

    drivers in the OS implement an ABI with the hypervisor. The device drivers interfacewith the hypervisor and the hypervisor directly communicates with the physical device

    Twittecounter

    It seems there has been an internal

    server error with the page you've

    requested. Our coding monkeys

    have been notified and we'll be backreal soon, promise!

    Send us a noteif the problem

    persists!

    @liliputingnewsWhich little PC should I

    buy? Raspberry Pi? Mele A1000? or

    http://t.co/ydryDF1Kvia @sharethis

    2012-07-16

    @raspberry_piWhich little PC should I

    buy? Raspberry Pi? Mele A1000? or

    http://t.co/GSF5bICT2012-07-16

    @DrQzAgreed. In systems/disk usage,

    they are related directly but do you not

    agree that a channels latency and

    throughput are indep? 2012-06-30

    More updates...Powered by Twitter Tools

    What I'm Doing...

    http://alexking.org/projects/wordpresshttp://twitter.com/FutureChipshttp://twitter.com/FutureChips/statuses/219105673345118208http://twitter.com/DrQzhttp://twitter.com/FutureChips/statuses/224863504673419265http://t.co/GSF5bICThttp://twitter.com/raspberry_pihttp://twitter.com/FutureChips/statuses/224866862851293184http://twitter.com/sharethishttp://t.co/ydryDF1Khttp://twitter.com/liliputingnewshttp://twitter.com/home?source=twitterremote&status=Hey%20@Boris,%20@SamWierema%20and%20@TheCounter!!!%20Wake%20up,%20Twitter%20Counter%20is%20down:%20http://twittercounter.com
  • 8/13/2019 IO Virtualization on ARM_Part3

    4/11

    12/3/13 IO Virtualization on ARM

    www.futurechips.org/chip-design-for-all/arm-virtualization-part-3-iommu.html 4/11

    as is shown in the figure below.

    This technique provides better performance with similar control but there is still a

    significant performance overhead, for example, in trapping to the hypervisor. Figure

    below shows the difference observed by IBM in using an emulated IDE controller vs

    IBMs virtio-blk paravirtualized device drivers in KVM.

    June 2013

    April 2013

    March 2013

    July 2012

    June 2012

    August 2011

    July 2011

    June 2011

    May 2011

    Archives

    About Us

    Pages

    http://www.futurechips.org/about-ushttp://www.futurechips.org/2011/05http://www.futurechips.org/2011/06http://www.futurechips.org/2011/07http://www.futurechips.org/2011/08http://www.futurechips.org/2012/06http://www.futurechips.org/2012/07http://www.futurechips.org/2013/03http://www.futurechips.org/2013/04http://www.futurechips.org/2013/06
  • 8/13/2019 IO Virtualization on ARM_Part3

    5/11

    12/3/13 IO Virtualization on ARM

    www.futurechips.org/chip-design-for-all/arm-virtualization-part-3-iommu.html 5/11

    When looking at this overhead it is important to keep in mind it is very use-case

    dependent. A CPU bound benchmark will not show much sensitivity to the virtualization

    of I/O. Alternatively for an I/O heavy benchmark this overhead can be significant. As an

    example the conjugate-gradient method for solving a system of linear equation spends

    around 70% of CPU cycles in the user mode and spends the remaining time in the

    hypervisor kernel engaged in disk I/O.

    Passthrough I/O

    Passthrough I/O greatly improves performance by remapping the guest page tables to

    directly write to the physical device. This eliminates most of the overhead in trapping to

    the hypervisor for every operation. This technique brings the bulk of I/O processing to

    near-native speeds.

  • 8/13/2019 IO Virtualization on ARM_Part3

    6/11

    12/3/13 IO Virtualization on ARM

    www.futurechips.org/chip-design-for-all/arm-virtualization-part-3-iommu.html 6/11

    There are several issues that need to be addressed to effectively virtualize I/O using

    this technique. Consider the case of a guest using DMA accesses to communicate with

    a device. In this scenario we need to account for the following issues.

    Isolation

    The goal of virtualization is to to sandbox the guest OS to keep it from accessing the

    data of other guest OSes. We do this in the guest by adding a second stage

    translation. However, the DMA devices operate on physical addresses and are not

  • 8/13/2019 IO Virtualization on ARM_Part3

    7/11

    12/3/13 IO Virtualization on ARM

    www.futurechips.org/chip-design-for-all/arm-virtualization-part-3-iommu.html 7/11

    aware of second stage translations. So if a guest is given unrestricted access to a DMA

    device it can read or write to any physical address in memory and corrupt the memory

    of other guests. So there needs to be a protection mechanism instituted to make sure a

    device only directs DMA requests from a particular guest to go to memory associated

    with that guest.

    Furthermore, more than one guest may need to access the same device. The device

    needs to be able to distinguish between the accesses coming from different devices

    and redirect them correctly.

    Physical Address

    To complete the DMA transaction the guest OS needs to provide the device with the

    proper physical address in memory to find the data. But the guest does not know the

    physical address of the data, only the Intermediate Physical Address (IPA) which is in

    essence a virtual address. For the DMA access to work the device must be able to

    translate the IPA to the correct physical address.

    Contiguous Memory Blocks

    The problem cannot be solved by just providing the device with the correct PA. The

    device expects the DMA target region to be located in a contiguous region of memory.

    In a virtualized environment this is not guaranteed. The hypervisor may allocate guest

    pages that are not contiguous in as small as 4K blocks. So the device must be able to

    do this translation for the entire DMA region.

    32 Bit Devices In Larger Address Spaces

    This problem is similar to the problem with a 32 bit guest on a 64 bit host discussed in

    the previous post. The system may have older devices that cannot access the complete

    larger address spaces of newer systems. An address translation is necessary to use

    these devices with a DMA outside their normal addressable range.

    Hardware Support

  • 8/13/2019 IO Virtualization on ARM_Part3

    8/11

    12/3/13 IO Virtualization on ARM

    www.futurechips.org/chip-design-for-all/arm-virtualization-part-3-iommu.html 8/11

    The problems mentioned above are not easily solved in software and need a hardware

    solution that correctly maps device addresses to the correct guest. Most platforms have

    hardware solutions for this. This mechanism is called IOMMUfor IO Memory

    Management Unit. Intel calls their implementation VT-d, AMD calls their implementation

    AMD-Vi, and ARM calls their implementation SystemMMU.

    The basic idea for the IOMMU is simple. An address translation unit is placed in

    between any devices that may be used by a guest OS. When the hypervisor is setting

    up second stage page tables for a guest OS to access the device, it sets up the IOMMU

    too. Similar to tablewalks in the core, address translations are expensive. So TLBs are

    implemented to reduce the overhead of address translations.

    An example system showing where the System MMU can be located.Transactions with the device are translated through the system MMU.

    System MM U

    The ARM System MMU is programmed with different translation contexts. It maps eachtransaction to the corresponding context by matching against expected streams. Based

    http://www.futurechips.org/wp-content/uploads/2013/04/system-using-smmu.pnghttp://en.wikipedia.org/wiki/IOMMU
  • 8/13/2019 IO Virtualization on ARM_Part3

    9/11

    12/3/13 IO Virtualization on ARM

    www.futurechips.org/chip-design-for-all/arm-virtualization-part-3-iommu.html 9/11

    on the context the System MMU may either bypass the translation, cause a fault, or

    perform a translation. The System MMU in the ARM architecture provides full 2 stage

    translation support (as described in the previous post) and depending on the context

    we may either do a first stage translation or a second stage translation. To perform the

    translation the System MMU has registers analogous to the TTBRs and other control

    registers for each context.

    The system MMU may also receive faults during its translation process or if a context is

    not setup. Depending on the type of fault and how the System MMU is configured it may

    take certain actions. A translation fault can trigger an interrupt. This allows an

    opportunity for the hypervisor to service the interrupt and restart the translation so it

    can come to completion. The System MMU may also send a BUSERROR to the

    appropriate requestor. There are syndrome registers present to ease the process of

    diagnosing and fixing the problem.

    Some advantages of System MMU dont even need virtualization. Since the System

    MMU enables every device to perform VA to PA translations, I/O operations can be

    performed by drivers in user-space using VAs. The permission checking and translation

    maps can ensure one user application does not corrupt the memory of another

    application . This would eliminate the traps to kernel presently required further reducing

    I/O overhead. Another problem is dealing with contiguous memory. Many operations

    result in very large DMA accesses that cannot be allocated a single chunk of memory

    by the OS. Presently they need to either be split into multiple DMA requests or

    performed with complex DMA scatter-gather operations. The System MMU enables the

    device to communicate via a DMA based on a contiguous VA instead of fragmentedPAs. This both reduces the CPU overhead and simplifies the software and device.

    It should be noted that the System MMU is a part of the platform rather than a part of

    the core architecture. This means it only affects the drivers. Because of this many

    features are implementation defined. For example the bits used to match a stream and

    map it to a context are implementation defined. Since there is no user code that is

    aware of this part of the system, changes to the system MMU architecture wouldnt

    require as many legacy code issues.

    So using these techniques the hypervisor can provide an appropriate implementation of

  • 8/13/2019 IO Virtualization on ARM_Part3

    10/11

    12/3/13 IO Virtualization on ARM

    www.futurechips.org/chip-design-for-all/arm-virtualization-part-3-iommu.html 10/11

    Leave a Reply

    virtualized I/O according to the use-case. This concludes the third installment of this

    series on virtualization. This series continues in the next postdiscussing the use-cases

    for virtualization especially the use cases targeted in the mobile space by ARM.

    References

    For more information check out the following resources.

    http://xpgc.vicp.net/course/svt/TechDoc/ch12-

    IOArchitecturesForVirtualization.pdf

    http://nowlab.cse.ohio-state.edu/NOW/dissertations/huang.pdf

    http://www.ibm.com/developerworks/linux/library/l-virtio/

    http://pic.dhe.ibm.com/infocenter/lnxinfo/v3r0m0/topic/liaat/liaatbestpractices_pdf.

    http://www.mulix.org/lectures/xen-iommu/xen-io.pdf

    http://developer.amd.com/wordpress/media/2012/10/IOMMU-ben-yehuda.pdf

    http://www.arm.com/files/pdf/System-MMU-Whitepaper-v8.0.pdf

    http://software.intel.com/en-us/articles/intel-virtualization-technology-for-directed-

    io-vt-d-enhancing-intel-platforms-for-efficient-virtualization-of-io-devices

    http://support.amd.com/us/Processor_TechDocs/48882.pdf

    You may also like -

    Which little PC should I buy? Raspberry Pi? Mele A1000? or ...

    Why computer architects MUST benchmark Javascript?

    Tips for iPhone Dev elopers: The web-based sandbox for understandingCortex A8 is ready (Part 3)

    Answers to Computer Science Self-assessment Quiz

    Computer Science Self-assessment Quiz

    0 0share 0 295

    http://www.futurechips.org/chip-design-for-all/software-interview-quiz.htmlhttp://www.futurechips.org/chip-design-for-all/answers-computer-science-self-assessment-quiz.htmlhttp://www.futurechips.org/chip-design-for-all/tips-for-iphone-developers-the-web-based-sandbox-for-understanding-cortex-a8-is-ready-part-3.htmlhttp://www.futurechips.org/thoughts-for-researchers/computer-architects-benchmark-javascript.htmlhttp://www.futurechips.org/thoughts-for-researchers/comparison-small-pcs-rasberry-pi.htmlhttp://support.amd.com/us/Processor_TechDocs/48882.pdfhttp://software.intel.com/en-us/articles/intel-virtualization-technology-for-directed-io-vt-d-enhancing-intel-platforms-for-efficient-virtualization-of-io-deviceshttp://www.arm.com/files/pdf/System-MMU-Whitepaper-v8.0.pdfhttp://developer.amd.com/wordpress/media/2012/10/IOMMU-ben-yehuda.pdfhttp://www.mulix.org/lectures/xen-iommu/xen-io.pdfhttp://pic.dhe.ibm.com/infocenter/lnxinfo/v3r0m0/topic/liaat/liaatbestpractices_pdf.pdfhttp://www.ibm.com/developerworks/linux/library/l-virtio/http://nowlab.cse.ohio-state.edu/NOW/dissertations/huang.pdfhttp://xpgc.vicp.net/course/svt/TechDoc/ch12-IOArchitecturesForVirtualization.pdfhttp://www.futurechips.org/thoughts-on-latest-happenings/arm-virtualization-part-4-applications.html
  • 8/13/2019 IO Virtualization on ARM_Part3

    11/11

    12/3/13 IO Virtualization on ARM

    www.futurechips.org/chip-design-for-all/arm-virtualization-part-3-iommu.html 11/11

    2012 Future Chips Suffusion theme by Sayontan Sinha

    Name

    E-mail

    URI

    ARM Virtualiz ation Extensions Memory and Interrupts (Part 2) ARM Virtualiz ation ARM vs x86 (Part 5)

    (required)

    (required)

    Your Comment

    You may use these HTML tags and attributes:

    Submit Comment

    http://www.futurechips.org/thoughts-on-latest-happenings/arm-virtualization-arm-x86-part-5.htmlhttp://www.futurechips.org/understanding-chips/arm-virtualization-part-2-memory-interrupts.htmlhttp://www.aquoid.com/news/themes/suffusion/http://www.futurechips.org/