linux* kernel scalability linux* kernel scalability

white paper

Businesses considering Linux for any aspect of their operations benefit from a coordinated cadence of hardware and software advances that support very large workloads in a cost-effective, forward-looking manner. Intel is working toward making Linux* run best on Intel® architecture, which benefits the community as a whole, including IT organizations across industries.

with each processor generation, factors such as core count and memory capacity increase, leading to ongoing decreases in the cost per unit of compute capacity available from commercial-off-the-shelf (COTS) servers. IT decision makers rightly consider whether the software solutions they depend on to power their businesses can scale to take advantage of the increasing capabilities of those servers.

as Linux on intel architecture-based servers continues to become the environment of choice for large-scale compute infrastructure, this question takes on growing importance: How well does Linux scale to enable workloads to take advantage of the growing resources available from today’s Intel architecture-

based servers? Scalability in this context refers to the ability of the Linux kernel and applications to take optimal advantage of increases in various types of resources:

• As more processor cores are added, the system should perform more work, which corresponds, for example, to more simultaneous users, or more operations in a given timeframe.

• As more memory is added, the system should run faster, particularly on data-intensive tasks and as the data stores in use by applications become larger.

• As larger numbers of software threads are introduced by parallel applications, the operating system must be able to efficiently divide work among them.

As a member of the Linux community, Intel monitors this issue with each successive release of the kernel—and its own platform hardware—identifying and resolving issues to ensure the ongoing scalability of the Linux kernel on intel architecture. This paper describes those efforts and demonstrates how they help to ensure ongoing value to IT decision makers.

Linux* KerneL ScaLabiLityAdvAnces for LArge-cApAcity servers

Op

en S

Ou

rce

On

int

eL

Intel’s Kernel Commitment

Intel’s long history as a member of the Linux* community attests to the value it places on helping make the kernel more robust. Today, Intel is one of the leading contributors to the Linux kernel.

EXECUTIVE SUMMARY

as enterprise servers based on intel® architecture have grown in performance, core count, and memory capacity, developers have innovated on the Linux* kernel to take advantage of them. As a result, data center managers can have confidence that Linux will scale to take solid advantage of the latest servers and those yet to come. Intel, along with others in the open-source community, will continue to identify opportunities and address challenges to enhance kernel scalability through the Linux Kernel Scalability project as ongoing development of the

kernel proceeds.

2

Keeping up with growth in Server CapaCityAdvances in server performance and capacity clearly offer businesses substantial gains in areas such as the ability to handle larger data sets, add new usage models, and control operating expense. At the same time, however, those deciding what upgrades to make must be assured that their software environments can truly benefit as expected from server refreshes. Server growth in areas such as hardware parallelism and memory capacity has become very rapid, making it vital for the community to enable Linux and other software for emerging hardware.

As shown in Figure 1, the levels of hardware parallelism and memory capacity in COTS servers have increased dramatically over the past several years. For example, a four-socket server based on the Intel® Xeon® processor 7400 series can accommodate up to 6 cores per socket, for a total of 24 logical processors.

three years later, in 2011, four-socket servers based on the Intel® Xeon® processor E7 family could have as many as 10 cores each, for a total of 40 cores; because of the re-introduction of intel® hyper-threading technology (intel® ht technology), each of those cores can handle two simultaneous software threads (although the OS can support more than that), for a total of 80 logical processors per four-way server. Simultaneously, each core also became more powerful, and this more than 3x increase in parallelism is accompanied by an 8x increase in server memory capacity, from 256 gigabytes (GB) to 2 terabytes.

The corresponding increases in the two-socket segment, while not as dramatic, are also substantial. From the Intel® Xeon® processor 5500 series in 2008 to the Intel® Xeon® processor E5 family in 2012, logical processor count per server doubled, from 16 to 32, and memory capacity increased by more than 5x, from 144 GB to 768 GB.

Figure 1. Dramatic growth of hardware parallelism and memory capacity of servers.

80

70

60

50

40

30

20

10

0

24

64

80

Logi

cal P

roce

ssor

s pe

r Se

rver

Growing Parallelism in Four-Socket Servers

intel® Xeon® processor 5500 series (2008)




intel® Xeon® processor E5 family (2012)

intel® Xeon® processor E7 family (2011)

80

70

60

50

40

30

20

10

016

2432

Growing Parallelism in Two-Socket Servers

Logi

cal P

roce

ssor

s pe

r Se

rver

256 GB

1 TB

2 TB2000

1500

500

0Mem

ory

Capa

city

per

Ser

ver (

GB

)

Growing Memory Capacity in Four-Socket Servers

2000

1500

500

0Mem

ory

Capa

city

per

Ser

ver (

GB

)

Growing Memory Capacity in Two-Socket Servers

144 GB 288 GB

768 GB

3

In addition to these easily quantifiable statistics to show increases in server capacity, a wide range of microarchitectural features and capabilities are introduced in each hardware-platform generation. OSs and other software must be enabled to take advantage of many of these advancements, and in other cases, software environments should be validated with the hardware features to enable proper operation. A few examples of hardware features that specifically require software enablement or validation in the kernel include the following:

Figure 3. intel® hyper-threading technology (intel® ht technology) enables a single processor core to maintain two architectural states, each of which can support its own thread. Many of the internal microarchitectural hardware resources are shared between the two threads.

architectural State 0



without intel® ht technology

thread 0

thread 0

thread 1

with intel® ht technology

• Intel® Advanced Encryption Standard New Instructions (Intel® AES-NI).1 Software enablement gives processors the ability to accelerate encrypt and decrypt operations using Intel AES-NI (Figure 2).

• Intel® Hyper-Threading Technology (Intel® HT Technology).2 Validation and enablement for this feature plays a vital role in delivering benefit from enabling each processor to simultaneously handle two software threads3 (Figure 3).

while the engineering work being done at intel to enable software for hardware platform features extends far beyond Linux, enablement of the kernel is a key imperative. Contributors employed at Intel continue to play a significant role4 in the ongoing development of the kernel. The importance Intel places on its involvement with Linux development is also illustrated by Intel’s initiation of the Linux Kernel Performance Project in 2005. Together with other members of the open-source community, Intel established this ongoing project specifically to monitor kernel performance on a regular basis, evaluating every dot release with key workloads.

Figure 2. intel® advanced encryption Standard new instructions (intel® aeS-ni), a set of six instructions, is designed to accelerate processor-intensive parts of the AES algorithm.

AES ENGINEwhole Disk

encryption

File Storage encryption

conditional access of

hD content

internet Security

Voice-Over-ip (VOip)

Scaling to More Memory

Each generation of server hardware increases the upper limits of system memory that may be available, as shown in Figure 1. To keep pace with those advances, the Linux* kernel must be optimized to scale to larger memory sizes. Intel has been involved with this effort for years, including work on increasing the size of memory pages to enhance overall memory scalability.

Scaling to More Cores

As various kernel subsystems become more highly parallelized to take advantage of the growing number of execution cores available on execution platforms (see Figure 1), they grow more complicated. As a result, individual operations tend to run slower and need more memory. Ongoing tuning effort in this area by Intel and others identifies subsystems for tuning that are likely bottlenecks.

4

identifying Scalability OppOrtunities and ChallengesLinux is very scalable indeed, and it runs effectively on the largest systems in use today. Quantifying exact scalability is a complex undertaking, because assessing how many processors Linux can scale to is highly dependent on the workload and other factors. To understand the factors behind scalability, the kernel can be conceived as a large library of components that operate on data in parallel to provide services to applications.

A key opportunity in terms of scalability, therefore, is for parallel operations to access and handle shared data efficiently. To avoid interference between kernel processes, a data structure is protected using a lock or other mechanism while a process is operating on it. This approach ensures control over data so the correct result is reached, although unnecessary locks create inefficiency.

Because the optimal number of locks is immensely variable depending on circumstances, tuning in this area is of ongoing importance. Refinements in the logic around locks enhance scalability by making it possible for larger numbers of simultaneous processes to proceed on a given body of data. the increasing sophistication of the Linux kernel in its ability to divide work efficiently between hundreds or thousands of software threads is a significant contributing factor to the kernel’s ability to scale to servers with rapidly increasing levels of hardware parallelism.

Fine-tuning the number of locks is also an example of the ongoing effort by intel, along with others in the Linux kernel community, to strike a balance between functionality and code complexity. While the system demands associated with a larger kernel code base are addressed to some degree by increasing system-memory footprints in servers, research in this area is an important factor in optimizing Linux scalability by applying system resources to meaningful work. Moreover, prioritizing opportunities for enhancement in the kernel is an important factor in improving overall quality, scalability, and performance.

opportunities

Helping Enable Transparent Hugepage Support

Conventionally, Linux* systems divide memory into 4-kilobyte pages, which represent the smallest unit of memory the system can manipulate. Because this size is very small compared to the usual amount of system memory present and the sizes of the working data sets that most applications use, many of the small pages must be used simultaneously. Increasing the size of individual pages leads to a smaller number of pages, which can create significant performance increases, because fewer translation-lookaside buffer slots—a highly contended resource—are required.

andrea arcangeli of red hat developed a set of kernel patches5 to provide transparent hugepage support, which allocates memory into 2-megabyte pages where possible, without requiring changes to the application. Initialization of larger pages takes more time. This challenge is complicated further by complexities in non-uniform memory access (NUMA) systems. Intel’s ongoing code patch contributions help enhance NUMA scalability and other scalability factors for transparent hugepage support.

Enhancing Scalability of lseek() Calls

a series of patches created by andi Kleen of intel that were committed to the kernel for the release of Linux* 3.2 resolved scalability issues with the lseek function. An example of the value of this change is evident in the frequent calls to lseek by PostgreSQL. Performance improvements in PostgreSQL 9.2 led to increased contention issues associated with these calls under prior versions of Linux. PostgreSQL committer Robert Haas reports6 that the lseek changes7 in Linux 3.2 are a major factor in the essentially linear scalability of PostgreSQL to 64 cores for read-heavy workloads.

Resolving Cache-Line Bouncing from Global Counter Updates

When critical sections move from one processor core to another, cache lines of shared data must be transferred, introducing a slow-down due to communication latency. The scalability limitations associated with this phenomenon, known as “cache-line bouncing,” are exemplified by updates to global counters. Scalability is limited by the number of cores in the system and especially by the number of sockets, because communication latency is higher between sockets than between cores on the same die. Intel’s work to resolve such issues includes converting the tmpfs’s bytes-in-use counter from a single global one to a per-processor one.

5

Kernel-level engineering to EnhancE Scalability

Systematic Identification of Scalability and Performance IssuesThrough the Linux Kernel Performance Project, Intel participates in the larger community effort to ensure that such issues are identified and addressed when they do arise. In turn, this effort contributes to continued scalability of the kernel on intel architecture, helping ensure, for example, that defects in the kernel from interactions between patches are detected before they make their way into Linux distributions and have been widely deployed on production systems.

The requirement to keep track of potential limitations from interactions between kernel components highlights the value of the systematic, structured approach that entities such as the Linux Kernel Performance Project are taking (as shown below).

Testing uses benchmarks that cover a broad set of core components of the Linux kernel, including I/O, the process scheduler, the file system, and the network stack. The apparatus tests patches on various versions of the kernel, so it can identify issues that arise only from specific combinations, and it uses various profiling tools to take advantage of their different strengths. Automated processes to isolate the root causes of identified issues are combined with manual investigation to balance thoroughness and efficiency. For more information, see kernel-perf.sourceforge.net.

Community-Based Enhancements to Kernel ScalabilityThe broad Linux community brings a variety of expertise to bear on enhancing scalability of the kernel. Intel’s involvement—from both software and hardware engineers—enables testing on emerging hardware platforms, even before they are available to the general public. By working with others in the community, Intel engineers help identify and address kernel issues before those issues make their way downstream into production distributions.

The community has developed considerable expertise and best practices to deal with various testing anomalies and complexities. For example, variations in test results are a confounding factor throughout the performance-tuning process. Having to continually analyze hardware and software combinations to reveal scalability and performance regressions can be particularly problematic.

opportunities

New kernel release candidate. when a new kernel is published on kernel.org, it is tested using a benchmark test suite.

Review of test results. Test results are reviewed weekly, anomalies are examined, appropriate test scenarios are

rerun, and coding corrections are pursued.

Publish discussion of findings. The test group publishes all outcomes and initates discussions of relevant issues on the Linux* kernel mailing list.

Test

Review

Publish

Beyond Scalability

Contributions that improve scalability of the kernel are just one of Intel’s areas of focus related to Linux*; others include the following:

• Power management and energy efficiency: see www.lesswatts.org

• Graphics: see intellinuxgraphics.org

• Wired and wireless networking: see sourceforge.net/projects/e1000 and intellinuxwireless.org

• Firmware and platform Integration: see biosbits.org and acpica.org

In addition to enabling Linux for hardware features, significant effort is also dedicated to ensuring that the kernel continues to perform as expected on existing servers as Linux develops. Because of the complexity of the kernel and the rapid pace of additions to it, it is vital to ensure that performance isn’t degraded by interactions among the kernel’s many discrete components.

This effort has met with excellent results, identifying and resolving many performance issues, and contributing in a productive way to the overall discussion on this topic in the community. Ultimately, contributions to the ongoing, consistent improvements in scalability and performance of the kernel optimize its suitability for use on large-scale server systems.

6

[Intel was] one of the first big companies to take action ar

ound open

source. They helped build the ISV ecosystem,

which was critical to get all the various ISVs

that were running their applications on RISC/

UNIX* migrated to an Intel/Red Hat Enterprise

Linux* combination.

– Mike Evans, VP of Business Development, Red Hat

Intel and Black Duck have collaborated for many years to both build awareness in the market and actually help drive the growth and vitality of the open-source ecosystem.– Tim Yeaton, President and CEO, Black Duck Software

““ ““

Testing takes a multi-faceted approach to minimize the incidental variations in results among test cases, including the following:

• Start tests with a known system state. Best practices call for rebooting each system, reformatting the disk, and re-installing test files before each run.

• Use a warm-up workload before testing. Before the test starts, this practice brings system components such as processor caches, buffers, and I/O caches to a steady state.

• Use a long runtime and average results from multiple runs. this approach reduces the effects of individual idiosyncrasies that arise during any single run.

Other complementary refinements that have arisen over time include improved automation and more sophisticated profiling and performance indexing to summarize results across various benchmarks. Together, these advances have created a sophisticated approach to regression testing for the Linux kernel that helps to optimize scalability and performance as hardware and software technology continue to progress.

Adding Kernel Support for Multiple Packet Queues

Prior to Linux* 2.6.21, the kernel was limited to receiving and processing network packets in a serial manner. This single-lane data path to and from the kernel represented an I/O limitation that ultimately had an effect on overall scalability.

Intel® Ethernet Controllers implement parallel data paths by means of multiple network queues, including as many as 64 transmit/receive pairs in the Intel® Ethernet Controller X540. Intel developed multi-queue support for Intel® network adapter drivers, adding receive-side multi-queue support in Linux 2.6.21 and transmit-side support in Linux 2.6.23. (Additional multi-queue support is being added as an ongoing community effort led by Red Hat.) Network-traffic optimization through multi-queue awareness has dramatically increased kernel scalability.

Collaborating within the Ecosystem

As an integral part of making its contributed kernel enhancements available to customers, Intel performs extensive enabling work in collaboration with other members of the ecosystem.

7

community

ConClusionScalability of the Linux kernel is expected to be an ongoing area of work within the community. This work will become even more vital as time passes and as the 80-core systems that we regard as high-end servers today become the mainstream servers of tomorrow. More cores, more memory, and more application parallelism will certainly arise, and the kernel community will strive to keep pace with these advances.

Intel, along with others in the open-source community, will continue to identify opportunities and address challenges to enhance kernel scalability through the Linux Kernel Scalability Project as ongoing development of the kernel proceeds. Intel remains committed to providing Linux support for new hardware features and to driving up scalability along with performance, security, energy efficiency, and other improvements. These efforts are an extension of Intel’s history of consistent contributions over time and dedication to open source as a preeminent force of innovation, helping deliver choice to its customers.

Learn more about how Intel is helping deliver on the promise of open source

www.intel.com/opensource

www.intel.com/opensource/linux

Intel Capital began investing in

open source back in the mid ‘90s. We started at

the operating system level...but then we looked

at the rest of the stack. The LAMP stack was

shifting big workloads from a mainframe to a

commodity x86-based server, [and] part of our

objective was to make sure that no matter what

the application...it would run best on Intel.

– Lisa Lambert, Managing Director, Intel Capital

““

Supporting Scalability Throughout the Stack

Intel enablement of open-source software encompasses not just the Linux* kernel, but every layer of the software stack. Intel’s code contributions and optimizations of open-source software, along with investments by Intel Capital, across the stack, have helped ensure that open-source-based solutions on intel® architecture deliver a viable alternative to legacy proprietary platforms.

Linux contributions

building blocks

industry standards

commercial ecosystem

academic research

tools and resources

customer solutions

open Source

on intel

1 Intel® Advanced Encryption Standard New Instructions (Intel® AES-NI) requires a computer system with an Intel AES-NI enabled processor, as well as non-Intel software to execute the instructions in the correct sequence. Intel AES-NI is available on select Intel® processors. For availability, consult your reseller or system manufacturer. For more information, see http://software.intel.com/en-us/articles/intel-advanced-encryption-standard-instructions-aes-ni/.

2 Intel® Hyper-Threading Technology (Intel® HT Technology) requires a computer system with an Intel® processor supporting Intel HT Technology and an Intel HT Technology enabled chipset, BIOS and operating system. Performance will vary depending on the specific hardware and software you use. See www.intel.com/products/ht/hyperthreading_more.htm for more information including details on which processors support Intel HT Technology.

3 For a discussion of how this performance-enabling feature may actually limit the performance of software that is not well suited to it, as well as the complexities associated with determining the effects accurately, see the papers “Performance Insights to Intel® Hyper-Threading Technology” at http://software.intel.com/en-us/articles/performance-insights-to-intel-hyper-threading-technology/ and “Intel® Hyper-Threading Technology: Analysis of the HT Effects on a Server Transactional Workload” at http://software.intel.com/en-us/articles/intel-hyper-threading-technology-analysis-of-the-ht-effects-on-a-server-transactional-workload/.

4 http://lwn.net/Articles/451243/.5 http://lwn.net/Articles/358904/.

6 http://rhaas.blogspot.com/2012/04/did-i-say-32-cores-how-about-64.html.7 http://rhaas.blogspot.com/2011/11/linux-lseek-scalability.html.INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR.Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked “reserved” or “undefined.” Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.

The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or by visiting Intel’s Web Site http://www.intel.com/. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information, go to http://www.intel.com/performance. *Other names and brands may be claimed as the property of others.Copyright © 2012 Intel Corporation. All rights reserved. Intel, the Intel logo, and Xeon are trademarks of Intel Corporation in the U.S. and other countries.1112/NKR/MESH/PDF 327459-001US

spark

www.intel.com/opensource

Intel takes pride in being a long-standing member of the open-source community. we believe in open-source development as a means to create rich business opportunities, advance promising technologies, and bring together top talent from diverse fields to solve computing challenges. our contributions to the community include reliable hardware architectures, professional development tools, work on essential open-source components, collaboration and co-engineering with leading companies, investment in academic research and commercial businesses, and helping to build a thriving ecosystem around open source.

Documents

linux* kernel scalability linux* kernel scalability