Click here to load reader
Author
shimosawa
View
509
Download
6
Embed Size (px)
Bootstrapping Code in Linux Kernel
Initialization (1)Taku ShimosawaPour le livre nouveau du Linux noyau1AgendaInitialization Phase of the Linux KernelTurning on the paging featureCalling *init functionsAnd miscellaneous things related to initialization
21. vmlinuxThis is the linux kernel3vmlinuxMain kernel binaryRuns with the final CPU stateProtected Mode in x86_32 (i386)Long Mode in x86_64And so onRuns in the virtual memory spaceAbove PAGE_OFFSET (default: 0xc0000000) (32-bit)Above __START_KERNEL_map (default: 0xfff80000000)i.e. All the absolute addresses in the binary are virtual onesEntry points
4ArchitectureNameLocationName (secondary)x86_32startup_32arch/x86/kernel/head_32.Sstartup_32_smpx86_64startup_64arch/x86/kernel/head_64.Ssecondary_startup_64ARMstextarch/arm/kernel/head[_nommu].Ssecondary_startupARM64stextarch/arm64/kenel/head.Ssecondary_holding_pensecondary_entryPPC_stextarch/powerpc/kernel/head_32.S*(__secondary_start)Virtual memory mapping5x86_64 Virtuali386 VirtualPhysicalLOWMEMPAGE_OFFSET(0xC0000000)Up to ~896 MBtext/dataPAGE_OFFSET(0xFFFF8800 00000000)__START_KERNEL_map(0xFFFFFFFF 80000000)0x000000000x00000000000000000xFFFFFFFF0xFFFFFFFFFFFFFFFF2GBWhy different mapping in 64-bit?The kernel code, data, and BSS reside in the last 2-GB of the memory=> Addressable by 32-bit!-mcmodel option in GCCSpecifies the assumptions for the size of code/data sections
6-mcmodel option (x86)textdatasmallwithin 2GBkernelwithin -2GBmediumwithin 2GBCan be > 2GBlargeAnywhere in 64bitColumn: -mcmodel in gcc7int g_data = 4;
int main(void){ g_data += 7;...}8b 05 c6 0b 20 00 mov 0x200bc6(%rip),%eax # 601040 ...bf 01 00 00 00 mov $0x1,%edi8d 50 07 lea 0x7(%rax),%edx48 b8 40 10 60 00 00 movabs $0x601040,%rax00 00 00bf 01 00 00 00 mov $0x1,%edi8b 30 mov (%rax),%esi...8d 56 07 lea 0x7(%rsi),%edxlarge#define SZ (1 =4.6)
21#if GCC_VERSION >= 40600/* * Tell the optimizer that something else uses this function or variable. */#define __visible __attribute__((externally_visible))#endifinclude/linux/compiler-gcc4.hcommit9a858dc7cebce01a7bb616bebb85087fa2b40871authorAndi Kleen Mon Sep 17 21:09:15 2012committerLinus Torvalds Mon Sep 17 22:00:38 2012
compiler.h: add __visible
gcc 4.6+ has support for a externally_visible attribute that prevents theoptimizer from optimizing unused symbols away. Add a __visible macro touse it with that compiler version or later.
This is used (at least) by the "Link Time Optimization" patchset.__init (1)To mark code(text) and data as only necessary during initialization22#define __init__section(.init.text) __cold notrace#define __initdata__section(.init.data)#define __initconst__constsection(.init.rodata)#define __exitdata__section(.exit.data)#define __exit_call__used __section(.exitcall.exit)(include/linux/init.h)#ifndef __cold#define __cold__attribute__((__cold__))#endif(include/linux/compiler-gcc4.h)#ifndef __section# define __section(S) __attribute__ ((__section__(#S)))#endif...#define notrace __attribute__((no_instrument_function))(include/linux/compiler.h)__init (2)The init* sections are concentrated to a contiguous memory area23. = ALIGN(PAGE_SIZE);.init.begin : AT(ADDR(.init.begin) - LOAD_OFFSET) {__init_begin = .; /* paired with __init_end */}...INIT_TEXT_SECTION(PAGE_SIZE)#ifdef CONFIG_X86_64:init#endifINIT_DATA_SECTION(16)..... = ALIGN(PAGE_SIZE);....init.end : AT(ADDR(.init.end) - LOAD_OFFSET) {__init_end = .;}arch/x86/kernel/vmlinux.lds.Sinit.textinit.data__init_begin__init_end__init (3)And, they are discarded (freed) after initializationCalled from kernel_init24void free_initmem(void){free_init_pages("unused kernel",(unsigned long)(&__init_begin),(unsigned long)(&__init_end));}arch/x86/mm/init.cvoid free_initmem(void){...poison_init_mem(__init_begin, __init_end - __init_begin);if (!machine_is_integrator() && !machine_is_cintegrator())free_initmem_default(-1);}arch/arm/mm/init.chead32.c, head64.cBefore calling start_kernel, i386_start_kernel or x86_64_start_kernel is called in x86Located in arch/x86/kernel/head{32,64}.cNo underscore between head and 32!x86 (32-bit)Reserve BIOS memory (in conventional memory)x86 (64-bit)Erase the identity mapClear BSS, copy boot information from the low memoryAnd reserve BIOS memory25Reserve? But how?This is very initial time. No complicated memory management is working right now.memblock (Logical memory blocks) is working!
memblock simply manages memory blocksAnd in some architecture, information is took over to another mechanism, and discarded after initialization
26#define BIOS_LOWMEM_KILOBYTES 0x413lowmem = *(unsigned short *)__va(BIOS_LOWMEM_KILOBYTES);lowmem size);}memblock_trim_memory(PAGE_SIZE);...}BTW, whats this?Resizing, or reallocation.Memblock uses slab for resizing if available# of e820 entries may be more than 128However, slab is available at kmem_cache_init called by mm_init (25/80), so not at this time.Memblock tries to allocate by itself by finding an area in memory && !reserved.
30static int __init_memblock memblock_double_array(struct memblock_type *type,phys_addr_t new_area_start,phys_addr_t new_area_size){addr = memblock_find_in_range(new_area_start + new_area_size,memblock.current_limit,new_alloc_size, PAGE_SIZE);memblock: Debug optionsmemblock=debug31static int __init early_memblock(char *p){if (p && strstr(p, "debug"))memblock_debug = 1;return 0;}early_param("memblock", early_memblock);
static int __init_memblock memblock_reserve_region(...){...memblock_dbg("memblock_reserve: [%#016llx-%#016llx] flags %#02lx %pF\n", (unsigned long long)base, (unsigned long long)base + size - 1, flags, (void *)_RET_IP_);
3. InitializationOkay, okay.32start_kernelWhats the first initialization function called?
33 smp_setup_processor_id() ((at least 2.6.18) ~ 3.2) lockdep_init () (3.3 ~)commit 73839c5b2eacc15cb0aa79c69b285fc659fa8851Author: Ming Lei Date: Thu Nov 17 13:34:31 2011 +0800
init/main.c: Execute lockdep_init() as early as possible This patch fixes a lockdep warning on ARM platforms:
[ 0.000000] WARNING: lockdep init error! Arch code didn't call lockdep_init() early enough? [ 0.000000] Call stack leading to lockdep invocation was: [ 0.000000] [] save_stack_trace_tsk+0x0/0x90 [ 0.000000] [] 0xffffffff
The warning is caused by printk inside smp_setup_processor_id().init (1/80) : lockdep_initInitializes lockdep (lock validator)Runtime locking correctness validatorDetectsLock inversionCircular lock dependenciesWhen enabled, lockdep is called when any spinlock or mutex is acquired.Thus, the initialization for lockdep must be first.Initialization is simple (just initializing list_heads of hashes)34void lockdep_init(void){...for (i = 0; i < CLASSHASH_SIZE; i++)INIT_LIST_HEAD(classhash_table + i);
for (i = 0; i < CHAINHASH_SIZE; i++)INIT_LIST_HEAD(chainhash_table + i);...}kernel/locking/lockdep.cConfig: CONFIG_LOCKDEP selected by PROVE_LOCKING or DEBUG_LOCK_ALLOC or LOCK_STATinit (2/80) : smp_setup_processor_idOnly effective in some architectureARM, s390, SPARC35u32 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = MPIDR_INVALID };void __init smp_setup_processor_id(void){int i;u32 mpidr = is_smp() ? read_cpuid_mpidr() & MPIDR_HWID_BITMASK : 0;u32 cpu = MPIDR_AFFINITY_LEVEL(mpidr, 0);
cpu_logical_map(0) = cpu;for (i = 1; i < nr_cpu_ids; ++i)cpu_logical_map(i) = i == cpu ? 0 : i;set_my_cpu_offset(0);
pr_info("Booting Linux on physical CPU 0x%x\n", mpidr);}arch/arm/kernel/setup.cHardware CPU (core) IDExchange the logical ID for the boot CPU and the logical ID for the CPU 0.1203cpu_logical_map:init (3/80) : debug_objects_early_init Initializes debugobjectsLifetime debugging facility for objectsSeems to be used by timer, hrtimer, workqueue, per_cpu_counter and rcuAgain, this function initializes locks and listheads
36Config: CONFIG_DEBUG_OBJECTSvoid __init debug_objects_early_init(void){int i;
for (i = 0; i < ODEBUG_HASH_SIZE; i++)raw_spin_lock_init(&obj_hash[i].lock);
for (i = 0; i < ODEBUG_POOL_SIZE; i++)hlist_add_head(&obj_static_pool[i].node, &obj_pool);}lib/debugobjects.cinit (4/80): boot_init_stack_canarySetup the stackprotectorinclude/asm/stackprotector.hDecide the canary value based on random value and TSC37static __always_inline void boot_init_stack_canary(void){u64 canary;u64 tsc;
#ifdef CONFIG_X86_64BUILD_BUG_ON(offsetof(union irq_stack_union, stack_canary) != 40);#endifget_random_bytes(&canary, sizeof(canary));tsc = __native_read_tsc();canary += tsc + (tsc stack_canary = canary;#ifdef CONFIG_X86_64this_cpu_write(irq_stack_union.stack_canary, canary);#elsethis_cpu_write(stack_canary.canary, canary);#endif}init (5/80): cgroup_init_earlyInitializes cgroupsFor subsystems that have early_init set, initialize the subsystem.cpu, cpuacct, cpusetThe rest of subsystems are initialized in cgroup_init (71/80)Initializes the structure, and the names for the subsystems38init (6/80): boot_cpu_initInitializes various cpumasks for the boot CPUonline : available to scheduleractive : available to migrationpresent : cpu is populatedpossible : cpu is populatableset_cpu_online adds the cpu to activeset_cpu_present does not add the cpu to possible39static void __init boot_cpu_init(void){int cpu = smp_processor_id();/* Mark the boot cpu "present", "online" etc for SMP and UP case */set_cpu_online(cpu, true);set_cpu_active(cpu, true);set_cpu_present(cpu, true);set_cpu_possible(cpu, true);}init/main.c!HOTPLUG_CPU => same!HOTPLUG_CPU => samecpumaskA bit map40typedef struct cpumask { DECLARE_BITMAP(bits, NR_CPUS); } cpumask_t;include/linux/cpumask.h#define DECLARE_BITMAP(name,bits) \unsigned long name[BITS_TO_LONGS(bits)]include/linux/types.h#define BITS_TO_LONGS(nr)DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(long))include/linux/bitops.hNR_CPU bitsbits :array of long (4 / 8 bytes)Set bit! (x86)The register bitoffset operand for bts is-231 ~ 231-1 or -263 ~ 263-1 41#define IS_IMMEDIATE(nr)(__builtin_constant_p(nr))...static __always_inline voidset_bit(long nr, volatile unsigned long *addr){if (IS_IMMEDIATE(nr)) {asm volatile(LOCK_PREFIX "orb %1,%0": CONST_MASK_ADDR(nr, addr): "iq" ((u8)CONST_MASK(nr)): "memory");} else {asm volatile(LOCK_PREFIX "bts %1,%0": BITOP_ADDR(addr) : "Ir" (nr) : "memory");}}arch/x86/include/asm/bitops.hSet bit! (ARM)42#if __LINUX_ARM_ARCH__ >= 6.macrobitop, name, instrENTRY(\name)UNWIND(.fnstart)andsip, r1, #3strnebr1, [ip]@ assert word-alignedmovr2, #1andr3, r0, #[email protected] Get bit offsetmovr0, r0, lsr #5addr1, r1, r0, lsl #[email protected] Get word offset...movr3, r2, lsl r31:ldrexr2, [r1]\instrr2, r2, r3strexr0, r2, [r1]cmpr0, #0bne1bbxlrUNWIND(.fnend)ENDPROC(\name).endmbitop_set_bit, orrsmp_processor_idReturns the core ID (in the kernel)In ARM (and old days in x86)Located in currentLocated in the top of the current stackIn x86Located in the per-cpu area.43#define raw_smp_processor_id() (this_cpu_read(cpu_number))arch/x86/include/asm/smp.h#define raw_smp_processor_id() (current_thread_info()->cpu)arch/arm/include/asm/smp.hstatic inline struct thread_info *current_thread_info(void){register unsigned long sp asm ("sp");return (struct thread_info *)(sp & ~(THREAD_SIZE - 1));}arch/arm/include/asm/thread_info.hNextTopics and the rest of initializationSetup parameters (early_param() etc.)InitcallsMultiprocessor supportsPer-cpusSMP boot (secondary boot)SMP altenativesAnd other alternativesAnd Others?Modules?44