Funny statistics (updated)

Well, I just had fun getting some line of code statistics about the project. These are very raw stats : number of lines in each files. Some people prefer to remove blank lines and comments from them.

Anyway, the total number of lines of code in the tree is 14557, spread accross 86 files. This means a mean number of 169 LOC/file, with a maximum of 2679 LOC and a minimum of 19 LOC. Root mean square is 342 LOC.

It’s interesting to see how code spreads across various components. Let’s first separate code written by third parties and separate the rest among the different components they belong to.

  • Part which originates from other projects : 2826 LOC
  • Bootstrap code : 3582 LOC
  • Kernel debug code : 1553 LOC
  • Testing code : 1468 LOC
  • Rest of kernel code : 5128 LOC

When sorting per file type, we get…

  • C++ files : 6026 LOC
  • Headers : 5598 LOC
  • C files : 2699 LOC
  • Assembly : 234 LOC

And if we want the 10 heaviest source files :

  • include/elf.h : 2679 LOC : Full depiction of the ELF standard (from the Linux kernel source)
  • memory/kmem_allocator.cpp : 1322 LOC – Memory allocator
  • arch/x86_64/debug/dbgstream.cpp : 912 LOC – Debug stream and other debug output facilities
  • arch/x86_64/memory/physmem.cpp : 776 LOC – Physical memory management
  • arch/x86_64/bootstrap/lib/kinfo_handling.c : 760 LOC – Retrieves various information for the kernel
  • arch/x86_64/memory/virtmem.cpp : 578 LOC – Virtual memory management
  • arch/x86_64/bootstrap/lib/txt_videomem.c : 488 LOC – The C equivalent of dbgstream.cpp
  • arch/x86_64/bootstrap/lib/paging.c : 340 LOC – Sets up early paging before the kernel starts
  • arch/x86_64/tests/memory/phymem_test_arch.cpp : 280 LOC – Physical memory management testing
  • tests/memory/phymem_test.cpp : 273 LOC – Physical MM testing (part which is guaranteed to be arch-independent)

I think it’s safe to say that I won’t ever reach Tanenbaum’s achievement of having a modern kernel which weights around 4000 lines of executable code, even taking into account the fact that this removes all C and H files… Still wonder how he did that, though.

EDIT : New statistics, about object and binary size this time ;)

Final binary size is 82KB for the kernel and 37KB for the bootstrap part. Considering that the combined size of all intermediary object files is 306KB (spread accross 40 files, mean 7.7KB, max 38.8KB, min 0.7KB, RMS 8.7KB), we can safely say that the linker did its job very well.

Separating object files using categories as before…

  • Bootstrap code : 54KB
  • Kernel debug code : 60KB
  • Testing code : 75KB
  • Rest of kernel code : 117KB

When sorting by source file type…

  • C++ : 250KB
  • C : 51KB
  • Assembly : 4.8 KB

And if we want the 10 heaviest object files :

  • arch/x86_64/debug/dbgstream.knlcpp.o : 39KB
  • memory/kmem_allocator.knlcpp.o : 34KB
  • arch/x86_64/memory/physmem.knlcpp.o : 26KB
  • arch/x86_64/memory/virtmem.knlcpp.o : 19KB
  • arch/x86_64/tests/memory/phymem_test_arch.knlcpp.o : 16KB
  • tests/memory/phymem_test.knlcpp.o : 15KB
  • arch/x86_64/debug/display_paging.knlcpp.o : 14KB
  • tests/memory/malloc_test.knlcpp.o : 13KB
  • arch/x86_64/bootstrap/lib/kinfo_handling.bsc.o : 13KB
  • tests/memory/virmem_test.knlcpp.o : 11KB

Interestingly enough, object files generated from C++ code seem much heavier than their C counterpart, since this time most of the weight comes from the microkernel. Another interesting result is the size of the binary resulting from assembly files compilation.

2 thoughts on “Funny statistics (updated)

  1. Amenel January 19, 2011 / 7:47 pm

    What’s the kind of things that you write C code for or assembly code for? I’m curious to know what task “demand” a lower level language.

  2. Hadrien January 19, 2011 / 9:34 pm

    Assembly is necessary for tasks that are arch-specific by their very nature, because they use CPU features directly. Examples : context switching, interrupt handlers, enabling paging or long (64-bit) mode on x86… I don’t use it for anything else.

    I use C for the bootstrap part. This is something which starts before the kernel and does some dirty stuff like setting up 64-bit mode and basic paging/segmentation, making a map of RAM, enumerating CPU capabilities, and loading the kernel’s ELF64 binary*. This way, the kernel can be written fully in 64-bit C++ with (almost) no hacks being put before the main function.
    At this stage of the boot process, I want something dirty, predictable, and with no runtime requirements : C is perfect for this. C++, being higher-level, is less predictable, as an example you can’t simply call a C++ function from assembly code without tweaking the function’s prototype first. It also has more runtime requirements (e.g. you need to call the constructors of global objects before running the main function).

    *I need to do this myself because GRUB 1 cannot, and I don’t want to use the mess that GRUB 2 is unless they take at least the time to write a proper, final multiboot2 spec, and reconsider some of their current decisions like loading kernel modules at nonsensically high memory locations and being able to load an ELF64 binary without being able to switch the processor to 64-bit mode.

Leave a comment