The Bug that Forced Me to Understand Memory Compaction, by Emily Samp

Abstract

Did you know that Ruby 2.7 introduces a new method for manual memory compaction?

Neither did I.

Then a user reported a bug on a gem I maintain, and well...
In this talk, I’ll tell you a story about how one bug forced me to learn all about memory management in Ruby. By the end of this talk, you should understand how memory is allocated on the heap, how Ruby implements garbage collection, and what memory compaction is all about!

Details

Intended Audience

This talk is intended for Rubyists of all experience levels who would like to gain a deeper understanding of memory management in Ruby MRI and learn about the new GC.compact method introduced in Ruby 2.7.

Outcomes

By the end of this talk, attendees should be able to understand the following concepts at a high level:

  • The structure and use of the memory heap
  • Ruby MRI garbage collection
  • Memory bloat and how that impacts Ruby MRI
  • Memory compaction, its implementation in Ruby 2.7, and how it impacts C extensions

Outline

  1. Introduction

    • As part of my job, I maintain a Ruby gem that has a C extension. One day, a user reported that the gem raised an exception when they ran GC.compact with Ruby 2.7.
    • At the time, I didn’t even know what GC.compact was, let alone why it was causing a bug in my gem. I had a lot to learn before I would be able to fix this bug.
    • This bug taught me a lot about Ruby internals, including how Ruby MRI does garbage collection, how the new memory compaction feature is implemented, and how I could update my C extension to allow for memory compaction.
    • In this talk, my goal is to share everything I learned while fixing this bug. By the end of the talk, the audience should have a high-level understanding of memory management in Ruby MRI and Ruby 2.7’s new memory compaction feature.
  2. What the heck is GC.compact anyway?

    • The gem was raising an exception when the user called the GC.compact method -- but what is GC.compact anyway?
    • GC.compact is a method introduced in MRI 2.7 that allows users to manually run memory compaction.
    • In order to understand what memory compaction is and why it’s important, you first have to understand what memory is and how Ruby manages it.
  3. Memory and Ruby

    • When writing a C program (Ruby MRI is built on top of C), you have to read and write memory from something called the “heap.”
    • The heap is part of RAM, your computer’s “short-term” memory which stores program instructions and anything else your computer will need to access quickly in the near future.
    • You might hear the phrase “the stack and the heap.” The stack is another type of memory where C programs store information about control flow as well as any variables that will automatically be deleted once they go out of scope.
    • The heap is where your program can dynamically allocate memory -- this means it doesn’t necessarily know how much memory it will need ahead of time, and it can allocate a block of unused memory of the correct size while your program is running.
    • You can think of the heap as being made up of “slots,” where every piece of memory takes up a certain number of slots. [1]
    • A C program gets the value of an object in the heap using its “pointer,” or its address in memory. This will be important later.
  4. Garbage collection!

    • When you allocate memory on the heap, you have to de-allocate it as well. Luckily, Rubyists don’t have to think about this because Ruby implements “garbage collection” -- this means that the language keeps track of what memory is no longer being used and frees it for us.
    • Ruby does a type of garbage collection called “mark and sweep”; first, it goes through every object and marks the ones that are still in use. Then, it goes through them again, and “sweeps” all the objects that haven’t been marked (i.e. the ones that are no longer in use). Ruby clears up the slots those objects were taking up on the heap. [2]
  5. Memory bloat

    • Ruby garbage collection does its job, in that it frees up unused memory, but this doesn’t always mean that the freed memory is available to use. This causes something called “memory bloat.”
    • When you allocate memory, you have to find enough contiguous slots to fit the object you’re allocating. If the empty slots are not next to each other, they can’t be used to store larger objects.
    • Often, Ruby frees up slots, but those slots are not next to each other, so they cannot be used to allocate new objects. Thus, Ruby has to continue using more and more memory. This is what causes memory bloat. [3]
    • Memory bloat is a problem because it can cause performance issues and even cause applications to fail if too much memory is allocated.
  6. Memory compaction to the rescue!

    • Now we can finally get back to talking about GC.compact.
    • GC.compact is a new memory compaction method in Ruby MRI 2.7 that is meant to address memory bloat.
    • At a high level, GC.compact will take empty slots in the middle of the heap and swap them with full slots at the end of the heap. That way, all the empty slots are located at the end of the heap, and there are enough contiguous empty slots for new objects to be allocated! [4]
  7. Explaining the bug in my gem

    • Memory compaction is great, but not every piece of memory can or should move.
    • Remember how we said that C programs use pointers to get objects out of the heap? Well, if you move everything in the heap around, the pointer referenced in your program might now be pointing to a different value.
    • This is what was happening with our C extension -- we had a variable that was created when the extension was initialized, but after calling GC.compact, the program raised an exception because the value of that variable changed!
  8. How do you fix it?

    • In order to understand how to fix this, you have to understand how GC.compact is implemented.
    • First, Ruby performs the mark-and-sweep garbage collection that I described earlier in the talk. The reason is that the Ruby mark function, rb_gc_mark, has been modified in Ruby 2.7 so that any object that is marked also gets “pinned,” meaning it’s pinned to its spot on the heap and will not move.
    • Then, empty slots in the middle of the heap are swapped with full slots at the end of the heap so that all empty slots are at the end of the heap.
    • Garbage collection is done again as clean up.
    • By this logic, the only reason my variable would be moving around is if it wasn’t properly marked during garbage collection! To test out my theory, I called the method rb_gc_register_mark_object on the variable during initialization of the C extension, and sure enough, this fixed the issue.
    • At this point, you might ask -- how was the C extension working in the first place if the variable was never marked? I believe this is because the variable was a constant, and constants are never garbage collected in Ruby, so this variable did not need to be marked for the purposes of garbage collection.
  9. How this impacts you

    • If you don’t write C extensions, then it probably doesn’t!
    • If you do, then there are a few ways to make your extension compatible with this new feature.
    • If you’re writing a brand new extension and don’t have to support versions of Ruby MRI older than 2.7 [4] you can use the new rb_gc_mark_no_pin method in your mark callbacks. This marks objects during garbage collection, but does not pin them in memory, allowing them to be moved around during compaction. Then, you can implement a new compaction callback [show a code example] and use the method rb_gc_new_location to let your program know when an object has moved in memory.
    • If you still have to support older versions of Ruby [4] you can continue to use rb_gc_mark in your mark callbacks -- as a reminder, this method will “pin” objects in memory, preventing them from moving during memory compaction. This may make memory compaction less performant, but it is an easy way to keep your C extension up to date with newer versions of Ruby.
  10. Conclusion
    This was a really hard bug to fix, and one I thought I would never be able to solve, but I’m ultimately glad that I took it on. Throughout the process, I gained a deeper understanding of memory management in Ruby MRI and learned all about the new GC.compact feature.

Authors note: Two things I plan to do for the actual version of the talk are:

  1. Provide visualizations of the heap in order to illustrate the concepts I discuss in this talk.
  2. Write a small C extension and give code examples to explain how one would modify their C extension to be compatible with GC.compact.

Sources:

  1. https://en.wikipedia.org/wiki/Memory_management#HEAP
  2. https://stackify.com/how-does-ruby-garbage-collection-work-a-simple-tutorial/
  3. https://www.joyfulbikeshedding.com/blog/2019-03-14-what-causes-ruby-memory-bloat.html#fragmentation-at-the-ruby-level
  4. https://bugs.ruby-lang.org/issues/15626

Pitch

Though I have more than 5 years of experience writing Ruby code, I have never had to understand any internal details of Ruby’s implementation. Until recently, I thought concepts like garbage collection would be too technical and complicated for me to understand. However, at my current job, I maintain a fairly popular gem with a C extension, and this has pushed me to learn more about memory management in Ruby than I ever have before. I want to demystify concepts like garbage collection and memory compaction for other members of the Ruby community by telling the story of how I came to understand them myself.

Edit proposal

Submissions

RubyConf 2020 - Waitlisted [Edit]

RubyConf 2020 - Accepted [Edit]

Add submission