Introduction

MMTk is a memory management toolkit providing language implementers with a powerful memory management framework and researchers with a multi-runtime platform for memory management research. It is a complete re-write of the original MMTk, which was written in Java as part of Jikes RVM.

MMTk Tutorial

In this tutorial, you will build multiple garbage collectors from scratch using MMTk. You will start with an incredibly simple 'collector' called NoGC, and through a series of additions and refinements end up with a generational copying garbage collector.

This tutorial is aimed at GC implementors who would like to implement new GC algorithms/plans with MMTk. If you are a language implementor interested in porting your runtime to MMTk, you should refer to the porting guide instead.

This tutorial is a work in progress. Some sections may be rough, and others may be missing information (especially about import statements). If something is missing or inaccurate, refer to the relevant completed garbage collector if possible. Please also raise an issue, or create a pull request addressing the problem.

What is MMTk?

The Memory Management Toolkit (MMTk) is a framework for designing and implementing memory managers. It has a runtime-neutral core (mmtk-core) written in Rust, and bindings that allow it to work with OpenJDK, V8, and JikesRVM, with more bindings currently in development. MMTk was originally written in Java as part of the JikesRVM Java runtime. The current version is similar in its purpose, but was made to be very flexible with runtime and able to be ported to many different VMs.

The principal idea of MMTk is that it can be used as a toolkit, allowing new GC algorithms to be rapidly developed using common components. It also allows different GC algorithms to be compared on an apples-to-apples basis, since they share common mechanisms.

What will this tutorial cover?

This tutorial is intended to get you comfortable constructing new plans in MMTk.

You will first be guided through building a semispace collector. After that, you will extend this collector to be a generational collector, to further familiarise you with different concepts in MMTk. There will also be questions and exercises at various points in the tutorial, intended to encourage you to think about what the code is doing, increase your general understanding of MMTk, and motivate further research.

Where possible, there will be links to finished, functioning code after each section so that you can check that your code is correct. Note, however, that these will be full collectors. Therefore, there may be some differences between these files and your collector due to your position in the tutorial. By the end of each major section, your code should be functionally identical to the finished code provided.

Furthermore, please note that this code may not be identical to the main code of the MMTk. It is deliberately kept separate as a simpler stable version. Make sure to refer to the provided tutorial code and not the main collector code during the tutorial.

Glossary

allocator: Code that allocates new objects into memory.

collector: Finds and frees memory occupied by 'dead' objects.

dead: An object that is not live.

GC work (unit), GC packet: A schedulable unit of collection work.

GC worker: A worker thread that performs garbage collection operations (as required by GC work units).

live: An object that is reachable, and thus can still be accessed by other objects, is live/alive.

mutator: Something that 'mutates', or changes, the objects stored in memory. This is the term that is traditionally used in the garbage collection literature to describe the running program (because it 'mutates' the object graph).

plan: A garbage collection algorithm expressed as a configuration of policies. See also Plans and policies below.

policy: A specific garbage collection algorithm, such as marksweep, copying, immix, etc. Plans are made up of an arrangement of one or more policies. See also Plans and policies below.

scheduler: Dynamically dispatches units of GC work to workers.

zeroing, zero initialization: Initializing and resetting unused memory bits to have a value of 0. Required by most memory-safe programming languages.

See also: Further Reading

Plans and Policies

In MMTk, collectors are instantiated as plans, which can be thought of as configurations of collector policies. In practice, most production collectors and almost all collectors in MMTk are comprised of multiple algorithms/policies. For example the gencopy plan describes a configuration that combines a copying nursery with a semispace mature space. In MMTk we think of these as three spaces, each of which happen to use the copyspace policy, and which have a relationship which is defined by the gencopy plan. Under the hood, gencopy builds upon a common plan which may also contain other policies including a space for code, a read-only space, etc.

Thus, someone wishing to construct a new collector based entirely on existing policies may be able to do so in MMTk by simply writing a new plan, which is what this tutorial covers.

On the other hand, someone wishing to introduce an entirely new garbage collection policy (such as Immix, for example), would need to first create a policy which specifies that algorithm, before creating a plan which defines how the GC algorithm fits together and utilizes that policy.

Set up MMTk and OpenJDK

This tutorial can be completed with any binding. However, for the sake of simplicity, only the setup for the OpenJDK binding will be described in detail here. If you would like to use another binding, you will need to follow the README files in their respective repositories (JikesRVM, V8) to set them up, and find appropriate benchmarks for testing. Also, while it may be useful to fork the relevant repositories to your own account, it is not required for this tutorial.

First, set up OpenJDK, MMTk, and the binding:

  1. Clone the OpenJDK binding and mmtk-core repository, and install any relevant dependencies by following the instructions in the OpenJDK binding repository.
  2. Ensure you can build OpenJDK according to the instructions in the READMEs of the mmtk-core repository and the OpenJDK binding repository.
    • Use the slowdebug option when building the OpenJDK binding. This is the fastest debug variant to build, and allows for easier debugging and better testing. The rest of the tutorial will assume you are using slowdebug.
    • You can use the env var MMTK_PLAN=[PlanName] to choose a plan to use at run-time. The plans that are relevant to this tutorial are NoGC and SemiSpace.
    • Make sure you only use the env var MMTK_PLAN=[PlanName] when you run the generated java binary (./build/linux-x86_64-normal-server-$DEBUG_LEVEL/jdk/bin/java). Do not set MMTK_PLAN when you build OpenJDK (if you already have set the env var MMTK_PLAN, you would need to do export MMTK_PLAN= or unset MMTK_PLAN to clear the env var before building).

The MMTk OpenJDK binding ships with a fixed version of mmtk-core, specified in mmtk-openjdk/mmtk/Cargo.toml. For local development, you would need to build the binding with a local copy of the mmtk-core repo that you can modify. You would need to point the mmtk dependency to a local path.

  1. Find mmtk under [dependencies] in mmtk-openjdk/mmtk/Cargo.toml. It should point to the mmtk-core git path with a specific revision.
  2. Comment out the line for the git dependency, and uncomment the following line for a local dependency.
  3. The local dependency points to mmtk-openjdk/repos/mmtk-core by default. If your local mmtk-core path is not mmtk-openjdk/repos/mmtk-core, modify the path to point to your local mmtk-core.
  4. Rebuild the OpenJDK binding.

Test the build

A few benchmarks of varying size will be used throughout the tutorial. If you haven't already, set them up now. All of the following commands should be entered in repos/openjdk.

  1. HelloWorld (simplest, will never trigger GC):

    1. Copy the following code into a new Java file titled "HelloWorld.java" in mmtk-openjdk/repos/openjdk:
      class HelloWorld {
         public static void main(String[] args) {
            System.out.println("Hello World!");
         }
      }
      
    2. Use the command ./build/linux-x86_64-normal-server-$DEBUG_LEVEL/jdk/bin/javac HelloWorld.java.
    3. Then, run ./build/linux-x86_64-normal-server-$DEBUG_LEVEL/jdk/bin/java -XX:+UseThirdPartyHeap HelloWorld to run HelloWorld.
    4. If your program printed out Hello World! as expected, then congratulations, you have MMTk working with OpenJDK!
  2. The Computer Language Benchmarks Game fannkuchredux (micro benchmark, allocates a small amount of memory but - depending on heap size and the GC plan - may not trigger a collection):

    1. Copy this code into a new file named "fannkuchredux.java" in mmtk-openjdk/repos/openjdk.
    2. Use the command ./build/linux-x86_64-normal-server-$DEBUG_LEVEL/jdk/bin/javac fannkuchredux.java.
    3. Then, run ./build/linux-x86_64-normal-server-$DEBUG_LEVEL/jdk/bin/java -XX:+UseThirdPartyHeap fannkuchredux to run fannkuchredux.
  3. DaCapo benchmark suite (most complex, will likely trigger multiple collections):

    1. Fetch using wget https://sourceforge.net/projects/dacapobench/files/9.12-bach-MR1/dacapo-9.12-MR1-bach.jar/download -O ./dacapo-9.12-MR1-bach.jar.
    2. DaCapo contains a variety of benchmarks, but this tutorial will only be using lusearch. Run the lusearch benchmark using the command ./build/linux-x86_64-normal-server-$DEBUG_LEVEL/jdk/bin/java -XX:+UseThirdPartyHeap -Xms512M -Xmx512M -jar ./dacapo-9.12-MR1-bach.jar lusearch in repos/openjdk.

Rust Logs

By using one of the debug builds, you gain access to the Rust logs - a useful tool when testing a plan and observing the general behaviour of MMTk. There are two levels of trace that are useful when using MMTk - trace and debug. Generally, debug logs information about the slow paths (allocation through MMTk, rather than fast path allocation through the binding). trace includes all the information from debug, plus more information about both slow and fast paths and garbage collection activities. You can set which level to view the logs at by setting the environment variable RUST_LOG. For more information, see the env_logger crate documentation.

Working with different GC plans

You will be using multiple GC plans in this tutorial. You should familiarise yourself with how to do this now.

  1. The OpenJDK build will always generate in mmtk-openjdk/repos/openjdk/build. From the same build, you can run different GC plans by using the environment variable MMTK_PLAN=[PlanName]. Generally you won't need multiple VM builds. However, if you do need to keep a build (for instance, to make quick performance comparisons), you can do the following: rename either the build folder or the folder generated within it (eg linux-x86_64-normal-server-$DEBUG_LEVEL).
    1. Renaming the build folder is the safest method for this.
    2. If you rename the internal folder, there is a possibility that the new build will generate incorrectly. If a build appears to generate strangely quickly, it probably generated badly.
    3. A renamed build folder can be tested by changing the file path in commands as appropriate.
    4. If you plan to completely overwrite a build, deleting the folder you are writing over will help prevent errors.
  2. Try running your build with NoGC. Both HelloWorld and the fannkuchredux benchmark should run without issue. If you then run lusearch, it should fail when a collection is triggered. It is possible to increase the heap size enough that no collections will be triggered, but it is okay to let it fail for now. When we build using a proper GC, it will be able to pass. The messages and errors produced should look identical or nearly identical to the log below.
    $ MMTK_PLAN=NoGC ./build/linux-x86_64-normal-server-$DEBUG_LEVEL/jdk/bin/java -XX:+UseThirdPartyHeap -Xms512M -Xmx512M -jar ./dacapo-9.12-MR1-bach.jar lusearch
    Using scaled threading model. 24 processors detected, 24 threads used to drive the workload, in a possible range of [1,64]
    Warning: User attempted a collection request, but it is not supported in NoGC. The request is ignored.
    ===== DaCapo 9.12-MR1 lusearch starting =====
    [2020-12-18T00:27:49Z INFO  mmtk::plan::global]   [POLL] nogc_space: Triggering collection
    [2020-12-18T00:27:49Z INFO  mmtk::plan::global]   [POLL] nogc_space: Triggering collection
    thread '<unnamed>' panicked at 'internal error: entered unreachable code: GC triggered in nogc', /opt/rust/toolchains/nightly-2020-07-08-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/macros.rs:16:9
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
    [2020-12-18T00:27:49Z INFO  mmtk::plan::global]   [POLL] nogc_space: Triggering collection
    [2020-12-18T00:27:49Z INFO  mmtk::plan::global]   [POLL] nogc_space: Triggering collection
    [2020-12-18T00:27:49Z INFO  mmtk::plan::global]   [POLL] nogc_space: Triggering collection
    [2020-12-18T00:27:49Z INFO  mmtk::plan::global]   [POLL] nogc_space: Triggering collection
    [2020-12-18T00:27:49Z INFO  mmtk::plan::global]   [POLL] nogc_space: Triggering collection
    [2020-12-18T00:27:49Z INFO  mmtk::plan::global]   [POLL] nogc_space: Triggering collection
    [2020-12-18T00:27:49Z INFO  mmtk::plan::global]   [POLL] nogc_space: Triggering collection
    [2020-12-18T00:27:49Z INFO  mmtk::plan::global]   [POLL] nogc_space: Triggering collection
    [2020-12-18T00:27:49Z INFO  mmtk::plan::global]   [POLL] nogc_space: Triggering collection
    [2020-12-18T00:27:49Z INFO  mmtk::plan::global]   [POLL] nogc_space: Triggering collection
    [2020-12-18T00:27:49Z INFO  mmtk::plan::global]   [POLL] nogc_space: Triggering collection
    [2020-12-18T00:27:49Z INFO  mmtk::plan::global]   [POLL] nogc_space: Triggering collection
    [2020-12-18T00:27:49Z INFO  mmtk::plan::global]   [POLL] nogc_space: Triggering collection
    [2020-12-18T00:27:49Z INFO  mmtk::plan::global]   [POLL] nogc_space: Triggering collection
    [2020-12-18T00:27:49Z INFO  mmtk::plan::global]   [POLL] nogc_space: Triggering collection
    [2020-12-18T00:27:49Z INFO  mmtk::plan::global]   [POLL] nogc_space: Triggering collection
    [2020-12-18T00:27:49Z INFO  mmtk::plan::global]   [POLL] nogc_space: Triggering collection
    [2020-12-18T00:27:49Z INFO  mmtk::plan::global]   [POLL] nogc_space: Triggering collection
    [2020-12-18T00:27:49Z INFO  mmtk::plan::global]   [POLL] nogc_space: Triggering collection
    [2020-12-18T00:27:49Z INFO  mmtk::plan::global]   [POLL] nogc_space: Triggering collection
    [2020-12-18T00:27:49Z INFO  mmtk::plan::global]   [POLL] nogc_space: Triggering collection
    [2020-12-18T00:27:49Z INFO  mmtk::plan::global]   [POLL] nogc_space: Triggering collection
    fatal runtime error: failed to initiate panic, error 5
    Aborted (core dumped)
    
  3. Try running your build with SemiSpace. lusearch should now pass, as garbage will be collected, and the smaller benchmarks should run the same as they did while using NoGC.
    MMTK_PLAN=SemiSpace ./build/linux-x86_64-normal-server-$DEBUG_LEVEL/jdk/bin/java -XX:+UseThirdPartyHeap -Xms512M -Xmx512M -jar ./dacapo-9.12-MR1-bach.jar lusearch
    

Create MyGC

NoGC is a GC plan that only allocates memory, and does not have a collector. We're going to use it as a base for building a new garbage collector.

Recall that this tutorial will take you through the steps of building a collector from basic principles. To do that, you'll create your own plan called MyGC which you'll gradually refine and improve upon through the course of this tutorial. At the beginning MyGC will resemble the very simple NoGC plan.

  1. Each plan is stored in mmtk-openjdk/repos/mmtk-core/src/plan. Navigate there and create a copy of the folder nogc. Rename it to mygc.
  2. In each file within mygc, rename any reference to nogc to mygc. You will also have to separately rename any reference to NoGC to MyGC.
    • For example, in Visual Studio Code, you can (making sure case sensitivity is selected in the search function) select one instance of nogc and either right click and select "Change all instances" or use the CTRL-F2 shortcut, and then type mygc, and repeat for NoGC.
  3. In order to use MyGC, you will need to make some changes to the following files.
    1. mmtk-core/src/plan/mod.rs, add:
      #![allow(unused)]
      fn main() {
      pub mod mygc;
      }
      This adds mygc as a module.
    2. mmtk-core/src/util/options.rs, add MyGC to the enum PlanSelector. This allows MMTk to accept MyGC as a command line option for plan, or an environment variable for MMTK_PLAN:
      #![allow(unused)]
      fn main() {
      #[derive(Copy, Clone, EnumFromStr, Debug)]
      pub enum PlanSelector {
          NoGC,
          SemiSpace,
          GenCopy,
          GenImmix,
          MarkSweep,
          PageProtect,
          Immix,
          // Add this!
          MyGC,
      }
      }
    3. mmtk-core/src/plan/global.rs, add new expressions to create_mutator() and create_plan() for MyGC, following the pattern of the existing plans. These define the location of the mutator and plan's constructors.
      #![allow(unused)]
      fn main() {
      // NOTE: Sections of this code snippet not relevant to this step of the 
      // tutorial (marked by "// ...") have been omitted.
      pub fn create_mutator<VM: VMBinding>(
          tls: VMMutatorThread,
          mmtk: &'static MMTK<VM>,
      ) -> Box<Mutator<VM>> {
          Box::new(match mmtk.options.plan {
              PlanSelector::NoGC => crate::plan::nogc::mutator::create_nogc_mutator(tls, &*mmtk.plan),
              PlanSelector::SemiSpace => {
                  crate::plan::semispace::mutator::create_ss_mutator(tls, &*mmtk.plan)
              }
      
              // ...
      
              // Create MyGC mutator based on selector
              PlanSelector::MyGC => crate::plan::mygc::mutator::create_mygc_mutator(tls, &*mmtk.plan),    })
      }
      
      pub fn create_plan<VM: VMBinding>(
          plan: PlanSelector,
          vm_map: &'static VMMap,
          mmapper: &'static Mmapper,
          options: Arc<UnsafeOptionsWrapper>,
      ) -> Box<dyn Plan<VM = VM>> {
          match plan {
              PlanSelector::NoGC => Box::new(crate::plan::nogc::NoGC::new(args)),
              PlanSelector::SemiSpace => Box::new(crate::plan::semispace::SemiSpace::new(args)),
      
              // ...
      
              // Create MyGC plan based on selector
              PlanSelector::MyGC => Box::new(crate::plan::mygc::MyGC::new(args))
          }
      }       
      }

Note that all of the above changes almost exactly copy the NoGC entries in each of these files. However, NoGC has some variants, such as a lock-free variant. For simplicity, those are not needed for this tutorial. Remove references to them in the MyGC plan now.

  1. Within mygc/global.rs, find any use of #[cfg(feature = "mygc_lock_free")] and delete both it and the line below it.
  2. Then, delete any use of the above line's negation, #[cfg(not(feature = "mygc_lock_free"))], this time without changing the line below it.

After you rebuild OpenJDK (and mmtk-core), you can run MyGC with your new build (MMTK_PLAN=MyGC). Try testing it with the each of the three benchmarks. It should work identically to NoGC.

If you've got to this point, then congratulations! You have created your first working MMTk collector!

At this point, you should familiarise yourself with the MyGC plan if you haven't already. Try answering the following questions by looking at the code and Further Reading:

  • Where is the allocator defined?
  • How many memory spaces are there?
  • What kind of memory space policy is used?
  • What happens if garbage has to be collected?

Building a semispace collector

In a semispace collector, the heap is divided into two equally-sized spaces, called 'semispaces'. One of these is defined as a 'fromspace', and the other a 'tospace'. The allocator allocates to the tospace until it is full.

When the tospace is full, a stop-the-world GC is triggered. The mutator is paused, and the definitions of the spaces are flipped (the 'tospace' becomes a 'fromspace', and vice versa). Then, the collector scans each object in what is now the fromspace. If a live object is found, a copy of it is made in the tospace. That is to say, live objects are copied from the fromspace to the tospace. After every object is scanned, the fromspace is cleared. The GC finishes, and the mutator is resumed.

Allocation: Add copyspaces

We will now change your MyGC plan from one that cannot collect garbage into one that implements the semispace algorithm. The first step of this is to add the two copyspaces, and allow collectors to allocate memory into them. This involves adding two copyspaces, the code to properly initialise and prepare the new spaces, and a copy context.

Change the plan constraints

Firstly, change the plan constraints. Some of these constraints are not used at the moment, but it's good to set them properly regardless.

Look in plan/plan_constraints.rs. PlanConstraints lists all the possible options for plan-specific constraints. At the moment, MYGC_CONSTRAINTS in mygc/global.rs should be using the default value for PlanConstraints. We will make the following changes:

  1. Initialize gc_header_bits to 2. We reserve 2 bits in the header for GC use.
  2. Initialize moves_objects to true.

Finished code (step 1-3):

pub const MYGC_CONSTRAINTS: PlanConstraints = PlanConstraints {
    moves_objects: true,
    ..PlanConstraints::default()
};

Change the plan implementation

Next, in mygc/global.rs, replace the old immortal (nogc) space with two copyspaces.

Imports

To the import statement block:

  1. Replace crate::plan::global::{BasePlan, NoCopy}; with use crate::plan::global::BasePlan;. This collector is going to use copying, so there's no point to importing NoCopy any more.
  2. Add use crate::plan::global::CommonPlan;. Semispace uses the common plan, which includes an immortal space and a large object space, rather than the base plan. Any garbage collected plan should use CommonPlan.
  3. Add use std::sync::atomic::{AtomicBool, Ordering};. These are going to be used to store an indicator of which copyspace is the tospace.
  4. Delete #[allow(unused_imports)].

Finished code (step 1):

#![allow(unused)]
fn main() {
use crate::plan::global::BasePlan; //Modify
use crate::plan::global::CommonPlan; // Add
use crate::plan::global::{CreateGeneralPlanArgs, CreateSpecificPlanArgs};
use crate::plan::mygc::mutator::ALLOCATOR_MAPPING;
use crate::plan::mygc::gc_work::MyGCWorkContext;
use crate::plan::AllocationSemantics;
use crate::plan::Plan;
use crate::plan::PlanConstraints;
use crate::policy::copyspace::CopySpace; // Add
use crate::policy::space::Space;
use crate::scheduler::*; // Modify
use crate::util::alloc::allocators::AllocatorSelector;
use crate::util::copy::*;
use crate::util::heap::VMRequest;
use crate::util::heap::gc_trigger::SpaceStats;
use crate::util::metadata::side_metadata::SideMetadataContext;
use crate::util::opaque_pointer::*;
use crate::vm::VMBinding;
use enum_map::EnumMap;
use std::sync::atomic::{AtomicBool, Ordering}; // Add
}

Struct MyGC

Change pub struct MyGC<VM: VMBinding> to add new instance variables.

  1. Delete the existing fields in the constructor.
  2. Add pub hi: AtomicBool,. This is a thread-safe bool, indicating which copyspace is the tospace.
  3. Add pub copyspace0: CopySpace<VM>, and pub copyspace1: CopySpace<VM>,. These are the two copyspaces.
  4. Add pub common: CommonPlan<VM>,. This holds an instance of the common plan.

Finished code (step 2):

#![allow(unused)]
fn main() {
#[derive(HasSpaces, PlanTraceObject)]
pub struct MyGC<VM: VMBinding> {
    pub hi: AtomicBool,
    #[space]
    #[copy_semantics(CopySemantics::DefaultCopy)]
    pub copyspace0: CopySpace<VM>,
    #[space]
    #[copy_semantics(CopySemantics::DefaultCopy)]
    pub copyspace1: CopySpace<VM>,
    #[parent]
    pub common: CommonPlan<VM>,
}
}

Note that MyGC now also derives PlanTraceObject besides HasSpaces, and we have attributes on some fields. These attributes tell MMTk's macros how to generate code to visit each space of this plan as well as trace objects in this plan. Although there are other approaches that you can implement object tracing, in this tutorial we use the macros, as it is the simplest. Make sure you import the macros. We will discuss on what those attributes mean in later sections.

#![allow(unused)]
fn main() {
use mmtk_macros::{HasSpaces, PlanTraceObject};
}

Implement the Plan trait for MyGC

Constructor

Change fn new(). This section initialises and prepares the objects in MyGC that you just defined.

  1. Delete the definition of mygc_space. Instead, we will define the two copyspaces here.
  2. Define one of the copyspaces by adding the following code:
#![allow(unused)]
fn main() {
            copyspace0: CopySpace::new(plan_args.get_space_args("copyspace0", true, VMRequest::discontiguous()), false),
}
  1. Create another copyspace, called copyspace1, defining it as a fromspace instead of a tospace. (Hint: the definitions for copyspaces are in src/policy/copyspace.rs.)
  2. Finally, replace the old MyGC initializer.
#![allow(unused)]
fn main() {
    fn new(args: CreateGeneralPlanArgs<VM>) -> Self {
        // Modify
        let mut plan_args = CreateSpecificPlanArgs {
            global_args: args,
            constraints: &MYGC_CONSTRAINTS,
            global_side_metadata_specs: SideMetadataContext::new_global_specs(&[]),
        };

        let res = MyGC {
            hi: AtomicBool::new(false),
            copyspace0: CopySpace::new(plan_args.get_space_args("copyspace0", true, VMRequest::discontiguous()), false),
            copyspace1: CopySpace::new(plan_args.get_space_args("copyspace1", true, VMRequest::discontiguous()), true),
            common: CommonPlan::new(plan_args),
        };

        res.verify_side_metadata_sanity();

        res
    }
}

Access MyGC spaces

Add a new section of methods for MyGC:

#![allow(unused)]
fn main() {
impl<VM: VMBinding> MyGC<VM> {
}
}

To this, add two helper methods, tospace(&self) and fromspace(&self). They both have return type &CopySpace<VM>, and return a reference to the tospace and fromspace respectively. tospace() (see below) returns a reference to the tospace, and fromspace() returns a reference to the fromspace.

We also add another two helper methods to get tospace_mut(&mut self) and fromspace_mut(&mut self). Those will be used later when we implement collection for our GC plan.

#![allow(unused)]
fn main() {
    pub fn tospace(&self) -> &CopySpace<VM> {
        if self.hi.load(Ordering::SeqCst) {
            &self.copyspace1
        } else {
            &self.copyspace0
        }
    }

    pub fn fromspace(&self) -> &CopySpace<VM> {
        if self.hi.load(Ordering::SeqCst) {
            &self.copyspace0
        } else {
            &self.copyspace1
        }
    }

    pub fn tospace_mut(&mut self) -> &mut CopySpace<VM> {
        if self.hi.load(Ordering::SeqCst) {
            &mut self.copyspace1
        } else {
            &mut self.copyspace0
        }
    }

    pub fn fromspace_mut(&mut self) -> &mut CopySpace<VM> {
        if self.hi.load(Ordering::SeqCst) {
            &mut self.copyspace0
        } else {
            &mut self.copyspace1
        }
    }
}

Other methods in the Plan trait

The trait Plan requires a common() method that should return a reference to the common plan. Implement this method now.

#![allow(unused)]
fn main() {
    fn common(&self) -> &CommonPlan<VM> {
        &self.common
    }
}

Find the helper method base and change it so that it calls the base plan through the common plan.

#![allow(unused)]
fn main() {
    fn base(&self) -> &BasePlan<VM> {
        &self.common.base
    }

    fn base_mut(&mut self) -> &mut BasePlan<Self::VM> {
        &mut self.common.base
    }
}

The trait Plan requires collection_required() method to know when we should trigger a collection. We can just use the implementation in the BasePlan.

#![allow(unused)]
fn main() {
    fn collection_required(&self, space_full: bool, _space: Option<SpaceStats<Self::VM>>) -> bool {
        self.base().collection_required(self, space_full)
    }
}

Find the method get_pages_used. Replace the current body with self.tospace().reserved_pages() + self.common.get_pages_used(), to correctly count the pages contained in the tospace and the common plan spaces (which will be explained later).

#![allow(unused)]
fn main() {
    fn get_used_pages(&self) -> usize {
        self.tospace().reserved_pages() + self.common.get_used_pages()
    }
}

Add and override the following helper function:

#![allow(unused)]
fn main() {
    fn get_collection_reserved_pages(&self) -> usize {
        self.tospace().reserved_pages()
    }
}

Change the mutator definition

Next, we need to change the mutator, in mutator.rs, to allocate to the tospace, and to the two spaces controlled by the common plan.

Imports

Change the following import statements:

  1. Add use super::MyGC;.
  2. Add use crate::util::alloc::BumpAllocator;.
  3. Delete use crate::plan::mygc::MyGC;.
#![allow(unused)]
fn main() {
use super::MyGC; // Add
use crate::MMTK;
use crate::plan::barriers::NoBarrier;
use crate::plan::mutator_context::Mutator;
use crate::plan::mutator_context::MutatorConfig;
use crate::plan::AllocationSemantics;
use crate::util::alloc::allocators::{AllocatorSelector, Allocators};
use crate::util::alloc::BumpAllocator;
use crate::util::opaque_pointer::*;
use crate::vm::VMBinding;
use crate::plan::mutator_context::{
    create_allocator_mapping, create_space_mapping, ReservedAllocators,
};
use enum_map::EnumMap;
// Remove crate::plan::mygc::MyGC
// Remove mygc_mutator_noop
}

Allocator mapping

In lazy_static!, make the following changes to ALLOCATOR_MAPPING, which maps the required allocation semantics to the corresponding allocators. For example, for Default, we allocate using the first bump pointer allocator (BumpPointer(0)):

  1. Define a ReservedAllocators instance to declare that we need one bump allocator.
  2. Map the common plan allocators using create_allocator_mapping.
  3. Map Default to BumpPointer(0).
#![allow(unused)]
fn main() {
const RESERVED_ALLOCATORS: ReservedAllocators = ReservedAllocators {
    n_bump_pointer: 1,
    ..ReservedAllocators::DEFAULT
};

lazy_static! {
    pub static ref ALLOCATOR_MAPPING: EnumMap<AllocationSemantics, AllocatorSelector> = {
        let mut map = create_allocator_mapping(RESERVED_ALLOCATORS, true);
        map[AllocationSemantics::Default] = AllocatorSelector::BumpPointer(0);
        map
    };
}
}

Space mapping

Next, in create_mygc_mutator, change which allocator is allocated to what space in space_mapping. Note that the space allocation is formatted as a list of tuples. For example, the first bump pointer allocator (BumpPointer(0)) is bound with tospace.

Downcast the dynamic Plan type to MyGC so we can access specific spaces in MyGC.

#![allow(unused)]
fn main() {
    let mygc = mmtk.get_plan().downcast_ref::<MyGC<VM>>().unwrap();
}

Then, use mygc to access the spaces in MyGC.

  1. BumpPointer(0) should map to the tospace.
  2. Other common plan allocators should be mapped using create_space_mapping.
  3. None of the above should be dereferenced (ie, they should not have the & prefix).
#![allow(unused)]
fn main() {
        space_mapping: Box::new({
            let mut vec = create_space_mapping(RESERVED_ALLOCATORS, true, mygc);
            vec.push((AllocatorSelector::BumpPointer(0), mygc.tospace()));
            vec
        }),
}

The create_space_mapping and create_allocator_mapping call that have appeared all of a sudden in these past 2 steps, are parts of the MMTk common plan itself. They are used to construct allocator-space mappings for the spaces defined by the common plan:

  1. The immortal space is used for objects that the virtual machine or a library never expects to die.
  2. The large object space is needed because MMTk handles particularly large objects differently to normal objects, as the space overhead of copying large objects is very high. Instead, this space is used by a free list allocator in the common plan to avoid having to copy them.
  3. The read-only space is used to store all the immutable objects.
  4. The code spaces are used for VM generated code objects.

With this, you should have the allocation working, but not garbage collection. Try building again. If you run HelloWorld or Fannkunchredux, they should work. DaCapo's lusearch should fail, as it requires garbage to be collected.

Collection: Implement garbage collection

We need to add a few more things to get garbage collection working. Specifically, we need to config the GCWorkerCopyContext, which a GC worker uses for copying objects, and GC work packets that will be scheduled for a collection.

CopyConfig

CopyConfig defines how a GC plan copies objects. Similar to the MutatorConfig struct, you would need to define CopyConfig for your plan.

In impl<VM: VMBinding> Plan for MyGC<VM>, override the method create_copy_config(). The default implementation provides a default CopyConfig for non-copying plans. So for non-copying plans, you do not need to override the method. But for copying plans, you would have to provide a proper copy configuration.

In a semispace GC, objects will be copied between the two copy spaces. We will use one CopySpaceCopyContext for the copying, and will rebind the copy context to the proper tospace in the preparation step of a GC (which will be discussed later when we talk about preparing for collections).

We use CopySemantics::DefaultCopy for our copy operation, and bind it with the first CopySpaceCopyContext (CopySemantics::DefaultCopy => CopySelector::CopySpace(0)). Other copy semantics are unused in this plan. We also provide an initial space binding for CopySpaceCopyContext. However, we will flip tospace in every GC, and rebind the copy context to the new tospace in each GC, so it does not matter which space we use as the initial space here.

#![allow(unused)]
fn main() {
    fn create_copy_config(&'static self) -> CopyConfig<Self::VM> {
        use enum_map::enum_map;
        CopyConfig {
            copy_mapping: enum_map! {
                CopySemantics::DefaultCopy => CopySelector::CopySpace(0),
                _ => CopySelector::Unused,
            },
            space_mapping: vec![
                // The tospace argument doesn't matter, we will rebind before a GC anyway.
                (CopySelector::CopySpace(0), &self.copyspace0)
            ],
            constraints: &MYGC_CONSTRAINTS,
        }
    }
}

Introduce collection to MyGC plan

Add a new method to Plan for MyGC, schedule_collection(). This function runs when a collection is triggered. It schedules GC work for the plan, i.e., it stops all mutators, runs the scheduler's prepare stage and resumes the mutators. The StopMutators work will invoke code from the bindings to scan threads and other roots, and those scanning work will further push work for a transitive closure.

Though you can add those work packets by yourself, GCWorkScheduler provides a method schedule_common_work() that will add common work packets for you.

To use schedule_common_work(), first we need to create a type MyGCWorkContext and implement the trait GCWorkContext for it. We create gc_work.rs and add the following implementation. Note that we will use the default SFTProcessEdges, which is a general work packet that a plan can use to trace objects. For plans like semispace, SFTProcessEdges is sufficient. For more complex GC plans, one can create and write their own work packet that implements the ProcessEdgesWork trait. We will discuss about this later, and discuss the alternatives.

#![allow(unused)]
fn main() {
pub struct MyGCWorkContext<VM: VMBinding>(std::marker::PhantomData<VM>);
impl<VM: VMBinding> crate::scheduler::GCWorkContext for MyGCWorkContext<VM> {
    type VM = VM;
    type PlanType = MyGC<VM>;
    type DefaultProcessEdges = SFTProcessEdges<Self::VM>;
    type PinningProcessEdges = UnsupportedProcessEdges<Self::VM>;
}
}

Then we implement schedule_collection() using MyGCWorkContext and schedule_common_work().

#![allow(unused)]
fn main() {
    fn schedule_collection(&'static self, scheduler: &GCWorkScheduler<VM>) {
        scheduler.schedule_common_work::<MyGCWorkContext<VM>>(self);
    }
}

Delete handle_user_collection_request(). This function was an override of a Common plan function to ignore user requested collection for NoGC. Now we remove it and allow user requested collection.

Prepare for collection

The collector has a number of steps it needs to perform before each collection. We'll add these now.

Prepare plan

In mygc/global.rs, find the method prepare. Delete the unreachable!() call, and add the following code:

#![allow(unused)]
fn main() {
    fn prepare(&mut self, tls: VMWorkerThread) {
        self.common.prepare(tls, true);

        self.hi
            .store(!self.hi.load(Ordering::SeqCst), Ordering::SeqCst);
        // Flips 'hi' to flip space definitions
        let hi = self.hi.load(Ordering::SeqCst);
        self.copyspace0.prepare(hi);
        self.copyspace1.prepare(!hi);

        self.fromspace_mut()
            .set_copy_for_sft_trace(Some(CopySemantics::DefaultCopy));
        self.tospace_mut().set_copy_for_sft_trace(None);
    }
}

This function is called at the start of a collection. It prepares the two spaces in the common plan, flips the definitions for which space is 'to' and which is 'from', then prepares the copyspaces with the new definition.

Note that we call set_copy_for_sft_trace() for both spaces. This step is required when using SFTProcessEdges to tell the spaces which copy semantic to use for copying. For fromspace, we use the DefaultCopy semantic, which we have defined earlier in our CopyConfig. So for objects in fromspace that need to be copied, the policy will use the copy context that binds with DefaultCopy (which allocates to the tospace) in the GC worker. For tospace, we set its copy semantics to None, as we do not expect to copy objects from tospace, and if that ever happens, we will simply panic.

Prepare worker

As we flip tospace for the plan, we also need to rebind the copy context to the new tospace. We will override prepare_worker() in our Plan implementation. Plan.prepare_worker() is executed by each GC worker in the preparation phase of a GC. The code is straightforward -- we get the first CopySpaceCopyContext, and call rebind() on it with the new tospace.

#![allow(unused)]
fn main() {
    fn prepare_worker(&self, worker: &mut GCWorker<VM>) {
        unsafe { worker.get_copy_context_mut().copy[0].assume_init_mut() }.rebind(self.tospace());
    }
}

Prepare mutator

Going back to mutator.rs, create a new function called mygc_mutator_prepare<VM: VMBinding>(_mutator: &mut Mutator<VM>, _tls: VMWorkerThread). This function will be called at the preparation stage of a collection (at the start of a collection) for each mutator. Its body can stay empty, as there aren't any preparation steps for the mutator in this GC. In create_mygc_mutator(), find the field prepare_func and change it from &unreachable_prepare_func to &mygc_mutator_prepare.

💡 Hint: If your plan does nothing when preparing mutators, there is an optimization you can do. You may set the plan constraints field PlanConstraints::needs_prepare_mutator to false so that the PrepareMutator work packets which call prepare_func will not be created in the first place. This optimization is helpful for VMs that run with a large number of mutator threads. If you do this optimization, you may also leave the MutatorConfig::prepare_func field as &unreachable_prepare_func to indicate it should not be called.

Release

Finally, we need to fill out the functions that are, roughly speaking, run after each collection.

Release in plan

Find the method release() in mygc/global.rs. Replace the unreachable!() call with the following code.

#![allow(unused)]
fn main() {
    fn release(&mut self, tls: VMWorkerThread) {
        self.common.release(tls, true);
        self.fromspace().release();
    }
}

This function is called at the end of a collection. It calls the release routines for the common plan spaces and the fromspace.

Release in mutator

Go back to mutator.rs. Create a new function called mygc_mutator_release() that takes the same inputs as the mygc_mutator_prepare() function above.

#![allow(unused)]
fn main() {
pub fn mygc_mutator_release<VM: VMBinding>(
    mutator: &mut Mutator<VM>,
    _tls: VMWorkerThread,
) {
    // rebind the allocation bump pointer to the appropriate semispace
    let bump_allocator = unsafe {
        mutator
            .allocators
            .get_allocator_mut(mutator.config.allocator_mapping[AllocationSemantics::Default])
    }
    .downcast_mut::<BumpAllocator<VM>>()
    .unwrap();
    bump_allocator.rebind(
        mutator
            .plan
            .downcast_ref::<MyGC<VM>>()
            .unwrap()
            .tospace(),
    );
}
}

Then go to create_mygc_mutator(), replace &unreachable_release_func in the release_func field with &mygc_mutator_release. This function will be called at the release stage of a collection (at the end of a collection) for each mutator. It rebinds the allocator for the Default allocation semantics to the new tospace. When the mutator threads resume, any new allocations for Default will then go to the new tospace.

ProcessEdgesWork for MyGC

ProcessEdgesWork is the key work packet for tracing objects in a GC. A ProcessEdgesWork implementation defines how to trace objects, and how to generate more work packets based on the current tracing to finish the object closure.

GCWorkContext specifies a type that implements ProcessEdgesWork, and we used SFTProcessEdges earlier. In this section, we discuss what SFTProcessEdges does, and what the alternatives are.

Approach 1: Use SFTProcessEdges

SFTProcessEdges dispatches the tracing of objects to their respective spaces through Space Function Table (SFT). As long as all the policies in a plan provide an implementation of sft_trace_object() in their SFT implementations, the plan can use SFTProcessEdges. Currently most policies provide an implementation for sft_trace_object(), except mark compact and immix. Those two policies use multiple GC traces, and due to the limitation of SFT, SFT does not allow multiple sft_trace_object() for a policy.

SFTProcessEdges is the simplest approach when all the policies support it. Fortunately, we can use it for our GC, semispace.

Approach 2: Derive PlanTraceObject and use PlanProcessEdges

PlanProcessEdges is another general ProcessEdgesWork implementation that can be used by most plans. When a plan implements the PlanTraceObject, it can use PlanProcessEdges.

You can manually provide an implementation of PlanTraceObject for MyGC. But you can also use the derive macro MMTK provides, and the macro will generate an implementation of PlanTraceObject:

  • Make sure MyGC already has the #[derive(HasSpaces)] attribute because all plans need to implement the HasSpaces trait anyway. (import the macro properly: use mmtk_macros::HasSpaces)
  • Add #[derive(PlanTraceObject)] for MyGC (import the macro properly: use mmtk_macros::PlanTraceObject)
  • Add both #[space] and #[copy_semantics(CopySemantics::Default)] to both copy space fields, copyspace0 and copyspace1. #[space] tells the macro that both copyspace0 and copyspace1 are spaces in the MyGC plan, and the generated trace code will check both spaces. #[copy_semantics(CopySemantics::DefaultCopy)] specifies the copy semantics to use when tracing objects in the corresponding space.
  • Add #[parent] to common. This tells the macro that there are more spaces defined in common and its nested structs. If an object is not found in any space with #[space] in this plan, the trace code will try to find the space for the object in the 'parent' plan. In our case, the trace code will proceed by checking spaces in the CommonPlan, as the object may be in large object space or immortal space in the common plan. CommonPlan also implements PlanTraceObject, so it knows how to find a space for the object and trace it in the same way.

With the derive macro, your MyGC struct should look like this:

#![allow(unused)]
fn main() {
#[derive(HasSpaces, PlanTraceObject)]
pub struct MyGC<VM: VMBinding> {
    pub hi: AtomicBool,
    #[space]
    #[copy_semantics(CopySemantics::DefaultCopy)]
    pub copyspace0: CopySpace<VM>,
    #[space]
    #[copy_semantics(CopySemantics::DefaultCopy)]
    pub copyspace1: CopySpace<VM>,
    #[parent]
    pub common: CommonPlan<VM>,
}
}

Once this is done, you can specify PlanProcessEdges as the DefaultProcessEdges in your GC work context:

#![allow(unused)]
fn main() {
use crate::policy::gc_work::DEFAULT_TRACE;
use crate::scheduler::gc_work::PlanProcessEdges;
pub struct MyGCWorkContext2<VM: VMBinding>(std::marker::PhantomData<VM>);
impl<VM: VMBinding> crate::scheduler::GCWorkContext for MyGCWorkContext2<VM> {
    type VM = VM;
    type PlanType = MyGC<VM>;
    type DefaultProcessEdges = PlanProcessEdges<Self::VM, MyGC<VM>, DEFAULT_TRACE>;
    type PinningProcessEdges = UnsupportedProcessEdges<Self::VM>;
}
}

Approach 3: Implement your own ProcessEdgesWork

Apart from the two approaches above, you can always implement your own ProcessEdgesWork. This is an overkill for simple plans like semi space, but might be necessary for more complex plans. We discuss how to implement it for MyGC.

Create a struct MyGCProcessEdges<VM: VMBinding> in the gc_work module. It includes a reference back to the plan, and a ProcessEdgesBase field:

#![allow(unused)]
fn main() {
pub struct MyGCProcessEdges<VM: VMBinding> {
    plan: &'static MyGC<VM>,
    base: ProcessEdgesBase<VM>,
}
}

Implement ProcessEdgesWork for MyGCProcessEdges. As most methods in the trait have a default implemetation, we only need to implement new() and trace_object() for our plan. However, this may not be true when you implement it for other GC plans. It would be better to check the default implementation of ProcessEdgesWork.

For trace_object(), what we do is similar to the approach above (except that we need to write the code ourselves rather than letting the macro to generate it for us). We try to figure out which space the object is in, and invoke trace_object() for the object on that space. If the object is not in any of the semi spaces in the plan, we forward the call to CommonPlan.

#![allow(unused)]
fn main() {
impl<VM: VMBinding> ProcessEdgesWork for MyGCProcessEdges<VM> {
    type VM = VM;
    type ScanObjectsWorkType = ScanObjects<Self>;

    fn new(
        edges: Vec<EdgeOf<Self>>,
        roots: bool,
        mmtk: &'static MMTK<VM>,
        bucket: WorkBucketStage,
    ) -> Self {
        let base = ProcessEdgesBase::new(edges, roots, mmtk, bucket);
        let plan = base.plan().downcast_ref::<MyGC<VM>>().unwrap();
        Self { base, plan }
    }

    fn trace_object(&mut self, object: ObjectReference) -> ObjectReference {
        let worker = self.worker();
        let queue = &mut self.base.nodes;
        if self.plan.tospace().in_space(object) {
            self.plan.tospace().trace_object(
                queue,
                object,
                Some(CopySemantics::DefaultCopy),
                worker,
            )
        } else if self.plan.fromspace().in_space(object) {
            self.plan.fromspace().trace_object(
                queue,
                object,
                Some(CopySemantics::DefaultCopy),
                worker,
            )
        } else {
            self.plan.common.trace_object(queue, object, worker)
        }
    }

    fn create_scan_work(&self, nodes: Vec<ObjectReference>) -> ScanObjects<Self> {
        ScanObjects::<Self>::new(nodes, false, self.bucket)
    }
}
}

We would also need to implement Deref and DerefMut to our ProcessEdgesWork impl to be dereferenced as ProcessEdgesBase.

#![allow(unused)]
fn main() {
impl<VM: VMBinding> Deref for MyGCProcessEdges<VM> {
    type Target = ProcessEdgesBase<VM>;
    fn deref(&self) -> &Self::Target {
        &self.base
    }
}

impl<VM: VMBinding> DerefMut for MyGCProcessEdges<VM> {
    fn deref_mut(&mut self) -> &mut Self::Target {
        &mut self.base
    }
}
}

In the end, use MyGCProcessEdges as DefaultProcessEdges in the GCWorkContext:

#![allow(unused)]
fn main() {
pub struct MyGCWorkContext3<VM: VMBinding>(std::marker::PhantomData<VM>);
impl<VM: VMBinding> crate::scheduler::GCWorkContext for MyGCWorkContext3<VM> {
    type VM = VM;
    type PlanType = MyGC<VM>;
    type DefaultProcessEdges = MyGCProcessEdges<Self::VM>;
    type PinningProcessEdges = UnsupportedProcessEdges<Self::VM>;
}
}

Summary

You should now have MyGC working and able to collect garbage. All three benchmarks should be able to pass now.

If the benchmarks pass - good job! You have built a functional copying collector!

If you get particularly stuck, the code for the completed MyGC plan is available here.

Exercise: Adding another copyspace

Now that you have a working semispace collector, you should be familiar enough with the code to start writing some yourself. The intention of this exercise is to reinforce the information from the semispace section, rather than to create a useful new collector.

  1. Create a copy of your semispace collector, called triplespace.
  2. Add a new copyspace to the collector, called the youngspace, with the following traits:
    • New objects are allocated to the youngspace (rather than the fromspace).
    • During a collection, live objects in the youngspace are moved to the tospace.
    • Garbage is still collected at the same time for all spaces.

Triplespace is a sort of generational garbage collector. These collectors separate out old objects and new objects into separate spaces. Newly allocated objects should be scanned far more often than old objects, which minimises the time spent repeatedly re-scanning long-lived objects.

Of course, this means that the Triplespace is incredibly inefficient for a generational collector, because the older objects are still being scanned every collection. It wouldn't be very useful in a real-life scenario. The next thing to do is to make this collector into a more efficient proper generational collector.

When you are finished, try running the benchmarks and seeing how the performance of this collector compares to MyGC. Great work!

Triplespace backup instructions

This is one possible implementation of the Triplespace collector, provided in case you are stuck on the exercise.

Attempt the exercise yourself before reading this.

First, rename all instances of mygc to triplespace, and add it as a module by following the instructions in Create MyGC.

In triplespace/global.rs:

  1. Add a youngspace field to pub struct TripleSpace:

    #![allow(unused)]
    fn main() {
    pub struct TripleSpace<VM: VMBinding> {
       pub hi: AtomicBool,
       pub copyspace0: CopySpace<VM>,
       pub copyspace1: CopySpace<VM>,
       pub youngspace: CopySpace<VM>, // Add this!
       pub common: CommonPlan<VM>,
    }
    }
  2. Define the parameters for the youngspace in new() in Plan for TripleSpace:

    #![allow(unused)]
    fn main() {
    fn new(
       vm_map: &'static VMMap,
       mmapper: &'static Mmapper,
       options: Arc<UnsafeOptionsWrapper>,
       _scheduler: &'static MMTkScheduler<Self::VM>,
    ) -> Self {
       //change - again, completely changed.
       let mut heap = HeapMeta::new(HEAP_START, HEAP_END);
    
       TripleSpace {
           hi: AtomicBool::new(false),
           copyspace0: CopySpace::new(
               "copyspace0",
               false,
               true,
               VMRequest::discontiguous(),
               vm_map,
               mmapper,
               &mut heap,
           ),
           copyspace1: CopySpace::new(
               "copyspace1",
               true,
               true,
               VMRequest::discontiguous(),
               vm_map,
               mmapper,
               &mut heap,
           ),
    
           // Add this!
           youngspace: CopySpace::new(
               "youngspace",
               true,
               true,
               VMRequest::discontiguous(),
               vm_map,
               mmapper,
               &mut heap,
           ),
           common: CommonPlan::new(vm_map, mmapper, options, heap, &TRIPLESPACE_CONSTRAINTS, &[]),
       }
    }
    }
  3. Initialise the youngspace in gc_init():

    #![allow(unused)]
    fn main() {
     fn gc_init(
        &mut self,
        heap_size: usize,
        vm_map: &'static VMMap,
        scheduler: &Arc<MMTkScheduler<VM>>,
    ) {
        self.common.gc_init(heap_size, vm_map, scheduler);
        self.copyspace0.init(&vm_map);
        self.copyspace1.init(&vm_map);
        self.youngspace.init(&vm_map); // Add this!
    }
    }
  4. Prepare the youngspace (as a fromspace) in prepare():

    #![allow(unused)]
    fn main() {
    fn prepare(&self, tls: OpaquePointer) {
       self.common.prepare(tls, true);
       self.hi
           .store(!self.hi.load(Ordering::SeqCst), Ordering::SeqCst);
       let hi = self.hi.load(Ordering::SeqCst);
       self.copyspace0.prepare(hi);
       self.copyspace1.prepare(!hi);
       self.youngspace.prepare(true); // Add this!
    }
    }
  5. Release the youngspace in release():

    #![allow(unused)]
    fn main() {
    fn release(&self, tls: OpaquePointer) {
       self.common.release(tls, true);
       self.fromspace().release();
       self.youngspace().release(); // Add this!
    }
    }
  6. Under the reference functions tospace() and fromspace(), add a similar reference function youngspace():

    #![allow(unused)]
    fn main() {
    pub fn youngspace(&self) -> &CopySpace<VM> {
       &self.youngspace
    }
    }

In mutator.rs:

  1. Map a bump pointer to the youngspace (replacing the one mapped to the tospace) in space_mapping in create_triplespace_mutator():
    #![allow(unused)]
    fn main() {
    space_mapping: box vec![
        (AllocatorSelector::BumpPointer(0), plan.youngspace()), // Change this!
        (
            AllocatorSelector::BumpPointer(1),
            plan.common.get_immortal(),
        ),
        (AllocatorSelector::LargeObject(0), plan.common.get_los()),
    ],
    }
  2. Rebind the bump pointer to youngspace (rather than the tospace) in triplespace_mutator_release():
    #![allow(unused)]
    fn main() {
    pub fn triplespace_mutator_release<VM: VMBinding> (
        mutator: &mut Mutator<VM>,
        _tls: OpaquePointer
    ) {
        let bump_allocator = unsafe {
            mutator
                .allocators
                . get_allocator_mut(
                    mutator.config.allocator_mapping[AllocationType::Default]
                )
            }
            .downcast_mut::<BumpAllocator<VM>>()
            .unwrap();
            bump_allocator.rebind(Some(mutator.plan.youngspace())); // Change this!
    }
    }

In gc_work.rs:

  1. Add the youngspace to trace_object, following the same format as the tospace and fromspace:
    #![allow(unused)]
    fn main() {
        fn trace_object(&mut self, object: ObjectReference) -> ObjectReference {
            // Add this!
            else if self.plan().youngspace().in_space(object) {
                self.plan().youngspace.trace_object::<Self, TripleSpaceCopyContext<VM>>(
                    self,
                    object,
                    super::global::ALLOC_TripleSpace,
                    unsafe { self.worker().local::<TripleSpaceCopyContext<VM>>() },
                )
            }
    
            else if self.plan().tospace().in_space(object) {
                self.plan().tospace().trace_object::<Self, TripleSpaceCopyContext<VM>>(
                    self,
                    object,
                    super::global::ALLOC_TripleSpace,
                    unsafe { self.worker().local::<MyGCCopyContext<VM>>() },
                )
            } else if self.plan().fromspace().in_space(object) {
                self.plan().fromspace().trace_object::<Self, MyGCCopyContext<VM>>(
                    self,
                    object,
                    super::global::ALLOC_TripleSpace,
                    unsafe { self.worker().local::<TripleSpaceCopyContext<VM>>() },
                )
            } else {
                self.plan().common.trace_object::<Self, TripleSpaceCopyContext<VM>>(self, object)
            }
        }
    }
    }

Building a generational copying collector

Note: This part is work in progress.

What is a generational collector?

The weak generational hypothesis states that most of the objects allocated to a heap after one collection will die before the next collection. Therefore, it is worth separating out 'young' and 'old' objects and only scanning each as needed, to minimise the number of times old live objects are scanned. New objects are allocated to a 'nursery', and after one collection they move to the 'mature' space. In triplespace, youngspace is a proto-nursery, and the tospace and fromspace are the mature spaces.

This collector fixes one of the major problems with semispace - namely, that any long-lived objects are repeatedly copied back and forth. By separating these objects into a separate 'mature' space, the number of full heap collections needed is greatly reduced.

This section is currently incomplete. Instructions for building a generational copying (gencopy) collector will be added in future.

Further Reading

Porting Guide

Note: This guide is work in progress.

This guide is designed to get you started on porting MMTk to a new runtime. We start with an overview of the MMTk approach to porting and then step through recommended strategies for implementing a port.

There’s no fixed way to implement a new port. What we outline here is a distillation of best practices that have emerged from community as it has worked through many ports (each at various levels of maturity).

Overview of MMTk’s Approach to Portability

MMTk is designed from the outset to be both high performance and portable. The core of MMTk is entirely runtime-neutral, and is written in Rust. Runtimes that wish to use MMTk may be written in any language so long as they have a means to call into MMTk’s API, which presents itself as a shared library.

MMTk uses the concept of bindings to create high performance impedance matching between runtimes and MMTk.

MMTk’s approach to portability follows these principles:

  1. The MMTk core must remain entirely runtime-agnostic and free of any runtime-specific code.
  2. The runtime’s code base should be entirely garbage-collector agnostic and free of any MMTk-specific code.
  3. The semantics of all MMTk functionality is strictly defined within the MMTk core.

Those principles have the following important implications:

  • Each port of a runtime is supported by a binding that has two components: one which is a logical extension of the runtime, written in the same language as the runtime, but which is MMTk-specific, and one which is a logical extension of MMTk, written in Rust, but which is runtime-specific (see diagram below).
  • A fully-correct but non-performant port will simply implement calls from the runtime to MMTk (to allocate an object, for example), and from MMTk to the runtime (to enumerate pointers, for example).
  • A performant port will likely replicate and lift MMTk functionality into the runtime portion of the port, and conversely replicate runtime functionality in Rust for performant access by MMTk.

A diagram with four boxes, left to right: OpenJDK, MMTk-specific mutator code, OpenJDK-specific MMTk code, MMTk

The diagram above illustrates a port of MMTk to OpenJDK with the binding in the center. The code coloured brown is logically part of MMTk and is written in Rust. The code coloured white is logically part of OpenJDK and is written in C++. The rightmost box is entirely free of any OpenJDK-specific code. The leftmost box should be entirely free of any MMTk-specific code.

Note: we do currently maintain a fork of OpenJDK which includes some necessary changes to their code base, but this is not MMTk-specific and ideally this will be upstreamed. Our port to V8 is a cleaner example, where we’ve managed to work closely with the V8 team to upstream all of the refactoring of the V8 code base that was necessary for it to support a third party heap.

We structure the code into three repos. Taking the example of the OpenJDK port, the three repos are: the MMTk core, the binding repo containing both parts of the binding, and the OpenJDK repo, which is currently a fork we maintain.

Things to Consider Before Starting a Port

In principle, a port to MMTk is not particularly difficult. MMTk can present itself as a standard library and the core of the API is relatively simple.

However, porting a runtime to a different GC (any GC) can be difficult and time consuming. Key questions include:

  • How well encapsulated is the runtime's existing collector?
  • Does the runtime make tacit assumptions about the underlying collector's implementation?
  • How many places in the runtime codebase reference some part of the GC?
  • If the runtime has a JIT, how good is the interface between the JIT and the GC (for write barriers and allocations, for example)?
  • Does the runtime support precise stack scanning?
  • etc.

Thinking through these questions should give you a sense for how big a task a GC port will be.

How to Undertake a Port

We recommend a highly incremental approach to implementing a port. The broad idea is:

  • Start with the NoGC plan and gradually move to more advanced collectors
  • Focus on simplicity and correctness.
  • Optimize the port later.

In MMTk’s language, a plan is essentially a configuration which specifies a GC algorithm. Plans can be selected at run time. Not all plans will be suitable for all runtimes. For example, a runtime that for some reason cannot support object movement won’t be able to use plans that use copying garbage collection.

Starting a Port: NoGC

We always start a port with NoGC. It is the simplest possible plan: it simply allocates memory and never collects. Although this appears trivial, depending on the complexity of the runtime and how well factored (or not) its internal GC interfaces are, just getting this working may be a major undertaking. In the case of V8, the refactoring within V8 required to get a simple NoGC plan working was substantial, touching over 100 files. So it’s a good idea not to underestimate the difficulty of a NoGC port!

At a high level, in order to implement NoGC, we need to handle MMTk initialization, mutator initialization, and memory allocation.

If you're ever stuck at any point, feel free to send a message in the #Porting channel of our Zulip!

Set up

You want to set up the binding repository/directory structure before starting the port. For the sake of the tutorial guide we assume you have a directory structure similar to the one below. Note that such a directory structure is not a requirement1 but a recommendation. We assume you are using some form of version control system (such as git or mercurial) in this guide.

1

In fact some bindings may not be able to have such a directory structure due to the build tools used by the runtime.

  • mmtk-X/mmtk: The MMTk side of the binding. This includes the implementation of the VMBinding trait, and any necessary Rust code to integrate MMTk with the VM code (e.g. exposing MMTk functions to native, allowing up-calls from the MMTk binding to the runtime, etc). To start with, you can take a look at one of our officially maintained language bindings as an example: OpenJDK, JikesRVM, V8, Julia, V8.
  • mmtk-X/X: Runtime-specific code for integrating with MMTk. This should act as a bridge between the generic GC interface offered by the runtime and the MMTk side of the binding. This is implemented in the runtime's implementation language. Often this will be one of C or C++.
  • You can place your runtime repository at any path. For the sake of this guide, we assume you will place the runtime repo as a sibling of the binding repo. You can also clone mmtk-core to a local path. Using a local repo of mmtk-core can be beneficial to your development in case you need to make certain changes to the core (though this is unlikely).

Your working directory may look like this (assuming your runtime is named as X):

Your working directory/
├─ mmtk-X/
│  ├─ X/
│  └─ mmtk/
├─ X/
└─ mmtk-core/ (optional)

You may also find it helpful to take inspiration from the OpenJDK binding, particularly for a more complete example of the relevant Cargo.toml files.

For this guide, we will assume your runtime is implemented in C or C++ as they are the most common implementation languages. However note that your runtime does not need to be implemented in C/C++ to work with MMTk.

Adding a Rust library to the runtime

We recommend learning the ins and outs of your runtime's build system. You should try and add a simple Rust "hello world" library to your runtime's code and build system to investigate how easy it will be to add MMTk. Unfortunately this step is highly dependent on the runtime build system. We recommend taking a look at what other bindings do, but keep in mind that no two runtime build systems are the same even if they are using the same build tools.

In case the build system is too complex and you want get to hacking, a quick and dirty way to add MMTk could be to build a static and/or dynamic binary for MMTk and link it to the runtime directly, manually building new binaries as necessary, like so:

  1. cd mmtk-X/mmtk
  2. cargo build to build in debug mode or add --release for release mode
  3. Copy the shared or static2 library from target/debug or target/release to your desired location
2

You would have to change the crate-type in mmtk-X/mmtk/Cargo.toml from cdylib to staticlib to build a static library.

Later, you can edit the runtime build process to build MMTk at the same time automatically.

Note: If the runtime you are targeting already links some Rust FFI libraries, then you may notice "multiple definition" linker errors for Rust stdlib functions. Unfortunately this is a current limitation of Rust FFI wherein all symbols are bundled together in the final C lib which will cause multiple definitions errors when two or more Rust FFI libraries are linked together. There is ongoing work to stabilize the Rust package format that would hopefully make it easier in the future. A current workaround would be to use the -Wl,--allow-multiple-definition linker flag, but this unfortunately isn't ideal as it increases code sizes. See here and here for more details.

Note: It is highly recommended to also check-in the generated Cargo.lock file into your version control. This improves the reproducibility of the build and ensures the same package versions are used when building in the future in order to prevent random breakages.

We recommend using the debug build when doing development work as it has helpful logging statements and assertions that will make catching bugs in your implementation easier.

The VMBinding trait

Now let's actually start implementing the binding. Here we take a look at the Rust side of the binding first (i.e. mmtk-X/mmtk). What we want to do is implement the VMBinding trait.

The VMBinding trait is a "meta-trait" (i.e. a trait that encapsulates other traits) that we expect every binding to implement. In essence, it is the contract established between MMTk and the runtime. We discuss each of its seven key traits briefly:

  1. ActivePlan: This trait implements functions related to mutators such as how many mutators exist, getting an iterator for all mutators, etc.
  2. Collection: This trait implements functions related to garbage collection such as starting and stopping mutators, blocking current mutator thread for GC, etc.
  3. ObjectModel: This trait implements the runtime's object model. The object model includes object metadata such as mark-bits, forwarding-bits, etc.; constants regarding assumptions about object addresses; and functions to implement copying objects, querying object sizes, etc. You should carefully implement and understand this as it is a key trait on which many things depend. We will go into more detail about this trait in the object model section.
  4. ReferenceGlue: This trait implements runtime-specific finalization and weak reference processing methods. Note that each runtime has its own way of dealing with finalization and reference processing, so this is often one of the trickiest traits to implement.
  5. Scanning: This trait implements object scanning functions such as scanning mutator threads for root pointers, scanning a particular object for reference fields, etc.
  6. Edge: This trait implements what an edge in the object graph looks like in the runtime. This is useful as it can abstract over compressed or tagged pointers. If an edge in your runtime is indistinguishable from an arbitrary address, you may set it to the Address type.
  7. MemorySlice: This trait implements functions related to memory slices such as arrays. This is mainly used by generational collectors.

For the time-being we can implement all the above traits via unimplemented!() stubs. If you are using the Dummy VM binding as a starting point, you will have to edit some of the concrete implementations to unimplemented!(). Note that you should change the type that implements VMBinding from DummyVM to an appropriately named type for your runtime. For example, the OpenJDK binding defines the zero-struct OpenJDK which implements the VMBinding trait.

Object model

The ObjectModel trait is a fundamental trait describing the layout of an object to MMTk. This is important as MMTk's core doesn't know of how objects look like internally as each runtime will be different. There are certain key aspects you need to be aware of while implementing the ObjectModel trait. We discuss them in this section.

Header vs Side metadata

Per-object metadata can live in one of two places: in the object header or in a separate space used just for metadata. Each one has its pros and cons.

Header metadata sits in close proximity to the actual object address but it is not easy to perform bulk operations. On the other hand, side metadata sits in a dedicated metadata space where each possible object address is assigned some metadata. This makes performing bulk operations easy and does not require stealing bits from the object header (there may in fact be no bits to steal for certain runtimes), but can result in large heap sizes given the metadata space is counted as part of the heap.

The choice of metadata location depends on the runtime and its object model and header layout. For example the JikesRVM runtime reserved extra space at the start of each object for GC-related metadata. Such space may not be available in your runtime. In such cases you can use side metadata to reserve per-object metadata.

Local vs Global metadata

MMTk uses multiple GC policies and each policy may use a different set of object metadata from each other. A moving policy, for example, may require extra metadata (in comparison to a non-moving policy) to store the forwarding bits and forwarding pointer. Such a metadata, which is local to a policy, is referred to as "local" metadata.

However, in certain cases, we may need to have metadata globally for the entire heap space. The classic example is the valid-object bit metadata which tells us if an arbitrary address is allocated/managed by MMTk. Such a metadata, which spans multiple policies, is referred to as "global" metadata.

For example, the Forwarding bits and pointer metadata is a local metadata used by copying policies to store forwarding bits (2-bits) and forwarding pointers (word size). Often runtimes require word-aligned addresses which means we can use the last two bits in the object header (due to alignment) and the entire object header to store the forwarding bits and pointer respectively. This metadata is almost always in the header.

We recommend going through the list of metadata specifications that are defined by MMTk. You should set them to locations that are appropriate for your runtime.

ObjectReference vs Address

A key principle in MMTk is the distinction between ObjectReference and Address. The idea is that very few operations are allowed on an ObjectReference. For example, MMTk does not allow address arithmetic on ObjectReferences. This allows us to preserve memory-safety, only performing unsafe operations when required, and gives us a cleaner and more flexible abstraction to work with as it can allow object handles or offsets etc. Address, on the other hand, represents an arbitrary machine address.

You might be interested in reading the Demystifying Magic: High-level Low-level Programming paper3 which describes the above in more detail.

3

https://users.cecs.anu.edu.au/~steveb/pubs/papers/vmmagic-vee-2009.pdf

Miscellaneous configuration options

There are many constants in the ObjectModel trait that can be overridden in your binding in order to meet your runtime's requirements. For example, the OBJECT_REF_OFFSET_LOWER_BOUND constant which defines the minimum offset from allocation result start (i.e. the address that MMTk will return to the runtime) and the actual start of the object, i.e. the ObjectReference. In other words, the constant represents the minimum offset from the allocation result start such that the following invariant always holds:

OBJECT_REFERENCE >= ALLOCATION_RESULT_START + OFFSET

We recommend going through the list of constants in the documentation and seeing if the default values suit your runtime's semantics, changing them if required.

MMTk initialization

Now that we have most of the boilerplate set up, the next step is to initialize MMTk so that we can start allocating objects.

Runtime-side changes

Create a mmtk.h header file in the runtime folder of the binding (i.e. mmtk-X/X) which exposes the functions required to implement NoGC and #include it in the relevant runtime code. You can use the example mmtk.h header file as an example.

Note: It is convention to prefix all MMTk API functions exposed with mmtk_ in order to avoid name clashes. It is highly recommended that you follow this convention.

Having a clean heap API for MMTk to implement makes life easier. Some runtimes may already have a sufficiently clean abstraction such as OpenJDK after the merging of JEP 304. In (most) other cases, the runtime doesn't provide a clean enough heap API for MMTk to implement. In such cases, it is recommended to create a class (or equivalent) that abstracts allocation and other heap functions like what the V8 and ART bindings do. This allows making minimal changes to the actual runtime and having a concrete implementation of the exposed heap API in the binding, reducing MMTk-specific code in the runtime. Ideally these changes are upstreamed like in the case of V8.

It is also recommended that any change you do in the runtime be guarded by build-time flags as it helps in maintaining a clean port.

At this step, your mmtk.h file may look something like this:

#ifndef MMTK_H
#define MMTK_H

#include <stddef.h>
#include <sys/types.h>

// The extern "C" is only required if the runtime
// implementation language is C++
extern "C" {

// An arbitrary address
typedef void* Address;
// MmtkMutator should be an opaque pointer for the VM
typedef void* MmtkMutator;
// An opaque pointer to a VMThread
typedef void* VMThread;

/**
 * Initialize MMTk instance
 */
void mmtk_init();

/**
 * Set the heap size
 *
 * @param min minimum heap size
 * @param max maximum heap size
 */
void mmtk_set_heap_size(size_t min, size_t max);

} // extern "C"

#endif // MMTK_H

Now we can initialize MMTk in the runtime. Note that MMTk should ideally be initialized around when the default heap of the runtime is initialized. You will have to figure out where is the best location to initialize MMTk in your runtime.

Initializing MMTk requires two steps. First, we set the heap size by calling mmtk_set_heap_size with the initial heap size and the maximum heap size. Then, we initialize MMTk by calling mmtk_init. In the future, you may wish to make the heap size configurable via a command line argument or environment variable (See setting options for MMTk).

MMTk-side changes

On the Rust side of the binding, we want to implement the two functions exposed by the mmtk.h file above. We use an MMTKBuilder instance to actually create our concrete MMTK instance. We recommend following the paradigm used by all our bindings wherein we have a static single MMTK instance and an MMTKBuilder instance that we can use to set relevant options. See the OpenJDK binding for an example.

Note: MMTk currently assumes that there is only one MMTK instance in your runtime process. Multiple MMTK instances are currently not supported.

The mmtk_set_heap_size function is fairly straightforward. We recommend using the implementation in the OpenJDK binding. The mmtk_init function is straightforward as well. It should simply manually initialize the MMTK static variable using lazy_static, like here in the OpenJDK binding.

By this point, you should have MMTk initialized. If you are using a debug build (which is recommended) and have logging turned on a message similar to below would be printed out:

[...]
[INFO  mmtk::memory_manager] Initialized MMTk with NoGC (FixedHeapSize(10485760))
[...]

Binding mutator threads to MMTk

For MMTk to allocate objects, it needs to be aware of mutator threads. MMTk only allows mutator threads to allocate objects. We do this by "binding" a mutator thread to MMTk when it is initialized in the runtime.

Runtime-side changes

Add the following function to the mmtk.h file:

[...]

/**
 * Bind a mutator thread in MMTk
 *
 * @param tls pointer to mutator thread
 * @return an instance of an MMTk mutator
 */
MmtkMutator mmtk_bind_mutator(VMThread tls);

[...]

The mmtk_bind_mutator function takes in an opaque pointer representing an instance of the runtime's mutator thread and returns an opaque pointer to a Mutator instance back to the runtime. The runtime must store this pointer somewhere, preferably in its runtime thread local storage implementation, as MMTk requires a Mutator instance to allocate and perform other actions.

The placement of the mmtk_bind_mutator call in the runtime depends on the runtime's implementation of its thread system. It is recommended to call mmtk_bind_mutator when the runtime initializes the thread local storage of a newly created thread. This ensures that the thread can allocate from MMTk immediately after initialization.

MMTk-side changes

The Rust side of the binding should simply defer the actual implementation to mmtk::memory_manager::bind_mutator. See the OpenJDK binding for an example.

Allocation

Now we can finally implement the allocation functions.

Runtime-side changes

Add the following two functions to the mmtk.h file:

[...]

/**
 * Allocate an object
 *
 * @param mutator the mutator instance that is requesting the allocation
 * @param size the size of the requested object
 * @param align the alignment requirement for the object
 * @param offset the allocation offset for the object
 * @param allocator the allocation semantics to use for the allocation
 * @return the address of the newly allocated object
 */
void *mmtk_alloc(MmtkMutator mutator, size_t size, size_t align,
        ssize_t offset, int allocator);

/**
 * Set relevant object metadata
 *
 * @param mutator the mutator instance that is requesting the allocation
 * @param object the returned address of the allocated object
 * @param size the size of the allocated object
 * @param allocator the allocation semantics to use for the allocation
 */
void mmtk_post_alloc(MmtkMutator mutator, void* object, size_t size, int allocator);

[...]

In order to perform allocations, you will need to know what object alignment the runtime expects. Runtimes often align allocations at word boundaries (i.e. 4- or 8-bytes) as it allows the CPU to access the data faster at execution time. Additionally, the runtime may use the unused lowest order bits to store flags (e.g. type information), so it is important that MMTk respects these expectations. Once you have figured out the alignment requirements for your runtime, you should update the MIN_ALIGNMENT constant in VMBinding to the correct value.

Now that MMTk is aware of each mutator thread, you have to change the runtime's allocation functions to call into MMTk to allocate using mmtk_alloc and set object metadata using mmtk_post_alloc. Note that there may be multiple allocation functions in the runtime so make sure that you edit them all!

You should use the saved Mutator pointer as the first parameter, the requested object size as the next parameter, and any alignment requirements the runtimes has as the third parameter.

If your runtime requires a non-zero allocation offset (i.e. the alignment requirements are for the offset address, not the returned address) then you have to provide the required value as the fourth parameter. Note that you must also update the USE_ALLOCATION_OFFSET constant in the VMBinding implementation if your runtime requires a non-zero allocation offset.

For the time-being, you can ignore the allocator parameter in both these functions and always pass a value of 0 which means MMTk will pick the default allocator for your collector (a bump pointer allocator in the case of NoGC).

Finally, you need to call mmtk_post_alloc with the object address returned from the previous mmtk_alloc call in order to initialize object metadata.

Note: Currently MMTk assumes object sizes are multiples of the MIN_ALIGNMENT. If you encounter errors with alignment, a simple workaround would be to align the requested object size up to the MIN_ALIGNMENT. See here for the tracking issue to fix this bug.

MMTk-side changes

The Rust side of the binding should simply defer the actual implementation to mmtk::memory_manager::alloc and mmtk::memory_manager::post_alloc respectively. See the OpenJDK binding for an example.

Congratulations! At this point, you hopefully have object allocation working and can run simple programs with your runtime using MMTk!

Miscellaneous implementation steps

Setting options for MMTk

The preferred method of setting options for MMTk is by setting them via the MMTKBuilder instance. See here for an example in the OpenJDK binding.

The process function can also be used to pass options. You may want to set multiple options at the same time. In such a case you can use the process_bulk function.

MMTk also supports setting options via environment variables. This is generally only recommended at early stages of the porting process in order for quick development. For example, to use the NoGC plan, you can set the environment variable MMTK_PLAN=NoGC.

A full list of available options that you can set can be found here.

Runtime-specific steps

Often it is the case that the above changes are not enough to allow a runtime to work with MMTk. For example, for the ART binding, the runtime required that all inflated locks be deflated prior to writing the boot image. In order to fix this, we had to implement a heap visitor that visited each allocated object and checked if it had inflated locks, deflating them if they were.

Unfortunately there is no real magic bullet here. If you come across a runtime-specific idiosyncrasy (and you almost certainly will), you will have to understand what the underlying bug is and either fix or work around it.

If you have any confusions or questions, please free to reach us on our Zulip! We would be glad to help.

Next Steps

Your choice of the next GC plan to implement depends on your situation. If you’re developing a new VM from scratch, or if you are intimately familiar with the internals of your target VM, then implementing a SemiSpace collector is probably the best course of action. Although the GC itself is rather simplistic, it stresses many of the key components of the MMTk <-> VM binding that will be required for later (and more powerful) GCs. In particular, since it always moves objects, it is an excellent stress test.

An alternative route is to implement MarkSweep. This may be necessary in scenarios where the target VM doesn’t support object movement, or would require significant refactoring to do so. This can then serve as a stepping stone for future, moving GCs such as SemiSpace.

We hope to have an Immix implementation available soon, which provides a nice middle ground between moving and non-moving (since it copies opportunistically, and can cope with a strictly non-moving requirement if needs be).

Debugging Tips

In this section, we discuss debugging tips that will be useful to debug MMTk in a binding implementation. We will focus on discussing strategies and tools in the specific context of debugging MMTk.

We aim to enhance the debugging capabilities for MMTk. However, as of October 2023, our debugging tools remain limited. We welcome your feedback and suggestions. If there's a specific debugging tool you'd like to see or have recommendations, please reach out to us on Zulip or submit your thoughts via issues.

Enabling debug assertions

MMTk is implemented with an extensive amount of assertions to ensure the correctness. We strongly recommend using a debug build of MMTk that includes all the debugging assertions when one is developing on a MMTk binding. The assertions are normal Rust debug_assert!, and they can be turned on in a release build with Rust flags (https://doc.rust-lang.org/cargo/reference/profiles.html#debug-assertions).

Extreme debugging assertions

In addition to the normal debugging assertions, MMTk also has a set of optional runtime checks that can be turned on by enabling the feature extreme_assertions. These usually include checks that are too expensive (even in a debug build) that we do not want to enable by default.

You should make sure your MMTk binding can pass all the assertions (including extreme_assertions).

Performance Tuning for Bindings

In this section, we discuss how to achieve the best performance with MMTk in a binding implementation. MMTk is a high performance GC library. But there are some key points that need to be done correctly to achieve the optimal performance.

Enabling Link Time Optimization (LTO) with MMTk

MMTk's API is designed with an assumption that LTO will be enabled for a performant build. It is essential to allow the Rust compiler to optimize across the crate boundary between the binding crate and mmtk-core. LTO allows inlining for both directions (from mmtk-core to the binding, and from the binding to mmtk-core), and allows further optimization such as specializing and constant folding for the VMBinding trait.

We suggest enabling LTO for the release build in the binding's manifest (Cargo.toml) by adding a profile for the release build, so LTO is always enabled for a release build.

[profile.release]
lto = true

If your binding project is a Rust binary (e.g. the VM is written in Rust), this should be enough. However, if your binding project is a library, there are some limitations with cargo that you should be aware of.

Binding as a library

Cargo only allows LTO for certain crate types. You will need to specify the crate type properly, otherwise cargo may skip LTO without any warning or error.

[lib]
...
# be careful - LTO is only allowed for certain crate types
crate-type = ["cdylib"]

At the time of writing, cargo has some limitations about LTO with different crate types:

  1. LTO is only allowed with cdylib and staticlib (other than bin). Check the code of can_lto for your Rust version to clarify.
  2. If the crate-type field includes any type that LTO is not allowed, LTO will be skipped for all the libraries generated (https://github.com/rust-lang/rust/issues/51009). For example, if you have crate-type = ["cdylib", "rlib"] and cargo cannot do LTO for rlib, LTO will be skipped for cdylib as well. So only keep the crate type that you actually need in the crate-type field.

Optimizing Allocation

MMTk provides alloc() and post_alloc(), to allocate a piece of memory, and finalize the memory as an object. Calling them is sufficient for a functional implementation, and we recommend doing so in the early development of an MMTk integration. However, as allocation is performance critical, runtimes generally would want to optimize allocation to make it as fast as possible, in which invoking alloc() and post_alloc() becomes inadequate.

The following discusses a few design decisions and optimizations related to allocation. The discussion mainly focuses on alloc(). post_alloc() works in a similar way, and the discussion can also be applied to post_alloc(). For concrete examples, you can refer to any of our supported bindings, and check the implementation in the bindings.

Note: Some of the optimizations need to make assumptions about MMTk's internal implementation and may make the code less maintainable. We recommend adding assertions in the binding code to make sure the assumptions are not broken across versions.

Efficient access to MMTk mutators

An MMTk mutator context (created by bind_mutator()) is a thread-local data structure of type Mutator. MMTk expects the binding to provide efficient access to the mutator structure in their thread-local storage (TLS). Usually one of the following approaches is used to store MMTk mutators.

Option 1: Storing the pointer

The Box<Mutator<VM>> returned from mmtk::memory_manager::bind_mutator is actually a pointer to a Mutator<VM> instance allocated in the Rust heap. It is simple to store it in the TLS. This approach does not make any assumption about the internals of an MMTk Mutator. However, it requires an extra pointer dereference when accessing a value in the mutator. This may sound not too bad, however, this degrades the performance of a carefully implemented inlined fast-path allocation sequence which is normally just a few (assembly) instructions. This approach could be a simple start in early development, but we do not recommend it for an efficient implementation.

If the VM is not implemented in Rust, the binding needs to turn the boxed pointer into a raw pointer before storing it.

#![allow(unused)]
fn main() {
                struct MutatorInTLS {
                    // Store the mutator as a boxed pointer.
                    // Accessing any value in the mutator will need a dereferencing of the boxed pointer.
                    ptr: Box<Mutator<MockVM>>,
                }

                // Bind an MMTk mutator
                let mutator = memory_manager::bind_mutator(fixture.get_mmtk(), tls_opaque_pointer);
                // Store the pointer in TLS
                let mut storage = MutatorInTLS { ptr: mutator };

                // Allocate
                let addr =
                    memory_manager::alloc(&mut storage.ptr, 8, 8, 0, AllocationSemantics::Default);
}

Option 2: Embed the Mutator struct

To remove the extra pointer dereference, the binding can embed the Mutator type into their TLS type. This saves the extra dereference.

If the implementation language is not Rust, the developer needs to create a type that has the same layout as Mutator. It is recommended to have an assertion to ensure that the native type has the exact same layout as the Rust type Mutator.

#![allow(unused)]
fn main() {
                struct MutatorInTLS {
                    embed: Mutator<MockVM>,
                }

                // Bind an MMTk mutator
                let mutator = memory_manager::bind_mutator(fixture.get_mmtk(), tls_opaque_pointer);
                // Store the struct (or use memcpy for non-Rust code)
                let mut storage = MutatorInTLS { embed: *mutator };
                // Allocate
                let addr = memory_manager::alloc(
                    &mut storage.embed,
                    8,
                    8,
                    0,
                    AllocationSemantics::Default,
                );
}

Option 3: Embed the fast-path struct

The size of Mutator is a few hundreds of bytes, which could be considered too large to store in the TLS in some language implementations. Embedding Mutator also requires to duplicate a native type for the Mutator struct if the implementation language is not Rust. Sometimes it is undesirable to embed the Mutator type. One can choose to only embed the fast-path struct that is in use.

Unlike the Mutator type, the fast-path struct has a C-compatible layout, and it is simple and primitive enough so it is unlikely to change. For example, MMTk provides BumpPointer, which simply includes a cursor and a limit.

In the following example, we embed one BumpPointer struct in the TLS. The BumpPointer is used in the fast-path, and carefully synchronized with the allocator in the Mutator struct in the slow-path. We also need to revoke (i.e. reset) all the cached BumpPointer values for all mutators if a GC occurs. Currently, we recommend implementing this in the resume_mutators API call, however there is work in progress that would make it an explicit API call instead.

Note that the allocate_default closure in the example below assumes the allocation semantics is AllocationSemantics::Default and its selected allocator uses bump-pointer allocation. Real-world fast-path implementations for high-performance VMs are usually JIT-compiled, inlined, and specialized for the current plan and allocation site. Hence, the allocation semantics of the concrete allocation site (and therefore the selected allocator) is known to the JIT compiler.

For the sake of simplicity, we only store one BumpPointer in the TLS in the example. In MMTk, each plan has multiple allocators, and the allocation semantics are mapped to those allocator by the GC plan you choose. So a plan uses multiple allocators, and depending on how many allocation semantics are used by a binding, the binding may use multiple allocators as well. In practice, a binding may embed multiple fast-path structs for all the allocators they use if they would like more efficient allocation.

Also for simplicity, the example assumes the default allocator for the plan in use is a bump pointer allocator. Many plans in MMTk use bump pointer allocator for their default allocation semantics (AllocationSemantics::Default), which includes (but not limited to) NoGC, SemiSpace, Immix, generational plans, etc. If a plan does not do bump-pointer allocation, we may still implement fast-paths, but we need to embed different data structures instead of BumpPointer.

#![allow(unused)]
fn main() {
                use crate::util::alloc::BumpPointer;
                struct MutatorInTLS {
                    default_bump_pointer: BumpPointer,
                    mutator: Box<Mutator<MockVM>>,
                }

                // Bind an MMTk mutator
                let mutator = memory_manager::bind_mutator(fixture.get_mmtk(), tls_opaque_pointer);
                // Create a fastpath BumpPointer with default(). The BumpPointer from default() will guarantee to fail on the first allocation
                // so the allocation goes to the slowpath and we will get an allocation buffer from MMTk.
                let default_bump_pointer = BumpPointer::default();
                // Store the fastpath BumpPointer along with the mutator
                let mut storage = MutatorInTLS {
                    default_bump_pointer,
                    mutator,
                };

                // Allocate
                let mut allocate_default = |size: usize| -> Address {
                    // Alignment code is omitted here to make the code simpler to read.
                    // In an actual implementation, alignment and offset need to be considered by the bindings.
                    let new_cursor = storage.default_bump_pointer.cursor + size;
                    if new_cursor < storage.default_bump_pointer.limit {
                        let addr = storage.default_bump_pointer.cursor;
                        storage.default_bump_pointer.cursor = new_cursor;
                        addr
                    } else {
                        use crate::util::alloc::Allocator;
                        let selector = memory_manager::get_allocator_mapping(
                            fixture.get_mmtk(),
                            AllocationSemantics::Default,
                        );
                        let default_allocator = unsafe {
                            storage
                                .mutator
                                .allocator_impl_mut::<crate::util::alloc::BumpAllocator<MockVM>>(
                                    selector,
                                )
                        };
                        // Copy bump pointer values to the allocator in the mutator
                        default_allocator.bump_pointer = storage.default_bump_pointer;
                        // Do slow path allocation with MMTk
                        let addr = default_allocator.alloc_slow(size, 8, 0);
                        // Copy bump pointer values to the fastpath BumpPointer so we will have an allocation buffer.
                        storage.default_bump_pointer = default_allocator.bump_pointer;
                        addr
                    }
                };

                // Allocate: this will fail in the fastpath, and will get an allocation buffer from the slowpath
                let addr1 = allocate_default(8);
                // Allocate: this will allocate from the fastpath
                let addr2 = allocate_default(8);
}

And pseudo-code for how you would reset the BumpPointers for all mutators in resume_mutators. Note that these mutators are the runtime's actual mutator threads (i.e. where the cached bump pointers are stored) and are different from MMTk's Mutator struct.

#![allow(unused)]
fn main() {
impl Collection<RtName> for RtNameCollection {
  ...
  fn resume_mutators(tls: VMWorkerThread) {
    // Reset the cached bump pointers of each mutator (setting both cursor and limit to 0) after a GC to
    // ensure that the VM sees a cohesive state
    for mutator in mutators {
      mutator.storage.default_bump_pointer = BumpPointer::default();
    }
    // Actually resume all the mutators now
    ...
  }
  ...
}
}

Avoid resolving the allocator at run time

For a simple and general API of alloc(), MMTk requires AllocationSemantics as an argument in an allocation request, and resolves it at run-time. The following is roughly what alloc() does internally.

  1. Resolving the allocator
    1. Find the Allocator for the required AllocationSemantics. It is defined by the plan in use.
    2. Dynamically dispatch the call to Allocator::alloc().
  2. Allocator::alloc() executes the allocation fast-path.
  3. If the fast-path fails, it executes the allocation slow-path Allocator::alloc_slow().
  4. The slow-path will further attempt to allocate memory, and may trigger a GC.

Resolving to a specific allocator and doing dynamic dispatch is expensive for an allocation. With the build-time or JIT-time knowledge about the object that will be allocated, an MMTk binding can possibly skip the first step in the run time.

If you implement an efficient fast-path allocation in the binding side (like the Option 3 above, and generating allocation code in a JIT), that naturally avoids this problem. If you do not want to implement the fast-path allocation, the following is another example of how to avoid resolving the allocator.

Once MMTk is initialized, a binding can get the memory offset for the default allocator, and save it somewhere. When we know an object should be allocated with the default allocation semantics, we can use the offset to get a reference to the actual allocator (with unsafe code), and allocate with the allocator.

#![allow(unused)]
fn main() {
            // At boot time
            let selector = memory_manager::get_allocator_mapping(
                fixture.get_mmtk(),
                AllocationSemantics::Default,
            );
            let default_allocator_offset =
                crate::plan::Mutator::<MockVM>::get_allocator_base_offset(selector);
            let mutator = memory_manager::bind_mutator(fixture.get_mmtk(), tls_opaque_pointer);

            // At run time: allocate with the default semantics without resolving allocator
            let default_allocator: &mut BumpAllocator<MockVM> = {
                let mutator_addr = Address::from_ref(&*mutator);
                unsafe {
                    (mutator_addr + default_allocator_offset).as_mut_ref::<BumpAllocator<MockVM>>()
                }
            };
            let addr = default_allocator.alloc(8, 8, 0);
}

Emitting Allocation Sequence in a JIT Compiler

If the language has a JIT compiler, it is generally desirable to generate the code sequence for the allocation fast-path, rather than simply emitting a call instruction to the allocation function. The optimizations we talked above are relevant as well: (i) the compiler needs to be able to access the mutator, and (ii) the compiler needs to be able to resolve to a specific allocator at JIT time. The actual implementation highly depends on the compiler implementation.

The following are some examples from our bindings (at the time of writing):

Contributors

This is a list of the contributors who helped writing and improving this document. We deeply appreciate your commitment to making this work the best it can be. Thank you for your contribution.