Luís Pina

Archive | RSS

Usages of sun.misc.Unsafe within Rubah

01/Dec/2014

Rubah is a Dynamic Software Updating system for Java that works on the stock Oracle HotSpot JVM, does not add any measurable overhead when running a program, and performs dynamic software updates efficiently.

In this post, I explain how Rubah uses the low-level unsafe operations available in class sun.misc.Unsafe (which I shall refer to as the unsafe API) to implement the optimizations that make it so efficient. I start by explaining what the unsafe API is and how to use it, then I describe the object memory layout of the Oracle HotSpot JVM and how to explore it with the unsafe API, then I discuss how Rubah does that to improve its performance, and finally I propose how to make some of the unsafe operations safer in a way that is compatible with Rubah.

What is sun.misc.Unsafe?

The class sun.misc.Unsafe is a proprietary API that enables a Java program to escape the control of the JVM and perform potentially unsafe operations, like direct memory manipulation. Here is a list of interesting methods that this API has (more documentation available here):


public class Unsafe {
	// use reflection instead, this method throws an exception if the calling
	// class is not on the bootstrap classpath
	public static Unsafe getUnsafe();

	// static field/Object field/array manipulation utilities
	public Object staticFieldBase    (Field f);
	public long   staticFieldOffset  (Field f);
	public long   objectFieldOffset  (Field f);
	public int    arrayBaseOffset    (Class c);
	public int    arrayIndexScale    (Class c);

	// reference-type field manipulation
	public Object getObject (Object o, long offset);
	public void   putObject (Object o, long offset, Object x);

	// int-type field manipulation (repeated for every other primitive type)
	public int    getInt    (Object o, long offset);
	public void   putInt    (Object o, long offset, int x);

	// compare-and-swap object/int/long
	public boolean compareAndSwapObject (Object o, long offset, Object expected, Object x);
	public boolean compareAndSwapInt    (Object o, long offset, int expected,    int x);
	public boolean compareAndSwapLong   (Object o, long offset, long expected,   long x);

	// allocate instance without running constructors
	public Object allocateInstance (Class cls);

	// try entering a monitor, fail with false instead of blocking if not able to
	public boolean tryMonitorEnter (Object o);

}

For instance, the following code sets an entire integer array to 1 using the unsafe API to avoid any bounds check:


// For this to work, add the class to the bootstrap classloader by passing
// the option -Xbootclasspath/a:. to the java command
Unsafe u    = Unsafe.getUnsafe();
int   size  = 1 << 16;
int[] array = new int[size];
int   base  = u.arrayBaseOffset(Integer.class);
int   scale = u.arrayIndexScale(Integer.class);

for (int i = 0 ; i < size ; i++)
  u.putInt(array, (base + i * scale), 1);

// Nothing prevents me writting out of bounds:
// u.putInt(array, (base + size * scale), 1);
// Or reading:
// u.getInt(array, (base + size * scale));

While this is faster than regular Java array manipulation, it may lead to out-of-bounds accesses that can be exploited maliciously.

The unsafe API is used extensively throughout the java.util.concurrent package to perform atomic compare-and-swap operations and volatile read/write on arrays. Currently, that is the only way to do those operations. However, the Oracle JVM development team is looking into ways to make this API safe and part of the Java API.

Most of the low-level memory operations can also be made through JNI native code. However, using the unsafe API is more efficient because it avoids the cost of switching context from Java to JNI and back. Besides, the JIT compiles some calls to the unsafe API directly to JVM intrinsics1.

HotSpot JVM object memory layout

There is a code pattern when using the unsafe API to manipulate data in object fields or arrays. The code above shows an example of that pattern: First, it gets a base pointer with method arrayBaseOffset, then, it gets the array scale factor with method arrayIndexScale, and ,finally, it computes the formula base + i * scale to manipulate position i on the array with methods getInt or putInt.

A similar pattern applies to manipulating objects, the following example shows:


// For this to work, add the class to the bootstrap classloader by passing
// the option -Xbootclasspath/a:. to the java command
Unsafe u       = Unsafe.getUnsafe();
LinkedList obj = new LinkedList();
Class c        = LinkedList.class;
Field f        = c.getDeclaredField("first");
long  offset   = u.objectFieldOffset(f);

u.getObject(obj, offset);
u.putObject(obj, offset, null);

// Nothing prevents me from writting a wrong type:
// u.putObject(obj, offset, c);

For manipulating object fields, we do not get a base pointer and a scale factor. We get, instead, the offset of each individual field with method objectFieldOffset. We can then manipulate object fields using methods getObject and putObject. Note that these are the same methods as in the previous example: If field first had type int, we would use methods getInt and putInt.

Both these code patterns map directly to how the JVM lays objects in memory. The following figure shows how objects look in memory (memory addresses of variables/fields from the code above are writen in a C-like style with a preceding &):

Given the similarity between the HotSpot and the OpenJDK, I shall use the OpenJDK names in the description of the object memory layout, with links to the relevant header files in the OpenJDK source. Every object starts with a fixed sized header that has two fields, as defined by the header file src/share/vm/oops/oop.hpp:

Arrays also have a header that starts with the same two fields, followed by an extra field, as defined by header file src/share/vm/oops/arrayOop.hpp:

Manipulating the object model

Rubah uses the unsafe API to manipulate the metadata on the object header. This is extremely unsafe3 and getting it right was itself an important implementation challenge. This is also brittle because the unsafe API is not part of the standard Java API. Future versions of the HotSpot JVM are free to change how objects are layed out in memory.

In the following, I list all the different ways Rubah manipulates the header using the unsafe API:


// For this to work, add the class to the bootstrap classloader by passing
// the option -Xbootclasspath/a:. to the java command
Unsafe u     = Unsafe.getUnsafe();
Object o     = new Object();

// Valid for a 64 bit JVM with compressed oops
// Might not work for different architectures
long offset  = 1L;

int newHash  = 42;

int identityHashCode = System.identityHashCode(o);
int unsafeHashCode   = u.getInt(o, offset);

assert (identityHashCode == unsafeHashCode != newHash); 
u.putInt(o, offset, newHash);

int identityHashCode = System.identityHashCode(o);
int unsafeHashCode   = u.getInt(o, offset);

assert (identityHashCode == unsafeHashCode == newHash);

class A { /* empty */ }
class B { /* empty */ }

// For this to work, add the class to the bootstrap classloader by passing
// the option -Xbootclasspath/a:. to the java command
Unsafe u     = Unsafe.getUnsafe();
Object a     = new A();
Object b     = new B();

// Valid for a 64 bit JVM with compressed oops
// Might not work for different architectures
long offset  = 8L;

// Do not keep this around for long, it changes over time
int klass    = u.getInt(a, offset);

assert (a instanceof A);
assert (b instanceof B);

u.putInt(b, offset, klass);

assert (a instanceof A);
// If this code gets JITed, the following line may terminate the JVM with SIGSEGV
assert (b instanceof A);

How to make sun.misc.Unsafe safer

Useful as it may be, sun.misc.Unsafe is a proprietary API that will disappear or be modified in future releases of the HotSpot JVM. Rubah relies on features of the unsafe API that might disappear. So, in this section, I propose some alternatives to make those features safe so that they can be made part of the standard Java API.


  1. From wikipedia:

    In compiler theory, an intrinsic function is a function available for use in a given programming language whose implementation is handled specially by the compiler. Typically, it substitutes a sequence of automatically generated instructions for the original function call, similar to an inline function. Unlike an inline function though, the compiler has an intimate knowledge of the intrinsic function and can therefore better integrate it and optimize it for the situation. This is also called builtin function in many languages.

    For instance, calling System.identityHashCode is compiled to an intrinsic operation that just reads the hash code from the object header in a few native instructions, rather than to a method call.

  2. Locking in the JVM is complex, using both biased and thin locks, which I do not discuss in this post for the sake of simplicity. From the JVM documentation:

    -XX:+UseBiasedLocking Enables a technique for improving the performance of uncontended synchronization. An object is “biased” toward the thread which first acquires its monitor via a monitorenter bytecode or synchronized method invocation; subsequent monitor-related operations performed by that thread are relatively much faster on multiprocessor machines. Some applications with significant amounts of uncontended synchronization may attain significant speedups with this flag enabled; some applications with certain patterns of locking may see slowdowns, though attempts have been made to minimize the negative impact.

    Thin locks are explained in the paper Thin locks: featherweight Synchronization for Java

  3. To work with a different JVM, Rubah needs to be ported to use the memory layout of objects on that JVM. For instance, Rubah can be ported to the Jikes RVM , which has a similar object memory layout, but all the sun.misc.Unsafe operations have to be rewritten to use instead VM Magic.

    We argue that Rubah is portable between JVMs because it does not require any modification to the JVM itself. Other DSU systems for Java require a custom GC algorithm and are much harder to port to different JVMs.

  4. The only way to perform compare-and-swap in a Java program without using the unsafe API is to use class java.util.concurrent.AtomicReference ,which is internally implemented using the unsafe API.

  5. The steady-state overhead of the type-erasure AND hash-code field together was around 8%, and not 10% as the reader might expect. 2

Presenter software for PDF slideshows

24/Sep/2011

Hello!

I use Latex to write all my technical documents. I use it also to make slides that I show at my presentations. So, I end up with a PDF that contains the slides.

The problem

To show a PDF slideshow, we require some support from the PDF reader software. At the very minimum, it needs to support fullscreen, which the vast majority does. Some go a little further. For instance,Okular shows the current slide number and a visual representation of how many slides are still left.

But that’s it. If we take a look at other presentation software, such as as Libre Office’s Impress, we see that it takes advantage of a second monitor to show a presenter’s console that features: The current and next slide, the time elapsed and remaining, the notes that we wrote for each slide, and some other information.

Using two monitors to present your slides is a very common usage scenario (laptop connected to a overhead projector) and the presenter screen is really, really useful. I can’t stress that enough.

The (incomplete) solutions

Let’s take a look at what software is there available specifically for presenting PDF slides, splitted into three classes:

The first class of programs bring some “eye-candy” to your presentations. I have used impressive, and it works really well. It pre-loads all the slides, so changing slides is really fast. It also animates the slide transition. A fast crossfading between slides looks great, more elaborate transitions just looks like showing off. I have not tried any of the others, but I guess they all should do more or less the same.

Pdf-presenter-console and pdf-presenter support a presenter console. However, none has all the features that I value. One supports notes but does not support time information, the other supports time information but does not support notes.

My solution

I could not find any PDF presenter program with all the features that I would like to have. So I made my own!

I present you the brand new open-pdf-presenter. Version 0.1 was release yesterday! You can find on the wiki instructions about building and running it.

Here are some screenshots:

 

Currently, it displays a presenter console with the current slide, the next slide, the elapsed/remaining time, and how many slides are left. In the near future, it will support notes on a separate file and some other nice features from existing software:

It is open-source, so feel free to inspect and modify my code. It uses git as the revision control system. If you make an useful and working patch, just use git to format that patch and email it to me.

I welcome all feedback (bug reports, feature requests, patches, trolling, rantsdeath threats) so feel free to leave your own. I’ll keep you posted when I make another release.