Saturday 23 February 2013

How Will Oracle Monetise Java?

Jane Seymour - maybe she can tell Java's future?
This image is Creative Commons.
Sun made money from Java by accident - Oracle is much more systematic than that.

Sun made money from Java because people used Java and if you are running Java than Solaris is the way to go - yes? That made sense then; it does not now. This is for many reasons, not least of which is Linux; however, the key reason being the Oracle expects each business unit to stand on its own feet (or at least that is what an ex senior Oracle manager once told me).

 Oracle has End Of Life'd Java 6 - all support and patches must now be paid for. 

How can Java make money for Oracle? 
"They give it away for free' surely that is no way to run a business". Some people feared that Oracle would stop giving it away for free - but that is not their game at all (though I will go back to this later). The are acting much more like Dr. Kananga by giving the product away, destroying the competition and then charging for it. 

Does Oracle Charge For Java? Yes - rather a lot by all accounts. Now that they have managed to get Java security center screen and thousands of free column inches plus countless Reddit rants FOR FREE discussing security they are have created a license to print money.

Oracle is running a protection racket - but no-one minds!

Don't get me wrong, I am not dissing Oracle here. If you believe doctors should work for free and the common good, then you can go die somewhere else. Equally, if you think Oracle should support your software for ever, for free, just go look what happend to Sun. Nope - now that support is center stage as a Java must have, why not use that to fund the whole project? 

Customers who do not upgrade need to pay maintenance for security patches, simples.

It get better though. There is positive feedback which makes Java better.

Oracle's plan is to use support to fun the whole Java project by making Java better - quite clever really.

The plan is simple:

  1. Make sure customers feel worried about not having support, especially security patches.
  2. Release a new version of Java every year.
  3. End Of Life older versions after the new version has been out for a year. This means - no free security or performance patches.
  4. Then, if you do not upgrade your version of Java every year, you will need to pay to have support and having support is the only security sensible way to go.
  5. If you are a large organisation, upgrading Java is actually very expensive. Much better to pay Oracle and not have to do it. If it works - don't fix it.
Hey, Oracle have realised that fear is the ultimate method of control. From control comes compliance and obedience  Much as the 'terrorist threat' has been used by governments the world over to strip agency from the populous and erode democratic freedoms, Oracle is using fear to drive commercial Java users to their checkbooks like so many willing sheep to the slaughter. 

Does it matter?

No - it is brilliant! Oracle could have started to charge for Java. That would have caused the whole ecosystem to collapse like a house of cards in a stiff breeze. This approach, is so much better.
  1. It funds Java development.
  2. It give Oracle a great reason to release new versions every year. They need to fund Java to get better.
  3. It only targets large organisations which are reluctant (for good reasons) to keep upgrading. Hey - they should be getting proper support anyhow. Would you want your bank account to run on an unsupported platform?
  4. It makes Java stand on its own as a capitalist project rather than some waffly 'community project for the greater good' non-sense.

Where does this leave Oracle's Java tooling?
Selling seat license for development tools has not been a really good money spinner for a while now. The real place to make money is runtime licensing and maintenance. 

This might explain why Oracle seem to be struggling to port Jrocket Misson Control to Hotspot. It is still languishing in the Java 6 world of Jrocket. They cannot make money out of it and it does not act as part of a new Java version. No reason to spend money on it.

Similarly, Netbeans is not likely to move very far very fast either. Yes, Oracle need to keep it alive because otherwise Eclipse would have too much power over the Java ecosystem; but when embedding Chrome into a IDE is news - one can tell the IDE has little new to tell.

Conclusions:
Oracle have learned from MySQL how to monetize a product they give away. MySQL was used as a low end product. Java is a high end product, equivalent to COBOL in many ways, sitting in the beating heart of the worlds largest banks and companies. Applying the same techniques to Java will be a much better way to make money than selling addictive drugs to poor Americans in the 1970s.



Saturday 16 February 2013

Automated Large Object Swapping Part II


 

Very Large Breasts
Very large objects can be hard to contain.
This image is public domain.
Here I carry on from Part I to discuss fast serialisation.

Note - this is a cross post form Nerds-Central

Writing large objects to disk takes a lot of time, so we might think that efficient serialisation is not required. However, this does not seem to be the case in practice. Java does not allow us to treat a block of memory which stores and array of floats (or any other non byte type) as though it were a block of bytes. Java's read and write routines use byte arrays; this results in copying to and from byte arrays. The approach I took was to move the contents of Sonic Field's float arrays into a byte array as quickly as possible and then set the references to the float array to null so making it eligible for garbage collection.

Now - it is possible (I did try it) to serialise the data from the large objects to the swap file 'in place' and not require the byte array intermediate. However, the speed of methods like RandomAccessFile.writeFloat and DataOutputStream.writeFloat are so slow (even ObjectOutputStream was) that on a really fast drive, the CPU would become the limiting factor. This is really an attribute of hard drives becoming so very fast these days. The SSD in my Mac Book Pro Retina can easy write 300 mega bytes per second. This is not even that high speed by some modern standards. Calling DataOutPutStream.writeFloat (with a buffering system between it and the drive) takes around a third of a CPU core to write out at 80 mega bytes per second to my external USB3 drive. So, if I were to use the SSD in my machine for swap (which I don't as that is just SSD abuse), the CPU would be the limiting factor.

We need much faster serialisation than Java provides by default!

What is the fastest way to move several megabytes of float data into a byte array? The fastest way I have found is to use the (slightly naughty) sun.misc.Unsafe class. Here is an example piece of code:

setUpUnsafe();
payload = allocateBytes(getReference().getLength() * 4);
unsafe.copyMemory(
    getReference().getDataInternalOnly(), 
    floatArrayOffset, 
    payload, 
    byteArrayOffset,
    payload.length
);

What copyMemory is doing is copying bytes - raw memory with no type information - from one place to another. The first argument is the float array, the second is the position within the in-memory layout of a float[] class where the float data sits. The third is a byte array and the fourth the data offset within a byte array class. The final argument is the number of bytes to copy. The Unsafe class code its self works out all the tricky stuff like memory pinning; so from the Java point of view, the raw data in the float array just turns up in the byte array very very quickly indeed.

It is worth noting that this is nothing like using byte buffers to move the information over. There is no attempt to change endianness or any other bit twiddling; this is just raw memory copying. Do not expect this sort of trick to work if the resulting byte array is going to be serialised and read into a different architecture (x86 to Itanium for example).

In Sonic Field, the byte array thus loaded with float data is stored in a Random access file:

ra.seek(position);
ra.writeInt(payload.length);
ra.writeLong(uniqueId);
ra.write(payload);
ra.writeLong(uniqueId);

Performing this call on my external drive at around 80 MBytesPerSecond uses about 4 percent of one core. This give a comfortable 16GBytesPerSecond to saturate one core which is more like it!

Reading the data back in is just the reverse.


ra.seek(position);
int payloadLen = ra.readInt();
long unid = ra.readLong();
if (unid != this.uniqueId) throw new IOException(/* message goes here*/);
payload = allocateBytes(payloadLen);
if (ra.read(payload) != payloadLen)
{
    throw new IOException(/* Message Goes Here */);
}
unid = ra.readLong();
if (unid != this.uniqueId) throw new IOException(/* message goes here*/);
ret = SFData.build(payload.length / 4);
unsafe.copyMemory(
    payload,
    byteArrayOffset,
    ret.getDataInternalOnly(),
    floatArrayOffset,
    payload.length
);

Note that the length of the data is recorded as a integer before the actual data block. I record the unique ID for the object the data came form before and after the serialised data. This is a safe guard against corruption or algorithm failure elsewhere in the memory manager.

Setting Up Unsafe
Unsafe is not that easy to set up, especially if you do not also set up a security manager. Here is the code I use:
java.lang.reflect.Field theUnsafeInstance = Unsafe.class.getDeclaredField("theUnsafe"); //$NON-NLS-1$
 theUnsafeInstance.setAccessible(true);
 Unsafe unsafe = (Unsafe) theUnsafeInstance.get(Unsafe.class);

Also we need to get those offsets within classes:
// Lazy - eventually consistent initialization
    private static void setUpUnsafe()
    {
        if (byteArrayOffset == 0)
        {
            byteArrayOffset = unsafe.arrayBaseOffset(byte[].class);
        }
        if (longArrayOffset == 0)
        {
            longArrayOffset = unsafe.arrayBaseOffset(long[].class);
        }
        if (floatArrayOffset == 0)
        {
            floatArrayOffset = unsafe.arrayBaseOffset(float[].class);
        }
    }

Note that all code from Sonic Field (including all the code on this page) is AGPL3.0 licensed.

Automated Large Object Swapping Part I


 

I don't like the title - says a bird
I don't like that title!
Creative Commons - see here
I hate that title! But it does describe what has been keeping me up nights for a week now. The challenge - make swapping large objects to disk completely automatic, not use too much disk and be efficient.


I have a system which hits these goals now; it is far too complex for a single Nerds Central post so I will discuss it over a few posts. This is an introduction and a description of the algorithms used.

The challenge comes when Sonic Field is performing very large renders. However, I believe the solution is general purpose. 

The particular piece which caused all the trouble was 'Further Into The Caverns'. You see, I have made a new memory management system to deal with vocoding. The idea I used for that was to keep track of all the SFData objects (audio data in effect) via weak references. When the total amount of SFData live objects went above a particular level, some were written to disk. The simple approach worked on the idea 'disks are big and cheap, so we will just keep adding to the end of the file'. This worked find for small renders with big memory requirements, but for long running renders it was not so good. I would have needed over a terabyte of storage for Further Into The Caverns.

Whilst all this is about audio in my case, I suspect the problem is a more general one. I suspect that any large scale computation could hit similar issues and may well benefit from my work. Here is a general description of the challenge.

  1. A program is performing very large computations on very large (multi-megabyte) in memory objects.
  2. Reading and writing objects from disk is very slow compared to storing in memory so should be avoided.
  3. Memory is insufficient to store the peak level of live large objects the program uses.
  4. Disk is large, but the amount of data used by the system will consume all the disk if disk space is not re-used.
  5. The program/system should handle the memory management automatically.
  6. Using normal operating system supplied swap technology does not work well.
This latter one is something I tend to see a lot with Java. Because the JVM is one process and it has a huge heap which has objects constantly being moved around in it, regular OS style swapping just cannot cope. A JVM has to fit in RAM or it will thrash the machine.

Trial And Error
I wish I could say I designed this system from scratch and it worked first time. The reality is that many different ideas and designes went by the way-side before I got one which worked well. I will not bore you with all the failures; I believe it is sufficient to say that the approach I took is born of hard kicks and a lot of stress testing.

Object Life Cycle
Sonic Field's model has a particular object life cycle which makes it possible to perform the disk swap technique without a significant performance impact where swapping is not occurring. This is achieved by wrapping large objects in memory manager objects when the former is not actively involved in a algorithm tight loop. Thus the overhead of the extra indirection (of the memory manager) is not incurred inside tight loops:
  1. Creation
  2. Filling with data
  3. Wrapping in memory manager object
  4. In RAM Storage
  5. Retrieval
  6. Unrapping - retain wrapper
  7. Algorithmic manipulation (read only)
  8. Return to 3 or
  9. Garbage collected
  10. Wrapper garbage collected
The key is that large objects are unwrapped when in use. However, when not actively being used in an algorithm, they are referenced only from the wrapper. The memory manager can store the large object on disk and replace the reference in the wrapper with information on where the object is stored on disk. When the object is requested for an algorithm again, it can be retrieved from disk.

When To Store
This was a tricky one! It turned out that the difference between the maximum heap and the available heap (are reported from the Java Runtime object) is the best way of triggering swapping out of objects. Just before a new large object is allocated the memory manager assesses if the maximum heap less the current heap use less the size of the object (approximately) is below a threshold. If it is, all currently in memory large objects are scheduled for swapping out. The details of this swapping out scheduling are complex and I will cover them in another post. There are other points at which the threshold is checked as well, though the pre-allocation one is the most important.

Cute Woman Posing In Torn Top
I'm all sort of torn up and fragmentary.
Creative Commons - See Here
Non Fragmenting Swap File Algorithm
All this sounds fine, but is really is not enough. A simple application of this idea was more than capable of consuming an entire 1 terabyte disk! The problem is fragmentation (if you are interested in the subject - here is a good place to start looking).

Let me explain in a little bit more detail:
Consider that I write out ten 1 gigabyte objects. Now my swap file contains 10 slots each of which is approximately 1 gig long (there a few header bytes as well). Now - the JVM garbage collector may eventually garbage collect the wrapper object which contains the information about one of the objects. This is detected by the memory manager by a weak reference to that wrapper returning null from the .get() method. OK! Now we have a slot that is free to be used again.

Unfortunately, this 1 gigabyte slot might well become filled with a 10 megabyte slot. We could then create a new slot out of the remaining 990 gig, but we have made sure that if we were then to need another gigabyte, it would have to go on the end of the swap file thus increasing swap usage.

It turns out that this process of fragmentation just keeps happening. Rather than (as I had originally thought) it levelling off so the swap file grows asymptomatically to maximum, the reality is that the file just grows and grows until what ever medium is it on fills up. Clearly a better algorithm is required.

The next obvious step is slot merging where by, if two slots next to each other are found to be empty (weak references to the file info objects) then they can be merged to create a bigger slot. There is no doubt this helped, but not enough to run Further Into The Caverns in less than 250G of file space. I became ambitious and wanted to run that render on my Mac Book without an external drive, which meant getting the swap down below 100G or so (it has a 250G SSD, and it is not good for SSDs to be filled completely).

So, an even better algorithm was still needed. I could have used one like that buddy algorithm (linked above) or something based on the idea that Sonic Field objects come in similar sizes. However, due to the point of indirection between wrapper objects and the swap file, an near perfect option exists.

Compaction - Simple and Effective
A rubbish compacting lorry
Squash it, move it, dump it.
Creative Commons see here.
Yep - garbage is better off squashed - makes it easier to deal with and move around.

Let us thing of the swap file as a set of slots, those with stuff in I will represent at [*] and empty ones (i.e. space which can be reused) as [ ].

[*][ ][*][ ][ ][*]
 1  2  3  4  5  6

If we swap 3 and 2 we get:

[*][*][ ][ ][ ][*]
 1  3  2  4  5  6


Now we can merge 2, 4 and 5:

[*][*][       ][*]
 1  3  2  4  5  6


Finally, we swap the new large empty slot with 6:

[*][*][*][       ]
 1  3  6  2  4  5 


With a single pass of the swap file we have moved all the empty space to one large slot at the end. Next time something is written to the file the big slot at the end can be split into two.

The final result
Moving slots is an expensive operation because it means copying data from one part of the disk to another. As a result, running the full compaction algorithm every time we need to find some space in the swap file is not such a good idea. It turns out that merging adjacent free slots and splitting slots when not all of one is used can keep the swap file from growing for a while. Eventually, though, it gets badly fragmented and starts growing quickly. The compromise I have found works OK (but I am sure could be tweaked to improve it) is to use a fast merge/split approach most of the time, but at random intervals which on average are every 100 times the memory manager looks for space, a full compaction is performed. I went for the random interval approach to ensure that no cycles between loops in a patch and the memory algorithms develop.

Did it work? Yes - Further Into The Caverns renders with just 5.8 gigabytes of swap file!

Sonic Field Memory Manager

My Mac using an external hard drive for swap space
during a large render

Sonic Field uses a lot of memory. Simple approaches to memory management and swapping data to disk reach a point where they were not good enough for me late last year - now a completely new system is in place.

The old system relied on operators in a patch telling Sonic Field when to swap audio signals out to disk. This was a pain to use and meant that some opportunities for swapping were missed. More complex patches which use more resources meant that this simple approach was not longer cutting it for Sonic Field. Unfortunately, automating something like memory management is very challenging. In some ways, more so in a Java based system than in a more traditional C++ based system because of the automated memory system of the JVM.

Now SFData objects (those which store audio signals) are made available for swapping out to disk as soon as they emerge from a processor. The memory manager is hooked into the main interpretor loop of SFPL:

    /*
     * (non-Javadoc)
     * @see com.nerdscentral.sfpl.SFPL_Runnable#Invoke(java.lang.Object, com.nerdscentral.sfpl.SFPL_Context)
     */
    @Override
    public Object Invoke(final Object input, final SFPL_Context context) throws SFPL_RuntimeException
    {
        Object ret = input;
        int i = 0;
        // This implementation might seem a bit clunky but it works out
        // a bit faster than using a ForEach over the array its self
        // context_ = context;
        long[] counter = new long[this.chainEnd + 1];
        try
        {
            for (i = 0; i <= this.chainEnd; ++i)
            {
                long t0 = System.nanoTime();
                ret = this.interpArray_[i].Interpret(ret, context);
                if (ret instanceof SFData)
                {
                    ret = SFTempDataFile.registerForSwap((SFData) ret);
                }
                long t1 = System.nanoTime();
                counter[i] += (t1 - t0);
            }
        }
        catch (final SFPL_StopException e)
        {
            if (e.rethrow())
            {
                throw e;
            }
            return e.getOperand();
        }
        catch (final Throwable e)
        {
            final Object op = ret == null ? Messages.getString("cSFPL_Runner.0") : ret;   //$NON-NLS-1$
            throw new SFPL_RuntimeException(this.objChain.get(i).obj, op, this.objChain.get(i).line, this.objChain.get(i).colm,
                            this.objChain.get(i).file, e);
        }
        for (i = 0; i <= this.chainEnd; ++i)
        {
            timer.addNanoCount(this.interpArray_[i].Word(), counter[i]);
        }
        return ret;
    }

When the memory manager detects that the JVM (Java Virtual Machine) is running low on memory, it starts to write the SFData objects' audio data to disk. 

The up side is that this approach allows Sonic Field to perform very much bigger calculations. The down side is that some tweaking of launch parameters is required by the user. I hope to make the default/automatic parameters good enough for nearly all conditions, but that will take a bit more work.