ObjectOutputStream to write out lots of data the data doesn't go away after
OOS.write() returns - instead it gets cached in a
handles map within the OOS itself so that should it be told to write that object instance again it can instead write a handle reference (thus saving on space in the stream and avoiding circular references). Kewl, huh? Well, not entirely...
The downside to this approach is that if you're writing LOTS of objects (say 250,000) then even tho you've designed your code to avoid holding all those objects in memory every one of them will be held in memory until you finish with the OOS entirely because of that handle cache! (This explains why my simple little app eats up over 2GB of RAM even tho it's supposed to be processing data objects one object at a time). The fix?
Two came up and I'm not 100% certain which is better all the time so I'll discuss them both (in the order in which I thought of them). :)
Solution #1 - Unshared Externalizable/Serializable
There's a new interface in town,
java.io.Externalizable, which is the cousin of
java.io.Serializable, but with a few new quirks:
- First, unlike
Externalizablehas methods (imagine that! methods in an interface!) -
writeExternal(ObjectOutput), typically the args are Object(Input|Output)Streams, which implement Object(Input|Output).
- The second (and rather weird) quirk is that your
Externalizableobject must have a PUBLIC no-arg constructor. Now, considering that the ctor is gotten via reflection I'm a little puzzled why it has to be public but... whatever.
So first you make your DAO implement
Externalizable rather than
Serializable (which is a parent of
Externalizable) and then you replace all your
ObjectOutputStream.writeObject() calls with
writeUnshared() (and on the read side
OIS.readUnshared()) which tells the streams to just ignore that whole handle map thing.
Now you might think just making your DAO unshared is enough but NO! (unless your DAO is the data itself, which is nearly impossible since all DAOs are, at some level, composed of primitives, primitive-wrappers, arrays or collections of primitive/primitive-wrappers). So while your DAO might be unshared (not cached) its parts are probably shared (cached). That's why you have to implement
Serializable - after a bit of testing i'd just go with
Serializable - no funky public ctor and no casting of
ObjectInputStream and so forth. Six of one, half a baker's dozen of the other... :)
Solution #2 - ObjectOutputStream with amnesia
There is a method on OOS called
reset() which clears the handle cache and reduces memory use to nearly nothing (other than what you were using without the OOS). It also injects a TC_RESET marker into the output stream which, in my DAO's case, increased the output file size by 50% (e.g., what was a 4.9M file became a 7.2M file). basically if you look in the serialized file you will see your DAO's and component fields' class names between every single instance (normally they're declared once at the top). So if size of your serialized data is an issue this might not be a great fix. But if RAM usage is a concern this approach is huge - my app which used to max out at over 400M (in testing) with regular serialization dropped to 100M with unshared but then dropped all the way to less than 1M java heap with
reset() between each
writeUnshared() (hard to say exactly 'cause there were so few Full-GCs but i'd guess it was about 400-500K). THAT's amazing (to me).
Didn't see any performance impact in calling
reset() between each
writeUnshard(); in fact i'd say the
reset() version works faster because it avoids all those Full-GCs.
Hope this helps folks as much as learning it helped me. :)