Imagine you’re cooking dinner. You pull out pots and pans, wooden spoons, spatulas, knives, cutting boards, whatever. As each item is used, it’s tossed in the sink to be dealt with later. As food is served, you pull out plates, bowls and platters, flatware, glasses, and so forth. When the meal is over, these too go into the sink. Finally, you wash the dishes, put everything away, and are ready to do it all again later.
This turns out to be an apt metaphor for the problem of using memory in a computer program. Every little task reserves a bit of memory for this or that, uses it, then tosses it into an autorelease pool. When everything is done, the event loop cycles, the autorelease pool is drained, and the memory is recovered to be used another time.
At any given moment, the amount of memory sitting around unavailable for use, like the dishes in the sink, is larger than the actual amount of memory being actively used. You can think of the pot full of boiling pasta or the plates bearing food as active memory, while the sum total of items being used, including those sitting around dirty, represents virtual memory.
What happens if we decide to throw a dinner party, requiring more cooking vessels and utensils, more courses, more plates and flatware? Even while the amount of active memory remains the same or slightly higher, the amount of virtual memory continues to grow and grow. Finally you reach for a pan or a plate and find there are simply no more available. The whole system grinds to a halt. You have just blown up virtual memory.
This temporary memory pressure is not quite the same thing as leaking memory. A true memory leak will not be resolved until your process quits. This is more like a short-term memory leak, but it can be just as deadly. Even if your application never reaches the 32-bit world’s hard limit of 4 GB, the more memory you’ve got allocated, the more likely you are to start paging memory, which will not only drag down your application, but every other process on the system.
In other words, memory pressure is a performance issue, and like any performance issue, we can assume two things: 1) we don’t know where the problem is, 2) but Shark does. Of course, I say all this in hindsight. Finding myself stumped by this exact situation, I turned to the Shark team for help. What I discovered is optimizing memory pressure is easy, but not obvious. Here’s the part where the non-programmers out there can go read something else.
For the sake of this discussion, let’s say we have a particular operation. This can be an actual NSOperation, or it can be some other logical unit of work. The point is, it’s an infinitely replicable task that will suck up some amount of memory, do its thing, then release its memory. Do one such operation and you might not even notice the bump in memory. Do 10,000 and it’s another story altogether.
Speaking of NSOperation, don’t be fooled by its easygoing nature. You’re still responsible for the base-level autorelease pool in main(), just as if you were splitting off a thread. This is an easy mistake to make, and a tricky bug to catch. Memory will be released upon the death of the thread, but due to thread-pooling, that might translate to thousands of actual operations. It’s an implementation detail, but one you should keep in mind.
Shark’s memory optimization is not unlike its CPU optimization. Instead of selecting “Time Profile” from the standard pop-up, select “Malloc Trace.” If you’re running mini-config (the standard) you’ll notice the options are slightly different. There’s no WTF mode, and there’s no sample interval. The lack of a sample interval makes sense, since an allocation is what it is; there’s no sampling involved. The lack of WTF mode is a bit less clear without realizing the central gotcha of Malloc Trace: it is tremendously expensive.
The mistake I would always make was running it for the standard 30 seconds. Memory allocation is an extremely quick operation relative to the speed of a CPU, so the amount of data generated in 30 allocation-intense seconds will take until the heat death of the universe to calculate—especially if the system is already paging memory. Unlike the carefree run-forever world of time profiling, you need your memory trace runs to be as short as possible. Anything more than a few seconds and you’re asking for pain.
That goes double (or, rather, square) if you’re running with “Record Only Active Blocks” checked. Not only did I not appreciate the processing time necessary to apply the active block option, I didn’t even know what it was, which is a shame, because it’s fantastic. In a nutshell, “Record Only Active Blocks” means “match malloc and free.” In other words, if I create an object and release it before the sample ends, it doesn’t show up at all, which makes finding leaks and pressure points so much easier.
A little Shark hint you might not be aware of: the option+esc keystroke to start and stop Shark is caught at a very high level, so you don’t have to actually switch to Shark to avail yourself of it. This makes it really easy to start the trace, perform the operation you’d like to record, and stop tracing when it’s done (or right before it’s done, if you want to see peak pressures).
After a few seconds of processing time, you’ll get a standard Shark sample. You can use heavy, tree, or combo mode, as well as data mining and charting to figure out where the pressure is coming from. A well placed autorelease pool and suddenly performance increases across the board. Unlike CPU optimizations, however, you’re not just helping your own application, you’re being a good citizen of the system.
This all begs the question of autorelease pools being considered an optimization in the first place. Couldn’t you just make it a habit to drop one whenever you have a big loop, or never call autorelease, instead opting for manual alloc and release? The trouble with that strategy is the same as the trouble with all premature optimizations.
First, you simply can’t know what’s an expensive operation. One little API call might trigger a flurry of autoreleased memory allocation in the most unexpected of places. Second, ease and readability should always be your primary concerns. Autoreleased memory is memory that will never leak, and a hell of a lot easier to keep an eye on than trying to go through matching every alloc to its release.
On the flip side of that, draining an autorelease pool is not free. Indeed, autorelease pools, like kitchen sinks, are real damned-if-you-do, damned-if-you-don’t situations. Drain it too early and you’re wasting cycles. Drain it too late and it becomes inexplicably slow. I don’t know the ideal size of an autorelease pool. Do you? Luckily, Shark means never having to guess.
A couple of other minor points: One nicety of Shark is you can start sampling at any point in your program. You don’t have to attach at launch like you do with Instruments. Given the expense of allocation profiling, that’s a very nice feature. Also, when looking for memory pressure problems, run through your operation one time before you run Shark. This will initialize any statics that might be hanging around so they don’t distract you from the results.
Finally, I would be remiss if I didn’t point out that this is exactly the kind of crap garbage collection is meant to save us from. Still, I’d be willing to bet even under GC, Shark could help you figure out where to position your collector hints.
Addenda
Ian Baird
Even with GC, depending on your algorithm you might need to hint the collector every once in a while to “collect if needed.” This is done either with the NSGarbageCollector’s collectIfNeeded: method, or on NSAutoreleasePool’s drain method. You’re right Shark’s probably good for this as well.
Chris Suter
I’ve never had a problem with releasing an autorelease pool too early. In loops where an autorelease pool is necessary, I now release them after every iteration of the loop. That way, it guarantees that it will show up problems immediately. I’ve had loops that only release the autorelease pool every 1000, say, iterations and that has masked a problem where I was releasing an object I didn’t want to.
I really doubt that releasing your autorelease pool every iteration is an issue so whilst it’s not a free operation, it’s pretty close to being free in my experience so I kind of disagree with what you say about it.
Mike Lee
You’re right, autorelease pools are not so expensive you’d worry about them. Certainly premature optimization is just as evil in that sense as any other.
A bigger concern is lines of code. I wouldn’t want to just add an autorelease pool to every loop as a matter of course*. What if you had a nice single-line loop? Adding an autorelease pool not only triples the lines of code, it also necessitates adding braces, making the code that much longer and that much harder to understand. It’s a little thing, but little things add up.
If only for elegance, the proper number of autorelease pools is “just the right amount.” Too few and you run into memory pressure. Too many and you’ve got unnecessary lines of code.
The ultimate lesson here is that autorelease pools are an optimization, and any optimization should be applied as a result of profiling, not intuition.
* Not to misquote you, of course. You clearly said, “in loops where an autorelease pool is necessary.” Once necessity is proved, I too would drain after every iteration, since it would require more logic to release every n iterations, and I’ve never been so hard up for CPU to make that level of optimization necessary.
Andre
Thats a great metaphor for memory management.
And, those are some good tips for shark. Didn’t know about option-esc...