Jimmy Zhang hits the nail on the head. The real issue is object allocation.
The real culprit for all the object allocation, ultimately, is XML's variable-width-everything : element types, attributes & text. This results in a boatload of discrete malloc() operations and a whole lot of "pointer pointing" to link parents to children, nodes to siblings etc. when the graph structure is read into memory. Memory managers hate lots-and-lots of little objects.
I'm fine with this. XML's variable-width-everything is the key feature as far as I am concerned. Not a bug. The 3 biggies to keep in mind in my experience are:
1 - Zip the XML for transmission and zip it for storage too if you like. Modern compressions algorithms are so beastly good that the can do better than you could yourself if you hand-crafted your own "efficient" storage/transmission format. Besides, you have better things to be doing than writing compressors/de-compressors and cunningly devilishy-difficult-to-debug-and-maintain custom notations. (Note that http groks gzip. I frequently encounter developers who don't know that.)
2 - Bend over backwards to avoid repeated loading of the XML from scratch -with all the malloc operations it entails. Don't fall for the common misconception that saving/transmitting a binary/marshaled/pickeled form will lead to fast re-loading. (These things end up calling malloc() too you know :-) Memory-based caches of "cooked" data structures are your friend.
3 - if you know for sure that every bit of every byte is precious bandwidth on the wire or on disk; and if you are a happy that this truly is the bottlekneck in your application, then perhaps XML is not right for you. But beware that CSV or JSON or any other format with unpredictable variabilities in "record" length will have the same malloc issue at the end of the day.
In a perfect world what would I do? I'd introduce a "profile" of XML 1.0 that allowed XML data to signal to XML parsers/processors key stats about the data such as maximum required #PCDATA node size, that sort of thing. It could be done with a PI or an attribute or a