I agree up to a point. Knowing for sure that you can "hook" into the conversation at HTTP level is a powerful thing in some cases. It is directly analagous to hooking software interrupts on PCs which was all the rage, oh, 20 years ago:-) (Anybody out there remember TSRs?)
However, the usefulness of such hooks is completely dependent on how thin/thick the client side of the conversation is. In the simplest case, the client doesn't hold any state and just relays user events to the server. Such apps were highly automatable in the TSR days and are highly auomatable on the Web today.
However, as soon as you put some sort of state holding, Turing machine on the client side...Simply put, you get a non-trivial client-side state vector which is not hookable. As soon as this happens, automation is in trouble.
A really simply example. Consider a Web UI with two buttons Debit and Credit on the client side. Button Debit is only enabled at UI level if Credit has been pressed at least once. Now where does that logic - that "business rule" live? If it lives client side (highly likely) then the behaviour of the application and the semantics of the hookable HTTP events is dictated by a state machine you cannot see at the HTTP level.
Daniel Fisher's comment to Jon's post gives a common example of the problem: forms that do not work unless the form data is processed by an onSubmit handler. There are really, really good UI/User Experience reasons for doing such things. Hence the problem :-/
Daniel also talks about driving the browser directly to avoid such problems. Again, this speaks to the heart of the problem. Will we end up having to drive client side browsers using OLE Automation or (shudder) event queue poking, in order to automate web applications?
This isn't a pleasant picture from my perspective.
It reminds me too much of my TSR days when another favourite trick in automating applications was poking keys into the keyboard buffer using (if memory serves) INT 10H.