Seems to me that if I have 4000 1Hz-wide bins, I should sum them to give
me the total power in a single bin that
"represents" the same amount of bandwidth. But is it more subtle than
that?
As usual, yes and no.
If you're concerned with statistical hygiene, then the mean (averaging over multiple bins) is defensible. And if you wanted to add a dimension to each reduced output bin, like color, you might want to throw in the variance within each set of sub-bins contributing to the average.
The most robust estimator would be the median, though, probably -- the exact midpoint between the lowest and highest values in each set of sub-bins.
However it sounds like what you're going for is a kind of compression that's lossy but optimizes for visual properties, not statistical robustness. That's usually highly non-linear and very subjective. In that case, why not pick what just looks good on your data? The algorithm that John Ackermann suggests is likely to be as good as anything else.
Frank