I see it says that it may throw bad_alloc, but it's not clear why, since the algorithm itself (e.g see "Possible implementation" below) can easily be done in-place.
I'm wondering if the bad_alloc might be because a single temporary element (of whatever type the iterators point to) is going to be needed to swap each pair of elements, or maybe to allow for an inefficient implementation that chose not to do it in-place?