Most of the technical reasons why we didn't do this originally have since been overcome (e.g. handling a nested ce=true inside ce=false like we've since done with tables) and it would make for a more intuitive user experience.
We could either keep the media dialog as is, or get rid of the caption sub-surface and move everything onto one page.