I think you're right. Word adds its own special sauce, and
when you go to convert lines of Word text into html (i.e., for InDesign), for example, it always, always results in layers of unnecessary crap (which I would call "code," except that I've been told that I use the term incorrectly; hence "crap"). Best to just build the html around the text (or, in your case, photos) yourself.
Not that I learned this the hard way, or anything.