convert everything to UTF-8, part 1: large groundwork