they are multimodal (can handle text, image and other inputs), became more context-aware, and they are capable of handling more complex tasks. For image generators, that meant it could process ...