Meta introduced the multimodal generative AI method Make-A-Scene which will allow people to “describe and illustrate their vision” in the physical world and in the metaverse. Many state-of-the-art AI systems base images on text prompts, yet, their compositions are difficult to predict and effectively control. As a result, one’s creative vision can never be truly fulfilled: your image might be facing the wrong direction, be too small, or too big. Make-A-Scene, in turn, is supposed to solve this issue. It allows for the creation of images with greater specificity, using a variety of elements, forms, arrangements, depth, compositions, and structures by combining user-created sketches with text prompts to describe their vision. In testings with human evaluators, images generated from both text and sketch were almost always (99.54%) rated as better aligned with the original sketch and often more aligned with the text prompt (66.3%.) “It’s not enough for an AI system to just generate content, though. To realize AI’s potential to push creative expression forward, people should be able to shape and control the content a system generates,” Meta argued in a press release. This concept will allow users to create digital worlds in the metaverse and produce high-quality art regardless of their personal artistic skills.