The next creative suite
Technology breakthroughs are followed by breakout product companies. One of the most exciting technology shifts right now is the rapid improvement in neural networks. Specifically, models like DALLE are very quickly becoming good at generating images from text input. You enter a prompt for anything imaginable – and within seconds you see AI-generated images.
Image synthesis
Tools like Figma have sidebars filled with property controls (sizing, positioning, alignment, fonts, colors, etc.). To complete any complex task, you have to perform a long sequence of steps. But with natural language input and powerful image synthesis, that need not be the case. Imagine being able to open a Figma file and entering commands like this:
Recreate the stripe.com/payments page and replace all logos with the selected image.
For the selected frame that shows a web view, create an equivalent mobile page. Show me 5 variants.
Generate an illustration that matches the selected text. Ensure it's consistent with illustrations A and B.
These capabilities wouldn't just provide 10% time savings. They can turn a 100 hour project into a 10 minute one. And of course a shift in capabilities of this magnitude opens the door for startups that reimagine how creative tools should work.
Video synthesis
Powerful image synthesis will precede powerful video synthesis because videos are more complex. But eventually similar capabilities will exist for video. Writers will be able to write stories and see them come to life in video. Initially the technique will only work with simple stories (maybe children books will be the first application!) but eventually we will have movie-quality video generated by text instruction. Imagine the possibilities:
Creating characters
You'll be able to describe a character in text – how they look, how they speak, their accent, etc – and see that character come to life in video. You will be able to upload a picture of your own face and make that the main character of your movie.
Generating speech
You'll be able to input text dialogue and watch your character say those words – with mouth movements that perfectly map the intonations of words.
High-abstraction controls
Imagine a video editor where instead of adjusting the brightness, you can adjust the personality of characters or the weather in the background.
Infinite paths
As the cost of video synthesis approaches 0, it will be much easier to create choose-your-own-adventure games and videos.
The future
We shape our tools, and thereafter our tools shape us.
- John Culkin
GPT3 and similar models are powerful enabling technology. So far, most products built on top are doing text synthesis – for example GitHub Copilot and Copy.ai. There are going to be many more soon and some will attempt to build the next creative suite.
These new creative tools will lead to new a branch of art. If you are interested in this space, I highly recommend this blog post to see examples of what is already possible. I'll leave you with this quote from the blog post that summarizes what creativity might look like in this new world.
And despite the fact that the model does most of the work in actually generating the image, I still feel creative – I feel like an artist – when working with these models. There’s a real element of creativity to figuring out what to prompt the model for. The natural language input is a total open sandbox, and if you can wield words to the model’s liking, you can create almost anything.
- Charlie Snell