AI for Images, Audio & Video
AI isn't just about text. Explore how DALL-E, Midjourney, Sora, and ElevenLabs are changing creative work across all media types.
Beyond Text: AI Goes Multimedia
While chatbots get the most attention, some of the most impressive AI breakthroughs are happening in images, audio, and video.
AI Image Generation
The Big Players
| Tool | Best For | Access |
|---|---|---|
| DALL-E 3 (OpenAI) | Precise instruction-following, text in images | Built into ChatGPT |
| Midjourney | Artistic, aesthetic, beautiful imagery | Discord bot or web app |
| Stable Diffusion | Free, open-source, customisable | Self-hosted or web apps |
| Adobe Firefly | Professional editing, commercial-safe | Adobe Creative Cloud |
How to Get Great Images
The same prompting skills apply:
Vague: “A cat” → generic cat picture
Specific: “A fluffy orange tabby cat sitting on a windowsill at sunset, watercolour painting style, warm tones, cozy atmosphere” → much better result
Tips for image prompts:
- Describe the subject, setting, style, and mood
- Mention specific art styles: “oil painting,” “minimalist,” “photorealistic”
- Include lighting details: “golden hour,” “dramatic shadows,” “soft diffused light”
- Specify what you DON’T want: “no text,” “no people in background”
AI Audio
Text-to-Speech (TTS)
AI voices have become remarkably human-like:
- ElevenLabs — Industry leader for natural voices and voice cloning
- OpenAI TTS — Good quality, built into the API
- Google Cloud TTS — Hundreds of voices and languages
Voice Cloning
You can now clone a voice from just a few minutes of audio. This has amazing uses (audiobooks, accessibility) and obvious risks (fraud, impersonation).
AI Music
- Suno — Generate full songs from text descriptions
- Udio — Create music in any genre from prompts
AI Video
This is the newest frontier:
- Sora (OpenAI) — Generate videos from text descriptions
- Runway — AI video editing and generation
- Pika — Quick video clips from text or images
- HeyGen — AI avatars for presentations and marketing
Video AI is improving fast but still has limitations with physics, consistency, and longer clips.
Creative Workflow Integration
The real power is combining these tools:
- Brainstorm with ChatGPT → script idea
- Generate images with Midjourney → visuals
- Create voiceover with ElevenLabs → narration
- Compile video with Runway → final product
What used to require a full production team can now be prototyped by one person.
The Ethics Discussion
These tools raise important questions:
- Copyright: Who owns AI-generated art? Can AI train on artists’ work without permission?
- Deepfakes: Realistic fake videos of real people
- Job impact: How does this affect creative professionals?
- Misinformation: AI-generated media that looks completely real
Most platforms now require disclosure when content is AI-generated, and watermarking technology is improving.
Try it: Give DALL-E (in ChatGPT) a detailed image prompt using the tips above. Then try the same prompt in Midjourney or another tool. Notice how each interprets your description differently.
Quick Quiz
Test what you just learned. Pick the best answer for each question.
Q1 How do AI image generators create pictures?
Q2 What is the key difference between Midjourney and DALL-E?
Q3 What is ElevenLabs known for?
Q4 What's the main ethical concern with AI-generated media?