Talk Dirty to AI
You do more with AI when you are more human
Published
Oct 2, 2025
Topic
Artificial Intelligence
Perhaps your best work happens when you are more human with the tools you use. And multimodal AI tools are the unlock we’ve been waiting for.
For decades, we’ve had to compress ourselves to fit the rigid structures of technology; learning platforms, memorizing commands, formatting documents a certain way. But multimodal AI tools flip that script. For the first time, tools are adapting to us: our voices, our sketches, our messy streams of thought.
Voice is a Gateway
A while ago I came across a short clip of a young woman speaking into her phone, rambling without much coherence. At first it seemed like just another casual voice note, but then I realized she was talking with ChatGPT. Despite her tangents and half finished sentences, the bot understood her intent and responded meaningfully.
That’s exactly how I think: raw, nonlinear, more feeling than structure. And here was a tool that could keep up.
I’ve been voice-first with AI since I started with Audiopen over two years ago, and today 90% of my ideas begin as spoken notes. From there, I sketch on paper, feed the notes into ChatGPT, and refine into writing. It’s a workflow that feels less like fighting a system, and more like being supported in how I naturally think.
The Research Behind Multimodality
What’s happening here is the natural evolution of generative AI. Early systems were built around text, but today’s frontier models can handle multiple modalities such as voice, text, visuals, structured data, all at once.
This matters because work is rarely one-dimensional. When you’re preparing a presentation, you don’t just write text, you reference spreadsheets, pull in images, and brainstorm out loud. Multimodal AI is catching up to the way humans already work.
The pace of progress is also staggering. Research from METR shows that the length of tasks AI can handle successfully has been doubling roughly every seven months. Meanwhile, OpenAI’s GDPval benchmark finds that frontier models are approaching expert-level performance across 44 occupations, from marketing and HR to engineering.
In other words: these tools aren’t just good at parsing your rambling thoughts. They’re rapidly becoming competent enough to turn them into real, valuable work.
Learning AI Fluency
But tools alone aren’t enough. Just like literacy was once a differentiator in the workplace, AI fluency is becoming the next. For me, that means building competencies like:
Delegation: knowing what goal to hand over to the AI
Description: providing context so it can approach the work effectively
Discernment: evaluating whether the output is actually good
Diligence: holding myself accountable for how I use it
All concepts borrowed from the AI Fluency course hosted by Anthropic.
Voice input is a great example. If I simply dump words, I may get something back, but if I describe the problem with enough context, ChatGPT can shape my messy thinking into something useful. Fluency is the bridge between experimenting with AI and integrating it into your everyday workflows.
Why This Matters for Business Teams
The same freedom I feel to think out loud with AI can help whole teams communicate more naturally, collaborate with less friction, and focus on the work that actually matters
People don’t need to “sound smart” to use AI; they can just talk
AI carries the structure, so teams can focus on substance
When tools fit natural human behaviours, AI stops feeling like extra work and starts feeling like a co-pilot
Instead of compressing ourselves to fit a tool, we finally have tools that expand to fit us. And maybe, just maybe, that’s how we rediscover joy in our work.
PS: If you’ve never tried it, Google Docs has voice typing too. CMD + SHIFT + S. You are welcome.