Today, there are dozens of transcription companies.
Just off the top of my head, in alphabetical order:
Clarify integrates with my CRM.
Dialpad does a great job with phone calls.
Fathom quickly creates video clips on the call to automatically so you can easily reflect on them later with Voice of the Customer.
Google Meet does live transcriptions and language translations.
Granola does it without recording the audio.
Krisp does audio recording and noise canceling without a meeting bot.
Otter does a great job on mobile.
Recall allows anyone to build a meeting bot, easily.
Zoom even has it built in now.
I absolutely love them because they make my workflow feel like a superhuman.
And while they all supposedly deliver “summary reports”—something is wrong—and it’s none of their faults. It’s the underlying technology. The math behind the language models.
Summarization is more than just the average weight of commonality of words.
Human verbal and and non-verbal communication implies a fuller experience we’ve yet to encode in models. To understand what this means, lets take a super specific example. Let’s assume just verbal communication. Let’s assume just English only. Let’s assume that we can capture every word that is said, perfectly, with just transcription, and use the example:
"You're going to wear that?"
The sentence"You're going to wear that?" can mean a genuine question, a surprised inquiry, a skeptical comment, a critical observation, or a playful tease—all depending on where the emphasis is placed.
Rising intonation on "that": "You're going to wear that?" - This sounds like a genuine question, asking for clarification about the clothing choice.
Falling intonation on "that": "You're going to wear that?" - This could sound critical or skeptical, implying the speaker disapproves of the outfit.
Emphasis on "you": "You're going to wear that?" - This could suggest the speaker is surprised or questioning the person's decision-making.
Emphasis on "wear": "You're going to wear that?" - This might imply the speaker is questioning the practicality or suitability of the clothing item.
Playful intonation: "You're going to wear that?" - With a lighter, teasing tone, this could be a playful comment about someone's outfit choice.
This is also easily seen when asking an LLM to summarize a negative proof research paper or satirical news article—where LLMs have limits similar to the human limitation of Poe’s law. The paper may spend dozens of pages on proposing a counter-proof dripping in satire and a single paragraph on the actual result, and easily miss the entire meaning of the work.
This is one of the foundational reasons why we’re way off from a fully-agentic world.
Further Reading:
The Future Will Be Brief, John Herrman
The LLMentalist Effect, Baldur Bjarnason
When ChatGPT summarises, it actually does nothing of the kind, R&A IT Strategy & Architecture