Topic
Voice & Video AI
Text-to-speech, voice cloning, video generation, and audio AI
Featured
All Stories

Amazon Consolidates Rufus Into Alexa for Shopping
Amazon is rebranding its Rufus shopping chatbot to Alexa for Shopping, consolidating its AI assistant strategy around…

Perceptron Mk1 undercuts rivals 80-90% on video AI pricing
Perceptron Inc., a two-year-old startup led by former Meta and Microsoft researchers, released Mk1, a video analysis AI…

Google Embeds Gemini Dictation in Gboard, Pressuring Startups
Google is integrating Gemini-powered dictation directly into Gboard, its keyboard app, with an initial rollout to…

Kuaishou Pursues External Funding for Kling AI Video Unit
Kuaishou Technology confirmed via regulatory filing that its board is evaluating a restructuring of Kling, its AI video…

Vapi hits $500M valuation with Amazon Ring win
Vapi, an AI voice platform startup, has reached a $500M valuation after winning Amazon Ring as a customer, beating out…

Kuaishou to Spin Off Kling AI Video Unit at $20B Valuation
Kuaishou Technology, a Chinese social media platform, is planning to spin off its Kling AI video generation unit ahead…

Wispr Flow's Hinglish bet pays off in India voice AI market
Wispr Flow reports accelerated growth in India following its Hinglish language rollout, demonstrating early traction in…

The Voice-First Office: How AI Will Reshape Workplace Design
As voice interaction with AI systems becomes more prevalent in workplace settings, the nature of office environments…

OpenAI Adds Reasoning to Realtime Voice Models
OpenAI has released new realtime voice models available through its API that can reason, translate, and transcribe…
AI Agents Can Now Publish Podcasts Directly to Spotify
A new command-line tool called Save to Spotify enables AI agents like Claude and OpenAI Codex to directly publish…

Uber Deploys OpenAI to Optimize Driver Earnings and Rider Booking
Uber has integrated OpenAI technology to power AI assistants and voice features for both drivers and riders on its…

Parloa brings voice AI agents to enterprise customer service
Parloa has built a platform that uses OpenAI models to power voice-driven AI customer service agents for enterprises.…
Google Upgrades Gemini for Home to Handle Complex Multi-Step Tasks
Google has upgraded Gemini for Home to version 3.1, enabling the smart home assistant to handle more complex,…

Visual AI Now Drives App Growth, But Revenue Lags Downloads
According to Appfigures data, app launches featuring visual AI models are generating 6.5 times more downloads than…

ElevenLabs Hits $500M ARR With BlackRock, Foxx Backing
ElevenLabs announced a new funding round that includes BlackRock, actor Jamie Foxx, and actress Eva Longoria among its…

OpenAI Rebuilds WebRTC for Low-Latency Voice AI
OpenAI has rebuilt its WebRTC infrastructure to deliver real-time voice AI with low latency and global scale, enabling…

Google TV Adds Gemini Photo and Video Tools
Google TV is integrating additional Gemini AI features, including photo and video transformation capabilities powered…

Text Agents and Voice Agents Are Different Problems
AWS published guidance on migrating text-based AI agents to voice assistants using Amazon Nova 2 Sonic, emphasizing…
Audio-Omni Unifies Generation and Editing Across Sound, Music, Speech
Researchers have introduced Audio-Omni, a unified framework that combines audio understanding, generation, and editing…

ActorMind Brings Emotional Speech Role-Playing to AI
Researchers have introduced ActorMind, a reasoning framework that enables AI models to perform speech role-playing by…

Lightweight Model Beats GPT-4o at Robot Gesture Prediction
Researchers have developed a lightweight transformer model that generates co-speech gestures for robots by predicting…

Open-source speech recognition cuts transcription costs to fractions of a cent
AWS and NVIDIA have published a guide for cost-effective multilingual audio transcription using the open-source…

NVIDIA Cosmos Reason 2 Tops Physical AI Leaderboards
NVIDIA released Cosmos Reason 2, an open-source reasoning vision-language model designed to improve how robots and AI…

ByteDance Launches Seedance 2.0 Globally, Excluding U.S.
ByteDance's cloud unit BytePlus has launched Seedance 2.0, its AI video generation model, to enterprise customers…

Runway CEO: AI Could Fund 50 Films for Cost of One Blockbuster
Runway's CEO argues that AI video generation could fundamentally reshape Hollywood's economics by enabling studios to…
Adobe Embeds GPU-Accelerated Color Grading Into Premiere Pro
Adobe is launching a new Color Mode for Premiere Pro in beta, a dedicated color grading environment that runs natively…
Google Releases Gemini 3.1 Flash Live with Lower Latency Voice AI
Google DeepMind has released Gemini 3.1 Flash Live, a voice model that improves upon previous iterations with better…