Topic

Voice & Video AI

Text-to-speech, voice cloning, video generation, and audio AI

Featured

Google DeepMind

Google DeepMind Releases Gemini 3.1 Flash TTS with Granular Audio Control

about 1 month ago· Google Deepmind

All Stories

AI Agents

Amazon Consolidates Rufus Into Alexa for Shopping

Amazon is rebranding its Rufus shopping chatbot to Alexa for Shopping, consolidating its AI assistant strategy around…

4 days ago· The Information

AI for Business

Perceptron Mk1 undercuts rivals 80-90% on video AI pricing

Perceptron Inc., a two-year-old startup led by former Meta and Microsoft researchers, released Mk1, a video analysis AI…

4 days ago· VentureBeat AI

AI for Business

Google Embeds Gemini Dictation in Gboard, Pressuring Startups

Google is integrating Gemini-powered dictation directly into Gboard, its keyboard app, with an initial rollout to…

5 days ago· TechCrunch AI

Voice & Video AI

Kuaishou Pursues External Funding for Kling AI Video Unit

Kuaishou Technology confirmed via regulatory filing that its board is evaluating a restructuring of Kling, its AI video…

5 days ago· The Information

AI Agents

Vapi hits $500M valuation with Amazon Ring win

Vapi, an AI voice platform startup, has reached a $500M valuation after winning Amazon Ring as a customer, beating out…

5 days ago· TechCrunch AI

Voice & Video AI

Kuaishou to Spin Off Kling AI Video Unit at $20B Valuation

Kuaishou Technology, a Chinese social media platform, is planning to spin off its Kling AI video generation unit ahead…

6 days ago· The Information

AI for Business

Wispr Flow's Hinglish bet pays off in India voice AI market

Wispr Flow reports accelerated growth in India following its Hinglish language rollout, demonstrating early traction in…

6 days ago· TechCrunch AI

AI for Business

The Voice-First Office: How AI Will Reshape Workplace Design

As voice interaction with AI systems becomes more prevalent in workplace settings, the nature of office environments…

6 days ago· TechCrunch AI

Voice & Video AITrending

OpenAI Adds Reasoning to Realtime Voice Models

OpenAI has released new realtime voice models available through its API that can reason, translate, and transcribe…

9 days ago· OpenAI

AI Agents

AI Agents Can Now Publish Podcasts Directly to Spotify

A new command-line tool called Save to Spotify enables AI agents like Claude and OpenAI Codex to directly publish…

10 days ago· The Verge AI

AI Agents

Uber Deploys OpenAI to Optimize Driver Earnings and Rider Booking

Uber has integrated OpenAI technology to power AI assistants and voice features for both drivers and riders on its…

10 days ago· OpenAI

AI Agents

Parloa brings voice AI agents to enterprise customer service

Parloa has built a platform that uses OpenAI models to power voice-driven AI customer service agents for enterprises.…

10 days ago· OpenAI

AI AgentsTrending

Google Upgrades Gemini for Home to Handle Complex Multi-Step Tasks

Google has upgraded Gemini for Home to version 3.1, enabling the smart home assistant to handle more complex,…

11 days ago· The Verge AI

AI for Business

Visual AI Now Drives App Growth, But Revenue Lags Downloads

According to Appfigures data, app launches featuring visual AI models are generating 6.5 times more downloads than…

12 days ago· TechCrunch AI

AI for Business

ElevenLabs Hits $500M ARR With BlackRock, Foxx Backing

ElevenLabs announced a new funding round that includes BlackRock, actor Jamie Foxx, and actress Eva Longoria among its…

12 days ago· TechCrunch AI

InfrastructureTrending

OpenAI Rebuilds WebRTC for Low-Latency Voice AI

OpenAI has rebuilt its WebRTC infrastructure to deliver real-time voice AI with low latency and global scale, enabling…

12 days ago· OpenAI

Generative AITrending

Google TV Adds Gemini Photo and Video Tools

Google TV is integrating additional Gemini AI features, including photo and video transformation capabilities powered…

17 days ago· TechCrunch AI

AI Agents

Text Agents and Voice Agents Are Different Problems

AWS published guidance on migrating text-based AI agents to voice assistants using Amazon Nova 2 Sonic, emphasizing…

18 days ago· AWS Machine Learning Blog

Generative AI

Audio-Omni Unifies Generation and Editing Across Sound, Music, Speech

Researchers have introduced Audio-Omni, a unified framework that combines audio understanding, generation, and editing…

19 days ago· ArXiv (cs.AI)

AI Agents

ActorMind Brings Emotional Speech Role-Playing to AI

Researchers have introduced ActorMind, a reasoning framework that enables AI models to perform speech role-playing by…

20 days ago· ArXiv (cs.AI)

AI Agents

Lightweight Model Beats GPT-4o at Robot Gesture Prediction

Researchers have developed a lightweight transformer model that generates co-speech gestures for robots by predicting…

23 days ago· ArXiv (cs.AI)

$Open-source speech recognition cuts transcription costs to fractions of a cent$

AI for Business

Open-source speech recognition cuts transcription costs to fractions of a cent

AWS and NVIDIA have published a guide for cost-effective multilingual audio transcription using the open-source…

24 days ago· AWS Machine Learning Blog

Voice & Video AI

NVIDIA Cosmos Reason 2 Tops Physical AI Leaderboards

NVIDIA released Cosmos Reason 2, an open-source reasoning vision-language model designed to improve how robots and AI…

29 days ago· Hugging Face Blog

Voice & Video AI

ByteDance Launches Seedance 2.0 Globally, Excluding U.S.

ByteDance's cloud unit BytePlus has launched Seedance 2.0, its AI video generation model, to enterprise customers…

29 days ago· The Information

AI for Business

Runway CEO: AI Could Fund 50 Films for Cost of One Blockbuster

Runway's CEO argues that AI video generation could fundamentally reshape Hollywood's economics by enabling studios to…

about 1 month ago· TechCrunch AI

AI Hardware

Adobe Embeds GPU-Accelerated Color Grading Into Premiere Pro

Adobe is launching a new Color Mode for Premiere Pro in beta, a dedicated color grading environment that runs natively…

about 1 month ago· NVIDIA Blog (AI)

Google DeepMind

Google Releases Gemini 3.1 Flash Live with Lower Latency Voice AI

Google DeepMind has released Gemini 3.1 Flash Live, a voice model that improves upon previous iterations with better…

about 1 month ago· Google Deepmind