
Topic
Research
Academic papers, technical breakthroughs, and lab research findings
Featured

All Stories

AI IQ Launches Model Scorecard, Sparks Precision vs. Simplicity Debate
A new site called AI IQ has launched a framework for scoring frontier language models on a single intelligence…

Anthropic's Mythos AI Shows Sharper Hacking Skills, U.K. Researchers Find
Researchers at the U.K.'s AI Security Institute reported Wednesday that Anthropic's latest version of Mythos AI…

Frontier LLMs Silently Corrupt 25% of Documents in Iterative Workflows
Microsoft researchers developed a benchmark showing that frontier LLMs silently corrupt an average of 25% of document…

Sakana trains 7B model to orchestrate GPT, Claude, Gemini
Sakana AI has developed RL Conductor, a 7-billion-parameter language model trained via reinforcement learning to…

Mamba Proves Viable for Time Series Classification
Researchers propose MambaSL, a minimally modified single-layer Mamba architecture designed specifically for time series…

Magnitude Beats Phase in Hybrid Quantum ML for SAR
Researchers tested five different encoding strategies for using Synthetic Aperture Radar data in quantum machine…

Faithful Reasoning Emerges from Multi-Move Training, Not Direct Prediction
Researchers studied how reasoning develops in language models across supervised fine-tuning and reinforcement learning…

SUDP: A Protocol to Keep Agent Secrets Secret
Researchers propose SUDP, a three-role protocol that lets AI agents perform secret-backed operations (API calls, cloud…

Safety Routing Circuits Found Across Models, Vulnerable to Encoding Attacks
Researchers have localized the policy routing mechanism in alignment-trained language models, identifying specific…

GAN Synthesizes Missing Brain MRI Scans While Preserving Tumors
Researchers propose 3D-MC-SAGAN, a generative model that synthesizes missing MRI brain scan modalities from a single…

Harvard Study: AI Outperforms Doctors on ER Diagnoses
A Harvard study evaluating large language models across medical contexts found that at least one AI model delivered…
GPT-5.5 matches Mythos Preview on cybersecurity tests
OpenAI's newly released GPT-5.5 performs at parity with Anthropic's restricted Mythos Preview model on cybersecurity…
Warmer AI Models Trade Accuracy for Empathy
Researchers at Oxford University's Internet Institute found that large language models fine-tuned to appear warmer and…
MIT's Physics-Based Violin Simulator Offers Luthiers a New Design Tool
MIT engineers have developed a physics-based virtual violin simulation tool that models the fundamental acoustics of…
Google Brings Gemini to Connected Vehicles via Software Update
Google is rolling out its Gemini AI assistant to vehicles equipped with Google built-in, replacing the current Google…

Data Sovereignty Becomes AI Strategy for Enterprises and Governments
A panel discussion from MIT Technology Review's EmTech AI conference explored how enterprises and governments are…

Alibaba cuts AI agent tool calls 49x with decoupled optimization
Alibaba researchers introduced Hierarchical Decoupled Policy Optimization (HDPO), a reinforcement learning framework…

Goodfire's Silico Brings Mechanistic Interpretability to Model Development
Goodfire, a San Francisco startup, released Silico, a tool that lets developers inspect and adjust AI model parameters…

Aggregating Zero-Shot LLMs Beats Single Models for Financial Disclosure Analysis
A new paper demonstrates that a lightweight supervised aggregator can effectively combine outputs from multiple…

NanoKnow: Mapping How LLMs Encode Knowledge
Researchers have released NanoKnow, a benchmark dataset that maps questions from Natural Questions and SQuAD to whether…

Personalized Calibration Makes Conformal Prediction Work in Clinical Settings
Researchers at the University of Illinois and collaborators demonstrate that personalized calibration strategies can…

DeepMind Pursues AI Co-Clinician Model for Healthcare
Google DeepMind is researching an AI co-clinician model designed to augment healthcare delivery by working alongside…
GPU Rental Performance Varies Wildly Within Same Model
Research from the College of William & Mary, Jefferson Lab, and Silicon Data reveals significant performance…

Evaluation costs now rival training costs for AI models
AI evaluation costs have become a major bottleneck as benchmarking has shifted from static LLM tests to agent-based…

Multi-Task EEG Model Cuts Costs with Low-Rank Adaptation
Researchers propose MTEEG, a multi-task learning framework that adapts pre-trained EEG models to multiple downstream…

Frontier Agents Now Autonomously Implement ML Pipelines, With Claude Outpacing Rivals
Researchers benchmarked frontier coding agents on their ability to autonomously implement an AlphaZero-style machine…

Poly-DPO and ViPO: Scaling Visual Preference Optimization
Researchers introduced Poly-DPO, an algorithmic extension to preference optimization that adds a polynomial term to…

Scaling Multi-Anchor Embeddings to LLMs with 40x Compression
Researchers introduce Adaptive Dictionary Embeddings (ADE), a framework that scales multi-anchor word representations…

New Training Method Cuts Reasoning Model Costs for Enterprises
Researchers at JD.com and academic partners introduced RLSD (Reinforcement Learning with Verifiable Rewards with…