Ward: Newsss

321

MACL: Multi-Label Adaptive Contrastive Learning Loss for Remote Sensing Image Retrieval

arXiv:2512.16294v1 Announce Type: new
Abstract: Semantic overlap among land-cover categories, highly imbalanced label distributions, and complex inter-class co-occurrence patterns constitute significant challenges for multi-label remote-sensing image retrieval. In this article, Multi-Label Adaptive…

330

Finding Flawed Fictions: Evaluating Complex Reasoning in Language Models via Plot Hole Detection

arXiv:2504.11900v3 Announce Type: replace
Abstract: Stories are a fundamental aspect of human experience. Engaging deeply with stories and spotting plot holes -- inconsistencies in a storyline that break the internal logic or rules of a story's world -- requires nuanced reasoning skills, including …

319

VLG-Loc: Vision-Language Global Localization from Labeled Footprint Maps

arXiv:2512.12793v2 Announce Type: replace
Abstract: This paper presents Vision-Language Global Localization (VLG-Loc), a novel global localization method that uses human-readable labeled footprint maps containing only names and areas of distinctive visual landmarks in an environment. While humans n…

321

CitySeeker: How Do VLMS Explore Embodied Urban Navigation With Implicit Human Needs?

arXiv:2512.16755v1 Announce Type: new
Abstract: Vision-Language Models (VLMs) have made significant progress in explicit instruction-based navigation; however, their ability to interpret implicit human needs (e.g., "I am thirsty") in dynamic urban environments remains underexplored. This paper intr…

322

OpenAI adds new teen safety rules to ChatGPT as lawmakers weigh AI standards for minors

OpenAI updated its guidelines for how its AI models should behave with users under 18, and published new AI literacy resources for teens and parents. Still, questions remain about how well policies translate into practice.

321

Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation

arXiv:2512.16913v1 Announce Type: new
Abstract: In this work, we present a panoramic metric depth foundation model that generalizes across diverse scene distances. We explore a data-in-the-loop paradigm from the view of both data construction and framework design. We collect a large-scale dataset b…

242

Inside enterprise AI’s turning point: 31 interviews on theCUBE that defined 2025

TheCUBE spent 2025 talking directly with top tech executives as they worked through how computing is being rebuilt inside their organizations, driven by real-world constraints rather than abstract roadmaps. Across those conversations, a consistent shift emerged away from theory and toward execution.…

222

Detailed balance in large language model-driven agents

Comments

230

MoHoBench: Assessing Honesty of Multimodal Large Language Models via Unanswerable Visual Questions

arXiv:2507.21503v3 Announce Type: replace
Abstract: Recently Multimodal Large Language Models (MLLMs) have achieved considerable advancements in vision-language tasks, yet produce potentially harmful or untrustworthy content. Despite substantial work investigating the trustworthiness of language mo…

218

Roomba rival bets on AI to clean up market for robot vacuum cleaners

Roborock chief says innovations such as robotic arms and dog excrement recognition are key to survival

211

Provable optimal transport with transformers: The essence of depth and prompt engineering

arXiv:2410.19931v3 Announce Type: replace
Abstract: Despite their empirical success, the internal mechanism by which transformer models align tokens during language processing remains poorly understood. This paper provides a mechanistic and theoretical explanation of token alignment in LLMs. We fir…

219

Generation is Required for Data-Efficient Perception

arXiv:2512.08854v2 Announce Type: replace
Abstract: It has been hypothesized that human-level visual perception requires a generative approach in which internal representations result from inverting a decoder. Yet today's most successful vision models are non-generative, relying on an encoder that …

210

Preparing Future-Ready Learners: K12 Skills Shift and GenAI EdTech Innovation Direction

arXiv:2512.16428v1 Announce Type: new
Abstract: Since Generative AI came out it has quickly embedded itself in our social fabric, triggering lots of discussions, predictions, and efforts from research, industry, government and capital market to experiment and embrace the technology. The question fo…

243

DAG Learning from Zero-Inflated Count Data Using Continuous Optimization

arXiv:2512.16233v1 Announce Type: cross
Abstract: We address network structure learning from zero-inflated count data by casting each node as a zero-inflated generalized linear model and optimizing a smooth, score-based objective under a directed acyclic graph constraint. Our Zero-Inflated Continuo…

211

A Network Arena for Benchmarking AI Agents on Network Troubleshooting

arXiv:2512.16381v1 Announce Type: new
Abstract: Agentic systems, powered by Large Language Models (LLMs), assist network engineers with network configuration synthesis and network troubleshooting tasks. For network troubleshooting, progress is hindered by the absence of standardized and accessible …

210

Radiology Report Generation with Layer-Wise Anatomical Attention

arXiv:2512.16841v1 Announce Type: new
Abstract: Automatic radiology report generation is a promising application of multimodal deep learning, aiming to reduce reporting workload and improve consistency. However, current state-of-the-art (SOTA) systems - such as Multimodal AI for Radiology Applicati…

233

Learning to Wait: Synchronizing Agents with the Physical World

arXiv:2512.16262v1 Announce Type: new
Abstract: Real-world agentic tasks, unlike synchronous Markov Decision Processes (MDPs), often involve non-blocking actions with variable latencies, creating a fundamental \textit{Temporal Gap} between action initiation and completion. Existing environment-side…

222

Hybrid Quantum-Classical Ensemble Learning for S\&P 500 Directional Prediction

arXiv:2512.15738v1 Announce Type: new
Abstract: Financial market prediction is a challenging application of machine learning, where even small improvements in directional accuracy can yield substantial value. Most models struggle to exceed 55--57\% accuracy due to high noise, non-stationarity, and …

120

Evaluation of AI Ethics Tools in Language Models: A Developers' Perspective Case Stud

arXiv:2512.15791v1 Announce Type: new
Abstract: In Artificial Intelligence (AI), language models have gained significant importance due to the widespread adoption of systems capable of simulating realistic conversations with humans through text generation. Because of their impact on society, develo…

111

Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

arXiv:2512.04677v3 Announce Type: replace
Abstract: Existing diffusion-based video generation methods are fundamentally constrained by sequential computation and long-horizon inconsistency, limiting their practical adoption in real-time, streaming audio-driven avatar synthesis. We present Live Avat…