Role-Aware Virtual Agents for Navigational Interaction guided by a Multimodal Large Language Model

1George Mason University, 2Goertek Alpha Labs, 3Adobe Research
ACM Transactions on Graphics (Proceeding of SIGGRAPH 2026)


Our approach enables adaptive human-agent interaction in the form of role-aware navigation in augmented reality. This example shows a hide-and-seek game with the agent acting as the hider.

Abstract

MY ALT TEXT

We present a role-aware virtual agent navigational interaction that generates consistent, role-aligned movement behaviors.

Our approach leverages Multimodal Large Language Models (MLLMs) to interpret multimodal inputs including scene information, user state, and high-level language role instruction, producing discrete navigation decisions and stylized planning path. Our approach enables virtual agents to behave consistently with narrative roles and respond to dynamic actions, such as playing a hide-and-seek taking into account the agent's role and the user's possible intention.

Our approach demonstrates how MLLMs can go beyond language-based interaction to support embodied, spatial, and role-aware agent behaviors in immersive environments such as augmented reality.

Contributions

  • We introduce a holistic multimodal approach that integrates learning-based temporal decision filtering, MLLM-driven scene and role-aware goal reasoning, and path planning guided by an optimization for intent-driven path generation
  • We enable virtual agents to navigate with role-conditioned goals and paths taking into account the scene semantics and user interaction dynamics via an MLLM..
  • We demonstrate consistent role-conditioned behaviors across hide-and-seek, mixed-role shepherding, and ablation scenarios.

Role-Conditioned Navigation Results

Playful

The agent keeps the interaction fun and discoverable, encouraging the user to find it without frustration.

Hostile

The agent prioritizes stealth, distance, and cover to avoid being detected by the user.

Supportive

The agent guides the user toward the goal using safe and understandable navigation behavior.

Our virtual agent generates different navigation behaviors depending on the assigned role, even within the same environment.

Shepherding Examples with Mixed Roles

Active hazard

The dog tracks the wolf, blocks likely flanking routes, and pushes the threat away from the herd while keeping the sheep grouped.

Passive hazard

The dog guides the herd around environmental hazards, rescues isolated sheep, and recenters the flock to restore cohesion.

Mixed hazard

The dog handles both environmental hazards and predator pressure at once, balancing defense, rerouting, and herd cohesion in a single behavior.

The MLLM-controlled dog dynamically balanced sheep support, hazard mitigation, and predator blocking in response to changing environmental conditions

Ablation Example

Visual prompt / Role-Aware Navigation

  • Without visual inputs, the agent shows limited strategic navigation with the hostile role.
  • With visual inputs, it makes more strategic, role-consistent decisions, fleeing to farther and more appropriate destinations.
  • Under the neutral condition (no contextual instruction), the agent shows repetitive behaviors, repeatedly selecting similar destinations and lingering near the start.

BibTeX


        @article{kim2026roleagent,
        title={Role-Aware Virtual Agents for Navigational Interaction guided by a Multimodal Large Language Model},
        author={Kim, Minyoung, Li, Changyang, Nguyen, Cuong and Yu, Lap-Fai},
        journal = {ACM Transactions on Graphics (TOG)},
        volume = {45},
        number = {4},
        articleno = {126},
        year = {2026}
      }