Research

My research sits at the intersection of control, optimization, and machine learning. I develop principled decision-making frameworks for Physical AI systems, aiming to bridge the gap between rigorous theoretical guarantees and high-performance adaptive intelligence. By addressing the fundamental trade-offs between efficiency, safety, and generalizability, my work seeks to empower autonomous systems—including robotics, autonomous vehicles, and large-scale mobility networks—to operate reliably in complex, uncertain, and dynamic environments.

Research Directions

Reinforcement Learning

RL 

This line of research explores the theoretical and algorithmic foundations of offline and safe multi-agent reinforcement learning (MARL).

  • Objective: Enabling reliable policy learning from static datasets with formal guarantees on safety and robustness.

  • Approach: Integrating Bayesian inference and uncertainty quantification to bridge the gap between reinforcement learning, optimization, and statistical decision theory.

World Foundation Models

WM 

I investigate the synergy between world models and RL for scalable decision-making under uncertainty.

  • Focus: Developing foundation-level models trained on multimodal data to simulate latent dynamics of agents and environments.

  • Goal: Combining generative modeling with model-based planning to create systems that generalize across diverse domains and complex tasks.

Dynamic Programming & Optimal Control

DP 

I advance the analytical foundations of dynamic programming and network optimization for multi-agent systems.

  • Focus: Studying structural properties, stability, and coordination across temporal and spatial scales.

  • Impact: Applications include intelligent transportation networks, distributed control, and the optimization of autonomous fleets.

Bandit Algorithms & Online Learning

Bandit 

This area focuses on adaptive decision-making within contextual bandits and streaming data environments.

  • Focus: Designing algorithms that efficiently balance exploration and exploitation.

  • Rigour: Supported by non-asymptotic and regret analyses to provide theoretical guarantees for high-dimensional sequential decision problems.