OverviewMy research focuses on control, optimization, and machine learning for decision-making in societal systems, with applications in robotics, autonomy, transportation, logistics, and finance. This research is supported by the National Natural Science Foundation of China and the Fundamental Research Funds for the Central Universities. I gratefully acknowledge these supports and collaborations. Research DirectionsReinforcement LearningThis line of research explores the theoretical and algorithmic foundations of offline and safe multi-agent reinforcement learning (MARL). The objective is to enable reliable policy learning from static datasets and to provide formal guarantees on safety, robustness, and performance. By incorporating Bayesian inference, uncertainty quantification, and posterior information analysis, this work establishes principled connections between reinforcement learning, optimization, and statistical decision theory. World Foundation ModelsThis direction investigates the integration of world models and reinforcement learning (RL) for scalable and robust decision-making under uncertainty. World models—trained on multimodal and interactive data—simulate latent dynamics of agents, environments, and infrastructure, allowing intelligent systems to plan and adapt safely across diverse conditions. By combining generative modeling, model-based planning, and policy optimization, this research aims to develop foundation-level decision models that generalize across domains and tasks. Bandit AlgorithmsThis research area focuses on adaptive decision-making under uncertainty, particularly within the frameworks of contextual bandits, online learning, and streaming data. It seeks to design algorithms that balance exploration and exploitation efficiently in dynamic environments, supported by rigorous non-asymptotic and regret analyses. These methods provide theoretical insights and practical tools for high-dimensional reinforcement learning and sequential decision problems. Dynamic ProgrammingThis direction advances the analytical foundations of dynamic programming, optimal control, and network optimization for multi-agent and large-scale systems. It studies the structural properties, stability, and coordination of decision processes across temporal and spatial scales, with applications to intelligent transportation networks, distributed control, and autonomous system optimization. |