I am a final year PhD student of Electrical Engineering at Harvard SEAS and MIT LIDS. I am currently working on Neuro-Symbolic Foundation Models for Planning under the guidance of Prof. Chuchu Fan and Prof. Nicholas Roy at MIT and co-advised by Prof. Na Li at Harvard. I am also doing the research in AI for Physics, Mechanics, and Materials, particularly interested in applying Robotics/Foundation Models into AI4Science.
I received my bachelor's degree at University of Science and Technology of China (USTC) with the major in Theoretical and Applied Mechanics and minor in Applied Mathematics in 2021. Before coming to the Robotics domain, I did researches on Applied Physics, Solid Mechanics, and AI for Science under the guidance of Prof. Ju Li at MIT, Prof. Joost Vlassak at Harvard, Prof. Ting Zhu at Georgia Tech, and Prof. Hailong Wang at USTC.
I received the 40th Guo Moruo Award (highest undergraduate honor in USTC) and Harvard SEAS PhD Fellowship.
We propose Tool-Use-Mixture (TUMIX), leveraging diverse tool-use strategies to improve reasoning. This work shows how to get better reasoning from LLMs by running a bunch of diverse agents (text-only, code, search, etc.) in parallel and letting them share notes across a few rounds. Instead of brute-forcing more samples, it mixes strategies, stops when confident, and ends up both more accurate and cheaper.
We introduce R1-Code-Interpreter, a framework that enables LLMs to autonomously use a Code Interpreter for diverse reasoning and planning tasks through multi-turn supervised and reinforcement learning. Using a curriculum that prioritizes high-potential samples, the model achieves significantly improved training efficiency and performance, raising accuracy from 44.1% to 72.4% on benchmark tasks. The resulting model, R1-CI-14B, surpasses GPT-4o without a Code Interpreter and slightly exceeds GPT-4o with one, demonstrating emergent self-verification through code generation.
While LLMs have shown promise in robot control tasks, they often produce physically invalid actions due to a lack of constraint awareness. To address this, we propose a reinforcement learning with verifiable rewards (RLVR) framework that trains LLMs to generate only constraint-compliant action plans, rewarding successful task completion.
We construct SymBench with 37 tasks related to symbolic computing and train an 8B CodeSteer model with multi-round SFT and DPO to guide GPT-4o in generating more appropriate code logic, thereby better integrating symbolic computation with textual reasoning. Experiments show that GPT-4o + CodeSteer outperforms o1 and R1 across these 37 tasks overall while also reducing costs of tokens and runtime.
Code-as-Symbolic-Planner: We steer LLMs to generate code as solvers, planners, and checkers for TAMP tasks requiring symbolic computing, across discrete and continuous environments, 2D/3D simulations and real-world settings, single- and multi-robot tasks with diverse requirements.
Our research highlights the limitations of textual reasoning in LLMs for tasks involving math, logic, and optimization, where code generation offers a more scalable solution. Despite advances like OpenAI's GPT Code Interpreter and AutoGen, no optimal method exists to reliably steer LLMs between code and text generation. This study identifies key patterns in how LLMs choose between code and text with various factors and proposes three methods to improve steering.
We introduce an automatic prompt optimization framework for complex, multi-step agent tasks: PROMST. To handle the issues of task complexity, judging long-horizon correctness of individual actions, high prompt exploration cost, and human preference alignment, we propose the integration of human feedback, a learned score prediction model, and the modification of task score functions.
Xie et al. (2024) introduced TravelPlanner, revealing that LLMs alone had a low success rate of 0.6%. In response, this work proposes a framework that uses LLMs with satisfiability modulo theory (SMT) solvers to interactively and automatically generate valid travel plans, achieving a 97% success rate on TravelPlanner and over 78% on a newly created international travel dataset.
We compare the task success rate and token efficiency of four multi-agent communication frameworks (centralized, decentralized, and two hybrid) and three step history methods (with all history, without history, and with state-action pairs) as applied to four coordination-dependent multi-agent 2D task scenarios for increasing numbers of agents.
We propose a framework to achieve accurate and generalizable NL-to-TL transformation with the assistance of LLM, from aspects of both data generation and model training.
Fundamental Science and AI for Science
I also did much work on fundamental physical sciences and AI for science. Integrating robotics and Foundation Models to help explore new science should be the general trend.
We apply active learning and multi-fidelity neural networks to explore the inverse problems, mitigate the sim-to-real gap, and automate the material discovery process.
We revealed the anomalous layer-dependent frictional behavior, which originates from the interplay among interfacial adhesion, wrinkle of topmost graphene, contact roughness, and plastic deformation of substrates.
The influence of the adhesion between the bare substrate and indenter tip can be significantly reduced by decreasing the adhesion strength and adhesion range between the atoms on the substrate and indenter, or by enhancing the substrate stiffness.
Hobbies
Sports: Soccer (I attended Ivy Cup with Harvard twice, though both failed in the group stage…Sad), Basketball, Swimming, Table Tennis, Badminton, Snooker, 5K Marathon.
Singing: I cannot sing professionally but with much interest to country music, such as ‘Take Me Home, Country Road’ by John Denver and ‘The Girl from The South’ by Lei Zhao.