Research Projects

Interested local/international students can apply through ASTAR scholarships, such as AGS (PhD), SINGA (PhD), SIPGA (Master and Undergraduate), or explore other attachment and internship opportunities by directly contacting me at zhu_haiyue@simtech.a-star.edu.sg with your CV.

Data-Efficient Robotics Foundation Models for Intelligent Perception and Reasoning

  • This research aims to develop data-efficient intelligent vision perception and reasoning for smart robotics using foundation models and advanced learning techniques. Autonomous robots require robust perception and reasoning capabilities to understand complex environments and make real-time decisions for autonomous and semi-autonomous operations. However, real-world robotics applications face data scarcity, domain shifts, and the need for flexible deployment, making traditional big-data-driven deep learning approaches impractical. This research will focus on enhancing sample efficiency in robotics AI using techniques such as:

      • Foundation Models for Robotics – leveraging pretrained vision-language models to improve generalization across diverse tasks.

      • Few-Shot and Self-Supervised Learning – reducing labeled data dependency while enabling robots to adapt to new environments.

      • Neuro-Symbolic AI for Perception and Reasoning – integrating deep learning with symbolic reasoning for better interpretability and decision-making.

      • Physics-Informed AI and Sim-to-Real Learning – improving model robustness and adaptability in real-world scenarios.

    The applications include robotic perception and reasoning tasks for complex scene understanding to afford the autonomous robot manipulation. The research outcomes will significantly benefit areas like autonomous manufacturing and human-robot collaboration in unstructured environments, aligning with Singapore’s vision of a Smart Nation.

Spatial-Aware Vision-Language-Action Models for Embodied AI in Robotics

  • Spatial awareness is crucial for Vision-Language-Action (VLA) models in robotics, enabling intelligent agents to understand, navigate, and interact with their environment effectively. Traditional VLA models primarily focus on multimodal understanding but often lack precise spatial reasoning, limiting their ability to ground language in 3D space and execute physical actions. This research focuses on developing spatial-aware vision-language-action (VLA) models to enhance embodied AI for intelligent robotics. Future autonomous systems require multimodal reasoning to understand their environment, process natural language instructions, and execute complex actions with spatial awareness.

    Key challenges include aligning spatial perception with language grounding, understanding geometric and semantic relationships, and adapting to novel environments with minimal supervision. This research will explore:

      • Vision-Language-Action Foundation Models – leveraging large-scale pretrained models for cross-modal understanding.

      • Spatial-Temporal Scene Understanding – integrating 3D spatial reasoning and video-language models for dynamic environments.

      • 3D Spatial Grounding – Linking natural language commands to 3D world representations for precise execution.

      • Spatial-Temporal Reasoning – Understanding motion, occlusion, and depth for dynamic interactions.

      • Sim-to-Real Transfer for Spatially-Aware Interaction – improving real-world adaptability through simulation-driven learning.

    Applications include autonomous robots, assistive robotics, and human-robot collaboration in industrial and service environments. The research outcomes will contribute to next-generation robotic agents capable of understanding spatial contexts and executing human-like actions in unstructured settings.