Research Projects
Interested local/international students can apply through ASTAR scholarships, such as
AGS (PhD),
SINGA (PhD),
SIPGA (Master and Undergraduate), or explore other attachment and internship opportunities by
directly contacting me at zhu_haiyue@simtech.a-star.edu.sg
with your CV.
Data-Efficient Robotics Foundation Models for Intelligent Perception and Reasoning
-
This research aims to develop data-efficient intelligent vision perception and reasoning for
smart robotics using foundation models and advanced learning techniques. Autonomous robots
require
robust perception and reasoning capabilities to understand complex environments and make
real-time
decisions for autonomous and semi-autonomous operations. However, real-world robotics
applications
face data scarcity, domain shifts, and the need for flexible deployment, making traditional
big-data-driven deep learning approaches impractical. This research will focus on enhancing
sample
efficiency in robotics AI using techniques such as:
• Foundation Models for Robotics – leveraging pretrained vision-language models to improve
generalization across diverse tasks.
• Few-Shot and Self-Supervised Learning – reducing labeled data dependency while enabling
robots
to
adapt to new environments.
• Neuro-Symbolic AI for Perception and Reasoning – integrating deep learning with symbolic
reasoning
for better interpretability and decision-making.
• Physics-Informed AI and Sim-to-Real Learning – improving model robustness and adaptability
in
real-world scenarios.
The applications include robotic perception and reasoning tasks for complex scene understanding
to
afford the autonomous robot manipulation. The research outcomes will significantly benefit areas
like autonomous manufacturing and human-robot collaboration in unstructured environments,
aligning
with Singapore’s vision of a Smart Nation.
Spatial-Aware Vision-Language-Action Models for Embodied AI in Robotics
Spatial awareness is crucial for Vision-Language-Action (VLA) models in robotics, enabling
intelligent agents to understand, navigate, and interact with their environment effectively.
Traditional VLA models primarily focus on multimodal understanding but often lack precise spatial
reasoning, limiting their ability to ground language in 3D space and execute physical actions. This
research focuses on developing spatial-aware vision-language-action (VLA) models to enhance embodied
AI for intelligent robotics. Future autonomous systems require multimodal reasoning to understand
their environment, process natural language instructions, and execute complex actions with spatial
awareness.
Key challenges include aligning spatial perception with language grounding, understanding
geometric
and semantic relationships, and adapting to novel environments with minimal supervision. This
research will explore:
• Vision-Language-Action Foundation Models – leveraging large-scale pretrained models for
cross-modal understanding.
• Spatial-Temporal Scene Understanding – integrating 3D spatial reasoning and video-language
models
for dynamic environments.
• 3D Spatial Grounding – Linking natural language commands to 3D world representations for
precise
execution.
• Spatial-Temporal Reasoning – Understanding motion, occlusion, and depth for dynamic
interactions.
• Sim-to-Real Transfer for Spatially-Aware Interaction – improving real-world adaptability
through
simulation-driven learning.
Applications include autonomous robots, assistive robotics, and human-robot collaboration
in industrial and service environments. The research outcomes will contribute to next-generation
robotic agents capable of understanding spatial contexts and executing human-like actions in
unstructured settings.
|