← Back to all projects

DUM-E: Vision-Guided Robotic Fencing System for Human-Robot Interaction

Timeline: 2024 – 2025 Role: Perception Lead Team: 4 Members Focus: Vision-based pose estimation and reactive motion planning

Franka Emika Panda RGB-D Perception HSV Segmentation ROS Finite State Machine Human-Robot Interaction

Project report → YouTube → GitHub →

Research Objective

The DUM-E system (Dynamic Utility Manipulator – Experimental) aims to enable interactive fencing between a human and a robot by combining real-time vision tracking with reactive motion generation. Using an external Intel RealSense RGB-D camera and a Franka Emika Panda arm, the robot detects a human opponent’s baton through color-based segmentation and generates counter-trajectories using a finite state machine and mirroring strategy. The objective is to demonstrate fast, safe, and intelligent human-robot interaction under dynamic motion, bridging perception and control in adversarial tasks.

Key Features Achieved

Implemented real-time 3D pose estimation of a human-held baton using HSV color segmentation and depth fusion from an Intel RealSense D435i camera.
Developed an extrinsic calibration pipeline to transform detected poses from camera frame to robot base frame with sub-centimeter accuracy.
Designed a finite state machine (FSM) for passive–active state transitions, triggering defense actions when the baton breaches a virtual defense plane.
Built a reactive trajectory planner that mirrors the opponent’s baton motion across the camera plane, achieving smooth and natural defensive movements.
Integrated quaternion-based orientation control via FrankaPy for end-effector stability and consistent blocking posture.
Achieved a minimum viable product capable of low-latency visual tracking, smooth imitation, and responsive defense in human–robot fencing scenarios.

Challenges & Learnings

Perception noise: HSV segmentation exhibited sensitivity to lighting and occlusion, causing centroid jitter of ±1.1 cm (x–y) and ±1.7 cm (z).
Latency optimization: The 165 ms response latency highlighted trade-offs between perception refresh rate and motion planning overhead.
Safety vs. workspace limits: Virtual wall constraints ensured safe interaction but restricted arm reach during aggressive attacks.
Hardware adaptation: Switched from Kinect to RealSense due to compatibility issues, requiring rapid recalibration of the visual pipeline.
Key learning: Maintaining real-time responsiveness in perception-driven systems requires balancing sensor stability, control frequency, and motion smoothness.

Quantitative Results

Pose estimation stability: ±1.1 cm (x–y), ±1.7 cm (z) centroid variance across 15 static trials.
System response latency: 165 ms end-to-end (camera to trajectory trigger).
Defensive success rate: ~80% interception success across 10 engagement trials.
Motion smoothness: Minimal overshoot with < 2 cm steady-state positional error during mirrored trajectories.
User perception: Qualitative feedback indicated fluid and intuitive interaction, especially during continuous mirroring phases.