DroneTrap: Drone Catching in Midair by Soft Robotic Hand with Color-Based Force Detection and Hand Gesture Recognition
Aleksey Fedoseev, Valerii Serpiva, Ekaterina Karmanova, Miguel Altamirano Cabrera, Vladimir Shirokun, Iakov Vasilev, Stanislav Savushkin, Dzmitry Tsetserukou
DDroneTrap: Drone Catching in Midair by Soft Robotic Hand withColor-Based Force Detection and Hand Gesture Recognition
Aleksey Fedoseev, Valerii Serpiva, Ekaterina Karmanova, Miguel Altamirano Cabrera, Vladimir Shirokun,Iakov Vasilev, Stanislav Savushkin, Dzmitry Tsetserukou
Abstract — The paper proposes a novel concept of dockingdrones to make this process as safe and fast as possible. Theidea behind the project is that a robot with the gripper graspsthe drone in midair. The human operator navigates the roboticarm with the ML-based gesture recognition interface. The 3-finger robot hand with soft fingers and integrated touch-sensorsis pneumatically actuated. This allows achieving safety whilecatching to not destroying the drone’s mechanical structure,fragile propellers, and motors.Additionally, the soft hand has a unique technology ofproviding force information through the color of the fingersto the remote computer vision (CV) system. In this case, notonly the control system can understand the force applied butalso the human operator. The operator has full control of robotmotion and task execution without additional programming bywearing a mocap glove with gesture recognition, which wasdeveloped and applied for the high-level control of DroneTrap.The experimental results revealed that the developed color-based force estimation can be applied for rigid object capturingwith high precision (95.3%). The proposed technology canpotentially revolutionize the landing and deployment of dronesfor parcel delivery on uneven ground, structure inspections,risque operations, etc.
I. I
NTRODUCTION
Most accidents of Unmanned Aerial Vehicle (UAV) hap-pen during landing, specifically in high winds or when theterrain is uneven. The concept of landing the swarm ondrones on the arms of the human for fast and safe deploymentis proposed in [1]. However, it is expected that drones willbe able to safely land in autonomous mode. Sarkisov et. al.invented the robotic landing gear with robotic legs to allowdrones to adapt to the surface [2]. The legs are equippedwith optical torque sensors making them responsive to thecontact force with the ground.Another approach to drone landing was presented byMiron et al. [3], proposing the drone-catching gripper basedon sleeved bending actuators with a silicon membrane andan external two-material sleeve. One of the estimated ap-proaches for this technology is the drone perching with thebehavior algorithm similar to the landing on the tree branch.The drone landing system with 3 soft landing gears attachedto the drone and neural network developed to achieve thedesired landing behavior was introduced by Luo et al. [4].
The authors are with the Space Center, Skolkovo Institute ofScience and Technology (Skoltech), 121205 Bolshoy Boulevard30, bld. 1, Moscow, Russia. { aleksey.fedoseev,valerii.serpiva, ekaterina.karmanova,miguel.altamirano, vladimir.shirokun,iakov.vasilev, stanislav.savushkin,d.tsetserukou } @skoltech.ru Fig. 1. DroneTrap system performs a drone catching task by soft gripper.
Such an approach allows a high mobility for the UAV andoperations on non-flat surfaces yet has a limited field ofimplementation as it decreases the weight of the possiblepayload for the drone. Also, a manipulator can be positionedon a mobile platform for access to hard-to-reach areas withuneven surfaces. Capturing a flying drone in midair with a netcarried by cooperative UAVs and soft landing on the surfaceis proposed by Rothe et al. [5]. A high-speed UAV parcelhandover system with a high-speed vision system for thesupply station control has been introduced by Tanaka et al.[6]. This research also suggests a concept of non-stop UAVdelivery based on high-speed visual control with a six-axisrobot.The autonomous robotic systems were suggested in several a r X i v : . [ c s . R O ] F e b esearches on mobile robots [7] and robotic hands [8] forcapturing the thrown objects with precise tracking. Typicalexamples of such systems are focused on the precision ofcatching, which is demonstrated by the robots for jugglingproposed by Kober et al. [9] or the ping-pong game proposedby Rapp et al. [10]. However, such systems in general do nottake into account the fragility of the operated objects. Sev-eral researches propose either the position-based impedancecontrollers suggested for soft robotic catching of a fallingobject by Uchiyama et al. [11] or joint impedance trackingcontroller based on torque sensors, proposed by B¨auml etal. [12]. The soft robotic gripper composed of flat siliconstrips and flexible polymer nanofibers has been developedby Sinatra et al. for the manipulation of delicate structures[13].Another challenge for fragile flying vehicles is the dockingof space crafts with the International Space Station (ISS).Canada Arm with 7-DoF (Degrees of Freedom) capturesand docks the unmanned cargo ships [14]. It has a uniquegripper with a three-wire crossover design featuring gentlegrasping of such payload as satellites. The key advantage ofthe Canada Arm is that it significantly reduces docking timeand makes the process safer by avoiding crashes with ISS.To achieve the safety of manipulation with fragile objects,several approaches of embedding sensors into silicon and 3Dprinted robotic grippers were introduced [15], [16], allowingsufficient and safe robotic manipulations with fragile objects.E.g., soft somatosensitive actuators allowing haptic, proprio-ceptive, and thermoceptive sensing to the user via embedded3D printing were developed by Truby et al. [17].Apart from the fully autonomous systems, there has beenan increasing interest in applying Human-Robot Interaction(HRI) for controlling robotic arms. This approach allowsusers to have adjustable control of robot motion with min-imum delay, and to modify the task without additionalprogramming, which plays a significant role in the dynamiccatching task. One of the possible approach for such op-eration implements wearable body motion capturing (mo-cap) systems, like exoskeletons [18], inertial measurementunit-based suits [19] and camera-based systems [20], [21].Wearable devices have the merit of naturally catching humangestures. Additionally, the hand performs as the most naturalhuman interface for object manipulation. Therefore, a glove-controller could be the most successful device for transmit-ting human commands. Furthermore, a glove-controller hasbeen successfully used in human-drone interaction [22].We propose a novel technology DroneTrap to catch dronesin midair with soft robotic hand. Such technology can makethe landing and deployment process of swarm of drones safeand fast. It consists of a robotic manipulator UR3 robot with6 DoF, a soft robotic hand, a mocap glove, and a computervision system for force detection.II. S YSTEM O VERVIEW
The drone grasping experiments are conducted in the roomfacilitated with the Vicon motion capturing system with 12cameras. This system can provide position and orientation information of the marked object (Crazyflie nano-quadrotorsand reference frame of UR6 collaborative manipulator).Control PC with Unity framework receives positions fromthe Mocap PC by Robot Operating System (ROS) Kinetic.Both the manipulator and the gripper are controlled bygesture recognition through the C
Fig. 2. DroneTrap system architecture. Both the robotic arm and softgripper are controlled with gesture recognition, allowing the operator tochoose the desired sequence of swarm catching.
A. Soft gripper control
The flexible gripper consists of three SPA (Soft PneumaticActuator). On each of the actuators has a force sensor and aflexible sensor, the grip angle can be adjusted for differenttypes of objects. A separate pneumatic hose is connectedto each of the actuators. The compression and release ofthe gripper and each of the three pneumatic actuators arecontrolled by a Air supply module (ASM) with a pneumaticcylinder, pumps and pneumatic valves (Fig.3).
AC-DC converter Vacuum pump CompressorArduino Mega +Seeed Grove shieldReceiver of airPressure sensorsSolenoid valuesRelay modules
Fig. 3. Design of an air supply module for soft gripper control
ASM performs control over the gripper - provides arequired amount compressed air to drive the gripper for grabthe object and execute discharge of air and vacuumization ofhe SPAs. Components connection of the air generating anddistribution subsystem is shown in Fig. 4.
Fig. 4. Control scheme for pneumatic actuators.
Such architecture enables to inflate and deflate each SPAindependently using minimum quantity of solenoid valvesand pressure sensors. In this scheme arrows indicate whichdirections air can flow (blue – supply, red – exhaust, black– bidirectional).
B. Gripper force estimation based on CV
Color-based CV module is used for the gripper force eval-uation, allowing naturally recognize and control the applyingforce both autonomously and by supervised control withoperator. This module is responsible for real-time cameracolor recognition and data representation and consists of twomain components: a single-board computer Raspberry Pi 4Bfor data handling and RPi Camera module v2.1. SPA gripper,sensor node module, Camera, and data handling moduleare assembled together in the experimental setup, which isdemonstrated in Fig. 5. The camera is mounted 30 cm fromthe gripper, focused, and put at the same altitude with it.
Power supplyRpi camera v2.1Raspberry Pi 4BForce sensorMicroSD-card module Sensor node with ESP32 dev kitFlex sensor Force sensorFlex sensorSPA
Fig. 5. Soft gripper design.
The computation framework for the force estimation con-sists of the ESP 32 WROOM dev kit v1 and Arduino Mega2560 microcontroller’ firmware. ESP32 microcontroller isimplemented for data acquisition from sensors, data pro-cessing, and transmission. In order to control the RGB LEDarray, we applied the FastLED library, providing the ability tochange the LED color by adjusting only one value. However,instead of RGB color space, the HSV model needs to beutilized: • Hue - attribute of a visual sensation according to whichan area appears to be similar to one of the perceivedcolors: red, yellow, green, and blue, or to a combinationof two of them; • Saturation - colorfulness of a stimulus relative to its ownbrightness; • Brightness - attribute of a visual sensation according towhich the perceived color of an area appears to be moreor less chromatic.Traditionally, for HSV color models hue is represented asa number of degrees from 0 to 360. Saturation and brightnessvalues are often represented as numbers (percentages) from0-100. But neither 360 nor 100 is a particularly computer-native number. To make the code smaller, faster, and moreefficient, the FastLED library using simple one-byte values(from 0-255) for hue, saturation, and brightness indepen-dently was implemented. To improve the estimation accuracyand avoid color duplication in the FastLED hue spectrum,the hue range was limited to a 20-210 value range whichcorresponds to orange - purple colors. A core algorithm ofthe remote colour-based applied force detection had beendeveloped in Python 3 using OpenCV library (Fig. 6).
Obtaining
RBG image Changing color cpace
BGR toGray Changing color cpace
BGR toHSVImplementing
Gaussian blur Thresholding Applying maskDetecting contours Acquisition mean value ofcolour Applying inverse transform forforce calculation Filtering Receiving estimated value Merge resultsOverlay contours on originalimage Fig. 6. Force estimation algorithm with OpenCV . Gesture recognition with NN
To provide a natural remote control during the dronecatching task, a custom mocap device V-Arm was developedand integrated in DroneTrap system. The V-Arm is designedin a form of glove with 5 flexure sensors for the tracking offinger clenching. Data processing from sensors is performedwith the Arduino Mega and then transferred to a control PCfor gesture recognition via the UART interface.To detect gestures we implemented a neural network thatwas pretrained with gestures visualized from the V-Arm datain the Unity3D engine using the C
V-Arm controllerV-Arm glove
Streaming device(microcontroller)
Mainly developedvisualization moduleon Unity3D Hand modelPre-trained NNCurrent gesturesensors dataprocessed data choosed 6DoFdata (parameters)output vector
Fig. 7. Algorithm for gesture recognition
Eight static gestures were chosen for the training process(Fig. 8).
Fig. 8. Gestures for NN training: (a) Palm, (b) Ok, (c) Thumb up, (d)Index up, (e) Rock, (f) Call me, (g) Gun, (h) Fist.
For V-arm hand tracking system as raw data 5 input valuescorresponding to thumb, index, middle, ring, and pinkiefinger’s bending are received in float format. To calculate thebending angle from this value, the Vector3.Angle approachwas used that estimates the relative position of two vectors, inour case being the directions of bones matching each fingeron the palm and the last phalanger of the finger. An exampleof such estimation is shown in Fig. 9.
Fig. 9. Vectors represent a) each joint local frame and b) direction thatused to calculate finger bending angle.
As ground truth device, the LeapMotion Controller wasimplemented in dataset collection. Dataset has been collectedin a dark environment with clear background, which providesthe best conditions for infrared camera implementation. Tomake the training data, raw data of the finger bendingwas collected for all 8 gestures 3 times in the loop. Theexamples of the same real and virtual hands during thedataset collection are shown in Fig. 10.
Fig. 10. Examples of real gestures and gestures for NN training
After the training process, the matching between the set offinger angles and gestures was obtained. The dataset consistsof 24 gestures. The NN performs using the Backpropagationmethod. It consists of an input vector with 5 values (anglesbetween the palm and each finger), 1 hidden layer with20 neurons, and an output vector of 8 values (number ofgestures), where each set of values corresponds to eachgesture. The 100 000 training loops with a learning rate of0.0005 allowed us to achieve 98.3% classification accuracy.uring the training, by each 10 loop, the average gestureclassification accuracy was checked, with the output vectorbeing weighted as maximum in case of correct classificationand zero in case of the classification error. The classificationaccuracy of trained NN is showed in Fig. 11 䌀氀愀猀猀椀昀椀挀愀琀椀漀渀 愀挀挀甀爀愀挀礀
Fig. 11. The effect of the number of training loops on the classificationaccuracy
III. E
VALUATION
A. Force estimation1) Research Methodology:
The experiment was con-ducted to evaluate the sensor behavior while grasping twoobjects of different stiffness: a compliant gauze fabric as thesoft object and a solid bowl as the rigid object (Fig. 12).
Hue: 203 (b)
Hue: 146 (a)
Fig. 12. Force detection by the remote color-based system when grasping:a) a soft object; b) a rigid object.
2) Experimental results:
The experimental results re-vealed that the exerted force in the case of the soft objects isless than in the case of a rigid object. A soft body grabis performed with 54% lower mean force in comparisonwith rigid object grasping (Fig. 13). The estimation error ofcolor and force for the soft object (15.7 %) is much higherthan for the rigid one (4.7 %) (Fig. 14), which was causedby the soft object covering one of the pneumatic actuatorsand preventing the correct light reflection. Therefore, theimpact of the objects with inconsistent shape should be takeninto consideration when applying the camera color detectionalgorithm.Taking into account that a drone is a small object with alow mass and easy to break, therefore gripper works withdrones like a soft body to prevent the drone from breakingdown.
Fig. 13. Force estimation by the remote system when grasping: a) a softobject; b) a rigid object.Fig. 14. Comparison of soft body with rigid body manipulation in termsof color evaluation.
B. Experiments with Gesture Recognition using NN1) Experiment 1:
To evaluate the performance of thedeveloped mocap device (V-Arm), the comparison experi-ment was conducted with 2 systems: LeapMotion and HTCVive. Several lighting conditions and hand positioning wereselected for this experiment: • Well-lightened Room (WL) vs dim-lightened (DL). • Hands showing gestures in front of the headset (0 deg)vs turned by 45 degrees (45 deg) vs turned by 90degrees (90 deg). • Hands placed one above another (HP) vs placed coher-ently (HNP).For this experiment 8 participants (mean age = 22) wereinvited to repeat 8 gestures displayed on the screen withboth hands in 5 different environmental conditions describedabove. All volunteers previously had experience with motioncapture systems. The average recognition rate of the per-formed gesture samples by the same NN and three differenttracking devices is displayed at the Table I.
TABLE IC
OMPARISON E XPERIMENT FOR MOCAP S YSTEMS IN V ARIOUS E NVIRONMENTAL C ONDITIONS
Conditions Vive Leap V-armWL, front , HNP 90.6% 84.4% 93.8%DL, front , HNP 68.8% 50% 90.6%WL, 45 deg , HNP 50% 37.5% 93.8%WL, profile , HNP 71.9% 75% 90.6%WL, front, HP 68.8% 37.5% 93.8% he experimental results revealed that in a well-lightenedroom with a frontal hand positioning and absence of handocclusion, which were considered as the best environmentalconditions due to the highest accuracy rate, the V-arm gesturetracking performed on average by 3.2% better than Vivevisual tracking and by 9.4% better than LeapMotion system.However, in the worst condition with 45 deg hand rotation,V-arm significantly outperformed both camera-based systems(43.8% in case of Vive and 56.3% in case of LeapMotion)even in Well-lightened room. V-arm showed insensitivity tolighting conditions of the environment as expected, whileLeapMotion accuracy decreased in 34.4% in dim-lightenedconditions.
2) Experiment 2:
To evaluate the recognition rate of allthe proposed gestures with V-arm system the additionalcomparison experiment was performed. In this scenario,the best environment parameters were chosen based on theresults of a previous experiment: well-lightened room, handsplaced in front of the headset, and hands located at the samedistance from the headset. For this experiment 22 participants(mean age = 23) were invited. Each participant was askedto perform 8 gesture samples with both hands 3 times inrandom order, 24 gestures in total. The combined datasetconsists of 1824 gesture data from three devices.
Fig. 15. Experimental setup with VIVE headset, LeapMotion camera andV-arm for the comparison of gesture recognition rate.
The results of the recognition rate of various gestures insimilar environment by 3 mocap systems are listed in TableII
TABLE IIC
OMPARISON E XPERIMENT FOR G ESTURE R ECOGNITION R ATE .Gesture Vive Leap V-Arm AverageGesture a 53.90% 77.60% 92.10% 74.60%Gesture b 93.40% 71.00% 97.40% 87.30%Gesture c 81.60% 97.40% 97.40% 92.10%Gesture d 75.90% 81.60% 67.10% 68.90%Gesture e 86.80% 77.60% 89.50% 84.60%Gesture f 91.10% 68.40% 97.40% 86.00%Gesture g 86.80% 86.80% 96.00% 89.90%Gesture h 73.70% 90.80% 82.90% 82.50%Average 78.30% 81.40% 90.00% 83.20%
The experimental results revealed that about 83% of allgestures were successfully recognized. In average V-Armdevice recognized 90% of performed gestures which is 11.7% higher than Vive and 8.6% higher than LeapMotionsystems managed to achieve. According to the obtaineddata, for V-Arm device the most recognizable were b —“Ok”, c — “Thumb up” and f — “Call” gestures with97.4% success rate. Overall, the c gesture has the highestrecognition accuracy performed by all the evaluated systems.At the same time, the least recognizable was the d — “Indexup” gesture, therefore it was considered to be inappropriatefor the further application in robot-arm control.IV. C
ONCLUSION
In this paper, a novel method of safe drone catching isproposed based on the developed remote color-based forcedetection system, hand gesture recognition, and soft roboticgripper with multiple embedded sensors. Our experimentalresults show that the developed color-based force detectionapproach can be successfully implemented in catching bothrigid (force estimation error 4.7%) and soft (force estimationerror 15.7%) objects with high precision, and provide conve-nient force estimation either for the autonomous system orfor the human operator. The operator then has the abilityto adjust the applied forces and the sequence of swarmcatching with the developed gesture recognition system basedon the developed mocap glove V-Arm. The experimentalresults revealed that the V-Arm allows us to achieve ahigh recognition rate (3.2-56.3% higher than camera-basedsystems depending on environment parameters) with varioushand gestures (on average 90% recognition).Therefore, the proposed technology can be potentiallyimplemented for the swarm docking on non-flat surfacesin harsh conditions, providing both precise and adjustablegrasping process to the operator. The remote force detectionand system control with gesture recognition allows the oper-ator to be fully separated from the task, ensuring the interac-tivity and safety of the semi-autonomous drone catching. Inthe future, we are going to explore more dynamic swarmcatching scenarios by applying high speed and complextrajectories to the drones. The interaction of multiple roboticarms with the swarm will be experimentally evaluated.R
EFERENCES[1] E. Tsykunov, R. Agishev, R. Ibrahimov, L. Labazanova, T. Moriyama,H. Kajimoto, and D. Tsetserukou, “Swarmcloak: Landing of aswarm of nano-quadrotors on human arms,” in
SIGGRAPH Asia2019 Emerging Technologies , ser. SA ’19. New York, NY, USA:Association for Computing Machinery, 2019, p. 46–47. [Online].Available: https://doi.org/10.1145/3355049.3360542[2] Y. S. Sarkisov, G. A. Yashin, E. V. Tsykunov, and D. Tsetserukou,“Dronegear: A novel robotic landing gear with embedded opticaltorque sensors for safe multicopter landing on an uneven surface,”
IEEE Robotics and Automation Letters , vol. 3, no. 3, pp. 1912–1917,2018.[3] G. Miron, B. B´edard, and J.-S. Plante, “Sleeved bending actuatorsfor soft grippers: A durable solution for high force-to-weightapplications,”
Actuators
Applied Sciences , vol. 9, p. 2976, 07 2019.[5] J. Rothe, M. Strohmeier, and S. Montenegro, “A concept for catchingdrones with a net carried by cooperative uavs,” in ,2019, pp. 126–132.6] S. Tanaka, T. Senoo, and M. Ishikawa, “High-speed uav delivery sys-tem with non-stop parcel handover using high-speed visual control,”in ,2019, pp. 4449–4455.[7] S. Kao, Y. Wang, and M. Ho, “Ball catching with omni-directionalwheeled mobile robot and active stereo vision,” in , 2017, pp.1073–1080.[8] S. Kim, A. Shukla, and A. Billard, “Catching objects in flight,”
IEEETransactions on Robotics , vol. 30, no. 5, pp. 1049–1065, 2014.[9] J. Kober, M. Glisson, and M. Mistry, “Playing catch and juggling witha humanoid robot,” in , 2012, pp. 875–881.[10] H. H. Rapp, “A ping-pong ball catching and juggling robot: A real-time framework for vision guided acting of an industrial robot arm,”in
The 5th International Conference on Automation, Robotics andApplications , 2011, pp. 430–435.[11] N. Uchiyama, S. Sano, and K. Ryuman, “Control of a roboticmanipulator for catching a falling raw egg to achieve human-robot softphysical interaction,” in ,2012, pp. 777–784.[12] B. B¨auml, T. Wimb¨ock, and G. Hirzinger, “Kinematically optimalcatching a flying ball with a hand-arm-system,” in , 2010, pp.2592–2599.[13] N. R. Sinatra, C. B. Teeple, D. M. Vogt, K. K. Parker, D. F. Gruber,and R. J. Wood, “Ultragentle manipulation of delicate structures usinga soft robotic gripper,”
Science Robotics , vol. 4, no. 33, 2019. [Online].Available: https://robotics.sciencemag.org/content/4/33/eaax5425[14] D. Lydia and D. Steel, “A heritage of excellence: 25 years at sparaerospace limited (in english). david steel.canada: Spar aerospacelimited,” 1992, p. 41–42.[15] T. G. Thuruthel, B. Shih, C. Laschi, and M. T. Tolley, “Softrobot perception using embedded soft sensors and recurrent neuralnetworks,”
Science Robotics , vol. 4, no. 26, 2019. [Online]. Available:https://robotics.sciencemag.org/content/4/26/eaav1488[16] B. Shih, C. Christianson, K. Gillespie, S. Lee, J. Mayeda,Z. Huo, and M. T. Tolley, “Design considerations for 3d printed,soft, multimaterial resistive sensors for soft robotics,”
Frontiersin Robotics and AI
Advanced Materials , vol. 30, no. 15, p.1706383, 2018. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/adma.201706383[18] M. Hamaya, T. Matsubara, T. Noda, T. Teramae, and J. Morimoto,“User-robot collaborative excitation for pam model identification inexoskeleton robots,” in , 2017, pp. 3063–3068.[19] Y. Wu, P. Balatti, M. Lorenzini, F. Zhao, W. Kim, and A. Ajoudani,“A teleoperation interface for loco-manipulation control of mobile col-laborative robotic assistant,”
IEEE Robotics and Automation Letters ,vol. 4, no. 4, pp. 3593–3600, 2019.[20] P. S. Lengare and M. E. Rane, “Human hand tracking using matlab tocontrol arduino based robotic arm,” in , 2015, pp. 1–4.[21] I. Ajili, M. Mallem, and J. Didier, “Gesture recognition for humanoidrobot teleoperation,” in , 2017, pp.1115–1120.[22] R. Ibrahimov, E. Tsykunov, V. Shirokun, A. Somov, and D. Tset-serukou, “Dronepick: Object picking and delivery teleoperation withthe drone controlled by a wearable tactile display,” in2019 28thIEEE International Conference on Robot and Human InteractiveCommunication (RO-MAN)