[PDF] A Sweet Pepper Harvesting Robot for Protected Cropping Environments

Abstract

Using robots to harvest sweet peppers in protected cropping environments has remained unsolved despite considerable effort by the research community over several decades. In this paper, we present the robotic harvester, Harvey, designed for sweet peppers in protected cropping environments that achieved a 76.5% success rate (within a modified scenario) which improves upon our prior work which achieved 58% and related sweet pepper harvesting work which achieved 33\%. This improvement was primarily achieved through the introduction of a novel peduncle segmentation system using an efficient deep convolutional neural network, in conjunction with 3D post-filtering to detect the critical cutting location. We benchmark the peduncle segmentation against prior art demonstrating a considerable improvement in performance with an F_1 score of 0.564 compared to 0.302. The robotic harvester uses a perception pipeline to detect a target sweet pepper and an appropriate grasp and cutting pose used to determine the trajectory of a multi-modal harvesting tool to grasp the sweet pepper and cut it from the plant. A novel decoupling mechanism enables the gripping and cutting operations to be performed independently. We perform an in-depth analysis of the full robotic harvesting system to highlight bottlenecks and failure points that future work could address.

Full PDF

AA Sweet Pepper Harvesting Robot for ProtectedCropping Environments

Chris Lehnert ∗ Electrical Engineering and Computer ScienceQueensland University of TechnologyBrisbane 4000, Australia [email protected]

Chris McCool

Electrical Engineering and Computer ScienceQueensland University of TechnologyBrisbane 4000, Australia [email protected]

Inkyu Sa

Mechanical and Process EngineeringETH ZurichZurich 8092, Switzerland [email protected]

Tristan Perez

Electrical Engineering and Computer ScienceQueensland University of TechnologyBrisbane 4000, Australia [email protected]

Abstract

Using robots to harvest sweet peppers in protected cropping environments has remainedunsolved despite considerable eﬀort by the research community over several decades. In thispaper, we present the robotic harvester, Harvey, designed for sweet peppers in protectedcropping environments that achieved a 76.5% success rate (within a modiﬁed scenario) whichimproves upon our prior work which achieved 58% and related sweet pepper harvesting workwhich achieved 33%. This improvement was primarily achieved through the introduction ofa novel peduncle segmentation system using an eﬃcient deep convolutional neural network,in conjunction with 3D post-ﬁltering to detect the critical cutting location. We benchmarkthe peduncle segmentation against prior art demonstrating a considerable improvement inperformance with an F score of 0.564 compared to 0.302. The robotic harvester uses aperception pipeline to detect a target sweet pepper and an appropriate grasp and cuttingpose used to determine the trajectory of a multi-modal harvesting tool to grasp the sweetpepper and cut it from the plant. A novel decoupling mechanism enables the gripping andcutting operations to be performed independently. We perform an in-depth analysis of thefull robotic harvesting system to highlight bottlenecks and failure points that future workcould address. The horticulture industry is heavily reliant on manual labour. For instance, in Australia, labour hire isbetween 20% and 30% of total cash costs (ABARE, 2014). These costs along with other pressures such ashigh cost of inputs (energy, water, agrochemicals, etc .), variable production due to uncertain weather eventsand labour scarcity are putting proﬁt margins for horticulture farms under tremendous pressure.Robotic harvesting oﬀers a potentially attractive solution by not only reducing costs of labour but bylowering risks associated with obtaining labour and food safety. Robot harvesting also enables capitalising ∗ Direct correspondence to: Chris Lehnert: [email protected] a r X i v : . [ c s . R O ] O c t n opportunities for extended selective harvesting that maximises quality - optimal scheduling for harvestingdiﬀerent parts of the farm with required quality thresholds. For these reasons, there has been increasinginterest in the use of agricultural robots for harvesting crop and vegetables over the past three decades(Kondo et al., 2011). The task of developing a robotic harvester is particularly challenging and requires theintegration of numerous subsystems such as crop detection, motion planning, and dexterous manipulation.The underlying functional requirements share those of manufacturing, but there are additional challenges:uncontrolled and changing lighting, variability in crop size and shape, occlusions, and the delicate natureof the crop being manipulated. A survey of robotic harvesting of horticulture crops reviewed 50 projectsover the past 30 years (Bac et al., 2014). The review highlights that over this period the performance ofautomated harvesting has not improved substantially despite advances in sensors, computers, and machineintelligence. If robotic-crop harvesting is to become a reality, two key challenges must be addressed:1. perception of the crop and environment, and2. manipulation of the crop.Perception relates to being able to locate or segment the crop, determine its location in 3D and locatekey points for attaching and detaching the crop from the plant. Crop manipulation involves attaching anddetaching the crop without harming the crop or plant; this involves the development of physically appropriateend-eﬀectors combined with algorithms to eﬀectively and eﬃciently utilise them. Both perception and cropmanipulation are challenging tasks due to the presence of occluding obstacles such as leaves and branches,as well as natural variability in crop size, shape, and pose.Figure 1: The Harvey platform, an autonomous sweet pepper harvester operating in a protected croppingsystem.In this paper, we present a new robotic harvester (Harvey) designed for sweet peppers (also known ascapsicum or bell pepper) in protected cropping environments that improves upon prior work (Lehnert et al.,2017). In principle, the robotic harvester uses a perception pipeline to detect a target sweet pepper and anappropriate grasp and cutting pose used to determine the trajectory of a multi-modal harvesting tool. Theharvesting tool features a suction cup to grasp the sweet pepper and an oscillating blade to cut the pepperfrom the plant. A novel decoupling mechanism enables the gripping and cutting operations to be performedserially with independently chosen grasping and cutting trajectories. This combination of robotic-visiontechniques and crop manipulation tools are key enabling factors for the successful harvesting of high-valuecrops, in particular, sweet peppers. Fig 1 shows an example of the crop setup and characteristics as expectedin a protected cropping environment.his work improves upon (Lehnert et al., 2017) through the introduction of an accurate peduncle segmen-tation system (in the perception pipeline) and improved integration of the perception to action system.We perform an in-depth analysis of the full robotic harvesting system to highlight bottlenecks and failurepoints that future work could address. These improvements in the perception and action methods consider-ably increase the harvesting success rate from 58% to 76.5%, under a modiﬁed scenario in a real protectedcropping system. Analysis of the full robotic harvesting system highlights that the integration of an activevision method would likely improve both sweet pepper and peduncle segmentation within highly occludedscenarios.The presented harvesting system achieved a 76.5% success rate (within a modiﬁed scenario) which improvesupon our prior work which achieved 58% and related sweet pepper harvesting work which achieved 33% (Bacet al., 2017) (which has some diﬀerences in the cropping system)Central to the improved harvesting performance of Harvey, is the novel use of an eﬃcient deep convolutionalneural network (McCool et al., 2017), referred to as MiniInception , in conjunction with 3D post-ﬁltering todetect the cutting location (for sweet peppers this is the peduncle—the part of the fruit which is attachedto the plant). We benchmark the

MiniInception approach for peduncle segmentation against prior art (Saet al., 2017) demonstrating a considerable improvement in performance with an F score of 0.564 comparedto 0.302. This improvement is possible not only due to the increased accuracy of the deep convolutionalneural network but also the novel use of 3D post-ﬁltering.The key contributions of this paper are: • A proven in-ﬁeld robotic harvesting system that achieves a harvesting success rate of 76.5% in amodiﬁed scenario, • an in-depth analysis of the perception and harvesting ﬁeld trials of the robotic harvester, • and a novel method for peduncle segmentation using an eﬃcient deep convolutional neural networkin conjunction with 3D post-ﬁltering.The remainder of the paper is structured as follows. A review of the current state of the art methodsfor autonomous harvesting of horticultural crops is presented in Section 2. The design of the autonomousharvesting platform is then presented in Section 3, outlining the harvesting environment, platform anddesign of the multi-modal end-eﬀector tool. The methods for perception and planning are then presented inSection 4, outlining our novel techniques for segmentation and 3D localisation of sweet peppers. Results ofthree experiments are presented in Section 5, presenting the performance of our segmentation and pedunclelocalisation methods. The last experiment presents the results for our end-to-end autonomous harvestingsystem in a real protected cropping environment. Section 6 discusses key challenges and future work forimproving the performance of autonomous harvesting systems for horticulture. Current literature contains examples of various robots which are capable of autonomous harvesting undercertain environmental conditions and crops including: sweet peppers (Bac et al., 2017) including our previouswork (Lehnert et al., 2017), cucumbers (van Henten and Hemming, 2002), citrus (Mehta and Burks, 2014),strawberries (Hayashi et al., 2010) and apples (Bulanon and Kataoka, 2010b; De-An et al., 2011). Despitethis, the commercial application of such robots for horticulture is very limited. Some of the factors behindthis lack of commercial uptake, as reviewed in Bac et al. (2014) and Shamshiri et al. (2018), include thecomplexity of agricultural environments and the diﬀerent conﬁguration of crops within it (poses, sizes,shapes and colours). In addition, the highly occluded nature of the scene combined with the requirementsof high eﬃciency, accuracy, and robustness of the manipulation process has led to very few systems beingommercially viable. The above factors are the subject of great attention in the literature recently and canbe divided into two categories: perception and manipulation.The most related to our work is the sweet pepper harvesting robot developed within the Clever Robots forCrops (CROPS) project (Bac et al., 2017; Hemming et al., 2014; Bontsema et al., 2014). This robot wasdeveloped for harvesting sweet peppers using a 9DOF manipulator within a greenhouse environment. In thiswork, a colour and time of ﬂight camera are used in an eye-in-hand conﬁguration to detect and localise thecrop. Using depth information the position of the sweet peppers and orientation of the stem are estimated.In the work of Bac et al. (2017), two diﬀerent end eﬀector designs where ﬁeld tested where the best designwas shown to achieve a harvesting success of 6% in an unmodiﬁed crop. This result led the developers tosimplify the crop conﬁguration by removing crop clusters and occluding leaves. This led to an improvementin harvesting success of the robot up to 33% for the simpliﬁed scenario. An average harvesting time persweet pepper was reported as 94 seconds in the work of Bac et al. (2017).

Crop perception includes detection, segmentation and 3D localisation, and has been investigated for a varietyof diﬀerent crops. The key challenges include detection and segmentation in challenging outdoor environ-ments. 3D Crop localisation refers to the process of determining the position and orientation informationof the crop (Van Henten et al., 2003; Kitamura et al., 2008; Hemming et al., 2014; Bulanon and Kataoka,2010a) One of the challenges with localisation includes fusing multiple modalities of sensing technology suchas visual (colour or texture) information with depth information to obtain an accurate 3D localisation of thecrop.

For detection and segmentation, using standard RGB cameras, researchers have explored the use of eithertraditional features or deep learnt features. Examples of traditional features include the use of a radialsymmetry transform to perform grape detection (Nuske et al., 2011, 2014), the detection of a distinctivespecular reﬂective pattern to detect apples in Wang et al. (2012) or combining colour and shape features toperform semantic segmentation, into four classes, for tomato detection in Yamamoto et al. (2014).More recently, feature learning approaches have been explored. One of the earliest examples of this was in2013 where Hung et al. (2013) proposed to learn features using an auto-encoder. These features were thenused within a conditional random ﬁeld (CRF) framework to perform almond segmentation. This approachachieved impressive segmentation performance but did not perform object detection.In 2016, Sa et al. (2016) proposed the use of deep learning systems for detection of nine diﬀerent crops(e.g. sweet pepper, melons, apples and avocados) and explored diﬀerent methods for combining multi-modalinformation (i.e., early, or late fusion of multispectral images) and explored some of the limits of such anapproach. For crop counting, Rahnemoonfar and Sheppard (2016) proposed to learn deep convolutionalneural networks using simulated training data to count apples.The above methods have addressed issues such as crop segmentation (Hung et al., 2013), detection (Sa et al.,2016) or estimating the number of crops in a sub-region of the image (Rahnemoonfar and Sheppard, 2016).However, to perform harvesting, it is important to ﬁnd other attributes of a plant such as the peduncle; thisis the part of the fruit which attaches it to the stem or branch of the plant.In terms of peduncle detection, Cubero et al. (2014) demonstrated the detection of various fruit pedunclesusing radius and curvature signatures. The Euclidean distance and the angle rate of change between eachof the points on the contour and the fruit centroid are calculated. The presence of peduncles yields rapidchanges in these metrics and can be detected using a speciﬁed threshold. Blasco et al. (2003) and Ruizt al. (1996) presented peduncle detection of oranges, peaches, and apples using a Bayesian discriminantmodel of RGB colour information. The size of a colour segmented area was calculated and assigned topre-deﬁned classes. The above methods are more likely suitable for the quality control and inspection ofcrop peduncles after the crop has been harvested rather than for harvesting automation as they require aninspection chamber that provides ideal lighting conditions with a clean background, no occlusions, goodviewpoints, and high-quality static imagery.In our previous work, Sa et al. (2017) proposed the use of point feature histograms and colour features todetect sweet pepper peduncles. This approach was evaluated on data from a real-world cropping environmentand achieved impressive results. A downside of this approach was the requirement to annotate 3D imagery,which can be time-consuming. An alternative approach would be to make use of just the 2D imagery andemploy a deep convolutional neural network (DCNNs).Recent work has demonstrated the potential for the use of deep learning approaches to address agriculturalcomputer vision problems. McCool et al. (2017) proposed an approach for deploying eﬃcient deep convolu-tional neural networks for crop vs weed classiﬁcation by distilling the information from a high-performancebut high computational load neural network to eﬃcient smaller, student , networks. Semantic segmentationmaking use of synthetic imagery was proposed by Barth et al. (2017) for plant-part segmentation and Miliotoand Stachniss (2018) presented an eﬃcient framework for semantic segmentation of weeds.Given the success of the above work (McCool et al., 2017), we considered its application in this paper anddescribe it in more detail in Section 4.2. We note that at the time of the experiments the prior work (Barthet al., 2017) and (Milioto and Stachniss, 2018) were unavailable.

In most cases of the literature, a vision system is used to detect and segment the target crop, and depthinformation is used to determine its position. Methods for estimating depth include the use of stereo images(Van Henten et al., 2003; Kitamura et al., 2008), time-of-ﬂight cameras (Hemming et al., 2014) and laserrange ﬁnders (Bulanon and Kataoka, 2010a).Colour and depth sensors have been used by Nguyen et al. (2014) to segment bushels of apples using Euclideanclustering techniques. Furthermore, random sample consensus was used to ﬁt a spherical model to each applein order to estimate their centroids.In some cases, the orientation of the crop is estimated for use in grasping and detachment stages of theharvesting process (van Henten and Hemming, 2002; Han et al., 2012). For instance, suction cup grippers(commonly used in harvesting) have the disadvantage of failing if there is no complete seal on the crop.Estimating the orientation of an asymmetrical crop such as sweet peppers or strawberries can aid in thealignment of the suction cup gripper improving the attachment success rates (Hayashi et al., 2010; Lehnertet al., 2016).Studies on 3D crop localisation in the presence of occlusions have been shown to improve localisation accuracy,such as the work by Gongal et al. (2015). Partially visible crops are highly challenging to localise since onlya portion of information is available. To address this issue, using spacial or temporal multiple-views andtheir registration technologies has been utilised. Gongal et al. (2016) employed a dual-sided imaging systemthat consists of 5 pairs of colour and 3D cameras for each side (10 pairs in total) for apple localisation.

A survey of autonomous harvesting projects for horticulture by Bac et al. (2014) has found that more thanhalf of the reviewed projects do not report the motion planning techniques used and this can account for theack of progress in motion planning techniques for horticulture. Two standard methods for motion planninginclude open loop planning and visual servoing.Open loop planning methods which do not simultaneously localise the crop and plan the motion is a commonapproach for autonomous harvesting. Open loop methods can suﬀer from problems when the robot interactswith the scene inducing changes to the crop location and thereby reducing the accuracy of the currentestimate. If the robot only interacts minimally with the scene and if the scene is static, open loop planningmethods can be successful (Hemming et al., 2014; Baur et al., 2014; Van Henten et al., 2003; Scarfe et al.,2009). Other improvements over standard motion planners have been attempted, such as using optimal pathplanning to determine the best motion of the manipulator for harvesting crop seen in the work by Schuetzet al. (2015).Image-based visual servoing has been used to control the motion of a robot manipulator to a target cropusing an eye-in-hand camera for autonomous harvesting of citrus (Mehta and Burks, 2014; Hannan andBurks, 2004), apples (Baeten et al., 2008; Bulanon and Kataoka, 2010a), tomatoes (Kondo et al., 1996)and strawberries (Han et al., 2012; Hayashi et al., 2010), but also with a ﬁxed point of view camera forsweet peppers (Kitamura and Oka, 2005). The approach by Mehta and Burks (2014) uses a perspectivetransformation to estimate the position of the crop in Euclidean space to determine the control policy ofthe manipulator. Visual servoing can be useful for motion planning within dense vegetation where croplocalisation can perform poorly due to occlusions (Barth et al., 2016).Successfully grasping and detachment in a dense and cluttered environment is still an ongoing researchproblem and is currently an active area of research (Bhattacharjee et al., 2014; Jain et al., 2013; Killpack et al.,2015), often requiring tactile sensing to discern between rigid and deformable objects. As advocated in Bacet al. (2014), simplifying the workspace or developing harvesting tools which simplify the harvesting operationare potential solutions to the motion planning problem in dense and cluttered horticultural environments.

The most common manipulator used for autonomous harvesting over the past 50 years has been 3DOFcartesian and anthropomorphic arms, followed by 6DOF manipulators (Bac et al., 2014). Optimal design ofmanipulators for diﬀerent horticulture tasks such as cucumbers (Van Henten et al., 2009) and sweet peppers(Lehnert et al., 2015) has been used to aid in the selection of the joint type and number of DOF for themanipulator. This work has shown that a potential optimal design for harvesting within a sweet pepperenvironment is a 6DOF manipulator with two extra linear DOF at the base adding vertical and horizontalfreedom (Lehnert et al., 2015).A variety of diﬀerent harvesting tools have been developed for grasping and manipulating crop. These rangefrom suction cups, contact grippers, soft robotic ﬁngers and under-actuated anthropomorphic grippers. Oneof the most widely used gripper technologies to handle crop is based on suction cups (Blanes et al., 2011)and the use of vacuum pressure to grasp the crop. Suction cups have been used for a large range of cropssuch as tomatoes (Ling et al., 2004), apples (Baeten et al., 2008), cucumbers (Van Henten et al., 2003), sweetpeppers (Hemming et al., 2014; Bontsema et al., 2014) and strawberries (Hayashi et al., 2010). A suctioncup has the advantage of requiring less workspace to complete the grasp (i.e. the mechanism is not requiredto envelop the crop but only come in contact with a smaller patch of its surface.)Contact grippers use friction to hold onto a crop, where the most common is a two ﬁnger jaw grip-per (Monkman et al., 2007). Contact-based grippers which use mostly two or three ﬁngers have beenused for harvesting apples (Bulanon and Kataoka, 2010a; De-An et al., 2011), tomatoes (Ling et al., 2004),oranges (De-An et al., 2011), and kiwifruit (Scarfe et al., 2009). Using soft robotic ﬁngers instead of rigidﬁngers has shown to have the potential for crop harvesting (Inc., 2016; Ilievski et al., 2011) as they have theadvantage of reducing grasping damage via compliant and soft interaction with the crop. However, the softﬁngers can be diﬃcult to implement and require further reﬁnement before they can be used practically.ften within horticulture, a detachment tool is required to remove the crop from the plant. These oftenrequire speciﬁc designs for the target crop, depending on how the crop is attached to the plant. The mostcommon type of detachment tool is a mechanism that cuts or severs the peduncle of the crop and includethermal cutters, scissors or a custom cutting mechanism.Thermal cutters have been used for a variety of crops such as sweet peppers (Bachche and Oka, 2013) andcucumbers (van Henten and Hemming, 2002) which have the advantage of sealing the cut area preventing thespread of diseases. Scissor type cutters have been used for diﬀerent crops such as sweet peppers (Kitamuraand Oka, 2005; Hemming et al., 2014) and strawberries (Hayashi et al., 2010; Han et al., 2012). In theseapproaches, the gripper and the cutter are mounted at ﬁxed oﬀsets with each other and require the crop andpeduncle to ﬁt within the ﬁxed oﬀset for a successful detachment. An added DOF between the gripper andcutter can be included to tackle this problem (Kondo et al., 2010), but can add to the complexity of the endeﬀector design. A disadvantage of a scissor mechanism is the potential to damage surrounding parts of theplant (Hemming et al., 2013). Furthermore, scissors cut in a plane, and if the curvature of the peduncle orstem is irregular, then the cutting plane may not completely sever it.A custom cutting tool for sweet peppers was developed in (Hemming et al., 2014) which used the concept ofenveloping the sweet pepper with a hinged jaw mechanism. This type of mechanism was more successful thana scissor mechanism developed within the same project. The main disadvantage with this mechanism was thesize and geometry constraints required to get the mechanism around the back of the sweet pepper. This wasproblematic as the mechanism would get stuck on parts of the plant surrounding the sweet pepper (Hemminget al., 2013).

This section describes the system design for our autonomous sweet pepper harvester and includes an overviewof the robotic platform, harvesting tool, software design and a description of the harvesting environment.The overall procedure for harvesting sweet peppers is shown in Fig 2 and can be broken down into ﬁve stages:1. Sweet Pepper Segmentation2. Peduncle Segmentation3. Grasp Selection4. Attachment5. DetachmentThe ﬁrst three steps: Sweet pepper localisation, peduncle localisation and grasp selection form the perceptionsystem described in Section 4. During the sweet pepper localisation stage, the robot arm is moved to a long-range perspective to capture a 3D colour image of the whole scene using an eye-in-hand RGB-D camera. Atarget sweet pepper is localised at the long-range perspective and used to move the camera to a close-rangeperspective of the targeted sweet pepper in order to improve the performance of the peduncle localisation.The next stage localises the peduncle of the target sweet pepper using a convolutional neural network anda 3D ﬁltering method to estimate the centroid of the peduncle. A grasp selection is then performed on thesegmented 3D points of the targeted sweet pepper. The grasp selection uses a heuristic to rank possiblegrasp poses on the target sweet pepper using surface and position metrics.The harvesting method involves two stages, crop attachment and crop detachment. The harvesting methoduses a custom harvesting tool described in Section 3.3 comprised of a suction cup to grasp the sweet pepper(attachment), and an oscillating blade to cut the sweet pepper from the plant (detachment). The grasp poseand peduncle pose from the previous perception system are used to plan the motion of the robot arm. Thegrasp pose is used to plan the attachment stage whereas the peduncle pose is used to plan the detachment.During the attachment step, the suction cup is magnetically attached to the cutting blade. For the subsequentdetachment phase, the cutting blade is separated from the suction cup which remains attached to the end-igure 2: The ﬁve stages of the autonomous harvesting cycle. The robot ﬁrstly detects a target pepper at awide viewing angle and then moves to a close-range perspective. The sweet pepper is then segmented usingcolour information. Thirdly the peduncle of the sweet pepper is estimated using a deep learning segmentationmethod. The fourth stage determines the optimal grasp location for attachment. The ﬁnal stage uses theestimated peduncle and grasp pose to execute the attachment and detachment tools to remove the sweetpepper from the plant.eﬀector via a ﬂexible tether. This design enables the robot arm to perform the attachment and detachmentsteps sequentially at independently chosen locations (grasp and peduncle poses) to maximise the success ofboth phases.

Commercially, sweet peppers are grown in both outdoor ﬁeld and indoor protected cropping environments. Inthis work, we focus on the task of picking sweet peppers within a protected cropping environment. Protectedcropping environments comprise of glass or a semi-permanent plastic enclosure (poly-tunnel) designed toprevent damage to the crop from pests, heat, cold, rain and wind. Protected cropping systems in tropicalclimates such as Northern Australia can diﬀer to other international greenhouse systems with respect totrellising and potting methods. However, the underlying plant structure such as leaves, stems and sweetpeppers are very similar, including their physical and visual appearance.Within the enclosure, sweet peppers are typically grown hydroponically with support trellises or wiring,allowing them to grow up to 4 m tall in a relatively 2D planar structure.Protected cropping environments are designed to provide signiﬁcantly increased yields compared to ﬁeld-grown crops. The layout also provides three signiﬁcant advantages for autonomous harvesting. First, thecrop is presented on a two-dimensional planar surface. This planar structure signiﬁcantly reduces occlusionfrom other branches and leaves, making visually detecting and locating crops much easier than ﬁeld sweetpeppers that grow within a low, three-dimensional bush/shrub. Second, the planar presentation of the crop a) (b)

Figure 3: (a) A section of a crop row in a protective cropping environment. Plants are trained onto a trellisso that the crop is presented on a two dimensional planar surface. Some occlusions from leaves can be seen.(b) Plant layout within protected cropping facility including typical row and plant spacing.simpliﬁes collision avoidance and motion planning by providing relatively open access to both the stem andside face of the crop. Third, the protective cropping environment presents a more forgiving environmentfor computer vision as the plastics and/or mesh roof and walls diﬀuse incoming sunlight. Diﬀused sunlightmeans the environment is lit relatively evenly, with virtually no sharp shadows.The protected cropping system that our studies were conducted on grew two cultivars of sweet pepper,Mercuno and Ducati with a row spacing of approximately 1 m and a plant spacing of 0.5 m. The layoutof this system is illustrated in Fig 3 along with a view of a typical section of the sweet pepper crop in acommercial protected cropping system.

The harvesting robot “Harvey”, is shown in Fig 4. Harvey is comprised of a 7DOF manipulator, a customend-eﬀector and a mobile base that houses batteries, a water-cooled computer with dedicated graphicsand the appropriate control hardware for operating the manipulator, end-eﬀector and drive system. Fig 4highlights each component, showing the setup of the robotic platform within the protected cropping facility.Harvey was designed to work within a protected cropping environment and manoeuvre within each crop rowwhich can grow up to 2 m tall. The vertical lift axis and mobile base were selected speciﬁcally to allow theend-eﬀector to reach all of the sweet pepper along each row.To choose a robot arm suitable for the sweet pepper harvesting task, a range of arms where compared withrespect to their cost, base weight, payload, workspace, IP rating and speed. A range of industrial arms weresurveyed including the Kuka LBR, Kinova Jaco/MicoA, Barrett WAM, Universal Robots (UR3/5/10) andShunk Powerball. A UR5, from Universal Robots, was selected as it satisﬁed all of the speciﬁcations for theharvesting task. In particular, the UR5 is IP54 rated for water and dust, has a workspace area of 0.85m anda payload of 5 kg, suitable for the weight of the harvesting tool (2 kg) and the weight of sweet pepper whichcan weigh up to 0.5 kg.igure 4: The Harvey platform with each component highlighted. The components consist of a 6DOF UR5robot arm with a harvesting tool attached to its end eﬀector, a custom mobile base platform with PC andcontrol box and a prismatic lift joint.

The end-eﬀector of the robotic arm is a custom designed harvesting tool. This is the means by which therobot interacts with the crop and its design is critical to reliable attachment and detachment of the sweetpepper.This harvesting tool performs a dual purpose, performing both gripping and cutting operations sequentially.This tool is designed to remove a sweet pepper by ﬁrst gripping it with a vacuum gripper, then cuttingthrough the stem with an oscillating blade. The individual components of the harvesting tool are mountedon the tool point of the robot arm and are shown in Fig 6. An RGB-D sensor is mounted near the front ofthe end eﬀector body, and is used to identify the shape and location of each sweet pepper. The body of theend eﬀector contains a hand-held oscillating multi-tool with metal blade for cutting stems.Grasping and cutting a sweet pepper at the same time with a single harvesting tool can be challenging.The natural variation of sweet peppers and peduncles (size, shape and orientation) imposes challengingconstraints on the path of the end eﬀector if attempting to grasp and cut at the same time. To overcome thisdiﬃculty, our end-eﬀector has a decoupling mechanism (using passive magnets) which separates the graspingand cutting tools by a ﬂexible tether. Separating the grasping and cutting tool relaxes the constraint forsimultaneous grasping and cutting, enabling these operations to occur sequentially and at diﬀerent locations(i.e not at ﬁxed oﬀsets from gripper and cutter). This allows a wider set of grasping and cutting poses to beused within the planning algorithm.The decoupling mechanism is composed of two distinct components. The ﬁrst component is a ﬂat, ﬂexiblepolymer strip that attaches the suction cup to the body of the end-eﬀector. The second component is amagnet that allows the base of the suction cup to attach to the underside of the cutting blade (see Fig 6).During the gripping operation, the suction cup is magnetically attached to the cutting blade, allowing therobot arm to control the position of the suction cup to grasp the sweet pepper. During the cutting operation,the suction cup passively detaches from the cutting blade, while remaining attached to the body of the endeﬀector by the ﬂexible strip, allowing the suction cup to move independently of the cutting blade. Thissimple and passive decoupling method requires no additional active elements such as electronics, actuatorsigure 5: A high level schematic diagram of the autonomous harvesting “Harvey” platform. The system iscomprised of the electronics, actuators, harvesting tool and air system. The diagram illustrates how eachcomponent is interconnected to create the full autonomous platform.or sensors, and allows independent gripping and cutting locations to be chosen for each sweet pepper, whichin turn enables more reliable harvesting.Two feedback sensors are also integrated into the harvesting tool. The ﬁrst is a separation sensor (SPDTlever switch) which indicates whether the suction cup and cutting tool are coupled or separated. The secondis a vacuum pressure sensor which provides feedback on whether the suction cup has grasped a sweet pepperas the vacuum pressure is directly related to the holding force.The procedure for harvesting the sweet pepper with the harvesting tools, illustrated in Fig 7, is as follows:1. Attach to crop: The arm is moved to allow the suction cup to attach to the surface of the sweetpepper. The attachment point is chosen as a smooth ﬂat area on the face of the sweet pepper(Fig 7a).2. separate suction cup from cutting blade: The end eﬀector is moved upwards which causes themagnets holding the suction cup to the underside of the cutting blade to separate. The ﬂexible stripnow allows the cutting blade to move independently of the suction cup/sweet pepper so that thecutting blade can target the optimum stem-cutting location (Fig 7b).3. Peduncle cutting: The oscillating cutting blade is moved forward to cut the sweet pepper peduncle,detaching it from the plant. After the peduncle is cut, the sweet pepper falls away from the plantwhilst remaining attached to the suction cup, in-turn attached to the end-eﬀector by the ﬂexiblestrip (Fig 7c).4. Magnet re-attachment and Release crop: The robot arm is moved so the end eﬀector points down-wards over a collection crate. This passively re-attaches magnetically the suction cup with thecutting blade under the force of gravity, ready to harvest another sweet pepper. The vacuum is thenreleased from the suction cup causing the crop to drop from a small height safely into the collectioncrate (Fig 7e).igure 6: Harvesting Tool. The harvesting tool attached to the end eﬀector of the robot manipulator. Thetool is comprised of a suction cup for grasping which can separate from a cutting tool (oscillating cuttingblade) via a passive magnetic coupling. The suction cup is attached to a ﬂexible rubber strip allowing thesuction cup to freely move when separated from the cutting tool. The harvesting tool has a separationsensor as feedback to the system if the tool separates accidentally. A vacuum pressure sensor is also used tomeasure whether the suction cup is attached to a sweet pepper. A RGB-D camera is also used as input forthe perception system (segmentation of sweet peppers and peduncles). (a) (b) (c)(d) (e)

Figure 7: Indicative images of the harvesting phases from attachment, separation of the cutter and detach-ment of the sweet pepper. (a) Image of the harvesting tool with the suction cup coupled to the tool duringthe attachment stage. (b) A vertical motion is used to decouple the suction cup and cutting tool. (c) Onceseparated, the cutting tool is used to sever the peduncle from the plant and is shown decoupled from thesuction cup. (d) Post detachment, the sweet pepper remains attached to the suction cup but falls away fromthe plant. (e) As the sweet pepper is placed into a crate, the suction cup and cutting tool passively re-attachmagnetically. .4 Software Design

The software system was designed within the Robot Operating System (ROS) framework. The systemcontains eight customised ROS nodes as illustrated in Fig 8a. Communication among the nodes was primarilyperformed with custom messages sent by ROS actions and services. Central to the software design was astate machine implemented with the ROS SMACH package (Bohren and Cousins, 2010).Fig 8b shows the logic ﬂow diagram which is internal to the state machine for harvesting sweet peppers alonga crop row. At each stage the system checks if a failure occurred, such as if a sweet pepper is not detected,or a suction cup attachment failed. If a failure occurred with attachment of the suction cup a diﬀerent grasppose is then attempted until a maximum number of attempts occurred. If no sweet peppers, grasp poses orpeduncle poses were detected or the attachment attempts were exceeded the system moved into the moveplatform state. After all sweet pepper/s in a scene were attempted or harvested the system became free formovement to the next region of sweet peppers.The ROS MoveIt! package (Sucan and Chitta, 2016) was used for executing the motion planning operations.Within the MoveIt! framework the Open Motion Planning Library (S¸ucan et al., 2012) implementationof RRT* (Karaman and Frazzoli, 2011) was selected, along with the TRAC-IK (Beeson and Ames, 2015)inverse kinematics (IK) solver. The distance optimisation setting of TRAC-IK was used to minimise thejoint-space traversed through movement operations. We found the combination of RRT* and TRAC-IK (a) (b)

Figure 8: Software architecture diagrams (a) Diagram illustrating how each software subsystem is connected.The state machine is central to the Harvey system, and contains the decision making logic. The SceneRegistration system reconstructs the 3D scene from RGB and depth images. The Sweet pepper segmentation,peduncle segmentation and grasp Selection Fitter are used to generate information required to harvest eachsweet pepper within a scanned scene. Physical harvesting operations are handled by the Path Planner,Robot Arm Controller and End Eﬀector Control subsystems. (b) Logic ﬂow diagram which is implementedwithin the state machine subsystem and illustrates the decision making steps for the autonomous harvestingoperationroduced consistent plans more often than alternative readily available options (Beeson and Ames, 2015).Collisions were handled within the motion planning framework which accounted for self-collisions of therobot platform in addition to custom vertical crow row boundaries to ensure the robot safely planned withineach crop row.Control of Harvey was implemented with readily available ROS control packages. A joint trajectory actionserver was used which accepted a 7DoF joint trajectory from the ROS Moveit! motion planner and split thejoint trajectories to the Universal Robots ROS joint controller and a custom ROS trajectory controller forthe lift axis.

In this section we present our perception system for determining grasp and cutting poses through thesegmentation of sweet peppers and peduncles. To do this we ﬁrst capture a view of the sweet pepper withan RGB-D camera. We then segment the sweet pepper from its surrounding environment using colourinformation (Section 4.1). The pose of the peduncle is then estimated by using a deep peduncle segmentor(Section 4.2) to perform per-pixel segmentation. Grasp poses are then selected using surface informationfrom the 3D segmentation of the sweet pepper (Section 4.3). The grasp pose and peduncle pose are thenused by the subsequent motion planning method (Section 4.4) to perform the harvesting operation.The ﬁrst stage of the perception pipeline involves two steps, ﬁrstly an image is captured from a long-rangeperspective in order to detect the initial location of a target sweet pepper. A target sweet pepper is selectedbased on distance to the origin of the robot arm workspace. The camera is then moved into a close-rangeperspective based on the position of the target sweet pepper but oﬀset vertically in order to maximise theview of the peduncle. In this work the RGB-D sensor is an Intel ® RealSense SR300 RGB-D camera andprovides RGB colour and depth information used to create a colour point cloud of the scene.

Sweet pepper segmentation is necessary to diﬀerentiate the sweet pepper from the background (leaves andstems). Per-pixel segmentation is necessary for the later stage of grasp pose selection. This task is challengingdue to variation in crop colour and illumination as well as high levels of occlusion. An ad-hoc combinationof features (local binary patterns, histogram of gradients, stacked auto-encoders and HSV) were used in ourprior work (McCool et al., 2016) to train a conditional random ﬁeld that was capable of detecting both greenand red sweet pepper.In this work, we are only interested in segmenting red (ripe) sweet pepper and so we make use of a simplercomputationally eﬃcient method based purely on colour described in Lehnert et al. (2016). A HSV coloursegmentation algorithm is trained to detect ripe (in our case red) sweet peppers. We convert the RGBinformation into the more consistent rotated hue, saturation, and value (rotated-HSV) colour space. Therotated-HSV colour space is chosen as its dependence upon intensity is assigned to a single dimension (V)rather than across all three RGB dimensions. The hue component is rotated by 90 ◦ to avoid red values fromlying on the border at 0 ◦ and 360 ◦ .The distribution of red sweet pepper pixels is then modelled using a multivariate Gaussian. The likelihoodthat a pixel is from a red sweet pepper is then evaluated for each pixel. p ( x | µ , Σ ) = (2 π ) − det | Σ | − exp (cid:20) −

12 ( x − µ ) T Σ − ( x − µ ) (cid:21) . (1)The model parameters ( µ and Σ ) are learnt on a training set of manually annotated images; Σ is assumedo be diagonal. This is a computationally eﬃcient model to use as the log-likelihood reduces tolog [ p ( x | µ , Σ )] = −

12 log (2 π ) −

12 log det | Σ | −

12 ( x − µ ) T Σ − ( x − µ ) (2)and the ﬁrst two terms, − log (2 π ) and − log det | Σ | , can be pre-computed.The output of the image segmentation is then used to segment the 3D points of sweet pepper from the colourpoint cloud of the scene. A resultant 3D point cloud containing only segmented sweet pepper points, is thenclustered and ﬁltered. A Euclidean clustering step is used to separate the points for multiple sweet peppersfrom the scene based on a minimum distance threshold. If multiple sweet peppers are in view this clusteringstep also determines the best candidate sweet pepper based on the cluster which has the largest number ofsegmented 3D points (most information available) and has the closest centroid to the end eﬀector. Lastly,smoothing and outlier removal is performed on the points to ﬁlter noise from the segmentation step. Theresulting output of this stage is the segmented point cloud of a single target sweet pepper. The stages of thesweet pepper segmentation method are illustrated in Fig 9.Figure 9: Diagram illustrating the steps performed to obtain a segmentation of a single target sweet pepper.(a) Firstly a Long-range perspective point cloud is captured (b) Then an initial sweet pepper segmentationis performed using colour, clustering and noise ﬁltering to estimate a target sweet pepper which is closestto the workspace of the robot (c) The RGB-D camera is then moved to a close-range perspective centredon the target sweet pepper (d) A ﬁnal sweet pepper segmentation is performed to produce a coloured pointcloud of the sweet pepper used for grasp selection. It is highly desirable to be able to precisely segment and localise the peduncle before performing any cropcutting because retaining the proper peduncle maximises the storage life and market value of each crop. Inaddition, accurate peduncle segmentation can lead to higher success rates for crop detachment, which in turnyields more harvested crops. It is, however, a challenging task attributed by multiple factors; the presenceof occlusions by leaves or other crops, varying lighting conditions in product environments, visually similarto other parts of the plant, and the natural variance in its shape (e.g., ﬂat or highly curved).Our previous work in Sa et al. (2017) proposed a peduncle segmentation system to address these challengesbased on hand-crafted colour and geometry (point feature histograms (Rusu et al., 2009)) features usedwith a support vector machine (SVM), referred to as

PFH-SVM . Although this conventional approach isstraight-forward and eﬃcient in feature extraction and prediction, it suﬀers from two downsides. First, itis sensitive to variations in environmental conditions (e.g., varying lighting) which implies that the trainedmodel can overﬁt to the environments where the dataset was collected and might not be generalisable.Second, annotating the data is challenging and time-consuming as it requires the visualisation and selectionof regions in a 3D viewer.Inspired by the recent advances in computer vision, we consider the eﬃcient deep learning method proposedby McCool et al. (2017) to train a peduncle segmentation system using primarily the colour image. Thissystem has the potential to provide a highly accurate and eﬃcient system which can be rapidly deployed toiﬀerent environments, and tasks, due to the ease with which data can be collected and annotated. However,due to the importance of the 3D structure of the plant we employ a secondary ﬁltering process. This allowsus to enforce structural constraints based on reasonable assumptions about the crop structure and is appliedto both the

PFH-SVM and eﬃcient deep learning approach for experimental comparison. Below we describethe eﬃcient deep learning approach and 3D ﬁltering process.Figure 10: Steps for CNN peduncle localisation. Firstly a region of interest is computed given prior detectedsweet pepper points (Compute Region of Interest). The ROI masked image is then passed into the CNNwhich returns detected peduncle points (2D Detection, blue and red denote high and low conﬁdence regionsrespectively). The detected points are then projected to Euclidean space using the corresponding depthimage from the RGD-D camera (Project to 3D). The projected points are then ﬁltered using 3D constraints,colour information and cluster size (3D Constraint and Colour Filtering). The ﬁltered peduncle points arethen used to estimate a cutting pose aligned with the centroid of the points (Pose Estimation).

McCool et al. (2017) proposed an approach for training deep convolutional neural networks (DCNNs) thatallows for the tradeoﬀ between complexity (e.g. memory size and speed) with accuracy while still trainingan eﬀective model from limited data. We use the

MiniInception approach of McCool et al. (2017) to train alightweight DCNN for eﬃcient peduncle segmentation. When training a model like this it is normal to deﬁnethe positive region, in this case the peduncle, and then consider everything else to be a negative example.However, due to scene complexity this is not appropriate for our work as some parts of the scene may containother peduncles, as can be seen in Fig 11 (a). As such, for each image we annotate the positive region, seeFig 11 (b), as well as the negative region, see Fig 11 (c). (a) (b) (c)

Figure 11: Example of manually annotated ground truth for peduncles. The original image is shown in (a)and the annotated peduncle (b) followed by the regions which do not represent the peduncle in (c). .2.2 Region of Interest Selection and 3D Filtering

For a deployed system, the task of peduncle detection is performed once the sweet pepper has been detected.Therefore we employ assumptions based on the structure of the plant improve the accuracy of the localisation.We make use of two assumptions. First, we can improve the eﬃciency of the two algorithms by pre-computinga 2D region of interest (RoI) so that only the region in the image above the detected sweet pepper is consideredto contain the peduncle. Similarly, 3D constraints such that the peduncle cannot be too distant from thedetected sweet pepper are enforced using a 3D bounding box before ﬁnally declaring the position and poseof the peduncle. An example of this process is given in Fig 10, this process is applied to both algorithms.The 2D RoI is deﬁned to contain the region within the image above the detected sweet pepper. Given abounding box of the detected sweet pepper of height h b and width w b and central position ( c x , c y ), the regionof interest for the peduncle is deﬁned to have the same width, w b , and height, h b . The central location of thepeduncle RoI is then given shifting up by half the height of the sweet pepper bounding box, ( c x , c y + h b / w sp , l sp , h sp correspond to the width, length and heightof the sweet pepper point cloud and w p , l p , h p corresponds to width, length and height of the calculatedpeduncle 3D BB. max h corresponds to the maximum height of the sweet pepper and h offset is a oﬀsetparameter for deﬁning the upper and lower heights of the peduncle 3D BB.The 3D Bounding Box (BB) is used to delete peduncle outliers using the maximum and minimum euclideanpoints from the detected sweet pepper points. The deﬁnitions of the width, length and height of the BB aregiven in Fig 12 and shows that the length, l p of the peduncle BB is given by the max of either the width, w sp , or length, l sp of the sweet pepper BB. The reason for selecting the max of width or length is due to thefact that the depth points of the sweet pepper are measured from one side (view) only and it is assumed thatthe sweet pepper is symmetrical about this axis. Therefore, the largest measure, width or length, gives themaximum BB of the sweet pepper in those axes. Furthermore, the height h p of the peduncle BB is deﬁnedby the max height of the sweet pepper, max h , and a predeﬁned height oﬀset parameter, h offset . For thiswork we deﬁned the height oﬀset parameter as 50mm which is the average length ( N = 25) of a peduncleor the varieties in the ﬁeld tests. Grasp poses for each sweet pepper are calculated using the segmented 3D point cloud of a sweet pepper.This work uses the method presented in (Lehnert et al., 2017) for selecting grasp poses. A summary of thismethod is presented in this section.The grasp selection method ﬁnds multiple grasp poses from point cloud data by computing the surfacenormals over the points using a ﬁxed patch size. These surface normals are used as initial candidate graspposes and subsequently ranked based on a utility function. The utility function is the weighted average ofthree normalised scores S i , S i and S i based on the surface curvature, distance to the point cloud boundaryand angle with respect to the horizontal world axis, respectively, where i is the current candidate pose. Thisutility function favours grasp poses that are close to the centre of the sweet pepper, on planar surfaces, arealigned with the horizontal world axis and away from discontinuities caused by occlusion. The utility , U i ofthe grasp pose i is calculated according to U i = (cid:88) j =1 W j S ij , given (cid:88) j =1 W j = 1 (3)where S ij is the normalised score of the grasp pose i and W j are weighting coeﬃcients that describe theimportance of each score.An example of the grasp selection method applied to real sweet pepper point clouds is shown in Fig 13where the utility of each grasp pose is represented as the gradient from red to black, where black has thelowest utility and the blue pose indicates the grasp pose with the highest utility. An added advantage ofthis grasping method is that if the grasp pose with highest utility is unsuccessful the next candidate posecan be used.Figure 13: Computed grasp poses. The utility of each grasp pose is shown as the gradient from red to black,where black is the lowest utility. The blue arrow indicates the best pose based on its utility. Motion planning is performed sequentially for attachment and then detachment. The attachment trajectorystarts at a ﬁxed oﬀset from the sweet pepper determined by the close range image capture location andmoves from the close range pose to a pre-grasp pose which has a ﬁxed oﬀset along the approach direction.The trajectory then makes a linear movement towards the selected grasp pose causing the suction cup tottach to the sweet pepper. This motion computed by the planner can be seen within Fig 14, depicted bythe green line. A ﬁxed oﬀset is also applied to the ﬁnal grasp pose of the attachment trajectory in order toensure the suction cup makes a seal (indicated by the green line drawn within the sweet pepper).Once the attachment trajectory has been executed attaching the suction cup, the end eﬀector is movedvertically along the z axis of the world frame from the ﬁnal grasp pose in order to decouple the suction cupfrom the cutting tool (as described in Section 3.3).The ﬁnal step for the motion planner is to compute a cutting trajectory. It was found that using a cuttingtrajectory that is aligned with the x axis of the world frame performed better than when aligned with theorientation of the peduncle. This method worked as estimating the orientation of the peduncle was sensitiveto the number of 3D points detected. The orientation of the cutting tool is kept level during the trajectory.An example detachment trajectory is shown as the yellow line in Fig 14 and includes both the inwards andoutwards motion of the cutting tool which has a ﬁxed orientated (red arrows) along the world x axis.The resulting end eﬀector trajectory for the attachment, separation and detachment stages for one harvestingcycle is shown in Fig 14. This ﬁgure depicts the trajectory of the end eﬀector relative to the estimated pose ofthe target sweet pepper, highlighting the three steps: capture image, attachment, separation and detachmentof the harvesting process.As discussed in Section 3.1, operating in a protected cropping environment with relatively planar row struc-ture simpliﬁes the planning an manipulation tasks so that we do not need to perform complex obstacleavoidance to move between the branches of a 3D plant structure. Our motion planner takes into accountself collisions with the robot arm and base platform, and assumes a simple planar obstacle closely behindthe crop row to reduce the probability of collisions with the plant. Trajectories that interact with the plantare generally “in and out” motions that reduce the chance of collisions.Figure 14: End eﬀector trajectory of a single harvesting trial for the estimated pose of a sweet pepper.The position of the end eﬀector is indicated by the coloured lines whereas the orientation is represented bythe red arrows. The trajectory begins (start pose) with the attachment stage indicated by the yellow line,transitioning into the separation stage as the blue line corresponding to a vertical motion and ﬁnishing withthe detachment stage illustrated as the green line. Experiments and Results

A ﬁeld trial was conducted within a protected cropping research facility in Cleveland, Queensland (Australia)over a 5 day period. Overall the robot platform has been tested on a total of 68 sweet peppers in a realprotected cropping system. Within this work two diﬀerent sweet pepper cultivars were trialled, Mercunoand Ducati.Three experiments are presented in the following section aimed at validating the perception system andoverall harvesting method. First, we present the accuracy of the sweet pepper segmentation system, todemonstrate that we can accurately segment out the ripe (red) sweet pepper of interest from the background.Second, we present experimental results for the peduncle segmentation system, comparing the performanceof the deep learning method to a previous hand-crafted 3D feature method. Finally, we present a ﬁeldexperiment of the full harvesting platform in a real protected cropping system, demonstrating the harvestingperformance of the ﬁnal integrated system.

Three crop rows were selected for the experimental work including a total of 68 sweets peppers. Each sweetpepper out of the total present within the crop rows were included in the experiment. An image of thecrop rows within the protected cropping system for this experimental work is shown in Fig 15a and exampleimages of the platform within the environment are shown in Fig 15b.The methodology for the ﬁeld trial is as follows. The robotic platform was positioned at the beginning of acrop row. The robot was then commanded to perform a single harvest cycle. If the robot failed to detach thesweet pepper, the robot arm moved back to its start position and the attempt was retried. If obstructionsor occlusions caused multiple failed attempts and that any further attempts would likely continue to fail thescene was modiﬁed by either removing leaves or by adjusting the position of the sweet pepper. Fig 16 showsexamples of how the modiﬁcations to the experiment were performed. The robot was then commanded torepeat the attempt from the same starting position. The results of each attempt were recorded and detailswere noted such as if a modiﬁcation was performed, whether a failure occurred, what was the cause of thefailure if any and if damage to the sweet pepper or plant occurred. (a) (b)

Figure 15: Setup for ﬁeld experiments in protected cropping system. (a) An image of the protected croppingenvironment used within the ﬁeld experiments. The cropping environment includes a horizontal trellis systemthat was replicated from real protected cropping systems in Queensland. (b) image of the platform withinthe protected cropping environment, illustrating the workspace of the robotic arm and harvesting tool.nce no more sweet peppers could be detected from the robots current position the platform was movedforward via remote control by 0.5 m (approximately half of the width of the cameras ﬁeld of view). Thisdistance was selected as it had suﬃcient overlap to detect sweet peppers from diﬀerent perspectives ifoccluded in the previous view.Each attempt was broken down into four parts where the measurement for success was: • Sweet Pepper Detection—a sweet pepper was detected and grasp pose was found; • Peduncle Detection—a peduncle was detected for the targeted sweet pepper; • Attachment—the suction cup attached to the sweet pepper suﬃcient to maintain a grasp; • Harvesting —the peduncle was cut and a sweet pepper was successfully placed into a storage bin.Additional notes were also recorded during the experiment on whether any damage to the sweet pepper orplant was caused during the harvesting process and categorised as: • Major or minor damage to the sweet pepper; • Major or minor damage to the plant or stem. (a) Unmodiﬁed(b) Modiﬁed

Figure 16: (a) Unmodiﬁed versus (b) modiﬁed sweet pepper, where leaves have been removed. (c) Unmodiﬁedversus (d) modiﬁed sweet pepper (no. 6) where leaves have been removed and the pose of the sweet pepperhas been adjusted. For this case, the sweet pepper was adjusted to be in front of the trellis string and themain stem of the plant.The parameters for the sweet pepper segmentation, grasp selection, peduncle segmentation and attachmentsubsystems described in previous sections are given in Table 1 and were determined empirically throughtesting of the robotic harvesting system. Fr this work HSV model and threshold were the same as that usedin previous work (Lehnert et al., 2017). a) (b)

Figure 17: Example sweet pepper and plant damage. (a) Sweet pepper with damage highlighted in bluefrom a previous attempt. (b) Example plant damage highlighted in blue where the cutting blade was placedincorrectly. It can be seen that the peduncle of this sweet pepper is short and behind the plant stem makingit diﬃcult for peduncle detection.Table 1: Parameters for harvesting experiment.Subsystem Parameter ValueSweet Pepper Segmentation HSV model parameters µ = [180 , . , . Σ = [255 , . , . E

3, 25 E π/ , E h offset ) 0.05 mAttachment Max no. attempts 5 Accurate segmentation of sweet pepper is a precursor for automated harvesting. In Section 4.1 we presenteda method, based purely on colour, to pixel-wise detect ripe (red) sweet peppers. We evaluate the performanceof this system by comparing to prior work (McCool et al., 2016) on sweet pepper detection and quantitativelyevaluate the performance using the area under the curve (AUC) of the precision-recall curve. Precision (P)and recall (R) are given by, P = T p T p + F p and R = T p T p + F n , (4)where T p is the number of true positives ( correct detections ), F p is the number of false positives ( falsealarms ), and F n is the number of false negatives ( mis-detections ). Ideally, precision and recall should be 1as this means there are no false positives and no false negatives.or evaluation, we manually annotated a set of red sweet pepper images and divided this into a training andtest set using a 2 : 1 split. The training set consists of 20 images and the test set consists of 10 images. Usingthis, we trained both the CRF-based (McCool et al., 2016) and our proposed colour-based approaches.The results in Fig 18 show that for the majority of the precision-recall curve the CRF-based and colour-basedapproaches have similar performance. It can be seen that once the recall exceeds 0.7, the performance ofthe colour-based approach drops considerably. By comparison, the performance of the CRF-based approachdegrades consistently. This leads to the CRF-based approach achieving an AUC of 0.789 compared to thecolour-based approach which achieves an AUC of 0.735. We attribute this performance diﬀerence to thereliance of a single features, for the colour-based approach, compared to the CRF-based approach whichmakes use of 4 features as well as taking into account the neighbouring information for the inference.From these results we conclude that for the task of detecting ripe (red) sweet pepper it is suﬃcient to use thecomputationally eﬃcient colour-based detector. However, if we were to change the task to also pick greensweet pepper then the CRF-based detector should be used; future work will explore the potential of such asystem. recall p r ec i s i on Colour detector, AUC=0.735CRF-based detector, AUC=0.789

Figure 18: Precision-recall curve for detecting red sweet pepper using the CRF-based approach (red) andthe colour-based approach (blue).

We present results for the two algorithms,

PFH-SVM and

MiniInception , executed on Harvey for detectingand segmenting peduncles. A small form GeForce 1070 was used for inference of the

MiniInception model.The system was deployed in a glasshouse facility in Cleveland (Australia) and consisted of two cultivar

Ducati and

Mercuno . To train the

MiniInception approach, 41 annotated images were used. These images camefrom two sites, 20 images were obtained from the same site in Cleveland several weeks prior to deployingthe robot which included a diﬀerent set of crop on the plant and 21 were obtained from another site in Giru,North Queensland.

The performance of the two algorithms,

PFH-SVM and

MiniInception , is summarised in Fig 19a. It canbe seen that the performance of the

MiniInception model is consistently superior to that of the

PFH-SVM approach. However, both approaches have relatively low performance with F scores of 0.313 and 0.132 forhe MiniInception and

PFH-SVM systems respectively.On average the execution time of the two algorithms is similar with the

MiniInception approach executingan average of 1704 points per second while the

PFH-SVM approach executes at an average of 1248 pointsper second. This measurement is reported as the two methods receive a diﬀerent number of points (3Dpoints vs 2D pixels) for the same data. (a) (b)

Figure 19: (a) Precision recall results for the

MiniInception (DeepNet) and

PFH-SVM (PFH) algorithmsbefore and after the ﬁltering step and (b) A comparison between using

MiniInception with and withoutextended training data.Introducing the ﬁltering step described in section 4.2 provides a considerable improvement in performancefor both algorithms. The F for the MiniInception and

PFH-SVM systems improve to 0.564 and 0.302respectively.For both algorithms, introducing the ﬁltering step leads to odd behaviour in the precision-recall curve. Thisis expected because we are altering the threshold on an algorithm, either

MiniInception or PFH-SVM , whoseperformance is dependent on another step greatly impacting its ﬁnal result. An example of this is illustratedin Fig 20 where introducing the ﬁltering step at diﬀerent thresholds leads to diﬀerent points of the pointcloud being considered as peduncles, and other points being suppressed. At low precision with high recall(low threshold) based on the assumption of selecting the maximum cluster size, a cluster that is a separateleaf or plant which ﬁts within 3D constraints may be selected (see Fig 20a). Once the precision becomeshigher, leaves and plant are thresholded out and the maximum cluster assumption becomes valid, resultingin the cluster of the peduncle been selected (see Fig 20b). Therefore in order for a peduncle cluster to beselected a minimum level of precision (threshold) is required.

Qualitative results for the

MiniInception segmentation algorithm are presented in Fig 21. From these resultsit can be seen that the deep network approach provides consistent results across multiple poses. Also, it canbe seen that the regions with high scores surround the peduncle region. We believe this, in part, explains a) (b)

Figure 20: Example behaviour of the post ﬁltering results where blue represents the classiﬁed pedunclepoints. (a) Example segmentation at low precision (low threshold) where the maximum cluster size hasselected a cluster that is a separate leaf or plant and ﬁts within the 3D constraints. (b) Same example withhigher precision—leaves and plant are thresholded out resulting in the cluster of the peduncle been selected.the poor precision-recall curve for the

MiniInception algorithm as these points will be considered as falsepositives and greatly reduce the precision value. This is despite their proximity to the peduncle. Thisalso explains the considerable gain achieved by introducing the ﬁltering step as many of these points willcorrespond to background regions and be discarded.In Fig 22 we present example results of the entire procedure for the

MiniInception algorithm, with ﬁltering,at varying thresholds. It can be seen that as the threshold is increased the erroneous points, such as thosebelonging to the stem, are removed. Even at higher threshold values a large number of points on the peduncleof the fruit are chosen.

One of the advantages of the

MiniInception approach is that it is much easier to annotate training data thanthe

PFH-SVM approach. To determine if this can be beneﬁcial we extended the training set of

MiniInception with an extra 33 images to a total of 74 annotated images. The extra images came from the Cleveland test siteafter performing the ﬁnal harvesting experiment. Another purpose of the additional images is to investigatethe potential improvement in performance when using additional domain speciﬁc images. This system isreferred to as

MiniInception-Extended .In Fig 19b it can be seen that the

MiniInception approach beneﬁts considerably by increasing the trainingset size. The F improves from 0.313 for MiniInception to 0.452 for

MiniInception-Extended , a relativeimprovement of 31%. Including the ﬁltering step again provides a boost in performance leading an F of0.631. This is a relative improvement of 10.8% and demonstrates one of the key potential advantages of thisdeep learning approach that it beneﬁts from increasing the training set size and annotating the training datais relatively easy as it required the labelling of a 2D image rather than a 3D point cloud (as is the case forthe PFH-SVM approach). Another approach would be to consider the use of synthetic data similar to Barthet al. (2017).

Results of the ﬁnal harvesting experiment are presented in the following section. A video of the roboticharvester demonstrating the ﬁnal experiment by performing autonomous harvesting of sweet peppers in aprotected cropping environment is available at http://bit.ly/experimental_results .igure 21: Example outputs from the MiniInception model. Images are overlaid with normalised conﬁdencescores from the CNN. It can be seen that majority of conﬁdence scores are high on peduncle pixels. Somehigh conﬁdence scores can also be seen on the stem and leaves of the plant but are mostly sparse. Majorityof false positives from stem and leaf segmentations can be ﬁltered out using further Euclidean clustering andconstraints on the resulting point cloud of all segmented pointsThe success rates for the experiment are presented in Fig 24 for the unmodiﬁed and modiﬁed scenarios andare broken down into four diﬀerent stages: sweet pepper segmentation, peduncle segmentation, attachmentand overall harvest success rates. Out of the total sweet peppers, 76.5% and 47% were successfully harvestedunder the modiﬁed and unmodiﬁed scenarios respectively. Modiﬁcations were made, such as removingleaves or adjusting the sweet pepper pose. The overall successful harvesting rate reﬂects the performanceof the detection, attachment and detachment stages. Table 2 presents additional details for the harvestingexperiment including the average number of attempts for each sweet pepper (1.9 for unmodiﬁed and 2.5for modiﬁed) and the performance for the two diﬀerent sweet pepper varieties. Arguably, one of the mostchallenging aspects of the harvesting process is the segmentation of peduncles—directly inﬂuencing whethera harvesting detachment can be executed. The results show that 84% of peduncles were detected within themodiﬁed scenario and 65% of the peduncles for the unmodiﬁed scenario.An example of images taken from the robot during the experiment are shown in Fig 23. These imagesshow the perspective of the camera used to detect the sweet pepper and peduncle. It can be seen thatthe perspective is selected such that the peduncle is in the centre of the image maximising the number ofpeduncle pixels whilst still keeping the sweet pepper within the ﬁeld of view. This camera perspective isdetermined using a previous estimate of the location of the target sweet pepper as described in Section 4.1.Within a protected cropping system, damage to the plant can lead to a loss in yield of that plant due todisease, reduced growth or worse case, death from the damage. It is therefore critical that the harvestingmethod achieves as minimal damage to the plant as possible. During the ﬁnal experiment, any damage tosweet peppers or the plant was recorded. Fig 25a shows the damage rates for the modiﬁed and unmodiﬁedigure 22: Example peduncle segmentation after ﬁltering from CNN responses with varying threshold valuesand projected to 3D points using a corresponding depth map. The threshold value increases from left toright. The segmented peduncle points are highlighted in blue.Figure 23: Example images of sweet peppers taken during the ﬁnal experiment from the robot’s camera. Avertical oﬀset is applied which shifts the sweet pepper down eﬀectively centering the peduncle within theimage aimed at improving the peduncle segmentation.experiments, separated into major and minor damage to either the sweet pepper or plant. Results show that5 (7%) sweet peppers suﬀered major damage (see Fig 17 for an example) and 3 (4%) plant/stems had majordamage for the modiﬁed scenario. Alternative designs for the cutting tool, such as the addition of a guard,could reduce damage to both the plant and fruit.An analysis of the timing for each stage of the harvesting process is illustrated in Fig 25b and presented inTable 3. The average time to detach a sweet pepper from the plant was 36.9 ( ± .

4) seconds where the mosttime was spent on the detachment stage (14.5 ( ± .

9) seconds). The detachment stage was the most timeconsuming as it involved the cutting motion which was executed with a slow end eﬀector velocity to ensurethe oscillating cutting blade successfully severed the peduncle. This detachment time could be reduced if amore powerful oscillating tool was used. The other major time consuming stages are the attachment andplacement stages as these also require motion of the robot arm.Unfortunately the placement stage was very ineﬃcient in time (9.2 ( ± .

4) seconds) as it was discovered afterthe experiment that the path planning algorithm was taking a large amount of time (including multipleplanning attempts) to ﬁnd a path to the packing crate due to a poor choice of predeﬁned waypoints. Thisproblem could easily be resolved in future work by choosing better waypoints for the packing crate ensuringthe path planning algorithm doesn’t waste a signiﬁcant amount of time re-planning.

It is important for future work to understand what are the major causes of harvesting failures within thesystem proposed. Therefore, a failure analysis was conducted to highlight what are some of the reoccurringepper Detection Peduncle Detection AttachmentSuccess Harvest Success020406080100 99% 84% 93% 76 . inorPepperDmg MinorPlantDmg MajorPepperDmg MajorPlantDmg D a m ag e R a t e ( % ) b a r w i d t h Modiﬁed Unmodiﬁed (a)

PepperDetec-tion GraspSelec-tion PeduncleDetec-tion Attach Detach Place T i m e ( s ) (b) Figure 25: (a) Sweet pepper and plant damage rates for autonomous harvesting. (b) Average time for eachstage of the harvesting process ( N = 68). Error bars indicate one standard deviation from the mean.Table 3: Execution Times for Harvesting Stages Harvesting stage Average time inseconds (std dev)

Sweet Pepper Detection 4.3 ( ± . ± . ± . ± . ± . ± . Total 36.9 ( ± . ) g) Diﬃcult sweet pepper - the sweet pepper was abnormal in size or shape making it diﬃcult forgraspingh) Obstruction of sweet pepper - the sweet pepper was fully occludedi) Attachment failure - the gripper was unable to grasp the sweet pepperEach failure was recorded and the rates at which they occurred are presented in Fig 26. The results show thatthe most frequent failure mode was (a) no peduncle detected, occurring 28% (30% for unmodiﬁed scenario)of the time. The failure modes (b) and (c) also represented a signiﬁcant portion of the failure cases andshow that detachment stage is challenging due a combination of inaccuracies of the peduncle detection anddynamic movement of peduncles during cutting.a) (b) (c) (d) (e) (f) (g) (h) (i)020406080100 28% 23% 22% 11% 6% 7% 5% 5% 2%30% 20% 24% 10% 9% 7% 5% 4% 2%Harvesting Failure Conditions Modiﬁed UnmodiﬁedFigure 26: Harvesting Failure conditions (N=127 for modiﬁed, N=99 for unmodiﬁed). The diﬀerent failurecases include: a) Peduncle Not detected b) Peduncle partially cut c) Peduncle moved d) Diﬃcult pedunclee) Path planning failure f) Obstruction of peduncle g) Diﬃcult sweet pepper h) Obstruction of sweet pepperi) Attachment failure. This paper describes an autonomous crop harvesting system that achieves impressive harvesting results ina real protected cropping environment. Automating the currently manual horticultural harvesting task hasbeen a long sought after goal with relatively slow development over the past few decades (Bac et al., 2014) andwith little commercial impact (Shamshiri et al., 2018). However, in this paper we have demonstrated a roboticsystem that approaches commercial viability. We highlight two key challenges for successful autonomousharvesting: 1) perception of the crop and environment; and 2) manipulation of the crop. This paper makesthree key contributions: • A proven in-ﬁeld robotic harvesting system that achieves a harvesting success rate of 76.5% in amodiﬁed scenario, • a in-depth analysis of the perception and harvesting ﬁeld trials of the robotic harvester, • and a novel method for peduncle segmentation using an eﬃcient deep convolutional neural networkin conjunction with 3D post-ﬁltering.The results demonstrate that visual detection of the crop and peduncle can be achieved in very diﬃcultsituations including challenging lighting conditions and with highly similar visual appearance to the back-ground. We demonstrate that the combination of eﬃcient deep learning models and 3D ﬁltering reduceproblems with estimating the critical cutting location within the harvesting process. Finally, the system isdemonstrated using a custom end-eﬀector that uses both a suction gripper and oscillating blade to success-fully remove sweet peppers in a protected cropping environment. The presented harvesting system achieveda 76.5% success rate (within a modiﬁed scenario) which improves upon our prior work which achieved 58%and related sweet pepper harvesting work which achieved 33% (Bac et al., 2017) (which has some diﬀerencesin the cropping system). Despite these improvements over the state of the art, a number of issues were madeapparent throughout our experiments. These are broken down into the perception, harvesting and hardwaredesign issues discussed below.The perception and planning systems have been shown to perform well for our speciﬁc problem. However,further advancements are necessary for a general system. To achieve a generic visual crop detection systemuitable for a range of crops, we believe a fast learning system capable of high detection accuracy that canbe trained from a small number of training images is required. This is a challenging task which remainsunsolved.The harvesting cycle time is currently slow (approximately 37 seconds), and reducing this cycle time wouldbe of high priority for future work. One of the most time consuming processes is the detachment step, asthe end-eﬀector is moved at a slow velocity to ensure the peduncle is fully severed. This can be addressedby improving the cutting rate of the oscillating tool such as: increasing its power; or investigating diﬀerentblade shapes (which could also improve the robustness of the cutting action under position uncertainty).Improvements to the cutting rate enable the end-eﬀector velocity to be increased during the cutting action.A common detachment failure case was either the blade partially cutting the peduncle or the peduncle movedout of the way during the cutting action. Improving the deep learning system by increasing the training datais one possible method to improve detachment reliability. Adding visual servoing methods could also increasethe robustness to changes in the environment such as when the peduncle shifts during the cutting action.Another challenge remaining with the methods presented in this paper are in improving reliable attachmentand detachment for the unmodiﬁed scenarios which include challenging visual and physical obstructionsfrom leaves and the surrounding plant. One potential way to improve the system would be to use activeperception (moving to see) methods which decide on how to move the camera to improve the view of thecrop that may be highly occluded from leaves.Protected cropping systems in tropical climates such as Northern Australia can diﬀer to other internationalgreenhouse systems with respect to trellising and potting methods. However, the underlying plant structuresuch as leaves, stems and sweet peppers are very similar including their physical and visual appearance.Under these assumptions we believe the work demonstrated could easily be applied to protected croppingsystems in sub-tropical climates which feature vertical trellising and pipe rail heating systems.The novel contributions of this work have resulted in considerable and extremely encouraging improvementsin sweet pepper picking success rates compared with the state-of-the-art. We believe that continuing tobuild on the system presented in this paper will result in further meaningful progress towards making animpact in the horticulture industry through a commercially viable system. The methods presented in thispaper may also be applied to a range of other high-value horticultural crops, and provides steps towards theultimate goal of fully autonomous and reliable crop management systems to reduce labour costs, maximisethe quality of produce, and ultimately improve the sustainability of farming enterprises. Acknowledgments

This work was supported by the Strategic Investment in Farm Robotics (SIFR) program—a initiative co-funded by QUT and the Department of Agriculture and Fisheries of the Queensland Government, Queens-land, Australia. Contributions to the work have also been supported by QUT’s Institute for Future Envi-ronments.We would like to thank Dr Elio Jovicich and Heidi Wiggenhauser of Queensland DAF for their help, support,and feedback towards organising and conducting the ﬁeld trial of the robotic harvesting platform.

References

ABARE (2014). Australian vegetable growing farms: An economic survey, 2012-13 and 2013-14. Researchreport, Australian Bureau of Agricultural and Resource Economics (ABARE).Bac, C. W., Hemming, J., van Tuijl, B. A., Barth, R., Wais, E., and van Henten, E. J. (2017). PerformanceEvaluation of a Harvesting Robot for Sweet Pepper.

Journal of Field Robotics , 34(6):1123–1139.ac, C. W., van Henten, E. J., Hemming, J., and Edan, Y. (2014). Harvesting Robots for High-value Crops:State-of-the-art Review and Challenges Ahead.

Journal of Field Robotics , pages n/a–n/a.Bachche, S. and Oka, K. (2013). Performance Testing of Thermal Cutting Systems for Sweet Pepper Har-vesting Robot in Greenhouse Horticulture.

Journal of System Design and Dynamics , 7(1):36–51.Baeten, J., Donn´e, K., Boedrij, S., Beckers, W., and Claesen, E. (2008). Autonomous fruit picking machine:A robotic apple harvester. In

Field and Service Robotics , pages 531–539. Springer.Barth, R., Hemming, J., and van Henten, E. J. (2016). Design of an eye-in-hand sensing and servo controlframework for harvesting robotics in dense vegetation.

Biosystems Engineering .Barth, R., IJsselmuiden, J., Hemming, J., and Henten, E. V. (2017). Synthetic bootstrapping of convolutionalneural networks for semantic plant part segmentation.

Computers and Electronics in Agriculture .Baur, J., Sch¨utz, C., and Pfaﬀ, J. (2014). Path Planning for a Fruit Picking Manipulator. In

InternationalConference of Agricultural Engineering .Beeson, P. and Ames, B. (2015). Trac-ik: An open-source library for improved solving of generic inversekinematics. In

Humanoid Robots (Humanoids), 2015 IEEE-RAS 15th International Conference on ,pages 928–935. IEEE.Bhattacharjee, T., Grice, P. M., Kapusta, A., Killpack, M. D., Park, D., and Kemp, C. C. (2014). A roboticsystem for reaching in dense clutter that integrates model predictive control, learning, haptic mapping,and planning. In

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2014)-3rd Workshop on Robots in Clutter: Perception and Interaction in Clutter , pages 14–18.Blanes, C., Mellado, M., Ortiz, C., and Valera, A. (2011). Review. technologies for robot grippers in pick andplace operations for fresh fruits and vegetables.

Spanish Journal of Agricultural Research , 9(4):1130–1141.Blasco, J., Aleixos, N., and Molt´o, E. (2003). Machine vision system for automatic quality grading of fruit.

Biosystems Engineering , 85(4):415–423.Bohren, J. and Cousins, S. (2010). The smach high-level executive [ros news].

IEEE Robotics & AutomationMagazine , 17(4):18–20.Bontsema, J., Hemming, J., Saeys, E. P. W., Edan, Y., Shapiro, A., Hoˇcevar, M., Hellstr¨om, T., Oberti, R.,Armada, M., Ulbrich, H., Baur, J., Debilde, B., Best, S., Evain, S., M¨unzenmaier, A., and Ringdahl, O.(2014). CROPS: high tech agricultural robots. In

International Conference of Agricultural Engineering ,pages 6–10.Bulanon, D. and Kataoka, T. (2010a). Fruit detection system and an end eﬀector for robotic harvesting ofFuji apples.

Agricultural Engineering International: CIGR , 12(1):203–210.Bulanon, D. M. and Kataoka, T. (2010b). A Fruit Detection System and an End Eﬀector for RoboticHarvesting of Fuji Apples. XII:1–14.Cubero, S., Diago, M. P., Blasco, J., Tard´aguila, J., Mill´an, B., and Aleixos, N. (2014). A new method forpedicel/peduncle detection and size assessment of grapevine berries and other fruits by image analysis.

Biosystems engineering , 117:62–72.De-An, Z., Jidong, L., Wei, J., Ying, Z., and Yu, C. (2011). Design and control of an apple harvesting robot.

Biosystems engineering .Gongal, A., Amatya, S., Karkee, M., Zhang, Q., and Lewis, K. (2015). Sensors and systems for fruit detectionand localization: A review.

Computers and Electronics in Agriculture , 116:8–19.Gongal, A., Silwal, A., Amatya, S., Karkee, M., Zhang, Q., and Lewis, K. (2016). Apple crop-load estimationwith over-the-row machine vision system.

Computers and Electronics in Agriculture , 120:26–35.an, K.-S., Kim, S.-C., Lee, Y.-B., Kim, S.-C., Im, D.-H., Choi, H.-K., and Hwang, H. (2012). Strawberryharvesting robot for bench-type cultivation.

Journal of Biosystems Engineering , 37(1):65–74.Hannan, M. and Burks, T. (2004). Current developments in automated citrus harvesting. .Hayashi, S., Shigematsu, K., Yamamoto, S., Kobayashi, K., Kohno, Y., Kamata, J., and Kurita, M. (2010).Evaluation of a strawberry-harvesting robot in a ﬁeld test.

Biosystems Engineering , 105(2):160–171.Hemming, J., Bac, C., van Tuijl, B., Barth, R., Bontsema, J., Pekkeriet, E., and van Henten, E. (2014).A robot for harvesting sweet-pepper in greenhouses. In

Proceedings of the International Conference ofAgricultural Engineering .Hemming, J., van Tuijl, B., Bac, W., and Barth, R. (2013). Test report of modules of harvester, includingsuggestions for revision and improvement. 33.Hung, C., Nieto, J., Taylor, Z., Underwood, J., and Sukkarieh, S. (2013). Orchard fruit segmentationusing multi-spectral feature learning. In

IEEE/RSJ International Conference on Intelligent Robots andSystems , pages 5314–5320.Ilievski, F., Mazzeo, A. D., Shepherd, R. F., Chen, X., and Whitesides, G. M. (2011). Soft robotics forchemists.

Angewandte Chemie International Edition , 50(8):1890–1895.Inc., S. R. (2016). Soft robotics enabling new markets in automation.Jain, A., Killpack, M. D., Edsinger, A., and Kemp, C. C. (2013). Reaching in clutter with whole-arm tactilesensing.

The International Journal of Robotics Research , page 0278364912471865.Karaman, S. and Frazzoli, E. (2011). Sampling-based algorithms for optimal motion planning.

The Inter-national Journal of Robotics Research , 30(7):846–894.Killpack, M., Kapusta, A., and Kemp, C. (2015). Model predictive control for fast reaching in clutter.

Autonomous Robots .Kitamura, S. and Oka, K. (2005). Recognition and cutting system of sweet pepper for picking robot in green-house horticulture. In

Mechatronics and Automation, 2005 IEEE International Conference , volume 4,pages 1807–1812. IEEE.Kitamura, S., Oka, K., Ikutomo, K., Kimura, Y., and Taniguchi, Y. (2008). A distinction method for fruitof sweet pepper using reﬂection of LED light. , 1:491–494.Kondo, N., Monta, M., and Noguchi, N., editors (2011).

Agricultural Robots: Mechanisms and Practice .Trans Paciﬁc Press.Kondo, N., Nishitsuji, Y., Ling, P. P., and Ting, K. C. (1996). Visual feedback guided robotic cherry tomatoharvesting.

Transactions of the ASAE , 39(6):2331–2338.Kondo, N., Yata, K., Iida, M., Shiigi, T., Monta, M., Kurita, M., and Omori, H. (2010). Development of anend-eﬀector for a tomato cluster harvesting robot.

Engineering in Agriculture, Environment and Food ,3(1):20–24.Lehnert, C., Perez, T., and McCool, C. (2015). Optimisation-based Design of a Manipulator for HarvestingCapsicum.Lehnert, C., Sa, I., Mccool, C., Upcroft, B., and Perez, T. (2016). Sweet Pepper Pose Detection and Graspingfor Automated Crop Harvesting. In

International Conference on Robotics and Automation .Lehnert, C. F., English, A., McCool, C., Tow, A. W., and Perez, T. (2017). Autonomous sweet pepperharvesting for protected cropping systems.

IEEE Robotics and Automation Letters , 2(2):872–879.ing, P. P., Ehsani, R., Ting, K., Chi, Y.-T., Ramalingam, N., Klingman, M. H., and Draper, C. (2004).Sensing and end-eﬀector for a robotic tomato harvester. In , page 1.American Society of Agricultural and Biological Engineers.McCool, C., Perez, T., and Upcroft, B. (2017). Mixture of lightweight deep convolutional neural networks:applied to agricultural robotics.

IEEE Robotics and Automation Letters .McCool, C., Sa, I., Dayoub, F., Lehnert, C., Perez, T., and Upcroft, B. (2016). Visual Detection of OccludedCrop: for automated harvesting. In

The International Conference on Robotics and Automation .Mehta, S. and Burks, T. (2014). Vision-based control of robotic manipulator for citrus harvesting.

Computersand Electronics in Agriculture , 102:146–158.Milioto, A. and Stachniss, C. (2018). Bonnet: An Open-Source Training and Deployment Framework forSemantic Segmentation in Robotics using CNNs.

ArXiv e-prints .Monkman, G., Hesse, S., Steinmann, R., and Schunk, H. (2007).

Robot grippers .Nguyen, T. T., Vandevoorde, K., Kayacan, E., De Baerdemaeker, J., and Saeys, W. (2014). Apple detectionalgorithm for robotic harvesting using a rgb-d camera. In

International Conference of AgriculturalEngineering, Zurich, Switzerland .Nuske, S., Wilshusen, K., Achar, S., Yoder, L., Narasimhan, S., and Singh, S. (2014). Automated visualyield estimation in vineyards.

Journal of Field Robotics , 31(5):837–860.Nuske, S. T., Achar, S., Bates, T., Narasimhan, S. G., and Singh, S. (2011). Yield estimation in vineyards byvisual grape detection. In

Proceedings of the 2011 IEEE/RSJ International Conference on IntelligentRobots and Systems (IROS ’11) .Rahnemoonfar, M. and Sheppard, C. (2016). Deep count: Fruit counting based on deep simulated learning.

Sensors .Ruiz, L. A., Molt´o, E., Juste, F., Pl´a, F., and Valiente, R. (1996). Location and characterization of the stem–calyx area on oranges by computer vision.

Journal of Agricultural Engineering Research , 64(3):165–172.Rusu, R. B., Blodow, N., and Beetz, M. (2009). Fast Point Feature Histograms (FPFH) for 3D registration.In , pages 3212–3217.Sa, I., Ge, Z., Dayoub, F., Upcroft, B., Perez, T., and McCool, C. (2016). Deepfruits: A fruit detectionsystem using deep neural networks.

Sensors .Sa, I., Lehnert, C., English, A., McCool, C., Dayoub, F., Upcroft, B., and Perez, T. (2017). Peduncledetection of sweet pepper for autonomous crop harvesting—Combined Color and 3-D Information.

IEEE Robotics and Automation Letters , 2(2):765–772.Scarfe, A. J., Flemmer, R. C., Bakker, H., and Flemmer, C. L. (2009). Development of an autonomouskiwifruit picking robot. In

Autonomous Robots and Agents, 2009. ICARA 2009. 4th InternationalConference on , pages 380–384. IEEE.Schuetz, C., Baur, J., Pfaﬀ, J., Buschmann, T., and Ulbrich, H. (2015). Evaluation of a direct optimiza-tion method for trajectory planning of a 9-dof redundant fruit-picking manipulator. In

Robotics andAutomation (ICRA), 2015 IEEE International Conference on , pages 2660–2666. IEEE.Shamshiri, R. R., Weltzien, C., Hameed, I. A., Yule, I. J., Grift, T. E., Balasundram, S. K., Pitonakova,L., Ahmad, D., and Chowdhary, G. (2018). Research and development in agricultural robotics: Aperspective of digital farming.

International Journal of Agricultural and Biological Engineering , 11(4):1–14.Sucan, I. A. and Chitta, S. (2016). Moveit!¸ucan, I. A., Moll, M., and Kavraki, L. E. (2012). The Open Motion Planning Library.

IEEE Robotics &Automation Magazine , 19(4):72–82. http://ompl.kavrakilab.org .van Henten, E. and Hemming, J. (2002). An autonomous robot for harvesting cucumbers in greenhouses.

Autonomous Robots , pages 241–258.Van Henten, E., Van Tuijl, B., Hemming, J., Kornet, J., Bontsema, J., and Van Os, E. (2003). Field Test ofan Autonomous Cucumber Picking Robot.

Biosystems Engineering , 86(3):305–313.Van Henten, E. J., Van’t Slot, D. a., Hol, C. W. J., and Van Willigenburg, L. G. (2009). Optimal manipulatordesign for a cucumber harvesting robot.

Computers and Electronics in Agriculture , 65(2):247–257.Wang, Q., Nuske, S. T., Bergerman, M., and Singh, S. (2012). Automated crop yield estimation for appleorchards. In , number CMU-RI-TR-.Yamamoto, K., Guo, W., Yoshioka, Y., and Ninomiya, S. (2014). On Plant Detection of Intact TomatoFruits Using Image Analysis and Machine Learning Methods.