Joint Multi-Leaf Segmentation, Alignment and Tracking from Fluorescence Plant Videos
11 Joint Multi-Leaf Segmentation, Alignment, andTracking for Fluorescence Plant Videos
Xi Yin, Xiaoming Liu, Jin Chen, David M. Kramer
Abstract —This paper proposes a novel framework for fluorescence plant video processing. The plant research communityis interested in the leaf-level photosynthetic analysis within a plant. A prerequisite for such analysis is to segment all leaves,estimate their structures, and track them over time. We identify this as a joint multi-leaf segmentation, alignment, and tracking problem. First, leaf segmentation and alignment are applied on the last frame of a plant video to find a number of well-aligned leafcandidates. Second, leaf tracking is applied on the remaining frames with leaf candidate transformation from the previous frame.We form two optimization problems with shared terms in their objective functions for leaf alignment and tracking respectively. Aquantitative evaluation framework is formulated to evaluate the performance of our algorithm with four metrics. Two models arelearned to predict the alignment accuracy and detect tracking failure respectively in order to provide guidance for subsequentplant biology analysis. The limitation of our algorithm is also studied. Experimental results show the effectiveness, efficiency, androbustness of the proposed method.
Index Terms —plant phenotyping, Arabidopsis, leaf segmentation, alignment, tracking, multi-object, Chamfer matching (cid:70)
NTRODUCTION P LANT phenotyping [1] refers to a set of method-ologies and protocols used to measure plantgrowth [2], architecture [3], composition [4], and etc.In contrast to the manual observation-based meth-ods, the automatic image-based approaches for plantphenotyping have gained more attention recently [5],[6]. As shown in Figure 1, plant researchers conductlarge-scale experiments in a chamber with controlledtemperature and lighting conditions. Ceiling-mountedfluorescence cameras capture images of a plant duringits growth period [7]. The pixel intensities of the imageindicate the photosynthetic efficiency (PE) of the plant.Given such a high-throughput imaging system, themassive data calls for advanced visual analysis inorder to study a wide range of plant physiologicalproblems [8], e.g., the heterogeneity of PE amongthe leaves. Therefore, the leaf-level visual analysis isfundamental to automatic plant phenotyping.This paper focuses on the processing of the rosetteplants where the leaves are at a similar height andform a circular arrangement. Our experiments aremainly conducted on
Arabidopsis thaliana , which is thefirst plant to have its genome sequenced [9]. Due to itsrapid life cycle, prolific seed production, and easinessto cultivate in the restricted space, Arabidopsis isthe most popular and important model plant [10] inthe plant research community. An automatic imageanalysis method for Arabidopsis, which is the mainfocus of this paper, is of essential importance for high-throughput plant phenotyping studies. Given a fluo-rescence plant video, our method performs multi-leafSegmentation, Alignment, and Tracking (SAT) jointly .Specifically, leaf segmentation [11] segments each leaf
Fig. 1.
Given a fluorescence plant video captured during itsgrowth period, our algorithm performs multi-leaf SAT jointly, i.e. , estimating unique and consistent-over-time labels for allleaves and their individual leaf structure like leaf tips. from the plant. Leaf alignment [12] estimates the leafstructure. Leaf tracking [13] associates the leaves overtime. This multi-leaf analysis is a challenging problemdue to several factors. First, the image resolution islow where the small leaves are even hard to be recog-nized by humans. Second, there are various degreesof overlap among leaves, which make it difficult tosegment each leaf boundary. Third, leaves within aplant exhibit various shapes, sizes, and orientations,which also change over time. Therefore, an effectivealgorithm should be developed to handle all thesechallenges.To the best of our knowledge, there is no previouswork focusing on leaf SAT simultaneously from plantvideos. To solve this new problem, we develop twooptimization algorithms. Specifically, leaf segmenta-tion and alignment are based on Chamfer Match-ing (CM) [14], which is a well-known algorithm toalign one object in an image with a given template.However, classical CM does not work well for align- a r X i v : . [ c s . C V ] M a y ing multiple overlapping leaves. Motivated by crowdsegmentation [15], where the number and locationsof the pedestrians are estimated simultaneously, wepropose a novel framework to jointly align multipleleaves in an image. First we generate a large set ofleaf templates with various shapes, sizes, and orien-tations. Applying all templates to the edge map of aplant image leads to the same amount of transformedleaf templates. We adopt the local search method foroptimization to select a subset of leaf candidates thatcan best explain the edge map of the test image.While leaf segmentation and alignment work wellfor one image, applying it to every video frameindependently does not enable tracking - associatingaligned leaves over time. Therefore, we formulate leaftracking on one frame as a problem of transformingmultiple aligned leaf candidates from the previousframe. The tracking optimization initialized with re-sults of the previous frame can converge very fastand thus results in enhanced leaf association andcomputational efficiency.In order to estimate the alignment and trackingaccuracy, two quality prediction models are learnedrespectively. We develop a quantitative analysis withfour metrics to evaluate the multi-leaf SAT perfor-mance. Furthermore, the limitation of our algorithmis studied. In summary, we make four contributions: (cid:5) We identify a new computer vision problem ofjoint multi-leaf SAT from plant videos. We collect adataset of
Arabidopsis and make it publicly available. (cid:5)
We propose two optimization algorithms to solvethis multi-leaf SAT problem. (cid:5)
We develop two quality prediction models topredict the alignment accuracy and tracking failure. (cid:5)
We set up a quantitative evaluation framework tojointly evaluate the performance.Compared to our earlier work [12], [16], we havemade five main changes: One term is modified inthe tracking objective function. The proposed methodis superior to [12], [16] on a larger dataset. Wedevelop two quality prediction models. We enhancethe performance evaluation procedure and add onemetric to evaluate segmentation accuracy. We studythe limitation of our tracking algorithm and showits robustness to leaf template transformation. Weextend our method to apply on RGB images [5] andcompare the segmentation results to [17].
RIOR W ORK
Plant image analysis has been studied in computergraphics [18]–[20] and computer vision [11], [21]. Forexample, a leaf shape and appearance model is pro-posed to render photo-realistic images of a plant [18].A data-driven leaf synthesis approach is developed toproduce realistic reconstructions of dense foliage [19].These models may not be applied to fluorescenceimages due to the lack of leaf appearance informa-tion. There are prior computer vision work on tasks such as leaf segmentation [11], [21], alignment [12],[22], tracking [13], [16], and identification [23]–[25].However, most previous studies focus on only one ortwo of these tasks. In contrast, our method addressesthree tasks of leaf SAT.
Leaf Segmentation can be classified into two cate-gories: segmentation of a detached leaf from natu-ral [22], [26]–[29] or clean background [24]; pixel-wise segmentation of each leaf from a plant [17], [25].Methods in the first category are usually used as thefirst step for leaf classification or species identification.[24] uses pixel-based color classification for leaf seg-mentation from a white background. [25] proposesactive contour deformation method for compoundleaf segmentation and identification.Our work belongs to the second category. It is verychallenging due to leaf variation and overlapping.Tsaftaris et al. organized a collation study of leafsegmentation on rosette plants in 2015 [30], [31]. Ourmethod is evaluated with other three methods. Twoof them are based on superpixels and watershedtransformation segmentation. [17] uses distance map-based leaf center detection and leaf split points detec-tion for leaf segmentation. Leaf Alignment aims to find the structure of a leaf,which is useful for leaf segmentation. [26] deforms apolygonal model to leaf shape fitting, where the baseand tip points are used to define a leaf template. Thesame points are used in [22] to model leaf shapes anddeform templates. Similarly, we use these two pointson our leaf templates for alignment. Our novelty liesin solving leaf segmentation and alignment jointly byextending CM to align multiple potentially overlap-ping leaves in an image.
Leaf Tracking models leaf transformation over time.A probabilistic parametric active contour model isapplied for leaf segmentation and tracking to auto-matically measure the temperature of leaves in [13].However leaves on those images are well separatedwithout any overlap and the active contours are ini-tialized via the ground truth segments, which is hardto achieve in real-world applications. [32] segmentsall leaves in a video separately and employs a merg-ing procedure to group the segments by exploitingthe angle properties of the leaves. [33] proposes agraph-based tracking algorithm by linking leaf de-tections across neighboring frames. All of them treattracking as a post processing after leaf segmentationon individual frame. In contrast, we employ a leaftemplate transformation to transfer the segmentationand alignment results between continuous frames. UR M ETHOD
Figure 2 shows our framework. Given a plant video,we first apply leaf segmentation and alignment on thelast frame to generate a number of well-aligned leafcandidates. Leaf tracking is considered as an align-ment problem with the leaf candidates initialized from
Fig. 2.
Overview of the proposed joint multi-leaf SAT. Given a plant video with t frames, the proposed method outputs theSAT results on each frame, and two prediction curves on the quality of alignment and tracking for each leaf. TABLE 1
Notations.
Notation Definition V , m the edge map and mask of a test image U , M the edge map and mask of a leaf template ˜ U , ˜ M the edge map and mask of a transformed leaf template DT the distance transform image of V H, S, R the numbers of template shapes, sizes, and orientations N the total number of leaf templates, N = HSRN the number of transformed leaf templates for optimization J, G the objective functions for alignment and tracking a the diagonal length of V ( c x , c y ) the center of a plant image ( c xn , c yn ) the center of the n th leaf candidate L , L the sets of N and N transformed templates A a N × K matrix collecting all ˜ M n from L K the number of pixels in the test image x a N -dim - indicator vector d a N -dim vector of CM distances in L C a constant value used in J D the number of maximum iterations in tracking N e , N l the number of estimated and labeled leaves in a frame M the collection of N e selected leaf candidates P a set of transformation parameters P = { p n } Nen =1 p n = [ θ, r, t x , t y ] (cid:124) is the parameter for U n ˆ t , , t , the estimated and labeled tips for one leaf ˆ T , T the estimated and labeled tips for one frame ˆ T , T the collections of estimated and labeled tips for all videos ˆ B , B the collections of estimated and groundtruth segmentationmasks for all videos N b the total number of labeled leaves e la the tip-based error normalized by the leaf length ID a N e × N l matrix of leaf correspondence ER a N e × N l matrix of tip-based errors in one frame ER the collection of all ER for labeled frames f the number of leaf without correspondence e , e the tip-based errors used in Algorithm 2 τ a threshold for comparing with tip-based errors F, E, T the performance metrics Q a , Q t the quality to predict alignment and tracking x a , x t the features to learn quality prediction models λ , , µ , the weights used in J and Gα , α the step sizes in the gradient descent of J and Gs the smallest leaf size we use a previous frame. During tracking, a leaf candidatewhose size is smaller than a threshold is deleted. Anew candidate is detected and added for trackingwhen there is a certain region of the image mask thatis not covered by the existing leaf candidates. Twoprediction models are learned to investigate the align-ment and tracking quality respectively. All notationsare summarized in Table 1. Our segmentation and alignment algorithm consistsof two steps. First, a pre-defined set of leaf templatesis applied to the edge map of a test image to generatean over-complete set of transformed leaf templates.Second, we formulate an optimization process to se-lect an optimal subset of leaf candidates.
Chamfer Matching (CM) [14] is a well-known methodused to find the best alignment between two edgemaps. Let V = { v i } and U = { u i } be the edgemaps of a test image and a template respectively. CMdistance is computed as the average distance of eachedge point in U with its nearest edge point in V : d ( U , V ) = 1 | U | (cid:88) u i ∈ U min v j ∈ V (cid:107) u i − v j (cid:107) , (1)where | U | is the number of edge points in U . CM dis-tance can be computed efficiently via a pre-computeddistance transform image DT ( g ) = min v j ∈ V (cid:107) g − v j (cid:107) ,which calculates the distance of each coordinate g toits nearest edge point in V . During the CM process,an edge template U is superimposed on DT andthe average value sampled by the template edgepoints u i equals to the CM distance, i.e. , d ( U , V ) = | U | (cid:80) u i ∈ U DT ( u i ) .Given a fluorescence plant image, it is first trans-formed to a binary image m by applying a threshold.The Sobel edge detector is applied to m to generatean edge map V . The goal of leaf alignment is totransform the D edge coordinates of a template U inthe leaf template space to a new set of D coordinates ˜ U in the test image space so that the CM distance issmall, i.e. , the leaf template is well aligned with V . Image Warping:
In our framework, there are twotypes of transformations involved including forwardand backward warping. We use affine transformationthat consists of scaling, rotation, and translation.As shown in Figure 3, let W : U (cid:55)→ ˜ U be a forward warping function that transfers the 2D edge ! M U M W ( U ; p ) W − ( X ; p ) ! U Fig. 3.
Forward and backward warping. points from the template space to the test image space,parameterized by p = [ θ, r, t x , t y ] (cid:124) : ˜ U = W ( U ; p ) = r (cid:34) cos θ − sin θ sin θ cos θ (cid:35) ( U − ¯ U )+ (cid:34) t x t y (cid:35) + ¯ U , (2)where θ is the in-plane rotation angle, r is the scalingfactor, t x and t y are the translations along x and y axisrespectively. ¯ U is the center of the leaf, i.e. , the averageof all coordinates of U , which is used to model theleaf scaling and rotation w.r.t. the leaf center.Let W − : ˜ U (cid:55)→ U be the backward warping fromthe image space to the template space. We denote X as a K × matrix including all coordinates in the testimage space. Thus, W − ( X ; p ) are the correspondingcoordinates of X in the template space. The purposefor this backward warping is to generate a K -dimvector ˜ M = M ( W − ( X ; p )) , which is the warpedversion of the original template mask M . Leaf Templates:
Since there is a large variation in leafshapes, it is infeasible to match leaf with one template.We manually select H basic templates (the st rowin Figure 4) with representative leaf shapes from theplant videos and compute their individual edge map U and mask M . We synthesize an over-complete setof transformed templates by selecting a discrete setof θ and r , which are expected to cover all potentialleaf configurations in V . This leads to an array of N = HSR leaf templates where S and R are thenumbers of leaf scales and orientations respectively(Figure 4). The yellow and green points in Figure 4are the two labeled leaf tips t , which are used to findthe corresponding leaf tips ˆ t in V via Equation 1.For each template U n , it scans through all possiblelocations on V and the location with the minimumCM distance is selected, which provides t x and t y optimal to U n . Therefore, with the manually selected θ , r , and exhaustively chosen t x and t y , N transformedtemplates are generated from H basic templates. Foreach transformed template, we record the 2D edgecoordinates of its basic template, warped templatemask, transformation parameters, CM distance andthe estimated leaf tips as L = { U n , ˜ M n , p n , d n , ˆ t n } Nn =1 .Note that L is an over-completed set of transformedleaf templates including the true leaf candidates as itssubset. Hence, the critical question is how to selectsuch a subset of candidates from L . Shapes (cid:1)
Sizes (cid:1)
Orienta-ons (cid:1)
Fig. 4.
Leaf template scaling and rotation from basic tem-plate shapes. The tip labels are shown in yellow and green.
The goal of leaf segmentation and alignment is tosegment each leaf and estimate the structure precisely.If the leaf candidates are well selected, there should beno redundant or missing leaves. Each leaf candidateshould be well aligned with the edge map of the testimage. This rationality leads to a three-term objectivefunction, which seeks the minimal number of leafcandidates ( J ) with small CM distances ( J ) to bestcover the test image mask ( J ).Each image contains around leaves while thenumber of potential candidates in L is , in ourcase. The selection space needs to be narrowed downsubstantially. To do this, we compute the CM distanceand the overlap ratio of each template to the testimage mask. We remove leaf templates whose CMdistance is larger than the average of all templatesor whose overlap ratio is smaller than . Finally,we generate a new set L with N (a few hundreds)templates. RANSAC [34] is not applicable here for tworeasons. First, it is difficult to define a model or eval-uation criterion for a random subset of leaf templates.Second, we have more outliers than inliers, whichmakes it hard to select the correct set in consensus.The objective function is defined on a N -dimindicator vector x , where x n = 1 means that the n th transformed template is selected and x n = 0 otherwise. Hence x uniquely specifies a combinationof transformed templates from L . The first term isthe number of the selected leaf candidates J = (cid:107) x (cid:107) .We concatenate d n from L to form a N -dim vector d . The second term, i.e. , the average CM distance ofthe selected leaf candidates, is formulated as: J = d (cid:124) x (cid:107) x (cid:107) . (3)The third term is the comparison between the syn-thesized mask and the test image mask. As shownin Figure 5, we convert the binary mask to a K -dim row vector m by raster scan. Similarly, eachwarped template mask ˜ M n is also a K -dim vector.The collection of ˜ M n from all transformed templatesis denoted as a N × K matrix A . Note that x (cid:124) A is indicative of the synthesized mask except that the …… …… : 1 × K N transformed templates : N × K { Fig. 5.
The process of generating m and A . pixel values of the overlapping leaves are larger than . We employ the arctan( · ) function, similar to [35],[36], to convert all elements in x (cid:124) A to be in the rangeof [0 , , f ( x ) = 1 π arctan( C ( x (cid:124) A −
12 )) + 12 , (4)where C is a constant controlling how close arctan( · ) approximates the step function. Note that the actualstep function cannot be used here since it is not differ-entiable and thus is difficult to optimize. The constant within the parentheses is a flip point separatingwhere the value of x (cid:124) A will be pushed toward either or . Therefore, the third term becomes: J = 1 K (cid:107) f ( x ) − m (cid:107) . (5)Finally, our objective function is: J ( x ) = J + λ J + λ J , (6)where λ and λ are the weights. These three termsjointly provide guidance on what constitutes an opti-mal combination of leaf candidates. Equation 6 is a pseudo-Boolean function . The basicalgorithm [37] is not applicable because our objectivecannot be written in the required polynomial form.We adopt the widely used local search method to op-timize Equation 6. The local search algorithm [38] forpseudo-Boolean function iteratively searches a smallneighborhood of x and updates x to its neighborhoodthat leads to a smaller function value.First, all elements in x are initialized as , i.e. ,all transformed templates are selected. We fix oneelement in x at each iteration by searching the neigh-borhood of x with the n th element being or ,denoted as x x n =0 and x x n =1 . According to the propo-sition in [38], a positive gradient indicates in thecorresponding element of the local optimal solution.Therefore, each iteration, we select the element withthe maximum gradient to remove redundant leaf templates. The gradient of the objective w.r.t. x is: dJd x = sign ( x ) + λ ( d (cid:107) x (cid:107) − d (cid:124) x (cid:107) x (cid:107) sign ( x )) − λ CπK A (cid:20) ( f ( x ) − m ) (cid:11) (1 + ( C ( x (cid:124) A −
12 )) ) (cid:21) (cid:124) , (7)where sign ( x ) is a function returning the sign of eachelement, and (cid:11) is the element-wise division of vectors.In each iteration, x is updated by x = x − α dJd x . The el-ement x n with the largest gradient is chosen and fixedto be or based on the smaller value of J ( x x n =0 ) and J ( x x n =1 ) . Once this element is fixed, its valueremains unchanged in the future iterations. The totalnumber of iterations is the number of transformed leaftemplates N . Finally, those elements in x equal to provide the combination of leaf candidates.This joint leaf segmentation and alignment is ap-plied on the last frame of a plant video to gen-erate N e leaf candidates that are used for track-ing in the remaining video frames. We denote theset of leaf candidates selected from L as M = { U n , ˜ M n , p n , d n , ˆ t n } N e n =1 , which means the basic leaftemplate U n is transformed by p n to result in a leafcandidate that is well-aligned with the edge map. Leaf tracking aims to assign the same leaf ID to thesame leaf through an entire video. In order to trackall leaves over time, one way is to apply leaf segmen-tation and alignment framework on every frame ofthe video and then build leaf correspondence betweenconsecutive frames. However, the leaf tracking consis-tency is an issue due to the potentially different leafsegmentation results on different frames. Therefore,we form an optimization problem for leaf trackingbased on template transformation.
Similar to Equation 6, we formulate a three-termobjective function parameterized by a set of trans-formation parameters P = { p n } N e n =1 , where p n is thetransformation parameters for leaf candidate U n .First, P is updated so that the transformed leafcandidates are well aligned with the edge map V . Thefirst term is computed as the average CM distance ofthe transformed leaf candidates: G = 1 N e N e (cid:88) n =1 d ( W ( U n ; p n ) , V ) . (8)The second term is to encourage the synthesizedmask from all transformed candidates to be similarto the test mask m . The synthesized mask of onetransformed leaf candidate is M n ( W − ( X ; p n )) , weformulate the second term as: G = 1 K (cid:107) N e (cid:88) n =1 M n ( W − ( X ; p n )) − m (cid:107) . (9) ( c nx , c ny ) ( c x , c y ) a * (cid:1) * (cid:1) * (cid:1) * (cid:1) * * (cid:1) s n θ n Fig. 6.
The angle difference - the long axis of leaves shouldpoint to the plant center.
One property of rosette plants such as
Arabidopis isthat the long axes of most leaves point toward thecenter of the plant. To take advantage of this domain-specific knowledge, the third term encourages therotation angle θ to be similar to the direction of theleaf center to the plant center. Figure 6 shows thegeometric relation of the angle difference, which canbe computed as c xn − c x s n − sin θ n , where ( c x , c y ) and ( c xn , c yn ) are the geometric centers of a plant and a leaf, i.e. , the average coordinates of all points in m and ˜ U n respectively, s n = (cid:112) ( c xn − c x ) + ( c yn − c y ) is thedistance between the leaf center and the plant center,and θ n is the rotation angle. Furthermore, since thisproperty is more dominant for leaves far away fromthe plant center, we weight the above angle differenceby s n and normalize it by the image size. The thirdterm is the average weighted angle difference: G = 1 N e a N e (cid:88) n =1 (cid:107) ( c xn − c x ) − s n sin θ n (cid:107) . (10)Finally, the objective function is formulated as: G ( P ) = G + µ G + µ G , (11)where µ and µ are the weights.Note the differences in two objective functions J and G . Since the number of leaves is fixed for tracking, J is not needed in the formulation of G . The numberof leaves is relatively small during tracking. Therefore, arctan( · ) is not needed since the synthesized mask isalready comparable to the test image mask. Given the objective function in Equation 11, ourgoal is to minimize it by estimating P , i.e. , P = argmin P G ( P ) . Since G ( P ) involves texture warping,it is a nonlinear optimization problem without a close-form solution. We use gradient descent to solve thisproblem. The derivation of G w.r.t. P can be writtenas: dG d p n = 1 N e | U n | ( (cid:79) DT x ∂ W x ∂ p n + (cid:79) DT y ∂ W y ∂ p n ) , (12)where (cid:79) DT x and (cid:79) DT y are the gradient images of DT at x and y axis respectively. These two gradientimages only need to be computed once for eachframe. ∂ W x ∂ p n and ∂ W y ∂ p n can be easily computed fromEquation 2 w.r.t. θ , r , t x and t y separately. Similarly, the derivation of G w.r.t. P is: dG d p n = 2 K (cid:104) N e (cid:88) n =1 M n ( W − ( X ; p n )) − m (cid:105) · ( (cid:79) M xn ∂ W − x ∂ p n + (cid:79) M yn ∂ W − y ∂ p n ) , (13)where (cid:79) M xn and (cid:79) M yn are the gradient images of thetemplate mask M n at x and y axis respectively. ∂ W − x ∂ p n and ∂ W − y ∂ p n can be computed based on the inversefunction of Equation 2.The derivation of G w.r.t. θ is more complex than tothe other three transformation parameters. For clarity,we only present the derivative over θ : dG dθ n = 2 N e a [( c xn − c x ) − s n sin θ n ] · (cid:104) | U n | ∂ W x ∂θ n − s n cos θ n − θ n s n | U n | [( c xn − c x ) ∂ W x ∂θ n + ( c yn − c y ) ∂ W y ∂θ n ] (cid:105) . (14)During optimization, P is initialized as the trans-formation parameters of the leaf candidates from theprevious frame and updated as p tn = p t − n − α dGd p n foreach leaf at iteration t . Note that this is a multi-leaf joint optimization problem because the computationof dG d p n involves all N e leaf candidates. The optimiza-tion stops when G does not decrease or it reaches themaximum iteration D . Given a multi-day plant video, we apply leaf segmen-tation and alignment algorithm on the last frame togenerate M and employ the leaf tracking toward thefirst frame. Due to plant growth and leaf occlusion,the number of leaves may vary throughout the video.If the area of any leaf candidate at one frame is lessthan a threshold s (defined as the number of pixels),we remove it from the leaf candidates.On the other hand, a new leaf candidate can bedetected and added to M . To do this, we compute thesynthesized mask of all leaf candidates and subtractit from the test image mask m to generate a residueimage for each frame. Connected component analysisis applied to find components that are larger than s .We then apply a subset of N leaf templates to find aleaf candidate based on the edge map of the residueimage. The new candidate is assigned with an existingleaf ID if its overlap to a previous disappeared leaf islarger than a threshold. Otherwise it will be assignedwith a new leaf ID. The new candidate is added into M and tracked in the remaining frames. While many algorithms strive for perfect results, itis inevitable that unsatisfactory or failed results areobtained on the challenging samples. It is critical foran algorithm to be aware of this situation so that future analysis does not rely on poor results. Oneapproach to achieve this goal is to perform the qualityprediction for the task, similar to quality estimation forfingerprint [39] and face [40]. The key tasks in ourwork include leaf alignment, estimating the two tipsof a leaf, and leaf tracking, keeping leaf consistencyover time. Therefore, we learn two quality predictionmodels to predict the alignment accuracy and detectthe tracking failure respectively. The prediction can beused to select a subset of leaves with high quality forsubsequent plant biology analysis [41].
Suppose Q a is the alignment accuracy of a leaf, whichindicates how well the two tips are aligned. We envi-sion what factors may influence the estimation of thetwo tips. First, the CM distance indicates how wellthe template and the test image are aligned. Second, awell-aligned leaf candidate should have large overlapwith the test image mask and small overlap withthe neighboring leaves. Third, the leaf area, angle,and distance to the plant center may influence thealignment result. Therefore, we extract a -dim featurevector x a including: the CM distance d ( W ( U n ; p n ) , V ) , the overlap ratio with the test image mask | m | ˜ M n (cid:12) m , the overlap ratio with the other leaves ˜ M n (cid:12) ( m − ˜ M n ) | ˜ M n | , thearea normalized by test image mask | m | | ˜ M n | , the angledifference | θ n − sin − c xn − c x s n | and the distance to the plantcenter s n . A linear regression model is learned byoptimizing the following objective on N a trainingleaves with ground truth Q a , which is proportionalto the alignment error (details in Sec. . . ). ω = arg min ω N a (cid:88) n =1 (cid:107) Q na − ωx na (cid:107) , (15)where ω is a -dim weighting vector to predict thealignment accuracy of each leaf. Due to the limitation of our algorithm, it is possiblethat one leaf might diverge to the location of theadjacent leaves and results in tracking inconsistency.We name it as a tracking failure . One example is shownin Figure 7, where labeled leaf has been assignedtwo different IDs ( and ) during tracking. Thechange happens from frame to frame . The goalof tracking quality prediction is to detect the moment when tracking starts to fail. We denote tracking qual-ity as Q t , where Q t = − means a tracking failure ofone leaf and Q t = 1 means tracking success.Similar to Section 3.3.1, we first extract a -dimfeature vector x a for one leaf. However x a alonecan not predict the tracking failure because it doesnot include temporal information. So we compare thefeatures x a of one frame with that of a reference frame ˆ x a , which is frames before x a . Since a tracking failure may result in abnormal changes in leaf area,angle, and distance to the center, we compute theleaf angle difference, leaf center distance, leaf overlapratio between the current and the reference frame.Finally, we form a -dim feature vector denoted as x t , including x a , x a − ˆ x a , the leaf angle difference θ n − ˆ θ n , the leaf center distance (cid:112) ( c xn − ˆ c xn ) + ( c yn − ˆ c yn ) , and theleaf overlap ratio M n ( W − ( X ; p n )) (cid:12) ˆ M n ( W − ( X ; ˆ p n )) M n ( W − ( X ; p n )) . Givena training set Ω = { ( x nt , Q nt ) } with Q nt ∈ {− , } ,a SVM classifier is learned as the tracking qualitymodel. ERFORMANCE E VALUATION
Leaf segmentation is to segment each leaf from theimage. Leaf alignment is to correctly estimate twotips of each leaf. Leaf tracking is to keep the leaf IDconsistent over the video. In order to quantitativelyevaluate the performance of joint multi-leaf SAT, weneed to provide the ground truth of the pixel-levelleaf segments in each frame, the two tips of each leaf,and the leaf IDs for all leaves in the video.As shown in Figure 7, we label the two tips of eachleaf and manually assign their IDs in several framesof one video. We record the label results in one frameas a N l × matrix T , where N l is the number oflabeled leaves and T ( n, :) = [ t x , t y , t x , t y ] records tipcoordinates of n th leaf in this frame. The collectionof all labeled frames in all videos is denoted as T ,where T = T { i, j } ( i = 1 , , ..., m, j = 1 , , ...n ) , m isthe number of labeled videos and n is the numberof labeled frames in each video. The total number oflabeled leaves in T is N b .During template transformation, the correspondingpoints of the transformed template tips in V becomethe estimated leaf tips [ˆ t x , ˆ t y , ˆ t x , ˆ t y ] . The leaf ID isassigned in the last frame starting from to thetotal number of selected leaves and kept the sameduring tracking. Similar to the data structure of T , thetracking results of all videos over the labeled framesis written as ˆ T . Given ˆ T and T , Algorithm 2 providesour detailed performance evaluation, which is alsoillustrated by a synthetic example in Figure 7.There are two concepts involved: frame-to-frameand video-to-video correspondence. For each esti-mated leaf, we need to find one corresponding leafin the labeled frame. Frame-to-frame correspondenceaims to assign a unique leaf ID to each leaf in theframe so that the IDs are consistent with our labeledIDs. As mentioned before, the frame-to-frame corre-spondence may not be consistent in the whole videodue to the tracking failures or more than one leaf IDscan be assigned to the same leaf in the video. Video-to-video correspondence aims to assign consistent andunique leaf ID to the same leaf in the entire video.We start by building frame-to-frame leaf correspon-dence, as in Algorithm 1 and the red dotted boxin Figure 7. To build the leaf correspondence of N e (cid:1) (cid:1) (cid:1) (cid:1) * (cid:1) * (cid:1) * (cid:1) * (cid:1) * (cid:1) * (cid:1) * (cid:1) * (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) * (cid:1) * (cid:1) * (cid:1) * (cid:1) * (cid:1) * (cid:1) * (cid:1) * (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) * (cid:1) * (cid:1) * (cid:1) * (cid:1) * (cid:1) * (cid:1) * (cid:1) * (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) * (cid:1) * (cid:1) * (cid:1) * (cid:1) * (cid:1) * (cid:1) * (cid:1) * (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) * (cid:1) * (cid:1) * (cid:1) * (cid:1) * (cid:1) * (cid:1) * (cid:1) * (cid:1) * (cid:1) * (cid:1) (cid:1) (cid:1) (cid:1) * (cid:1) * (cid:1) * (cid:1) * (cid:1) * (cid:1) * (cid:1) ID (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) ✔ (cid:1) ✔ (cid:1) ✔ (cid:1) (cid:1) ✔ (cid:1) ✔ (cid:1) ✔ (cid:1) (cid:1) ✔ (cid:1) ✔ (cid:1) (cid:1) ✔ (cid:1) ✔ (cid:1) (cid:1) ✔ (cid:1) ✔ (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) ID (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) ✔ (cid:1) ✔ (cid:1) ✔ (cid:1) (cid:1) ✔ (cid:1) ✔ (cid:1) ✔ (cid:1) (cid:1) ✔ (cid:1) ✔ (cid:1) ✔ (cid:1) (cid:1) ✔ (cid:1) ✔ (cid:1) ✔ (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) f = f = e = [0.1,0.2,0.3] ER ID ID (cid:1) Input (cid:1)
Frame 1: (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) ,, (cid:1) (cid:1) ,, (cid:1) (cid:1) ,, (cid:1) (cid:1) (cid:1) , (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) f = f = e = [0.1,0.2,0.3,0.4,0.5,0.6,0.7] ER ID ID (cid:1) Frame 2: (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) ,,, (cid:1) (cid:1) ,,, (cid:1) (cid:1) ,, (cid:1) (cid:1) , (cid:1) (cid:1) ,, (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:2)(cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) f = f = e = [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.1] e = [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.1] ER ID ID (cid:1) Frame 3: (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) Tracking results (cid:1)
Labeling results (cid:1) D (cid:1) / (cid:1) (cid:1) (cid:1) Fig. 7.
A toy example of executing Step of Algorithm 2on one video with frames. In frame , we illustrate theprocess of Algorithm 1. Each table shows the correspondingcomputation of each leaf from tracking results (each row) andlabel results (each column). Algorithm 1:
Build leaf correspondence [ f, ER , ID ] = leafMatch ( ˆ T , T ) . Input:
Estimated leaf tips matrix ˆ T ( N e × ) andlabeled leaf tips matrix T ( N l × ). Output: f = | N l − N e | , ER , and ID .Initialize D = ER = ID = N e × N l . for i = 1 , . . . , N e dofor j = 1 , . . . , N l do D ( i, j ) = e la ( t i , ˆ t j ) ; for k = 1 , . . . , min ( N e , N l ) do [ e min , i, j ] = min ( D ) ; D ( i, :) = Inf; D (: , j ) = Inf; ID ( i, j ) = 1 ; ER ( i, j ) = e min . estimated leaves with N l labeled leaves, a N e × N l matrix D is computed, which records all tip-basederrors of each estimated leaf tips ˆ t , with everylabeled tips t , normalized by the labeled leaf length: e la (ˆ t , , t , ) = || ˆ t − t || + || ˆ t − t || || t − t || . (16) Algorithm 2:
Performance evaluation process.
Input:
Tracking results ˆ T , label results T . Output: F , E , and T .Initialize f = 0 , e = e = [] .1. for i = 1 , . . . , m do ER = cell( N e , N l ), ID = N e × N l . for j = 1 , . . . , n do ˆ T = ˆ T { i, j } ; T = T { i, j } ; [ f , ER , ID ] = leafMatch ( ˆ T , T ) ; f = f + f ; e = [ e , ER ( ER (cid:54) = 0)] ; ER = ER + ER ; ID = ID + ID ; for k = 1 , . . . , min ( N e , N l ) do [ ID max , i, j ] = max ( ID ) ; ID ( i, :) = Inf; ID (: , j ) = Inf; e = [ e , ER { i, j } ] ;2. for τ = 0 : 0 .
01 : 1 do F ( τ ) = f + sum ( e >τ ) N b ; E ( τ ) = mean ( e ≤ τ ) ; T ( τ ) = sum ( e ≤ τ ) N b . We build the leaf correspondence by finding a num-ber of min ( N e , N l ) minimum errors in D that do notshare columns or rows, which results in min ( N e , N l ) leaf pairs and f = | N l − N e | leaves without cor-respondence. Finally, it outputs the number of un-matched leaf f , ER recording tip-based errors and ID recording the leaf correspondence. This frame-to-frame correspondence is built on all frames and theresults are added into ER and ID . We build the video-to-video leaf correspondence using the accumulated ID . e and e are the tip-based errors of leaf pairswith frame-to-frame and video-to-video correspon-dence respectively. The difference of e and e is fromestimated leaf . While it is well aligned with labeledleaf in frame , it does not have leaf correspondencein all frames together.Finally we compute three metrics by varying athreshold τ . Unmatched leaf rate F is the percentageof unmatched leaves w.r.t. the total number of labeledleaves N b . F attributes to two sources, f leaves with-out correspondence and correspondent leaves withtip-based errors larger than τ . Landmark error E is the average of all tip-based errors in e that aresmaller than τ . Tracking consistency T is the per-centage of leaf pairs whose tip-based errors in e aresmaller than τ w.r.t. N b . These three metrics jointlyestimate the accuracy in leaf counting ( F ), alignment( E ), and tracking ( T ).In order to quantitatively evaluate the segmentationaccuracy, we annotate each image to generate a leafsegmentation mask where the pixels of the same leafare assigned with the same number over the video.We add the metric “Symmetric Best Dice” (SBD) [5]to compute the similarity between the estimated andthe ground truth segmentation masks. It is averagedacross all labeled frames. These four metrics are usedto evaluate the performance of our joint framework. XPERIMENTS AND R ESULTS
Our dataset includes Arabidopsis Thaliana videostaken in a -day period, which is sufficient to modelthe plant growth [42]. Each video has frames,with the image resolution ranging from × to × . For each video, we label the two tips of allvisible leaves and segmentation masks of frames,each being the middle frame of a day. In total welabeled N b = 1 , leaves. We select videos to formthe training set for template generation and parametertuning. The remaining videos are used for testing.The collection of all labeled tips and segmentationmasks are denoted as T and B .To generate leaf templates, we select leaves withrepresentative shapes and label the two tips for eachleaf, as in Figure 4. We select scales for each leafshape to guarantee the scaled templates can coverall possible leaf sizes in the dataset. For each scaledleaf template, we rotate it every ◦ in the ◦ space. Finally, the total number of leaf templates is N = SHR = 2 , with S = 10 , H = 12 , R = 24 . For each testing video, we apply our approach andcompare with four methods:
Baseline Chamfer Match-ing , Prior Work [12], [16], and
Manual Results . Proposed Method N templates are applied to theedge map of the last video frame to generate the sameamount of transformed templates. Leaf segmentationand alignment generate N e leaf candidates for leaftracking, which iteratively updates P according toEquation 11 towards the first frame. Baseline Chamfer Matching
The basic idea of CMis to align one object in an image. To align multipleleaves in a plant image, we design the baseline CM toiteratively align one leaf at one time. In each iteration,we apply all N templates to the edge map of atest image to generate N transformed leaf templates,which is the same as our first step. The transformedtemplate with the minimum CM distance is selectedand denoted as a leaf candidate. We update the edgemap by deleting the matched edge points of theselected leaf candidate. The iteration continues until of the edge points are deleted. We apply thismethod to the labeled frames of each video and buildthe leaf correspondence based on leaf centers. Multi-leaf Alignment [12] The optimization in [12] isthe same as our proposed leaf alignment on the lastframe. We apply [12] on all labeled frames and buildthe leaf correspondence based on leaf center distances.
Multi-leaf Tracking [16] The difference between theproposed method and [16] includes the modified G
1. The dataset, labels, and templates are publicly available at:http://cvlab.cse.msu.edu/project-plant-vision.html. in Equation 11. And [16] do not have the scheme togenerate a new leaf candidate during tracking.
Manual Results
In order to find the upper bound ofour proposed method, we use the ground truth labels T to find the optimal set of ˆ T . For each labeled leaf,we find the leaf candidate with the smallest tip-basederror e la from N transformed templates.For all methods, we record the estimated tip co-ordinates of all leaf candidates in the labeled framesas ˆ T . The transformed template masks are used togenerate an estimated segmentation mask for eachframe. We record the estimated segmentation masksof all labeled frames as ˆ B . ˆ T and T are used to evaluate F , E , and T . ˆ B and B are used to evaluate SBD . Qualitative Results
Figure 8 shows the results onthe labeled frames of one video. Since the baselineCM only considers CM distance to segment each leafseparately, leaf candidates are likely to be alignedaround the edge points, which result in large land-mark errors. While [16] can keep the leaf ID consistent,it does not include the scheme to generate a new leafcandidate during tracking (e.g., leaf in Figure 8). Ourproposed method performs substantially better thanothers. It has the same segmentation as the labeledresults and all leaves are well tracked. Leaf is deletedwhen it gets too small. Due to the limitation of a finite amount of templates, the manual results are notperfect. However, in our tracking method, we allowtemplate transformation under any parameters in P without limiting to a finite number. Quantitative Results
We first evaluate the SAT ac-curacy w.r.t. F , E , and T . We set the threshold τ tovary in [0 .
05 : 0 .
01 : 1] and generate the accuracycurves for all methods, as shown in Figure 9. When τ is small, i.e. , we have very strict requirements onthe accuracy of tip estimation, all methods work wellfor easy-to-align leaves. With the increase of τ , moreand more hard-to-align leaves with relatively largetip-based errors are considered as well-aligned leavesand contribute to the landmark error E and trackingconsistency T . Therefore, detecting more leaves willresult in higher E and T . It is noteworthy that ourmethod achieves lower landmark error and higher tracking consistency while segmenting more leaves.The baseline CM segments less leaves with higherlandmark error and lower tracking consistency. Themanual results are the upper bound of our algorithm.Obviously F will be and T will be with theincrease of τ because we enforce the correspondenceof all labeled leaves. But E will not be due to thelimitation of a finite template set. Overall, the pro-posed method performs much better than the baselineCM and our prior work. The improvement over [12]is mainly in a higher T , and it improves [16] in all
12 34678 9
12 34678 9
12 346789
12 3456789
12 34567 89
47 1 3689 2
47 51 3689 2
57 34169 28
47 6 39 128
47 6 39 128
47 6 39 128
47 6 39 1258
47 6 39 1258
12 34678 9
12 346789
12 346789
12 34567 89
12 34567 89 (a) (cid:1) (b) (cid:1) (c) (cid:1) (d) (cid:1) (e) (cid:1) (cid:1)
23 4 56 781012 47 6 39 1258
47 6 39 158 47 6 39 128
12 346789
12 34567 89
12 34678 9
12 34567 89
415 37 6289
107 4 3169 28 511
Fig. 8.
Qualitative results: (a) ground truth labels; (b) baseline CM; (c) [16]; (d) proposed method; and (e) manual results.Each column is one frame in the video (day/frame). Yellow/green dots are the estimated outer/inner leaf tips. Red contour is W ( U ; p ) . Blue box encloses the edge points matching W ( U ; p ) . The number on a leaf is the leaf ID. Best viewed in color. F τ Baseline [12][16]
ProposedManual E τ T τ Fig. 9.
Accuracy comparison of F , E , and T vs. τ for all different methods based on Algorithm 2. three metrics. However there is still a gap betweenthe proposed method and the manual results, whichcalls for future research.The SBD -based segmentation accuracy is shown inTable 2. The proposed method is again superior to thebaseline algorithm and the prior work.
Efficiency Results
Table 2 shows the average exe-cution time, which is calculated based on a Mat-lab implementation on a conventional computer. Ourmethod is superior to the baseline CM and [12]. It is alittle slower than [16] because of the updated G andthe scheme to add leaf candidates during tracking. Segmentation Accuracy
While there is no prior workfocuses on the joint multi-leaf SAT, leaf segmentation
TABLE 2
SBD and efficiency comparison (sec./image).
Baseline [12] [16] Proposed Manual
SBD
TABLE 3
Leaf segmentation
SBD accuracy ( ± std) comparison. A1 A2 A3 all[17] 74.2( ± ± ± ± ± ± ± ± has been studied especially on RGB imagery. Forexample, state-of-the-art performance [17] is reported P e rf o r m a n ce λ λ SBD ( λ ) E ( λ ) F ( λ ) SBD ( λ ) E ( λ ) F ( λ ) (a) P e rf o r m a n ce HSBDEF (b) P e rf o r m a n ce SSBDEF (c) P e rf o r m a n ce RSBDEF (d)
Fig. 10.
Alignment parameter tuning. We show the accuracy in E , F , and SBD when varying the coefficients of eachobjectve (a) and the template set size (b,c,d). in the 2014 Leaf Segmentation Challenge (LSC) [30].We apply our segmentation and alignment algorithmto the LSC dataset [5], which consists of three sets of
Arabidopsis ( A , A ) and tobacco ( A ). Two examplesfrom A and A are shown in Figure 11. Note that pre-processing of the RGB imagery is employed in orderto extend our proposed method to this LSC dataset.We compare the segmentation accuracy with [17] inTable 3. Our algorithm achieves higher SBD in A ,A , and in average. The segmentation accuracy onthe LSC dataset is much higher than that of our flu-orescence dataset because images in [5] are of higherresolution.(a) (b) (c) (d) Fig. 11.
Leaf segmentation results on LSC: (a) input image;(b) alignment result; (c) estimated segmentation mask; (d)ground truth segmentation mask.
We explore the sensitivity of the parameters in ourmethod. We use the training videos for parametertuning in our framework. For alignment parametertuning, we test on all labeled frames independentlyand evaluate the accuracy without using trackingconsistency T . For tracking parameter tuning, we teston the labeled frames of each video and evaluate theaccuracy using all four metrics.Figure 10 (a) shows the alignment parameter tun-ing results of the weights for each objective term inEquation 6. We first search for the optimal setting tobe: λ = 4 and λ = 300 . We then fix one parameterand change the other and evaluate the performanceat τ = 0 . . We observe that λ is relatively robustwith some improvement from to . The performanceincreases tremendously as λ increases, indicating that P e rf o r m a n ce µ TSBDFE (a) P e rf o r m a n ce µ TSBDFE (b) P e rf o r m a n ce DTSBDFE (c)
Fig. 12.
Tracking parameter tuning. Accuracy w.r.t. thecoefficients in the objective and the number of iterations. J is crucial. Without either term ( λ = 0 or λ = 0 ),the performance is not optimal.In order to analyze the impact of the number of leaftemplates, we reduce the value in one of H , S , and R at a time. As shown in Figure 10, the performanceincreases as the number of templates increases in allthree parameters. However, orientation is the mostimportant as leaves with different orientations aremore likely to have higher CM distances than leaveswith different shapes or scales.Figure 12 (a,b) shows the parameter tuning resultsin leaf tracking framework. Similarly, we first find theoptimal weights to be: µ = 1 and µ = 15 . We fixone parameter and change the other and evaluate theperformance at τ = 0 . . µ and µ are relatively robustto changes. However, they are still useful as withouteither term, the performance decreases.To study the impact of the number of iterations be-tween two frames, we change D and evaluate the per-formance. As shown in Figure 12 (c), the performanceincreases as D increases. However, it stabilizes when D is larger than because the algorithm alreadyconverges before reaching the maximum iteration.In summary, all parameters used in our algorithmare set as: λ = 4 , λ = 300 , C = 3 , µ = 1 , µ = 15 , D = 300 , α = 0 . , α = 0 . , and s = 40 . Alignment Quality Model
Data samples for evaluat-ing our alignment quality model are selected from e in Algorithm 2, which contains the tip-based errorsof all leaf pairs with of them are less than . .We select samples from e for each interval of tip-based error within [0 : 0 . . . Sample duplication isemployed when the number of sample in a particularinterval is less than . All samples with tip-basederror larger than . will also be selected but without Q a ˆ Q a traning samplestesting samples −
101 SVM output0 100 200 300 400 −
101 Gaussian filtering and thresholding (a) (b)
Fig. 13. (a) Alignment quality model applied to trainingand testing samples; (b) Tracking quality model applied toone video: the SVM classifier output (top), the result afterGaussian filtering and thresholding (bottom), and the twoblue lines are the labeled period of a tracking failure. duplication. Finally we select samples and extractfeatures x a for each sample. We assign Q a = 2 e la tomake the output in the range of [0 , . And Q a = 1 forall samples with e la > . . We randomly select samples as the test set and the remaining samples areused to train the model. Figure 13 (a) shows the resultsof the model on both training and testing samples.We use R to measure how well the model fits ourdata. It is defined as: R = 1 − (cid:80) ( Q a − ˆ Q a ) (cid:80) ( Q a − ¯ Q a ) , (17)where ˆ Q a is the predicted quality value and ¯ Q a isthe mean of Q a . In our model, R = 0 . and thecorrelation coefficients for all testing samples is . .Both values indicate a high correlation of Q a and ˆ Q a .This quality model is used to predict the alignmentaccuracy and generate one predicted curve for eachleaf, as shown in Figure 2. Tracking Quality Model
We visualize the results ofour method and find videos that have a trackingfailure of one leaf. As the goal for tracking qualitymodel is to detect when the tracking failure starts,we label two frames when the failure starts andends in each video. The starting frame is when aleaf candidate starts to change its location towardits neighboring leaves. The ending frame is when aleaf candidate totally overlaps another leaf. Amongall failure samples, the shortest tracking failure lengthis frames and the average length is frames.We select - frames near the ending frame asthe negative training samples with Q t = − and frames evenly distributed before the failure startsas the positive training samples with Q t = 1 . Thefeatures x t are extracted as discussed in Section 3.3.2and used to train a SVM classifier. The learned modelis applied to all frames to predict the tracking quality.Figure 13 (b) shows an example of the output. Weapply a Gaussian filter to remove outliers and deletethe failure whose length is less than frames (theshortest length of failure samples).We compare the first frame of a predicted failure Fig. 14.
Leaf alignment results on synthetic leaves withvarious amount of overlap. From left to right, the overlap ratiow.r.t. the smaller leaf is , , , , , and . with that of a labeled failure. When their distanceis less than frames (the average length of thefailure samples), it is considered as a true detection.Otherwise it is a false detection. Using the leave-one-video-out testing scheme, the quality modelgenerates true detections and false detectionsover labeled failures. Similarly, this quality modelis applied during tracking and outputs a predictioncurve for each leaf (shown in Figure 2). Any algorithm has its limitation. Hence, it is impor-tant to explore the limitation of the proposed method.First, one interesting question is to what extend oursegmentation and alignment method can correctlysegment leaves in the overlapping region. We answerthis question using a simple synthetic example. Asshown in Figure 14, our method performs well whenthe overlap ratio is less than . Otherwise it iden-tifies two leaves as one leaf, which appears to bereasonable when the overlap ratio is high (e.g., ).Second, our leaf tracking starts from a good initial-ization of the leaf candidates from the previous frame.Another interesting question is to what extend ourtracking method can succeed with bad initializations.To study this, frames with good tracking results areselected from videos (one for each). We change thetransformation parameters P to synthesize differentamount of distortions and apply our tracking algo-rithm on these frames. The leaf candidate is deletedonly if it becomes one point and the tip-based erroris set to be . We compute the average tip-based errorof all leaf candidates.We vary the rotation angle θ , the scaling factor r , and the translation ratio t xy , which is defined as t xy = √ t x + t y √ ( t x − t x ) +( t y − t y ) and the direction is randomlyselected. The average and range of the tip-based errorsfor all frames are shown in Figure 15 . Our trackingmethod reduces the initial tip-based error to a smallvalue. It is most robust to r and most sensitive to t xy .Figure 16 shows some examples. For rotation angleless than ◦ , our method works well for differentamounts of leaf rotations. For the scaling factor, aslong as the leaf candidate is not too small, our methodis very robust even if we enlarge the original leafcandidates to be . times larger. For the translationratio, it is sensitive because the direction is randomlyselected and leaf candidates are very likely to shift to . . . . . .
60 20 40 60 80 θ e .
17 0 .
34 0 .
50 0 . leaves neighboring separate leaf . . . . . .
60 0 . . . r e .
24 0 0 .
24 0 .
49 0 . leaves neighboring separate leaf . . . .
80 0 . . . . . t xy e .
10 0 .
20 0 .
30 0 .
40 0 . leaves neighboring separate leaf Fig. 15.
Mean tip-based error with different initializations. The axes on top of the figures show the initial tip-based errors.
Fig. 16.
Example results: the first row shows the initialization, and the second row shows the tracking results. the locations of the neighboring leaves. Furthermore,changing the initialization of θ and r for separateleaves (leaf , , , in Figure 16) leads to better perfor-mance than that of neighboring leaves (leaf , , , in Figure 16) because neighboring leaves will haveoverlap with each other and therefore influence thetracking results. Overall, as the distortion increases,the average tip-based error increases while some ofthe leaf candidates can still be well aligned. ONCLUSIONS
In this paper, we identify a new computer visionproblem of leaf segmentation, alignment, and trackingfrom fluorescence plant videos. Leaf alignment andtracking are formulated as two optimization prob-lems based on Chamfer matching and leaf templatetransformation. Two models are learned to predict thequality of leaf alignment and tracking. A quantitativeevaluation scheme is designed to evaluate the perfor-mance. The limitations of our algorithm are studiedand experimental results show the effectiveness, effi-ciency, and robustness of the proposed method.With the leaf boundary and structure informationover time, the photosynthetic efficiency can be computedfor each leaf, which paves the way for leaf-levelphotosynthetic analysis and high-throughput plantphenotyping. The proposed method and the evalu-ation scheme are potentially applicable to other plantvideos, as shown in the results on the LSC dataset. R EFERENCES [1] Fabio Fiorani and Ulrich Schurr, “Future scenarios for plantphenotyping,”
Annual Review of Plant Biology .[2] Marcus Jansen et al., “Simultaneous phenotyping of leafgrowth and chlorophyll fluorescence via growscreen fluoroallows detection of stress tolerance in arabidopsis thaliana andother rosette plants,”
Functional Plant Biology , 2009. [3] Samuel Trachsel, Shawn M Kaeppler, Kathleen M Brown, andJonathan P Lynch, “Shovelomics: high throughput phenotyp-ing of maize (zea mays l.) root architecture in the field,”
Plantand Soil , 2011.[4] Larissa M Wilson, Sherry R Whitt, Ana M Ib´a˜nez, Torbert RRocheford, Major M Goodman, and Edward S Buckler, “Dis-section of maize kernel composition and starch production bycandidate gene association,”
The Plant Cell , 2004.[5] Hanno Scharr, Massimo Minervini, Andreas Fischbach, andSotirios A Tsaftaris, “Annotated image datasets of rosetteplants,” Tech. Rep. FZJ-2014-03837, 2014.[6] Anja Hartmann, Tobias Czauderna, Roberto Hoffmann, NilsStein, and Falk Schreiber, “Htpheno: an image analysispipeline for high-throughput plant phenotyping,”
BMC Bioin-formatics , 2011.[7] Ladislav Nedbal and John Whitmarsh, “Chlorophyll flu-orescence imaging of leaves and fruits,” in
Chlorophyll aFluorescence . 2004.[8] Xu Zhang, Ronald J. Hause, and Justin O. Borevitz, “Naturalgenetic variation for growth and development revealed byhigh-throughput phenotyping in Arabidopsis thaliana,”
G3:Genes, Genomes, Genetics , 2012.[9] Arabidopsis Genome Initiative et al., “Analysis of the genomesequence of the flowering plant arabidopsis thaliana.,”
Nature
ImageAnalysis and Recognition . 2009.[12] Xi Yin, Xiaoming Liu, Jin Chen, and David M Kramer, “Multi-leaf alignment from fluorescence plant images,” in
WACV ,2014.[13] Jonas Vylder, Daniel Ochoa, Wilfried Philips, Laury Chaerle,and Dominique Straeten, “Leaf segmentation and trackingusing probabilistic parametric active contours,” in
ComputerVision/Computer Graphics Collaboration Techniques . 2011.[14] Harry G. Barrow, Jay M. Tenenbaum, Robert C. Bolles, andHelen C. Wolf, “Parametric correspondence and Chamfermatching: Two new techniques for image matching,” Tech.Rep., DTIC Document, 1977.[15] Bastian Leibe, Edgar Seemann, and Bernt Schiele, “Pedestriandetection in crowded scenes,” in
CVPR , 2005.[16] Xi Yin, Xiaoming Liu, Jin Chen, and David M Kramer, “Multi-leaf tracking from fluorescence plant videos,” in
ICIP , 2014.[17] Jean-Michel Pape and Christian Klukas, “3-d histogram-basedsegmentation and leaf detection for rosette plants,” in
ECCVWorkshops , 2014. [18] Long Quan, Ping Tan, Gang Zeng, Lu Yuan, Jingdong Wang,and Sing Bing Kang, “Image-based plant modeling,” in ACMTransactions on Graphics , 2006.[19] Derek Bradley, Derek Nowrouzezahrai, and Paul Beardsley,“Image-based reconstruction and synthesis of dense foliage,”
ACM Transactions on Graphics .[20] Yangyan Li, Xiaochen Fan, Niloy J Mitra, Daniel Chamovitz,Daniel Cohen-Or, and Baoquan Chen, “Analyzing growingplants from 4d point cloud data,”
ACM Transactions onGraphics , 2013.[21] Yann Ch´en´e, David Rousseau, Philippe Lucidarme, JessicaBertheloot, Val´erie Caffier, Philippe Morel, ´Etienne Belin, andFranc¸ois Chapeau-Blondeau, “On the use of depth camera for3D phenotyping of entire plants,”
Computers and Electronics inAgriculture , 2012.[22] Guillaume Cerutti, Laure Tougne, Julien Mille, Antoine Vaca-vant, and Didier Coquin, “Understanding leaves in naturalimages–a model-based approach for tree species identifica-tion,”
CVIU , 2013.[23] Sofiene Mouine, Itheri Yahiaoui, and Anne Verroust-Blondet,“Advanced shape context for plant species identification usingleaf image retrieval,” in
Proc. ACM Int. Conf. MultimediaRetrieval (ICMR) , 2012.[24] Neeraj Kumar, Peter N. Belhumeur, Arijit Biswas, David W.Jacobs, W. John Kress, Ida C. Lopez, and Jo˜ao VB. Soares,“Leafsnap: A computer vision system for automatic plantspecies identification,” in
ECCV . 2012.[25] Guillaume Cerutti, Laure Tougne, Julien Mille, Antoine Va-cavant, Didier Coquin, et al., “A model-based approach forcompound leaves understanding and identification,” in
ICIP ,2013.[26] Guillaume Cerutti, Laure Tougne, Antoine Vacavant, and Di-dier Coquin, “A parametric active polygon for leaf segmenta-tion and shape estimation,” in
Advances in Visual Computing .2011.[27] Siqi Chen, Daniel Cremers, and Richard J Radke, “Image seg-mentation with one shape priora template-based formulation,”
Image and Vision Computing , 2012.[28] Xiao-Feng Wang, De-Shuang Huang, Ji-Xiang Du, Huan Xu,and Laurent Heutte, “Classification of plant leaf images withcomplicated background,”
Applied Mathematics and Computa-tion , 2008.[29] Xianghua Li, Hyo-Haeng Lee, and Kwang-Seok Hong, “Leafcontour extraction based on an intelligent scissor algorithmwith complex background,” in
International Conference onFuture Computers in Education , 2012.[30] ,” http://plant-phenotyping.org/CVPPP2014-challenge.[31] Hanno Scharr, Massimo Minervini, Andrew P French, Chris-tian Klukas, David M Kramer, Xiaoming Liu, Imanol Luengo,Jean-Michel Pape, Gerrit Polder, Danijela Vukadinovic, et al.,“Leaf segmentation in plant phenotyping: a collation study,”
Machine Vision and Applications , 2015.[32] Eren Erdal Aksoy, Alexey Abramov, Florentin W¨org¨otter,Hanno Scharr, Andreas Fischbach, and Babette Dellen, “Mod-eling leaf growth of rosette plants using infrared stereo imagesequences,”
Computers and Electronics in Agriculture , 2015.[33] Babette Dellen, Hanno Scharr, and Carme Torras, “Growthsignatures of rosette plants from time-lapse video,” 2015.[34] Martin A Fischler and Robert C Bolles, “Random sampleconsensus: a paradigm for model fitting with applications toimage analysis and automated cartography,”
Communicationsof the ACM , 1981.[35] Xiaoming Liu, Ting Yu, Thomas Sebastian, and Peter Tu,“Boosted deformable model for human body alignment,” in
CVPR , 2008.[36] Xiaoming Liu, “Discriminative face alignment,”
PAMI , 2009.[37] Yves Crama, Pierre Hansen, and Brigitte Jaumard, “Thebasic algorithm for pseudo-boolean programming revisited,”
Discrete Applied Mathematics , 1990.[38] Endre Boros and Peter L Hammer, “Pseudo-boolean optimiza-tion,”
Discrete Applied Mathematics , 2002.[39] Eyung Lim, Xudong Jiang, and Weiyun Yau, “Fingerprintquality and validity analysis,” in
ICIP , 2002.[40] Kamal Nasrollahi and Thomas B. Moeslund, “Face qualityassessment system in video sequences,” in
Biometrics andIdentity Management . 2008. [41] Kathleen Greenham, Ping Lou, Sara E Remsen, Hany Farid,and C Robertson McClung, “Trip: Tracking rhythms in plants,an automated leaf movement analysis program for circadianperiod estimation,”
Plant Methods , 2015.[42] Oliver L Tessmer, Yuhua Jiao, Jeffrey A Cruz, David M Kramer,and Jin Chen, “Functional approach to high-throughput plantgrowth analysis,”
BMC systems biology , 2013.
Xi Yin received the B.S. degree in Electronicand Information Science from Wuhan Univer-sity, China, in 2013. Since August 2013, shehas been working toward her Ph.D. degreein the Department of Computer Science andEngineering, Michigan State University, USA.Her paper on plant segmentation won theBest Student Paper Award at Winter Con-ference on Application of Computer Vision(WACV) 2014. Her research interests includecomputer vision and deep learning.
Xiaoming Liu is an Assistant Professorin the Department of Computer Scienceand Engineering at Michigan State Univer-sity (MSU). He received the B.E. degreefrom Beijing Information Technology Institute,China and the M.E. degree from ZhejiangUniversity, China, in and respec-tively, both in Computer Science, and thePh.D. degree in Electrical and Computer En-gineering from Carnegie Mellon University in . Before joining MSU in Fall , hewas a research scientist at General Electric Global Research Center.His research areas are face recognition, biometrics, image align-ment, video surveillance, computer vision and pattern recognition.He has authored more than scientific publications, and has filed U.S. patents. He is a member of the IEEE.
Jin Chen received the BS degree in com-puter science from Southeast University,China, in , and the Ph.D. degree incomputer science from the National Univer-sity of Singapore, Singapore, in . Heis an Assistant Professor in the Departmentof Energy Plant Research Laboratory andthe Department of Computer Science andEngineering at Michigan State University. Hisgeneral research interests are in computa-tional biology, as well as its interface withdata mining and computer vision.