[PDF] Balanced Order Batching with Task-Oriented Graph Clustering

Abstract

Balanced order batching problem (BOBP) arises from the process of warehouse picking in Cainiao, the largest logistics platform in China. Batching orders together in the picking process to form a single picking route, reduces travel distance. The reason for its importance is that order picking is a labor intensive process and, by using good batching methods, substantial savings can be obtained. The BOBP is a NP-hard combinational optimization problem and designing a good problem-specific heuristic under the quasi-real-time system response requirement is non-trivial. In this paper, rather than designing heuristics, we propose an end-to-end learning and optimization framework named Balanced Task-orientated Graph Clustering Network (BTOGCN) to solve the BOBP by reducing it to balanced graph clustering optimization problem. In BTOGCN, a task-oriented estimator network is introduced to guide the type-aware heterogeneous graph clustering networks to find a better clustering result related to the BOBP objective. Through comprehensive experiments on single-graph and multi-graphs, we show: 1) our balanced task-oriented graph clustering network can directly utilize the guidance of target signal and outperforms the two-stage deep embedding and deep clustering method; 2) our method obtains an average 4.57m and 0.13m picking distance ("m" is the abbreviation of the meter (the SI base unit of length)) reduction than the expert-designed algorithm on single and multi-graph set and has a good generalization ability to apply in practical scenario.

Full PDF

BBalanced Order Batching with Task-Oriented Graph Clustering

Lu Duan ∗ , Haoyuan Hu ∗ , Zili Wu , Guozheng Li , Xinhang Zhang , Yu Gong , Yinghui Xu Zhejiang Cainiao Supply Chain Management Co. Ltd Alibaba Group{duanlu.dl,zili.ziliwu,guozheng.lgz,xinhang.zxh,haoyuan.huhy,renji.xyh}@cainiao.com,[email protected]

ABSTRACT

Balanced order batching problem (BOBP) arises from the processof warehouse picking in Cainiao, the largest logistics platform inChina. Batching orders together in the picking process to forma single picking route, reduces travel distance. The reason for itsimportance is that order picking is a labor intensive process and, byusing good batching methods, substantial savings can be obtained.The BOBP is a NP-hard combinational optimization problem and de-signing a good problem-specific heuristic under the quasi-real-timesystem response requirement is non-trivial. In this paper, ratherthan designing heuristics, we propose an end-to-end learning andoptimization framework named Balanced Task-orientated GraphClustering Network (BTOGCN) to solve the BOBP by reducing itto balanced graph clustering optimization problem. In BTOGCN, atask-oriented estimator network is introduced to guide the type-aware heterogeneous graph clustering networks to find a betterclustering result related to the BOBP objective. Through compre-hensive experiments on single-graph and multi-graphs, we show:1) our balanced task-oriented graph clustering network can directlyutilize the guidance of target signal and outperforms the two-stagedeep embedding and deep clustering method; 2) our method obtainsan average 4 .

57m and 0 .

13m picking distance reduction than theexpert-designed algorithm on single and multi-graph set and has agood generalization ability to apply in practical scenario. CCS CONCEPTS • Mathematics of computing → Combinatorial optimization ;• Information systems → Clustering ; Hierarchical data models ;• Applied computing → Multi-criterion optimization and decision-making . KEYWORDS

Balanced Order Batching Problem; End-to-End Learning and Opti-mization; Type-aware Graph Clustering; Task-oriented Estimator

ACM Reference Format:

Lu Duan ∗ , Haoyuan Hu ∗ , Zili Wu , Guozheng Li , Xinhang Zhang , YuGong , Yinghui Xu . 2020. Balanced Order Batching with Task-Oriented "m" is the abbreviation of the meter (the SI base unit of length) . ∗ Lu Duan and Haoyuan Hu contribute equally, and Haoyuan Hu is the correspondingauthor.Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected].

KDD ’20, August 23–27, 2020, Virtual Event, CA, USA © 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM.ACM ISBN 978-1-4503-7998-4/20/08...$15.00https://doi.org/10.1145/3394486.3403355

Graph Clustering. In

Proceedings of the 26th ACM SIGKDD Conference onKnowledge Discovery and Data Mining (KDD ’20), August 23–27, 2020, VirtualEvent, CA, USA.

ACM, New York, NY, USA, 11 pages. https://doi.org/10.1145/3394486.3403355

Order picking is a critical component of warehouse operations asshown in Figure 1, which affects customer service, logistic costsand even the efficiency of whole supply chain. It is the most labor-cost-intensive operation which takes up 50%- 70% [11, 35] of theoverall warehouse operating cost. In our case, Cainiao, the largestlogistics platform in China, which on average deals with a hundredmillion logistics orders in warehouses each day, is very concernedabout improving the warehouse operating efficiency, especially theorder picking part, since 1% improvement of it will reduce millionsof dollars cost.Due to its higher efficiency compared to discrete order picking ( pick-by-order ), batch picking have been become the most prevalentmode of manual order picking, where items of several orders in onebatch can be collected simultaneously on a single tour. The processof grouping a set of orders into pick lists (batches) is referred toas order batching and correspondingly, the optimization problemresearching how to divide orders into different batches to obtainthe shortest total picking distance is referred to as order batchingproblem (OBP). OBP is known to be NP-hard in the strong sense[12].Actually, OBP has some similar features with the well-known com-binational optimization problem, i.e., capacitated vehicle routingproblem (CVRP) [31], but differs from that with respect to the or-der’s impartibility, i.e. items in one order must be picked up in onebatch, which makes the OBP more complex than CVRP. Besides,in our case, the number of orders in each batch must be equal, forthe picking device always load a fixed number of picking baskets,one basket for one order, in practice. Therefore, we focus on the balanced order batching problem (BOBP) in this paper.In operation research community, OBP has been extensivelystudied for years. In general, there are two main classes of ap-proaches: exact algorithms based on mathematical modeling andhuman-designed heuristics methods. Although exact algorithmscan guarantee the optimality, they can only solve very limited scaleproblems ([18] reports only 50 orders in 900 storage locations)without applicability in real-world. So, many heuristics methodsare elaborately designed by experts with in-depth domain knowl-edge for practical application. However, with the application ofour IoT device LEMO , a sort of RF scanner invented by Cainiaofor picking orders without papery pick list, heuristics methods cannot provide good enough solution for such vast amount of ordersunder the quasi-real-time system response requirement. Becauseconstructive heuristics cannot guarantee the solution quality and https://iot.cainiao.com/portal/lemo a r X i v : . [ c s . L G ] A ug rade Order Logistics Order WMS

Packing TSPQuality Check…... Pick List

Matching Order

Order Picking

Figure 1: The work-flow of the order processing. The genera-tion of pick list will go through a process from coarse to fine .Coarse: 500 orders are filtered from the order pool. Fine: thefiltered orders can be combined (batched) into to pick lists(batches) until the capacity of the device is exhausted andthe performance indicator of pick list is the distance of pick-ing route generated by a TSP algorithm. In our case, the de-vice capacity is set up as a fixed number of orders. After picklists generation, the logistic orders will go through picking,packing, quality check before delivery. All these operationsare included in our Cainiao warehouse management system(WMS). meta-heuristics need many complex algorithm iterations to get acompetitive solution.To overcome above weakness, we try to use the machine learningmethods to solve combinational optimization problem in view of itsfast online inference ability. However, the pure learning methodswhich directly predict a optimization problem solution using super-vised learning or reinforcement learning (RL) [2, 24, 38] have beenproven to have difficult in discovering the algorithmic structurein the huge solution space even provided with large scale trainingdata. Accordingly, the majority of solutions [1, 4, 5, 34, 43]adoptthe two-stage approaches which train machine learning modelfirstly and then input the model predications to an optimizationalgorithm to get the final task result. However, in usual, two-stageapproaches only get a sub-optimal solution. Therefore, some end-to-end approaches [13, 40, 41] have been developed recently, wherethe differentiable optimization algorithm layer is integrated intothe learning architecture so as to combine the merits of learningand optimization methods. However, most of these works focus onthe universal framework with general learning methods, such asauto-encoder (AE) [13] or primal graph convolutional networks(GCN) [41]. Obviously, due to the explicit hierarchical structure andmultiple relations between items and orders in BOBP, we need alearning method with more strong expression ability. Furthermore,these works commonly use a continuous relaxation of the discreteproblem to propagate gradients through the optimization proce-dure, while we prefer a more direct and exact method to handle thenon-differentiable problem. Therefore, we introduce an improvedlearning and optimization end-to-end framework named

BalancedTask-oriented Graph Clustering Network (BTOGCN) to solve theBOBP. Since the BOBP naturally can be regarded as a variant of clus-tering problem, we construct a heterogeneous order-item (orderbatching) graph so that the BOBP is transformed into a specialgraph clustering task on this graph with cluster balance constraintand the non-differentiable task loss. For better discovering the struc-ture information hidden in the hierarchal heterogeneous graph ofBOBP, we devise a type-aware heterogeneous graph convolutionalnetworks (HetGNN). Then the embeddings of orders learned byHetGNN are clustered in balance by a differentiable optimizationlayer. However, the task-based loss (total picking distance) which isthe objective of BOBP in our case, cannot be derived with respectto the cluster results (order batches) because of the optimizationprocess used to calculate the loss is related to the item set of clus-ters. As a result, we propose a task-oriented estimator network toapproximate the task-based loss, which makes it possible to trainour model in end-to-end way.In summary, we list the contributions of this work as below: • First and foremost, to our best knowledge, we are the firstone to propose a task-oriented end-to-end graph learningand optimization framework to solve the BOBP. In the mean-while, our approach has shown its potential to solve manyclassical combinational optimization problems, such as VRPand bin packing problem. • Next, we use the objective of BOBP as the task-loss to guidethe graph clustering. However, the task-loss is non-differentiablewith respect to batches for the sake of the set union opera-tion and picker route optimization process. In order to tacklethis problem, we propose a task-oriented estimator networkto approximate the task-based loss, which enables our modelbeing trained in the unsupervised manner. • Then, in view of the natural hierarchical structure of BOBP,we design a type-aware heterogeneous graph convolutionalnetworks (HetGNN) on balanced order batching graph withdifferent types of nodes and edges. And the HetGNN has beenproven its superior representation learning ability combinedwith a simple k-means layer and greedy assignment to obtainbetter solutions. • As a result, we have saved 4.57m and 0.13m average pick-ing distances than expert-designed heuristic algorithm [20]on single graph and multi-graph dataset, respectively. Nu-merical results also demonstrate that our BTOGCN methodsignificantly outperforms the two-stage models without taskgoal and has a good generalization ability to apply in practi-cal scenario.

Motivated by recent studies on decision-focused learning[8, 13,40, 41] which include optimization process into training architec-ture to improve performance of downstream decision, we proposean end-to-end learning and optimization framework BTOGCN tosolve a very complex combinational optimization problem namedBOBP. However, our task has some complex and problem-specificchallenges to tackle compared to the aforementioned universal task.First, our task loss cannot be directly calculated based on thedecision solution, because in BOBP, the computation of pickingdistance with respect to the batch needs extra algorithm, specifically,e have to union the items of orders in one batch at first, then solvea traveling salesman problem (TSP) to get the picking distance forthese items. Obviously, the set union operation and TSP algorithmmake the task loss non-differentiable with respect to the discretebatch. On the contrary, in [41] which combines the representationlearning with explicit soft k-means algorithm, the loss functioncan not only be directly computed in the forward pass, but alsonaturally differentiable to the k-means assignment matrix r , becausethe original discrete cluster is continuously relaxed in their modelall the time. The similar relaxation approach is also applied in[40] to simplify the problem. Thus, we must calculate the taskloss in a faster way to improve the training efficiency and solvethe derivative problem in other way. In fact, similar problems areoften countered in RL, thus the Q-network is devised to predict thefuture cumulative reward in a sequential decision-making processgiven current state and action[25]. With the growing concern withthe decision-based learning, [7] proposes an end-to-end learningscheme which integrates the non-differential decision evaluationprocess into the learning architecture via a task-goal estimator. Wefollow the idea of task-based estimator in our work.Second, in view of the similarity of BOBP with clustering prob-lem, we take into consideration of using deep clustering methodsto get batches. In recent years, numerous deep clustering methods[13, 14, 42, 44] are proposed, which aim to find a "cluster-friendlylatent embedding space by deep learning to make a better clusterdecision in an end-to-end way. To cluster the data, these worksusually will directly learn the cluster centroid, because the clustercentroids is relatively stable in embedding space as a result of thechangeless data distribution, so deep clustering is often applied incommunity detection and image classification [6, 29]. However, inour scene, the cluster centroids might dramatically change accord-ing to the orders waiting to be picked. Therefore, an explicit andnon-parameter clustering algorithm layer is what we need, suchas k-means. But as we all know, in k-means, the derivative of clus-ter assignment with respect to data cannot be directly computed.Thus, [13] adapts the Gumbel-Softmax re-parameterization trickto estimate the gradient, [14] transforms the clustering problemto optimal transport(OT) problem and [41] transforms the clustercentroid update formula of k-means to a helper function and usethe implicit function theorem to get that derivation. Nevertheless,these methods cannot provide a balanced discrete cluster solutionfor us. Therefore, we add a dynamic assignment layer in the behindof the differential k-means layer to keep the balance of clusters.Third, we need a more powerful representation learning tech-nique to mine intricate relations among orders and items thanAE, which is commonly seen in varieties of end-to-end pipelines[16, 42, 44]. GCN has proved its power in many real-world applica-tion such as recommendation system[39] and chemical propertiesprediction[33]. However, most of successful GCN works[17, 37]focus on the homogeneous graph, while data represented as het-erogeneous graph is more natural in many cases. To the best ofour knowledge, some new works [26, 33] has applied the heteroge-neous graph to get success. [33] proposed a model named EAGCNwhich applies the "multi-attention" mechanism to process the het-erogeneous graph where each attention only aggregate neighborsconnected by the edges of corresponding type. Similarly, [26] ex-ploits the heterogeneous graph with attention mechanism, but the graph contains multiple types of nodes rather than edges. However,in our BOBP, we need to design a graph embedding architectureby combining information of different types of nodes and edges.As detailed above, these three key challenges brought by thecomplexity of BOBP are elaborately taken care in our BTOGCNapproah, to the best of our knowledge, there is no one end-to-endframework with the ability to solve the BOBP solely yet. In this section, we first give an informal description about howthe BOBP works in our warehouse management system (WMS).Then we will give a formal definition and optimization objectivefor BOBP, and discuss how to transform it into balanced graphclustering optimization problem.

In Cainiao, an average of 100 million express orders pass throughthe warehouses every day. Figure 1 illustrates the work-flow oforder processing. After the customer places an order on the e-commerce website, e.g. TaoBao or TMall, it will be converted toa logistics order (consisting of several items) to be processed inthe warehouse. For the sake of simplification, the order specifieslogistics order afterwards. At each decision-making time, thereare tens of thousands of logistic orders waiting to be processedin the order pool. As a well-known NP-hard problem [12], it isvery expensive to generate pick lists from such a huge candidateset. In order to reduce the computational complexity and meet therequirements of time efficiency, in practice, we choose the coarse-to-fine framework commonly seen in the traditional recommendationsystem scenario. Firstly, a coarse model (deep matching model)will be deployed to generate hundreds of orders (usually 500) fromthe current order pool, then the generated candidate set will beprocessed by a fine model (order batching network) to produce picklists. In our paper, we concentrate on the process of grouping theorders in candidate set into batches on candidate set (matching pool)after matching procedure. Essentially, the order batching networkconsists of the following two components: • order clustering, i.e. the transformation of orders into picklists, which cannot exceed the capacity of the device, herethe capacity is set to a fixed number of orders; • picker routing, i.e. planning the corresponding picking routefor the items in each pick list by solving it as TSP.Once clusters of orders have been formed, the calculation of thetravel distance for the routes requires a number of TSP solutions(one route for one batch). The balanced order batching problem in amultiple block warehouse that we analyze in this paper is reducedto the splitting of the N orders into K batches, each with c orders, soas to minimize the total picking distance. Hence, we are implicitlyassuming that N = c × K , which is referred as a balanced orderbatching problem. Given a set of candidate N orders O = { o , . . . , o N } , our goal is todivide them into K batches S = { s k , ≤ k ≤ K } ⊆ B ( B is the set ofall feasible batches), so that the total travel distance of all batchesenoted as d S = (cid:205) Kk = d k is minimized. In particular, the d k is thedistance calculated by solving the problem of finding a shortestroute to retrieve all items of the batch s k as a traveling salesmanproblem. Each feasible batch b m ∈ B is characterized by a zero-onevector a m = ( a m , . . . , a Nm ) , where a jm = o j is includedin batch b m ( b m ∈ B , o j ∈ O ) . Besides, binary decision variable x m is set to 1 if batch b m is chosen to S , or x m =

0, otherwise. Notethat, in our case, picking device capacity is expressed as a fixednumber c of orders in a batch, so that batch b m is feasible only if (cid:205) o j ∈ O a jm = c . Based on the above descriptions of problem andnotations, the mathematical model of the BODP is formulated asfollows: min (cid:213) b m ∈ B d m x m subject to: (cid:213) bm ∈ B a jm x m = , ∀ o j ∈ O ; (1) (cid:213) bm ∈ B x m = K ; (2) (cid:213) oj ∈ O a jm = c , ∀ b m ∈ B ; (3) x m ∈ { , } , ∀ b m ∈ B . (4) Equation (1) ensures that each order is exactly assigned to onechosen batch which usually referred as integrity condition. Equa-tion (2) and Equation (3) ensure that K batches are chosen and eachof them has c orders.

1 2 3 4 5 6 7 8 9 1 0 1 2 2 4 3 1 4 4 2 1 0 1 1 4 4 1 3 3 3 2 1 0 2 3 5 2 2 2 4 2 1 2 0 3 5 2 2 4 5 4 4 3 3 0 4 3 1 5 6 3 4 5 5 4 0 3 5 3 7 1 1 2 2 3 3 0 4 4 8 4 3 2 2 1 5 4 0 4 9 4 3 2 4 5 3 4 4 0

Distance MatrixItem

Feasible Batch AB Figure 2: Illustration for a specific heterogeneous graph G with N = and K = and c = . The orange squares andblue circles represent orders and items, respectively. Theblack lines and grey dashed lines represent order-to-orderand order-to-item edges. For the sake of beauty, all the item-to-item edges are not drawn. We show two kinds of cluster-ing solutions in graph (red and green) and the correspondingfeasible batches (A and B). We suppose that d B > d A meansthat A clustering is better than B given candidate order set O and item set I . A heterogeneous graph G (V , E) consists of a set of vertices V and a set of edges E . There is a set of node types A , and each vertex v ∈ V belongs to one of the node types, denoted by ϕ ( v ) = p ∈ A ,where ϕ is the mapping function from V to A . We represent anedge e ∈ E from the vertex i ∈ V to j ∈ V with a relation type r as a triplet e = ( i , j , r ) , where r ∈ R and R is the set of relation types. For a vertex j and edge type r , the set of linkages with itsneighboring nodes is defined as E j , r = {( i , j , r ′ ) ∈ E| i ∈ V , r ′ = r } .Correspondingly, we can conduct a heterogeneous order batchinggraph G with V = {O , I} , where O is the set of candidate orders, I is the set of item nodes included in the candidate orders andthen we simply define three types of relation R = { oo , oi , ii } , i.e.order-to-order, order-to-item and item-to-item. The detailed edgedescription is as follows: • order-to-order edge, i.e. the picking route consisting of itemsset in two different orders; • order-to-item edge, i.e. indicating whether the order o con-tains the item i ; • item-to-item edge, i.e. the distance between two differentitems.You can take Figure 2 as an example. In conclusion, we cantransform the balanced order batching problem into balanced graphclustering optimization problem. That is to say, we aim to find abest way to divide the order nodes O in G into K clusters(batches)so that the corresponding total picking distance calculated by itemsretrieving route is minimized. In Section 3.2, we firstly introduce our special constructed het-erogeneous graph G (V , E) which consists of 2 node types (order,item) and 3 edge types ( oo , oi , ii ), and then convert BOBP into aproblem with regard to how to divide the N nodes into K clusters(each with c order nodes) to obtain the best score. The evaluationcriteria of a cluster in BOBP is defined as the distance of pickingroute for retrieving its all items. It is noted that the essence oforder is a set of items and the transformation from batch to itemset is set union operation which is not differentiable. To tacklethis specific problem, we firstly propose Balanced Task-orientedGraph Clustering Network (BTOGCN) which follows the Task-Oriented Prediction Network (TOPNet) [7] with Graph Clustering[41]. The overview architecture of BTOGCN is illustrated in Figure3. BTOGCN learns a task-oriented estimator to directly estimatethe downstream picking distance given graph embeddings, clus-tering results and labels, which guides the upstream clustering toachieve better performance in the downstream evaluation. In graphclustering, we use a differentiable k-means layer to cluster the or-der embeddings generated by a specifically designed type-awareheterogeneous graph convolutional network (HetGNN). We refer itas type-aware heterogeneous graph clustering networks. Moreover,we adopt a surrogate loss function to warm up the target estimator,making it sufficient-and-efficient to train the networks. The remain-der of this section provides details on how the proposed methodworks. In this paper, we aim to learn the low-dimensional representation h v ∈ R n ϕ ( v ) and apply it to the downstream node clustering task,where n ϕ ( v ) is the dimension of embedding space for node type Graph Embedding Extractor E Graph Embedding Extractor E ̂ y i y i + Balanced K-Means LayerWarehouse Map Graph ClusterAssignment Surrogate CrossEntropy LossEstimatedTask-Based Loss EstimatedClustering Loss γβ + EstimatedError (MSE)Task-based Criteria:Clustering Pick Distance ℒ t ItemCabinetOrder

Task-OrientedEstimatorWallShelf i − i o − i Edge o − o Edge 1 1 2 21 2

HetGNN

EdgeNode 1 12 2 i − i o − i Edge o − o Edge 1 1 2 21 2

HetGNN y ′ Figure 3: Overview of the Balanced Task-oriented Graph Clustering Network. ϕ ( v ) , for each vertex v in the heterogeneous graph let N ( v ) be theset of adjacent vertexes of node v , E ( v ) denotes the edges connectedto v . Existing GCN-based works [17, 23, 37] focus mainly on ho-mogeneous graphs. Note that various relationship types can occursimultaneously in the E ( v ) , which brings challenges for heteroge-neous graph embedding. To tackle these challenges, we propose atype-aware heterogeneous graph clustering method, namely Het-GNN. Furthermore, for graph clustering task on heterogeneousgraphs[32], the graph is usually composed of millions of nodesand the objective is to detect community by training on subgraphs.However, in balanced order batching problem, the graph size isusually only a few thousand (order and item nodes) and it’s notsuitable to train on subgraphs since the task-based criteria is relatedto the whole graph. In particular, we use order embeddings as theinput of clustering and we do cluster on each graph. GCN-based methods follow a layer-wise propagation manner, in which apropagation layer can be separated into two sub-layers: aggrega-tion and combination. In the following section, we will concretelydemonstrate how to customize type-aware aggregation sub-layerand attention combination sub-layer for a heterogeneous balancedorder batching graph (HBOBG) with different types of node andedge attributes. The data-flow of type-aware aggregation and at-tention combination sub-layer is illustrated in Figure 4.

Type-aware Aggregation Sub-layer . Type-aware aggregationsub-layer is primarily motivated as an adaptation layer of GraphNeural Networks, which separately performs convolution operationon different types of graph neighborhoods. Aggregation operationsvary by node and edge types.For an edge, only order-to-order ( oo ) edge has the hidden state,while other types of edges, i.e.,the order-to-item ( oi ) edge representthe inclusion relationship between order and items, and the item-to-item ( ii ) edge denotes the normalized picking distance betweentwo items. h le r =  h le oo , if r = oo , , if r = oi , d e ii , if r = ii . As a rule, the hidden states h le oo is updated as the concatenationof previous hidden states of the edge itself and two nodes it linksto. So the aggregation of edge hidden state is defined as: h l + e oo = σ (cid:16) W le oo . (cid:16) h le oo | | h lo | | h lo (cid:17)(cid:17) , e oo = ( o , o , oo ) where || denotes the concatenation operation. BiLSTM

TSP

Target Order × ×

Attention ………… h lo h l +1 o Attention Combination Sub-LayerType-Aware Aggregation Sub-LayerOrderItem

TSP

Item SetBiLSTM

TSP

Route DistanceBelong to

Attention Attention…… × ProductNeighbors

Figure 4: The data-flow of type-aware aggregation and at-tention combination sub-layer. It’s an example of updatingthe embedding of an order node. 1) At type-aware aggrega-tion sub-layer, attention mechanism is used to calculate theneighboring information according to linkage type. 2) At at-tention combination sub-layer attention mechanism is usedto capture information among linkage types.

For an order node o ∈ O and item node i ∈ I , besides theinformation from neighbor nodes, the attributes of edges connectedto them are also collected. The aggregated neighboring embedding h lN ( o ) and h lN ( i ) are calculated as h l + N ( o ) = { h l + N ( o ) , r , ∀ r ∈ R } h l + N ( i ) = { h l + N ( i ) , r , ∀ r ∈ R } where h l + N ( o ) , r and h l + N ( i ) , r are aggregated neighbor embeddingwith linkage type r and the detailed calculation process will beshown later.Consider a vertex j ∈ V , each neighboring vertex m to vertex j with linkage type r : h l + ϕ ( j ) , m , r = W l + r (cid:16) h lm | | ˆ h le r (cid:17) , ∀ e = ( m , j , r ) ∈ E j , r and ˆ h le r = (cid:26) h le oo , if r = oo , (cid:156) , others . The | R | kinds of relations maintain different parameters W ϕ ( j ) , r . Topreserve the semantic of different types of relationships betweennodes, we utilize | R | attention scoring functions [36] to matchdifferent relation patterns, i.e., F l + = { f l + r | r ∈ R } . For a vertex j nd linkage type r , an attention coefficient is computed for eachedge e = ( m , j , r ) ∈ E j , r in the form as: attn l + m , j , r = σ ( f l + r ( h lj , h l + ϕ ( j ) , m , r )) where σ is an activation function implemented RELU[28]. Theattention coefficient attn l + m , j , r indicates the importance of edge eto the target vertex j with linkage type r . For simplicity, we adoptthe same form of attention mechanism for all relation types butdifferent in the parameters. A natural form of the attention scoringfunction is the dot product f l + r : h key × H val → h val , whichmaps a feature vector h key and the set of candidates’ feature vectors H val to an weighted sum of elements in H val . f l + r ( j , m ) = h l + ϕ ( j ) , m , r ∗ h ljT where ∗ denotes the dot product operation. Notably, the trainableattention parameter shared by the same edge type r . Thereby, thesoftmax is applied over the neighboring linkages with type r of ver-tex j for the normalization of the attention coefficient. Moreover, fororder-to-order linkages, the attention coefficient should be scaledwith the normalized picking distance d e oo before normalization α l + m , j , r = exp ( ˆ attn l + m , j , r )/ (cid:213) e = ( t , j , r )∈E j , r exp ( ˆ attn t , j , r l + ) where ˆ attn l + m , j , r = (cid:40) attn l + m , j , r ∗ d e oo , if r = oo , attn l + m , j , r , others . Now we have the hidden states of neighboring nodes in the samelow-dimensional space of the target node j , and weights of linkage r associated with vertex j . Then with the in-degree distribution ofvertex, the neighborhood aggregation for order and item node withlinkage type r can be performed as h l + N ( o ) , r = σ ( (cid:213) e = ( m , o , r )∈E o , r α l + m , o , r h l + ϕ ( j ) , m , r ) h l + N ( i ) , r = σ ( (cid:213) e = ( m , i , r )∈E i , r α l + m , i , r h l + ϕ ( j ) , m , r ) Attention Combination Sub-layer . After aggregating the neigh-bor’s type-aware information, we adopt an attention combinationstrategy for the order and item nodes as h l + o = ATT N l + O ( h lo , { H l + N ( o ) , h lo }) h l + i = ATT N l + I ( h li , { H l + N ( i ) , h li }) ATTN here is a naive dot-product attention mechanism mentionedabove and the parameters involved in ATTN is different from nodetypes at different layer, and we will not introduce it here. In a word,ATTN is used to combine context information among edge types.The whole algorithm is described in Algorithm 2 in Appendix A.1.Note that this method can actually be generalized to a hierarchicalattention mechanism based on the meta-path schemes [46].

Type-aware Sampling Strategy . With the proposed type-awareaggregation sub-layer and attention combination sub-layer, a whole-graph-batch training strategy can be conducted because all theentities should to be updated in one iteration to calculate the target-based criteria. For HBOBG, orders are connected with each other, so are the items. Regardless of the edge between the item and theorder, the total edges is up to 624250 fo a graph with 500 ordersand 1000 items. Due to the time consumption, such massive edgesshould be reduced by sampling strategy. More details can be seenin Appendix A.2.

Incorporate Graph Networks with Bidirectional-LSTM Model . The order-to-order edges should be converted into embedding be-fore being merged with order features. In our specific graph, the oo edge is represented by the picking route between two orders.Bi-directional Long Short-Term Memory (BiLTSM) [21]is a satis-factory sequential model that balances effectiveness and efficiency.Therefore, we employ the BiLTSM model to get route embeddingand integrate it to our graph neural network model as a part of anend-to-end framework. The output of BiLTSM is then used as theembedding of the picking route. In detail, h e oo = BiLT SM ( p , p , p , . . . , p n ) where p j represents item embedding of j -th item in picking route,and h e oo is the initial embedding of edge e oo described above. Now, we can directly cluster ordersas we get the low-dimensional and dense embedding of orderslearned by HetGNN, then the cluster result, i.e. K batches, will beused to calculate the total picking distance by non-differentiable setunion operation and TSP algorithm process which will be discussedin the next section. Instead of using a clustering method such ask-means,we use a differentiable k-means version in [41] to enablethe soft k-means assignment matrix ˆ y ∈ R N × K to be derived withrespect to the order embedding vector x , i.e., ∂ r ∂ x can be obtained.Furthermore, to achieve the balanced clustering, we adopt a globalsize constraint [45] to help each cluster to be approximately thesame size. L G = (cid:213) k ∈{ ,..., K } ( N (cid:213) j ˆ y jk / N − K ) However, the ˆ y is probability matrix and need a assignment al-gorithm to get discrete assignment result y ′ corresponding to the K batches. We simply implement a greedy assignment algorithm,which iteratively choose the highest probability under the clustersize constraint. The balanced order batching problem consists of two components:order clustering and picker routing. Order clustering is responsiblefor grouping orders into batches. And the picker routing firstlytransforms the batch into an item set, then plans the picking routeby TSP process for all of the items in that set. As the transformationfrom batch to item set is the non-differentiable set-union operation,the task-based criteria cannot be directly integrated into the end-to-end gradient based training process to guide the clustering. Toautomatically integrate the real task-based loss into our end-to-endlearning process, we propose a task-oriented estimator network T,which takes both the extracted input feature E ( x ) , the encoding ofour clustering assignment matrix ˆ y and labels y , to approximate thetask-based loss L t given a graph x , L t is the total picking distanceof the clustering solution generate by y ′ and can measure the orderlustering performance directly: L t ( y ′ ) = K (cid:213) k = d k T( E ( x ) , ˆ y , y ) = ( T , . . . , T K ) ∈ R N × K where d k is the picking route distance of orders in cluster k , whichis calculated by solving a TSP. Furthermore, the estimation error canbe derived as a standard mean square error between the task-basedloss L t and estimated score T : L e (T( E ( x ) , ˆ y , y ) , L t ( y ′ )) = K K (cid:213) k = ( d k − N (cid:213) j = T jk ˆ y jk ) Firstly,we borrow the idea of existing works, which mainly focus on usinga surrogate loss function L s to guide the learning process, wherepractitioners can either choose standard machine learning loss func-tions or other differentiable task-specific surrogate loss functions[3, 9, 10, 30, 40]. For unsupervised clustering task, the training sam-ple is not known previously, whereas the supervised classificationtask have ground truth as labels. If we have a clustering solution, wecan naturally make it as a ground truth. The better the performanceof the solution, the higher the accuracy of training. Nevertheless,there is no optimal solution for NP hard balanced batch processing,so we use a well-designed heuristic algorithm to generate a goodinitial solution to guide the clustering to meet the downstreamevaluation criteria and to the task-oriented goal in the preheatingstage. Therefore, the surrogate loss is a cross-entropy one and L s can be defined as follows: L s ( E ( x ) , y , ˆ y ) = − N N (cid:213) j = K (cid:213) k = ˆ y jk log ( y jk ) Besides, as the training goes on, we adopt a dynamic ground truthto make more effective clustering to step over the local minimal. Inorder to effectively avoid non-optimal local minimal and steadilyincrease task performance throughout training, we will update theground truth if we get a better clustering solution with a significantimprovement according to a paired t-test ( α = L s and the estimated task-based loss T as estimated clus-tering loss: L c = γ ∗ L s + β ∗ T The hyper-parameter γ and β depend on the estimation target error L e , i.e., when the L e is below thresh ε , let γ = , β =

1, otherwiselet γ = , β =

0. This exchange setting bridges the supervisionfrom both the labels and the task-based criteria. In fact, it enablesBTOGCN to "warm up" both the clustering and the task-orientedestimator T using the designed surrogate loss at the early stage.Intuitively, by utilizing the supervision from dynamic labelsmentioned above to "warm up" a reasonable clustering with thesurrogate loss function, the task-oriented estimator will not be usedto guide clustering to get more reasonable clusters until it estimatesthe task-based loss well enough, vice versa. In sum, a well-learnedtask-oriented estimator would also improve the clustering, whichcollaboratively forms a virtuous circle for the learning of both thetask-oriented estimator network and the clustering network.

Algorithm 1

Balanced Task-oriented Graph Clustering Network.

Input: x is balanced order batching graph sampled form the train-ing set. y is the label generated by a heuristic method. C is theclustering network, significance α is the threshold for pairedt-test Calculate soft clustering assignment matrix ˆ y = C( E ( x )) Execute greedy search on ˆ y to obtain the clustering results y ′ Evaluate r using picking route distance calculated by TSP algo-rithm and get a real task-based loss L t ( r ) = (cid:205) Kj = d k . if L t ( y ) > L t ( y ′ ) and OneSidedPairedTTest ( y , y ′ ) < α then update the heuristic label y = y ′ end if Encode the input graph x into E ( x ) via a HetGNN E obtain the estimated task-based score T ( E ( x ) , ˆ y , y ) let the estimated clustering loss L c = γ ∗ L s + β ∗ T update the clustering C and the HetGNN E by minimizing L c Update the task-oriented estimator T , the HetGNN E by min-imizing the MSE between the task-based loss and estimatedscore L e (T ( E ( x ) , ˆ y , y ) , L t ( y ′ )) In this computational experiment, we test our method and compareit with other existing methods on a set of real logistic orders, whichis from real-world order data in Cainiao WMS.

We collect a graphdataset on balanced order batching problem based on real-worldlogistic order data in Cainiao WMS. In particular, for convenience,each graph is always constructed by 500 online orders. The task issimplified as finding a best solution which averagely assigns these500 orders to 20 batches(i.e., each with 25 orders). The solution isevaluated by the total walking distance needed for retrieving allitems of these 20 batches. We take the problem of picking up allitems in one batch as a TSP and solve it with or-tools [15]. The unitof walking distance is meter.

Specifically, to demonstrate theeffectiveness of task-oriented objective, we compare BTOGCN withthe model which only utilizes supervised heuristic labels to predict asolution in Section 4.2.1. We call it Supervised-BTOGCN. However,it is noticed that except for the heuristics for BOBP, learning basedapproaches are the so-called two-stage methods, which can onlyget node embeddings or the clusters with no balance guarantee.Therefore, in order to get balanced clusters, we adopt a balancedk-means algorithm (BKM) [27] to conquer it. After that, we willcalculate the total walking distance of picking up all items in thesebatches as the task score by calling the or-tools to solve TSP foreach batch.As an alternative method, we consider the following well-knownmethods as our baseline, which are capable of producing clusters(batches) on our problem sets: • BKM: run BKM with the orders’ original high-dimensionalfeatures which contains the items’ warehouse coordinations. • Heuristics: a state-of-the-art algorithm proposed by [20].

AE+BKM: run BKM with the order embeddings produced byauto-encoder based method [19]. • DEC+Assign: run DEC [42] to get a relatively balance softassignment matrix, then apply greedy assignment algorithmfor it to satisfy the strict cluster size constraintNotedly, since the DEC-based and AE-based methods focus on thenode embeddings in which the entire dataset shares same clus-ter centers, they cannot be applied in multi-graphs with differentcluster centers.

We use the 2-Layer HetGNN with 128hidden units mentioned in 4.1 as graph embedding extractors E E E

1. To encoder the order-to-order edges, a 2-layerBiLSTM with 128 hidden units is used to get the initial route em-beddings. The task-oriented estimator T is 4-layer fully-connectedneural networks with hidden units 128, 128, 128, 1. We train ourmodel with Adam optimizer [22] by initial learning rate of 10 − and decay by a factor of 0 .

96. More detailed implementation canbe found in our code which will be shared upon paper acceptance.The implementation details of other baseline methods can be seenin Appendix B.1.

Table 1 presents the results of our purposed model compared tonon-learned baselines and state-of-art two-stage deep learningtechniques. As mentioned above, AE+BKM is a two-stage clusteringmethod which first learns node embedding individually and thendo clustering on it, while DEC is an unconstrained end-to-endclustering method, whose clustering results can meet the balancerequirements with the help of assignment algorithm, the wholemethod is denoted as DEC+Assign. AE+BKM and DEC+Assign arethen "off-line" evaluated by an optimization algorithm, namely TSP.In single graphs experiments, we totally use 20 graphs with onlinedata of different date and train our model on each separately.Firstly, it is seen that our proposed methods Supervised-BOTGCNand BOTGCN remarkably outperform the BKM, AE+BKM andDEC+Assign, which means the two-stage models will suffer fromthe lack of the guidance from task goal. In particular, our methodseven obtain better solutions compared to the well-designed heuris-tics which demonstrates the superiority of our model which cantransform a complex heuristics to a simple balanced k-means withno-inferior solution. Secondly, although the best batch in heuris-tics solution is better, the overall performance of all batches is nobetter than our method. In other words, BTOGCN places emphasison the global optimization rather than only one or two batcheshaving good quality. Moreover, due to the contribution of task-oriented network, BOTGCN has a further improvement comparedto Supervised-BOTGCN. Finally, another interesting note is thatthe end-to-end deep clustering method DEC+Assign has the worstperformance which supports our assumption that deep clusteringmethods will find the latent "clustering-friendly" representationspace, however, it may not also be "friendly" with the cluster-basedtask. In addition, compared to BKM, the AE+BKM gains a significantimprovement owing to the representation learning technique.

Table 1: Results On Single Graph Set. "Avg Batch Score", "MaxBatch Score" and "Min Batch Score" represent the averageof the mean/maximum/minimum picking route distance ofeach graph in single graph set, respectively.

Methods Avg Batch Score Max Batch Score Min Batch Score

Heuristics

BKM

DEC+Assign

AE+BKM

Supervised-BTOGCN

BTOGCN

Table 2: Results On Multi-Graph Set. "Avg Batch Score", "MaxBatch Score" and "Min Batch Score" represent the averageof the mean/maximum/minimum picking route distance ofeach graph in multi-graph set, respectively.

Methods Avg Batch Score Max Batch Score Min Batch Score

Heuristics

BKM

Supervised-BTOGCN

BTOGCN

Then, we investigate the generalization of our model on multi-graphs, as the order pool ceaselessly changes with the influx andconsume of orders in warehouse. Specifically, we hope our approachcan work on multi-graphs drawn from some order pool distributionso that it can be applied for on-line inference to unseen graphs.Indeed, we fix 40 graphs for training , 2 validation, and 8 test. Asshown in Table 2, the task-oriented method surpasses all othermethods thanks to the benefit of task-oriented estimator. We canconclude that the learned model successfully generalizes to com-pletely unseen graphs. Although the heuristic method will obtainrelatively good results, it’s time-consuming. The result with limitedtime is even worse than the vanilla BKM in the quasi-real-timeproduction systems. Taking advantage of estimating the task-basedloss using a task-oriented estimator network, BTOGCN boosts thedistance reduction by 3 .

99m per graph compared with the modelSupervised-BTOGCN trained with supervised label.

The balanced order batching by different methods are executed in awarehouse by the order picker. We randomly choose a balance orderbatching graph and do simulation in a warehouse to indicate thehigher performance of our purposed method. As shown in Figure5, for the sake of beauty, we just draw top-3 pick lists producedby BTOGCN on the map of the Cainiao warehouse in Hangzhou.Results of other methods can be seen in Appendix B.2. Obviously,the results demonstrate that BTOGCN can produce more reasonableorder batches than other methods.

This work aims to solve a practical problem existing widely in thewarehouse named balance order batching problem. In this paper,we firstly propose a method combining machine learning with op-timization algorithm to solve the balanced order batching problem(BOBP) which can directly and effectively improve the efficiency ofwarehouse operation. At the beginning, we give a formal problemdefinition, then reduce the BOBP to a balanced graph clusteringoptimization problem, whose task-based evaluation criteria is not

TOGCN

WallShelf FirstSecondThirdAll Items in Matching Pool

Figure 5: Top-3 pick lists generated by BTOGCN differentiable. To tackle this specific problem, we propose a novelend-to-end approach, called Balanced Task-Oriented Graph Clus-tering Network (BTOGCN), which consists of an elaborately de-signed type-aware heterogeneous graph clustering networks and atask-based estimator, which can automatically integrate task-basedevaluation criteria into the learning process with respect to thetask-based goal. In our evaluation, we perform extensive analysisto demonstrate the highly positive effect of our proposed methodtargeting balance order batching problem. In our future work, weplan to use this framework to solve other combinatorial optimiza-tion problems, such as CVRP and bin packing problem.

REFERENCES [1] Ashwin Bahulkar, Boleslaw K Szymanski, N Orkun Baycik, and Thomas CSharkey. 2018. Community detection with edge augmentation in criminal net-works. In . IEEE, 1168–1175.[2] Irwan Bello, Hieu Pham, Quoc V Le, Mohammad Norouzi, and Samy Bengio.2016. Neural combinatorial optimization with reinforcement learning. arXivpreprint arXiv:1611.09940 (2016).[3] Yoshua Bengio. 1997. Using a financial training criterion rather than a predictioncriterion.

International Journal of Neural Systems

8, 04 (1997), 433–443.[4] Giulia Berlusconi, Francesco Calderoni, Nicola Parolini, Marco Verani, and CarloPiccardi. 2016. Link prediction in criminal networks: A tool for criminal intelli-gence analysis.

PloS one

11, 4 (2016).[5] Matthew Burgess, Eytan Adar, and Michael Cafarella. 2016. Link-predictionenhanced consensus clustering for complex networks.

PloS one

11, 5 (2016).[6] Sandro Cavallari, Vincent W Zheng, Hongyun Cai, Kevin Chen-Chuan Chang,and Erik Cambria. 2017. Learning community embedding with communitydetection and node embedding on graphs. In

Proceedings of the 2017 ACM onConference on Information and Knowledge Management . 377–386.[7] Di Chen, Yada Zhu, Xiaodong Cui, and Carla P Gomes. 2019. Task-Based Learningvia Task-Oriented Prediction Network. arXiv preprint arXiv:1910.09357 (2019).[8] Priya Donti, Brandon Amos, and J Zico Kolter. 2017. Task-based end-to-endmodel learning in stochastic optimization. In

Advances in Neural InformationProcessing Systems . 5484–5494.[9] Adam N Elmachtoub and Paul Grigas. 2017. Smart" predict, then optimize". arXivpreprint arXiv:1710.08005 (2017).[10] Aaron Ferber, Bryan Wilder, Bistra Dilina, and Milind Tambe. 2019. MIPaaL:Mixed integer program as a layer. arXiv preprint arXiv:1907.05912 (2019).[11] Edward Frazelle and Ed Frazelle. 2002.

World-class warehousing and materialhandling . Vol. 1. McGraw-Hill New York.[12] Noud Gademann and Steef Velde. 2005. Order batching to minimize total traveltime in a parallel-aisle warehouse.

IIE transactions

37, 1 (2005), 63–75.[13] Boyan Gao, Yongxin Yang, Henry Gouk, and Timothy M Hospedales. 2019. Deepclustering with concrete k-means. arXiv preprint arXiv:1910.08031 (2019).[14] Aude Genevay, Gabriel Dulac-Arnold, and Jean-Philippe Vert. 2019. DifferentiableDeep Clustering with Cluster Size Constraints. arXiv preprint arXiv:1910.09036 (2019).[15] Google. 2019. OR-Tools. https://developers.google.com/optimization[16] Xifeng Guo, Long Gao, Xinwang Liu, and Jianping Yin. 2017. Improved deepembedded clustering with local structure preservation.. In

IJCAI . 1753–1759.[17] Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representationlearning on large graphs. In

Advances in neural information processing systems .1024–1034.[18] Sebastian Henn, Sören Koch, Karl F Doerner, Christine Strauss, and GerhardWäscher. 2010. Metaheuristics for the order batching problem in manual order picking systems.

Business Research

3, 1 (2010), 82–105.[19] Geoffrey E Hinton, Simon Osindero, and Yee-Whye Teh. 2006. A fast learningalgorithm for deep belief nets.

Neural computation

18, 7 (2006), 1527–1554.[20] Ying-Chin Ho, Teng-Sheng Su, and Zhi-Bin Shi. 2008. Order-batching methodsfor an order-picking warehouse with two cross aisles.

Computers & IndustrialEngineering

55, 2 (2008), 321–347.[21] Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirectional LSTM-CRF models forsequence tagging. arXiv preprint arXiv:1508.01991 (2015).[22] Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic opti-mization. arXiv preprint arXiv:1412.6980 (2014).[23] Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graphconvolutional networks. arXiv preprint arXiv:1609.02907 (2016).[24] Wouter Kool, Herke Van Hoof, and Max Welling. 2018. Attention, learn to solverouting problems! arXiv preprint arXiv:1803.08475 (2018).[25] Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez,Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control withdeep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015).[26] Ziqi Liu, Chaochao Chen, Xinxing Yang, Jun Zhou, Xiaolong Li, and Le Song.2018. Heterogeneous graph neural networks for malicious account detection.In

Proceedings of the 27th ACM International Conference on Information andKnowledge Management . 2077–2085.[27] Mikko I Malinen and Pasi Fränti. 2014. Balanced k-means for clustering. In

JointIAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR)and Structural and Syntactic Pattern Recognition (SSPR) . Springer, 32–41.[28] Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve re-stricted boltzmann machines. In

Proceedings of the 27th international conferenceon machine learning (ICML-10) . 807–814.[29] Luis Perez and Jason Wang. 2017. The effectiveness of data augmentation inimage classification using deep learning. arXiv preprint arXiv:1712.04621 (2017).[30] Andrew Perrault, Bryan Wilder, Eric Ewing, Aditya Mate, Bistra Dilkina, andMilind Tambe. 2019. Decision-focused learning of adversary behavior in securitygames. arXiv preprint arXiv:1903.00958 (2019).[31] Ted K Ralphs, Leonid Kopman, William R Pulleyblank, and Leslie E Trotter. 2003.On the capacitated vehicle routing problem.

Mathematical programming

94, 2-3(2003), 343–359.[32] Benedek Rozemberczki, Ryan Davies, Rik Sarkar, and Charles Sutton. 2019. Gem-sec: Graph embedding with self clustering. In

Proceedings of the 2019 IEEE/ACMInternational Conference on Advances in Social Networks Analysis and Mining .65–72.[33] Chao Shang, Qinqing Liu, Ko-Shin Chen, Jiangwen Sun, Jin Lu, Jinfeng Yi, andJinbo Bi. 2018. Edge attention-based multi-relational graph convolutional net-works. arXiv preprint arXiv:1802.04944 (2018).[34] Suo-Yi Tan, Jun Wu, Linyuan Lü, Meng-Jun Li, and Xin Lu. 2016. Efficient networkdisintegration under incomplete information: the comic effect of link prediction.

Scientific reports

6, 1 (2016), 1–9.[35] James A Tompkins, John A White, Yavuz A Bozer, and Jose Mario Azaña Tanchoco.2010.

Facilities planning . John Wiley & Sons.[36] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is allyou need. In

Advances in neural information processing systems . 5998–6008.[37] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, PietroLio, and Yoshua Bengio. 2017. Graph attention networks. arXiv preprintarXiv:1710.10903 (2017).[38] Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. 2015. Pointer networks. In

Advances in neural information processing systems . 2692–2700.[39] Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang, Binqiang Zhao, and Dik LunLee. 2018. Billion-scale commodity embedding for e-commerce recommendationin alibaba. In

Proceedings of the 24th ACM SIGKDD International Conference onKnowledge Discovery & Data Mining . 839–848.[40] Bryan Wilder, Bistra Dilkina, and Milind Tambe. 2019. Melding the data-decisionspipeline: Decision-focused learning for combinatorial optimization. In

Proceedingsof the AAAI Conference on Artificial Intelligence , Vol. 33. 1658–1665.[41] Bryan Wilder, Eric Ewing, Bistra Dilkina, and Milind Tambe. 2019. End to endlearning and optimization on graphs. In

Advances in Neural Information ProcessingSystems . 4674–4685.[42] Junyuan Xie, Ross Girshick, and Ali Farhadi. 2016. Unsupervised deep embeddingfor clustering analysis. In

International conference on machine learning . 478–487.[43] Bowen Yan and Steve Gregory. 2012. Detecting community structure in networksusing edge prediction methods.

Journal of Statistical Mechanics: Theory andExperiment

Proceedingsof the 34th International Conference on Machine Learning-Volume 70 . JMLR. org,3861–3870.[45] Hongjing Zhang, Sugato Basu, and Ian Davidson. 2019. A Frameworkfor Deep Constrained Clustering-Algorithms and Advances. arXiv preprintarXiv:1901.10061 (2019).46] Shichao Zhu, Chuan Zhou, Shirui Pan, Xingquan Zhu, and Bin Wang. 2019.Relation Structure-Aware Heterogeneous Graph Neural Network. In . IEEE, 1534–1539.

ALGORITHM.A.1 Heterogeneous Graph ConvolutionalNetworks

Algorithm 2

Heterogeneous Graph Convolutional Networks onbalanced order batching graph.

Input:

Balanced order batching graph G (V , E) , V = {O , I} ,where O is the set of candidate orders, I is the set of item nodesincluding in the candidate orders, edge types R = { oo , oi , ii } ,number of layers L . Output:

Hidden states of the L -th layer, include the hidden statesof order nodes: z o , ∀ o ∈ O for l = 1 to L do for e ∈ E do if r == oo then h l + e oo = σ ( W le oo . f l + e ( h le oo , h lO ( e oo ) , h lI ( e oo ) )) end if end for for j ∈ V do for ∀ e = ( m , j , r ) ∈ E j , r do if r == oo then ˆ h l + e r = h le oo else ˆ h le r = (cid:156) end if h l + ϕ ( j ) , m , r = W l + r ( h lm || ˆ h le r ) attn l + m , j , r = σ ( f l + r ( h lj , h l + ϕ ( j ) , m , r )) if r == oo then attn l + m , j , r = attn l + m , j , r ∗ d e mj end if α l + m , j , r = so f tmax ( attn lm , j , r ) end for end for for o ∈ O do h l + N ( o ) , r = σ ( (cid:205) e = ( m , o , r )∈E o , r α l + m , o , r h l + ϕ ( j ) , m , r ) h l + o = ATT N l + O ( h lo , { H l + N ( o ) , oi , H l + N ( o ) , oo , h lo }) end for for i ∈ I do h l + N ( i ) , r = σ ( (cid:205) e = ( m , i , r )∈E i , r α l + m , i , r h l + ϕ ( j ) , m , r ) h l + i = ATT N l + I ( h li , { H l + N ( i ) , ii , H l + N ( i ) , oi , h li }) end for end for for o ∈ O do z o = h Lo end for A.2 Type-aware Sampling Strategy

We summarize the type-aware sampling strategy in the following: • For oo and ii edges, we choose the M -th closet orders/items. • For oi edges, when the number of candidates is greaterthan the number of samples, i.e. P , we randomly sample P items/orders. When the number of candidates is less than P , we pad them with placeholders, and ignore all the compu-tations related to these placeholders.Different from random sampling strategy, we leverage the typeinformation and propose a type-aware sampling. Our samplingstrategy is more reasonable than random sampling in two aspects.First, different types of edges should have their own samplingstrategies. For oo and ii edges, choosing closest orders/items is morereasonable than random sub-sampling since closest orders/itemsare more likely to cluster together. In the meanwhile, padding ismore reasonable than re-sampling for oi edges because the inclusionrelation between order and item are often sparse. Padding avoidschanging neighborhood distribution compared to re-sampling. Inthis way, we achieve a comparable result with a small M and P thussaving training time as well as reducing memory consumption. B B EXPERIMENTAL DETAILSB.1 Hyper-parameters of baseline models

We use majority of the publicly available code released by theauthor to report the performance of DEC. The auto-encoder weused in all experiments is the basic auto-encoder architecture, Itsencoder is a fully connected multilayer perception with dimensions m − − −

32, where m is the original data space dimension.The decoder is a mirrored version of the encoder. All layers exceptthe one preceding the output layer are applied a ReLU activationfunction. We train our model with Adam optimizer [22] by initiallearning rate of 10 − with λ = . β = . β = . B.2 Experimental Results

Order picking simulation with other methods is shown as follows:

AE+BKM

WallShelf FirstSecondThirdAll Items in Matching Pool

Figure 6: Top-3 pick lists generated by AE+BKM

WallShelf FirstSecondThirdAll Items in Matching Pool