Augmented Reality Chess Analyzer (ARChessAnalyzer): In-Device Inference of Physical Chess Game Positions through Board Segmentation and Piece Recognition using Convolutional Neural Network
AAugmented Reality Chess Analyzer(ARChessAnalyzer): In-Device Inference ofPhysical Chess Game Positions through BoardSegmentation and Piece Recognition usingConvolutional Neural Networks
Anav MehtaCupertino High SchoolCupertino, CA, USASeptember 4, 2020
Abstract
Chess game position analysis is important in improving ones game. It requiresentry of moves into a chess engine which is, cumbersome and error prone. Wepresent ARChessAnalyzer, a complete pipeline from live image capture of a phys-ical chess game, to board and piece recognition, to move analysis and finally toAugmented Reality (AR) overlay of the chess diagram position and move on thephysical board. ARChessAnalyzer is like a scene analyzer - it uses an ensemble oftraditional image and vision techniques to segment the scene (ie the chess game)and uses Convolution Neural Networks (CNNs) to predict the segmented piecesand combine it together to analyze the game. This paper advances the state of theart in the first of its kind end to end integration of robust detection and segmentationof the board, chess piece detection using the fine-tuned AlexNet CNN and chessengine analyzer in a handheld device app. The accuracy of the entire chess positionprediction pipeline is 93.45% and takes 3-4.5sec from live capture to AR overlay.We also validated our hypothesis that ARChessAnalyzer, is faster at analysis thanmanual entry for all board positions for valid outcomes. Our hope is that the in-stantaneous feedback this app provides will help chess learners worldwide at alllevels improve their game. a r X i v : . [ c s . H C ] A ug Introduction
Previous Work
Chess position detection can be broadly separated into two areas. The first, boardrecognition and segmentation, refers to the detection of the chess board within theimage and segmentation of the board into 64 images. This is a prerequisite to thesecond step - piece recognition which is a classification of those images - as empty orany of the 12 categories of chess pieces.
Chessboard detection has traditionally been solved in the context of camera calibrationusing hints such as marking the corners at the expense of its versatility. There havealso been dedicated hardware attempts for chessboard detection [5]. Recent computervision advances have led to general techniques for board recognition can be separatedinto corner-based or line-based approaches [6]. Corner-based approaches identifythe corners of the chess board, then either perform Hough transforms to identify thelines or assign coordinates to the corners directly . These approaches either assumea plain background, so as to reduce the number of corner-generating artifacts; use atop down overhead view; or require the absence of pieces from the board, in order toprevent the occlusion of corners or lattice points by pieces. Line-based approachesuse edge detection on the input image to identify lines of the chess board. Domainknowledge, such as the fact that the board can be identified by 18 total lines and theorientations of half those lines will be orthogonal to the other half, makes line-basedapproaches more robust to noise, and is therefore the more popular technique [8]. Someproposed corner based approaches which averaged the quality of results while some haveutilized exclusively line-based methods [7]. [8] introduced the classification of suchmethods into line-based and corner-based approaches but were limited to recognizingchessboards without chess pieces on them.
Initial attempts at piece recognition were game-tracking applications assume the startingpositions of the pieces, and there can use differences in intensity values after each moveto track the movement of pieces [5]. Techniques that do not assume a starting positionfocus on color segmentation to detect pieces and then use shape descriptors to identifythem. However, color segmentation often relies on square and piece color combinations,so unreasonable constraints are place on type of chessboard or a sideview that relies ondepth but occludes most of the pieces. This paper uses a more robust piece recognitionsolution using trained object detection CNN models.3
Materials and Methods
Figure 1: Position detection pipelineWe have designed an ensemble of methods and tools with engineering tradeoffs,while advancing the state of the art to develop the entire chess position pipeline in aniOS app (Figure 1). The following are the major steps.•
Detect Chessboard:
This is a precursor to the entire pipeline, determines thepresence of a chessboard using simple binary image classifier.•
Segment Chessboard:
OpenCV image and vision techniques are used to deter-mine the outer bounds of the chessboard and segment the image into 64 pieces.•
Predict Chessboard:
An pretrained CNN object detector model is used to predictthe 64 image and form the position string.•
Analyze Chessboard:
The string is fed into an open source chess engine -StockFish to determine the next best move.•
Augment Reality OverLay Chessboard
The next move and position are over-layed on top of the existing chessboard for the player.4 .2 Detecting and Segmenting Chessboard
In order to begin predicting the board, it is necessary to detect a chessboard. In order tobegin detecting, we developed a simple binary image classifier trained using 95 imageswith a 80/20% training/test split. We provide user feedback for allowing the user to staystable while the object is detected. This took less than 1sec from start of the pointingof the board.
The algorithm consists of five main steps (Figure 2).•
Detecting straight and parallel lines
Besides standard line detector, the addi-tional objective is to merge all small segments that are nearly colinear into longstraight lines and then identify the parallel lines. – Edge detection:
Smoothen the image and then use adaptive Canny edgedetection similar to using gradient threshold and adaptive histogram equal-ization mask. – Line detection:
Using HoughLines filter the segments removing smalllines and merge gaps between adjacent collinear lines. – Grouping:
Separate segments into groups of nearly collinear segments andmerging their ends. – Merging:
Analyze and merge the segments in each group utilizing theM-estimator, resulting in one normalized straight line. – Filtering:
Remove non-parallel lines. In this case, the segments are boundwithin 10 ◦ of horizontal and vertical• Determining the Bounding box . After the lines have been detected, the next stepis finding the intersections of those lines. After the points have been determines,they are merged using K-cluster. Minimum and maximum of those intersectingpoints gives the box. After this, all is needed to do a projective transform to (( , ) , ( x , x )) for all the three color channels. AlexNet [4], which is thenext stage of the chess piece detection pipeline, expects the size of each image tobe 227 × Segmenting into 8x8 squares
An array of 8x8 images is created to be fed intothe piece detection estimator
Figure 3 summarizes the five steps in the development of the model. We will now covereach of these steps in detail. 5igure 2: Segmentation pipelineOrig Gray BlurCanny Hough PointsBounded Transformed Segmented6igure 3: ARChessAnalayzer model generation pipelineTable 1: Chess piece database
Labelempty br bn bb bq bk bp wr wn wb wq wk wp tot123 212 205 204 215 214 201 202 207 210 212 210 207 2622
Figure 4: Model preparation and training
Since there was a lack of labeled chess pieces, a database of approximately 2,600 chesspieces was manually constructed from one tournament chess set (Table 1). They wereimage of individual piece places on the board and manually labelled with one of 137lasses – white and black of pawn, knight, bishop, rook, queen, king, and empty (Figure9).
Figure 4 gives a summary of the preprocessing steps. Images are resized to 227 by 227by 3 (color channels) to fit the input of the AlexNet. To improve the performance ofthe CNN [13], the data set is also augmented with transformations such as cropped,flipped and blur (Figure 5) resulting in a total size of approx 2 × AlexNet (Figure 6) is considered one of the most influential papers published, employingCNNs and GPUs to accelerate deep learning [4]. It contains eight layers; the first fivewere convolutional layers, some of them followed by max-pooling layers, and the lastthree were fully connected layers and the activation function is a SoftMax. While wedid try other CNNs, they overfit on the data almost immediately, unlike AlexNet. Thisis possibly due to the combination of imbalance and small size of the data set.•
Transfer Learning
Huge data sets with upwards of a million images are necessaryto train a CNN from scratch with randomly initialized weights. Too little data,such as our case, would cause a model to overfit. The original AlexNet modelwas trained using approximately 1.3 million images (1000 object classes) fromthe 2012 ImageNet data set. Weights and layers from original AlexNet were usedas a starting point and fine-tuned with pre-processed images with a batch sizeof 64. Transfer learning [11] leverages the previously learned low level features(such as lines, edges and curves) and requires less data to arrive at a satisfactoryCNN. 8igure 6: AlexNet [4]•
Model Precision
The AlexNet model was fine tuned with 32b (FP32), but it wasreduced down to 16b (FP16) with CoreML tool, during model conversion, to fit inthe size of the app. The state-of-the-art hardware deep neural networks (DNNs)are moving from 32-bit computations towards 16-bit precision, due to energyefficiency and smaller associated storage. Recently [10] showed the successfultraining of DNNs using 8b (FP8) while fully maintaining the accuracy on aspectrum of models and datasets.
Many tools went into this research. AlexNet was trained using Caffe framework withNvidia Tesla K80 GPUs on Google Colaboratory platform using Python 3.7. TheOpenCV library was used for board segmentation. A standard iOS mobile developmentsetup was used. It was built using the Swift 5 programming language and Xcodeintegrated development environment with bridges to OpenCV and StockFish. In figure7 are some of the screen shots. 9igure 7: Screen shots of ARChessAnalyzer app
Initial screenshot Canny edges and Augmented RealityHough line segmentation overlay Experiments And Results
A total of 720 positions were analyzed. Each board image was tagged with an expectedFEN string like the following starting board positionrnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1and that was compared to the following observed string.rnbqkbnr/pppppppp/8/8/4 (cid:0)(cid:18) B P Accuracy on the piece pipeline increased approximately 1.97% with augmentation (Fig8). For the rest of the paper, the augmented model is used.Figure 8: Increase in accuracy after data augmentation e m p t y b r bn bb bq bk bp w r w n w b w q w k w p P e r ce n t a g eacc u r ac y AlexNet was fine-tuned using 32b precision and later downsized to 16b (using coreml)due to size constraints of the iOS app store. The degradation in accuracy is 0.78%(Table 2). For the rest of the paper FP16 is used.Table 2: Accuracy and model precisionAlexNet model (bit) Size AccuracyFP32 221MB 94.23%FP16 106MB 93.45%FP8 . 48MB . 93.15%11 .3 Saliency Maps
Saliency maps help visualize the inner workings of CNNs via a heat map highlightingthe image features that the model is focused on [14]. Saliency maps were generated ona few images to confirm that the classifiers are identifying the correct regions of interestfor a particular labels (Figure 9).Figure 9: Saliency mapsLabel Saliency Label SaliencyBlack WhiteRookBishopKnightKingQueenPawn12 .4 Classification Metrics
Generalization properties for multiclass classification derived from the confusion matrix(Table 3) of a classifier are used as a measure of its quality. The following performancemetrics are evaluated - accuracy (Eq 2), sensitivity (Eq 3), specificity (Eq 4), and F1Score (Eq 5). Figure 10 displays all of the scores. accuracy = all correctall samples = true pos+true negtrue pos+true neg+false pos+false neg (1)sensitivity = true posall pos = true postrue pos + false neg (2)specificity = true negall neg = true negtrue neg + false pos (3)F1score = × sensitivity × specificitysensitivity + specificity (4) Table 3: Confusion matrix between actual and predicted pieces
Predicted cellsempty br bn bb bq bk bp wr wn wb wq wk wpempty 31454 5 4 5 0 1 4 4 0 0 0 0 0br 0 1345 0 50 16 21 35 0 0 0 0 0 0bn 0 0 1330 1 4 2 0 0 0 0 0 0 0bb 4 50 0 1370 54 37 24 0 0 0 0 0 0bq 0 0 0 100 760 50 0 0 0 0 0 0 0bk 3 0 0 0 0 735 0 0 0 0 0 0 0bp 0 0 0 51 10 5 5540 0 0 0 0 0 0wr 3 0 0 0 0 0 0 1955 0 0 0 0 0wn 0 0 0 0 0 0 1 0 1955 0 0 5 0wb 0 0 0 0 0 0 0 0 0 1650 35 37 45wq 2 0 0 0 0 0 0 0 0 0 754 0 0wk 0 0 0 0 0 0 0 0 0 0 0 765 35wp 0 0 0 0 0 0 0 34 27 43 34 10 5634
Figure 10: Accuracy, sensitivity, specificity and F1score
Accuracy Sensitivity e m p t y b r bn bb bq bk bp w r w n w b w q w k w p a ll . . . . . accuracy e m p t y b r bn bb bq bk bp w r w n w b w q w k w p a ll . . . . sensitivitySpecificity F1score e m p t y b r bn bb bq bk bp w r w n w b w q w k w p a ll . . . . . specificity e m p t y b r bn bb bq bk bp w r w n w b w q w k w p a ll . . . . F1score .5 Accuracy of Overall Pipeline The overall accuracy of the position prediction string is 93.45%. A detailed analysisof the mismatched FEN strings was done considering the types of pieces on the board,the distribution of the pieces on the board and the population of the board,. Thereis high rate of mis-classification among certain types of pieces especially betweenqueen, bishop, pawn and king 3. Secondly, boards with center distribution had a higheraccuracy than those at the edges or corners (Table 4). This is because of the boundclipping and projective transform and less training images of pieces from the sides.The last source of error is piece population (Opening, Midgame, Endgame) (Table 5)This is possibly because as the game proceeds from opening to end, there are lessmis-classification possibilities. More training images especially among pieces withhigh rate of mis-classification and also not just using the top view but also from thesides, a slight board expansion after segmentation to incorporate the edge ring couldhelp capture all the features, and a two-step window algorithm processing the edge andcorner pieces separately would possibly alleviate these issues. We will this address thisin our future work.Table 4: Accuracy of piece prediction based on positionPosition AccuracyCorner 91.45%Edge 92.87%Center 96.73%Total 93.45%
Corner: More than 50% of the four corners are occupiedEdge: If not Corner and more than 50% of the 28 edges are occupiedCenter: If not Corner or Edge
Table 5: Model of positions calculated expected and actual accuracy
Pos empty br bn bb bq bk bp wr wn wb wq wk wp Exp % Act %Open 32 2 2 2 1 1 8 2 2 2 1 1 8 89.72 91.02-ing 32 2 2 2 1 1 6 2 2 2 1 1 6 90.8839 2 2 2 1 1 5 1 2 2 1 1 5 91.1242 1 2 2 1 1 4 2 2 2 1 1 3 92.1343 0 2 2 1 1 4 2 2 2 1 1 3 92.36Mid- 45 1 1 2 1 1 3 2 1 2 1 1 3 92.24 93.47game 48 1 1 1 1 1 3 1 1 1 1 1 3 93.0249 0 1 1 1 1 3 2 1 1 1 1 2 93.5749 1 1 1 1 1 2 2 1 1 1 1 2 93.4947 1 1 1 1 1 3 2 1 1 1 1 3 93.01End- 56 1 0 0 1 1 1 1 0 0 1 1 1 94.71 95.24game 59 0 0 0 1 1 0 0 0 0 1 1 1 95.1357 0 1 0 1 1 0 0 1 1 1 1 0 95.2959 0 1 0 1 1 0 0 0 0 1 1 0 95.5060 0 1 0 0 1 0 0 0 0 1 1 0 95.83Full board: 32 piecesOpening: Full board - approx 8-10 piecesMiddle Game: Full board - approx 15-20 piecesEnd Game: 0-1 rooks, bishops, knights, 0-2 pawns, 0-1 queen.Expected accuracy of 5 sample calculations in each category use Figure 10.Actual accuracy are mean of those accuracy of those positions. .6 Hypothesis Testing For Speed of Analysis Our hypothesis was that ARChessAnalyzer would be significantly faster than manualentry into a chess engine for valid outcomes. To verify our hypothesis, we setup ourexperiment as follows. For chess diagram entry and analysis, we chose the chess.com,which uses Stockfish as its back engine. In order to pick a variety of chess diagramswe chose five from beginning positions, mid-game positions and end game positionsas follows (Table 6). The results for manual entry and ARChessAnalyzer are tabulatedin (Table 7). Two parameters µ and µ a corresponding to the average time taken fordirect entry and via the app are calculated, and so are the standard deviation ( σ ) (Table7). Since, the number of sample size of each set is small ( n =
5) we use the studentdistribution t-statistic. Our H a is one tailed, and using n − = t > .
610 corresponding to 99 . Hypothesis H1 µ a ≥ µ : Using ARChessAnalyzer is faster to analyze the chessgame than using manual entries for valid outcomes. Null Hypothesis H0 µ a < µ : Using ARChessAnalyzer has no improvement overdirect StockFish entry analysis for valid outcomes.Table 6: Game Number, Types and Number of PiecesChess Diagram Game Number, Types and Number of Piecesn Beginning Midgame Endgame1 Ruy Lopez 30(5) Polish Immortal 31 Reti End Game 42 Italian 32(6) Stamma 21 Lasker’s Pin 63 Siclian Defense 32(2) Ponziani 14 Saavedra position 44 French Defense 30(6) Abu Naim 10 Lucena position 55 Caro Kann 32(3) Damiano 11 Philidor position 5 For beginning position, both the number of pieces on the board and change in pieces from starting position are shown
Table 7: Analysis Times (in sec) and Hypothesis Testing ResultsChess Diagram Manual Entry to StockFish ARChessAnalyzern Beginning Midgame Endgame Beginning Midgame Endgame1 15.21 347.92 15.90 3.20 3.22 3.402 18.42 222.79 12.45 2.60 3.12 3.503 16.80 107.91 32.67 4.50 4.25 3.304 17.24 100.87 15.32 3.50 2.78 2.205 15.56 96.24 25.67 2.90 3.23 3.10 µ σ Conclusions And Future Work
This paper improves the state of the art in areas of chessboard segmentation andchess piece detection using convolutional neural networks, making trade-offs in modelaccuracy and memory footprint for integration in an handheld device on an first ofits kind iOS app. The piece detection accuracy is greater than 99% and the accu-racy of prediction pipeline is 93.45%. The AR pipeline takes about 3-4.5sec frompointing the live view camera at the chessboard to AR Overlay. We also validatedour hypothesis that ARChessAnalyzer is significantly faster at analysis than man-ual entry for valid outcomes. The project source code can be found at https://github.com/anavmehta/ARChessAnalyzer . The app is also available on theiOS store.Our hope is that the instantaneous feedback this app provides will help chess learnersat all levels all over the world.The following are the areas of focus of future research.•
Chessboard segmentation
The algorithm can be improved to make the camera angle more tolerant to roll,yaw, and pitch and infer hidden points and lines to detect partially hidden pieces.•
Chess piece detection
More images of chessboards and pieces, from different chess sets - with variationof top and side angles to better capture the shape of the piece, will improvethe robustness of classifiers to occlusions, artifacts and intra-class variations.AlexNet was chosen based on ease of training and accuracy. A more systematicevaluation of recent research of in-device models will help further reduce memoryfootprint while retaining accuracy.•
Chess engine and game state recognition
The state of the board sometimes requires knowledge of previous immediatemoves (e.g en-passant or castling). This would require deeper analysis, and/orthe user overrides. StockFish is the most popular and was easy to integrate in iOS.However, players may want to switch to other engines based on their preferenceor engine strengths.Finally, for worldwide acceptance as a learning chess app it has to be ported toAndroid. 16 eferences [1] J. Deng, W. Dong, R. Socher, L. J. Li, L. Kai and L. Fei-Fei, ”ImageNet: Alarge-scale hierarchical image database,” IEEE Conference on Computer Visionand Pattern Recognition, 2009.[2] https://stockfishchess.org/ [3] Y. LeCun, Y. Bengio and G. Hinton, ”Deep learning,” Nature, vol. 521, pp. 436-444,2015.[4] https://neurohive.io/en/popular-networks/alexnet-imagenet-classification-with-deep-convolutional-neural-networks/ [5] C. Koray and E. Sumer, ”A Computer Vision System for Chess Game Tracking,”presented at the 21st Computer Vision Winter Workshop, Rimske Toplice, Slovenia,2016[6] Duda, R.O., Hart, P.E., 1972. Use of the hough transformation to detect lines andcurves in pictures. Commun. ACM 15, 11–15. http://doi.acm.org/10.1145/361237.361242 , doi:10.1145/361237.361242.[7] Raghuveer Kanchibail, Supreeth Suryaprakash, Suhas Jagadish, Chess BoardRecognition, http://vision.soic.indiana.edu/b657/sp2016/projects/rkanchib/paper.pdf [8] Tam, K.Y., Lay, J.A., Levy, D., 2008. Automatic grid segmentation of populatedchessboard taken at a lower angle view, in: Computing: Techniques and Applica-tions, 2008. DICTA’08. Digital Image, IEEE. pp. 294–299.[9] https://en.wikipedia.org/wiki/Forsyth%E2%80%93Edwards_Notation [10] Naigang Wang, Jungwook Choi, Daniel Brand, Chia-Yu Chen, Kailash Gopalakr-ishnan, “Training Deep Neural Networks with 8-bit Floating Point Numbers”,Conference on Neural Information Processing Systems (NeurIPS)[11] A. Karpathy, ”Transfer Learning,” Stanford University. [Online]. Available: http://cs231n.github.io/transfer-learning [12] Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ImageNet Classificationwith Deep Convolutional Neural Networks https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf [13] Jason Wang, Luis Perez The Effectiveness of Data Augmentation in Image Classi-fication using Deep Learning http://cs231n.stanford.edu/reports/2017/pdfs/300.pdfhttp://cs231n.stanford.edu/reports/2017/pdfs/300.pdf