Archives of Computational Methods in Engineering | 2021

A Systematic Survey on CAPTCHA Recognition: Types, Creation and Breaking Techniques

 
 
 

Abstract


CAPTCHA stands for Completely Automated Public Turing Test to Tell Computers and Human Apart. CAPTCHA is used for internet security. A few CAPTCHA schemes are available today like, text-based, audio-based, video/animation-based, puzzle based etc. In this paper, all these types are collaborating at single place to analyze. The main aim of this article is to present a literature to identify and recognize CAPTCHA, its types, the creation and breaking techniques. It is a systematic and complete analysis of all available CAPTCHA types. In this paper, 16 text-based CAPTCHA’s generation methods are discussed with usability and security ranges from 3 to 100 and 65 to 100%, respectively. The security and usability measures are not calculated/sustained using some known English schemes. Out of 16 reviewed CAPTCHAs, 12 are based on English language, 1 on Arabic language, 1 on Chinese language, 1 on Devanagari language and 1 on Gurumukhi script. The designs are made segment proof with overlapping random shapes, overlapping characters, clasping, different colors and different shades. For making recognition proof many techniques are used like image masking, local and global warping; broken characters, random rotation, arcs, jaws, etc. Approximately 50 schemes, especially based on the English language, are successfully broken with a success rate that ranges from 2 to 100%. The techniques that are used to break these schemes include shape context matching, distortion estimation, Log Gabor 2D filter, horizontal and vertical projection (for a segment the letters) are used. For recognition CNN, KNN, DNN and MCDNN are used. Almost 15 images-based CAPTCHAs are discussed that are designed with usability and security range 90–100 and 17–100%, respectively. Out of these 5 schemes are successfully broken with a success rate ranging between 7 and 100%. The K-NN and SVM are mostly used algorithms to recognize the images. Audio based CAPTCHAs (5 designs) are discussed with usability and security range from 68.5 to 100 and 100%, respectively. The broken rate of these audio schemes is also 45–75%. These schemes are broken with SVM and K-NN algorithms. The paper also discusses 4 popular video-based designs that provide usability and security that ranges from 75 to 100 and 98 to 100, respectively. These schemes are also compromised with broken rate 16–10% using SIFT, NN and simple OCR techniques. The paper can be a benchmark to precede any specific research to dive into any one of these types.

Volume None
Pages 1-30
DOI 10.1007/S11831-021-09608-4
Language English
Journal Archives of Computational Methods in Engineering

Full Text