Domain-Embeddings Based DGA Detection with Incremental Training Method
DDomain-Embeddings Based DGA Detection withIncremental Training Method st Xin Fang, 2 nd Xiaoqing Sun,3 rd Jiahai Yang
Institute for Network Sciences and CyberspaceTsinghua University , Beijing, ChinaBeijing National Research Center for InformationScience and Technology { fx18, sxq16 } @mails.tsinghua.edu.cn, [email protected] th Xinran Liu
National Computer Network Emergency Response TechnicalTeam / Coordination Center , Beijing, [email protected]
Abstract —DGA-based botnet, which uses Domain GenerationAlgorithms (DGAs) to evade supervision, has become a part ofthe most destructive threats to network security. Over the pastdecades, a wealth of defense mechanisms focusing on domainfeatures have emerged to address the problem. Nonetheless, DGAdetection remains a daunting and challenging task due to thebig data nature of Internet tra ffi c and the potential fact thatthe linguistic features extracted only from the domain names areinsu ffi cient and the enemies could easily forge them to disturbdetection. In this paper, we propose a novel DGA detectionsystem which employs an incremental word-embeddings methodto capture the interactions between end hosts and domains,characterize time-series patterns of DNS queries for each IPaddress and therefore explore temporal similarities betweendomains. We carefully modify the Word2Vec algorithm andleverage it to automatically learn dynamic and discriminativefeature representations for over 1.9 million domains, and developan simple classifier for distinguishing malicious domains from thebenign. Given the ability to identify temporal patterns of domainsand update models incrementally, the proposed scheme makes theprogress towards adapting to the changing and evolving strategiesof DGA domains. Our system is evaluated and compared withthe state-of-art system FANCI and two deep-learning methodsCNN and LSTM, with data from a large university’s networknamed TUNET. The results suggest that our system outperformsthe strong competitors by a large margin on multiple metrics andmeanwhile achieves a remarkable speed-up on model updating. Index Terms —Domain-Embeddings, DGA Detection,Word2vec, Incremental Training
I. I ntroduction
DGAs are commonly used by botnets to bypass securitymechanisms where some static methods like blacklists areemployed. They can generate a vast amount of pseudo-randomdomain names, while the attacker would only select a smallsubset for registration to establish command and control(C&C) connections [1] [2].This results in an asymmetric sit-uation where attackers can use any one of generated domainsto control bots, but defenders must monitor all of them. Awide spectrum of methods for DGA detection have beenproposed in recent years, but most of them rely upon thelinguistic features developed and could not work well whenthe botmaster decides to change domain-generating strategy. This explains why some character-based detectors which workwell for traditional DGAs perform poorly when confrontedwith those plausibly clean-looking domain names based onwordlists (also called dictionaries) [3].In such scenario, we are concerned with developing analgorithm that is resilient to feature change and able to functionwell for not only character-based or wordlist-based DGAs , butalso for any kind of completely new algorithms that are neverseen before. We hold the intuition that bots tend to exhibit sim-ilar behavior patterns no matter what kind of DGA algorithmsare implemented. These time-relevant patterns provide morerobust and stable features which improve the flexibility againstthe changing and evolving attacking strategies. For example,the bots controlled by the same entity communicate with thesame C&C server, and the botnet members cause a largeamount of regular tra ffi c when launching an attack. In addition,our system must be adaptive to the big data nature of Internettra ffi c and the explosive growth of malicious domains, so anwell-designed incremental training strategy is indispensable toreduce the model iteration cost.In this paper, we propose a novel DGA detection systemaiming for a wide range of DGA families and the never-endinggrowth of the Internet DNS tra ffi c. The critical nature of ourideas is to characterize time-series patterns of DNS queriesfor each IP address, explore temporal similarities betweendomains and apply incremental training strategy to speed upmodel updating.To sum up, the contributions of this work is threefold: In order to improve the flexibility against the changingand evolving attacking strategies, we focus on the underlyingrelevance among the domains and utilize the latent patternsof DNS query sequences to detect DGAs. We also applyword2vec algorithm for a mapping from DGA detection tovector arithmetic. To cope with the never-ending growth of the InternetDNS tra ffi c, we utilize an incremental training strategy forword2vec al- gorithm, which helps to speed up the modeltraining process when additional training data is provided. We built a practical system based on the proposedalgorithm, and achieved excellent results in several empiricalexperiments and real-world deployments. / / $31.00 © a r X i v : . [ c s . CR ] S e p he rest of this paper is organized as follows. In Section 2,we introduce some background knowledge and systematicallyoutline related works. In Section 3, we propose our DGAdetection system based on the incremental word2vec algorithmwith details. Then we provide our experimental methodologyand results in Section 4. Finally, we summarize our primaryjobs and discuss future work in Section 5.II. R elated W ork A. DGA and DGA Detection
In order to detect DGA domains, Yadav et al. [4] proposeda technique based on the significant di ff erence between tradi-tional DGA domains and human generated domains in termsof the distribution of alphanumeric characters. In addition,Antonakakis et al. [1], Sch¨uppen et al. [5] and Wang et al.[6] proposed machine-learning based DGA detectors usinghuman-engineered lexical features of DGA domain names,while Tong et al. [7], Lison et al. [8] and Tran et al. [9] cameup with some methods using deep learning algorithms such asCNN, LSTM, and BiLSTM. However, attackers have designeda more resilient class of mAGDs produced by randomlyselecting and concatenating words from a dictionary in orderto imitate legitimate domain names created by a human. Thisnew kind of DGA is much harder to detect. In fact, many state-of-the-art DGA detectors which function well for traditionalDGAs, perform poorly when faced with wordlist-based ones.Confronted with such a challenging situation, defendershave presented several countermeasures. Pereira et al. [3]firstly proposed a method for combating wordlist-based DGAsin 2018. They built a new structure named WordGraph basedon the segmentation of domain names, and then employ it tofurther discover DGA dictionaries. Another existing approachraised by J.Koh et al. [10] extracted in-depth semantic featuresfrom unrelated corpus, and used transfer learning theoryto learn the semantic signatures of the wordlist-based DGAfamilies. These approaches perform well, but only focus onwordlist-based DGAs nevertheless. B. Word Embeddings Algorithm
The basic idea of word embeddings was initially proposedby Hinton in 1986 [11], which was called distributed rep-resentation at that time. This method is mainly used in thearea of Natural Language Processing (NLP), while we canstill utilize it in Domain Name System (DNS) analyzing fieldby analogizing DNS query sequences to natural languagesentences, which follows the core idea of W. Lopez et al. [12]that considers DNS queries from a particular source IP addressduring a specific time interval as words in a single document .Nowadays the most ubiquitous word embeddings method is
Word2Vec [13], and in this paper we use
Skip-Gram modelwith Negative Sampling (SGNS) [14], an advanced variantof
Word2Vec , as basic algorithm for its popularity. Severalprevious works tried to apply word2vec algorithm to DNS-related field ( [15], [16]), but their target is tra ffi c classificationinstead of DGA detection. Fig. 1. System architecture.
Although we illustrate that SGNS can accurately classifymAGDs through experiments, it turns out that existing neuralword embeddings methods, including SGNS, are multi-passalgorithms and thus cannot perform incremental model update,which means that they have to re-train the model on the oldand new training data from scratch when additional trainingdata is provided [17]. To this end, some researchers havefocused on exploring incremental training strategies of wordembeddings methods ( [17], [18]). Similar to the conventionalword2vec algorithm, there was also little or no attention paidto the incremental word2vec method in the literature when itcomes to DGA detection as far as we know.III. S ystem A rchitecture In this section, we describe our DGA detection systemarchitecture and training mechanism. As is shown in Fig. 1,our system has several components: Pre-processor, Detector,and Post-processor.
A. Pre-processor
Before the preprocessing phase, we deploy our data collec-tors in several core DNS servers in TUNET to collect raw DNSlogs. Thereafter, we apply black / white lists from both publicsources (e.g., malwaredomainlist.com ) and private sources toraw DNS corpus, which can be considered as a pre-labelingprocess. To further calibrate the labeling results, we samplepart of raw data for manual labeling. As for the unlabeledleftovers, they will be used for training word embeddings sinceword2vec algorithm is unsupervised.fter labeling process, we feed all of the data to a datawrangling and cleaning module, which functions as follows:First of all, the module traverses through the entire datasetand remove all queries containing invalid IP addresses, querytype or query name. Second, since many of the DNS queriesare nonexistent domain names, rarely duplicated, and in manycases composed of a large number of changing prefixes anda few unchanging su ffi xes, we decide to merge the similardomains by common su ffi xes. Besides, to eliminate the impactof ccTLD (country code Top-Level Domain [19]), we removeall ccTLDs from the tail of domain names containing them.Last but not least, we select the appropriate time windowsize and reorganize the data structure. More specifically, wedetermine a window size such as 10 minutes, which remains asa hyper-parameter to be decided later and partition the datasetaccordingly. Query records in each window are organized inthe format of [ timestamp , IP , domain , domain , · · · ]. Finally,all queries from a specific IP address during the pre-definedtime window are grouped and hence constitute a Document with each domain name is a
Word . B. Detector
Our detector consists of two parts: the incremental word2vecmodel and the subsequent simple classifier.
1) Incremental Word2Vec Model:
Based on previous re-search results ( [17], [18]), we apply the incremental trainingmethod of SGNS to the domain-embeddings generation modelin this paper. Given one document output from Pre-processor,we assume that the words (domains) inside constitute a se-quence: w , w , w , · · · , w n . Then the classical SGNS modelattempts to minimize the following objective function to learndomain embeddings: L SGNS = − n n (cid:88) i = (cid:88) | j | < c , j (cid:44) log σ ( t w i · c w i + j ) + k E v ∼ q ( v ) [ log σ ( − t w i · c v ] (1) where t w i is the target word w i ’s embedding and c w i + j is thecontext word w i + j ’s embedding within a window of size c , σ ( x ) is the sigmoid function, k is a pre-fixed integer, v is thenegative sample drawn from q ( v ), and q ( v ), which is referredto as negative sampling distribution [18]. While Equation(1)can be optimized by Stochastic Gradient Descent (SGD) usingAdaGrad [20] in an online fashion, traditional multi-passSGNS training still needs to scan through the entire dataset atfirst to pre-compute the negative sampling distribution q ( v ) ,which makes it di ffi cult to perform e ffi cient incremental modelupdate when additional training data is provided every singletime, especially when the amount of the new data is smallercompared to the old one.For the reason that new domains keep showing up con-tinuously in real world, we need to present an incrementalextension of SGNS. We adopt the methodology inherited fromprevious works ( [17], [18]), which goes through the train-ing data solely in a single-pass to update word embeddingsincrementally. Algorithm 1 presents this incremental SGNSalgorithm. Algorithm 1
Incremental SGNS for each new batch D of training data do f ( d ) ← d ∈ D n ← length ( D ) for i ← · · · , n do f ( d i ) ← f ( d i ) + q ( d ) ← f ( d ) α Σ d (cid:48)∈D f ( d (cid:48) ) α for all d ∈ D for j ← − c , · · · , − , , · · · , c do draw k negative samples from q ( d ) use adaptive SGD to update t w i , c w i + j , and c v , · · · , c v k end for end for end forAlgorithm 2 Draw Negative Samples set array r with length K to empty n ← length ( W ) cnt ← for i ← · · · , n do cnt ← cnt + if i ≤ K then r i ← w i else draw an interger k uniformly from 1 , , · · · , n if k ≤ K then r k ← w i end if end if end for In the implementation of incremental SGNS, how to e ffi -ciently produce negative samples is an important issue, sincethe e ffi ciency of sampling greatly a ff ects the overall trainingspeed. To seek solution to this problem, here we utilize theReservoir Sampling [21] algorithm, which helps to generateone single negative sample in only O(1) time (See Algorithm2).
2) Logistic Regression Classifier:
Without loss of gener-ality, we use the Logistic Regression classifier as the tailclassifier. It receives all the word-embeddings popped out fromword2vec model and the corresponding ground-truth label,which specifies whether the domain is malicious or not. Itis noteworthy that logistic regression already has the potentialfor incremental training because it can update the parametersusing SGD every time there is new training data provided. Inthe testing / evaluation / deploy phase, the classifier can directlycalculate the input domain-embeddings’ labels without extraoperations.
3) Workflow Description:
Before Detector, we already di-vide datasets into labeled part (relatively small) and unlabelledpart (relatively big) due to expensive manual labeling cost.Fortunately, we do not need ground truth when trainingunsupervised word2vec models. Hence, our overall trainingstrategy is to use all received valid data for training word2vecodel, while feed only labeled ones to the classifier to makethe best use of collected data. Actually, based on such a greedystrategy, we have guaranteed the quality and generalizationability of the obtained domain-embeddings, which plays a vitalrole in improving the performance of the Logistic Regressionclassifier trained with relatively small amounts of labeled data.
C. Post-processor
The classification results from Detector can be used intwo ways. First, we can assume that a domain name whosescore is above a pre-set threshold is a DGA-generated domainname with a high probability. Thus, with this assumption, wecan construct a feedback loop to update the blacklists andwhitelists in the pre-processing session. Second, we can makeuse of the results of test dataset to conduct performance eval-uation of the detector and make analysis on hard-cases, whichis extremely meaningful for estimating the trends of currentand in-coming DGAs and further improving the performanceof DGA detection system.IV. D atasets
We collected DNS data for two consecutive weeks from theTsinghua campus network using Passive DNS [22] tools. Atotal of 162 million raw DNS query logs were obtained. Thelengthy periods of data recording guarantee a representativedataset which contains di ff erent times of the day, di ff erent daysof the week, and di ff erent working / non-working days. Moreinformation about the datasets is shown in Table 1.There are some steps to be done before experiments.The critical points of our ideas are first filtering data withblack / white lists and then manual labeling. Due to the hugeamount of the collected raw data, we cannot a ff ord to labelthem all. Thus we sample and label the first 15% of the total162 million queries and split this labeled dataset into two parts:80% as trainset-with-gt (i.e. training set with ground truth) andthe left 20% as testset-with- gt (i.e. test set with ground truth),which are employed to conduct the comparison experimentsbetween our method and existing methods.The reason why we choose the first 15% of datasets forlabeling is that during the time of collecting this part ofdata, we coincidentally found there are a large number ofdomains collected from query logs appearing on the maliciousdomain lists of some public blacklists such as DGArchive [23].Therefore, we started to collect data from that point of timeand took these DNS logs containing DGA domains as theground truth datasets to be labeled.Meanwhile, the last 85% unlabelled logs are used as thevalidation data for incremental word2vec algorithm becausewe intend to demonstrate that incremental word2vec couldfunction well not only for the initial labeled datasets but alsofor newly added data. We randomly sample them at a scale of1 /
10 in a consecutive way, then filter and manually label thisnew dataset. Again, the first 80% and last 20% of the datasetare put into trainset-with-gt and testset- with-gt respectively.
TABLE ID etails of
TUNET DNS D atasets
Properties Descriptions
Duration of Data Collection A total of consecutive 14 daysGeneration Rate of DNS Queries About 500 thousand / hPeak Rate of DNS Queries About 3 million / hOccupied Space About 7GB / dayTotal Amount of DNS queries About 162 million in totalAmount of Unique Domains Queried About 1.9 million in totalTABLE IIA nalysis R esults of D atasets with G round -T ruth Properties Values
All DNS Total Amount 38,235,023Domains Unique Amount 463,030Benign DNS Total Amount 35,833,755Domains Unique Amount 368,793Total Amount 2,401,268DGA DNS Unique Amount 94,237Domains Character-Based 84,538 (Unique)Wordlist-based 9,699 (Unique)
V. E xperiments
This section evaluates the performance of the proposedscheme over the real-world network. All operations wereperformed on a terminal server with Intel i7 [email protected] and 32GB RAM running Ubuntu Linux 16.04.
A. Visualization
In order to be intuitive, we use t-SNE [24] to visualize thedomain-embeddings generated from the incremental word2vecmodel. As is shown in Fig.2, DGA domains belonging todi ff erent families are labeled as class to and drawn indi ff erent colors, while benign domains are labeled as class and densely clustered in the right of the picture. Amongthe classes, class drawn in cyan refers to wordlist-basedDGA domains, and the other classes except class 0 representcharacter-based DGA domains.Fig.2 demonstrates that the clusters of malicious and benigndomains can be divided neatly without di ffi culty with thehelp of incremental word2vec. It is suggested that thosewordlist-based DGA domain names, which always mislead tra-ditional methods, could be easily identified using incrementalword2vec algorithm. B. Comparison Experiment
To further evaluate our system performance, we conductcomparison experiments between IWM (short for
IncrementalWord2Vec Model ), FANCI (representative of traditional ma-chine learning methods ), CNN and LSTM (representative of character-based deep learning methods ).It is notable that current popular DGA detection methodssuch as FANCI [5] and D3N [7] usually conduct experiments ig. 2. Visualization of Domain-Embeddings with all DGA Families Usingt-SNE TABLE IIIR esults of E xp .1 to E xp .5 with A ll DGA F amilies
Methods Trainset Testset PRE TPR FPR F1-score
FANCI
NXD NXD 0.791 0.932 0.117 0.856
FANCI
NXD all 0.001 0.176 0.603 0.001
FANCI all all 0.967 0.717 0.012 0.823
CNN all all 0.962 0.906 0.0004 0.934
LSTM all all 0.949 0.962 0.0003 0.947
IWM all all solely on NXDomains, which would miss many DGA do-mains. In fact, the number of DGA domains in NXDomainsonly accounts for less than 36% of the total in all DNSdata. Moreover, we find that FANCI method using RandomForests trained on NXDomains does not perform well on ourtestset containing only NXDomains, especially for wordlist-based malicious domains, which are almost undetectable inthis case. Furthermore, if we evaluate the model with ourtestsets containing domains beyond the NXDomains, barelycan it function normally.Considering the situation above, we design the followingsub-experiments. The results are published in Table 3 andTable 4.
Exp.1.
RF (short for Random Forests) of FANCI, trainedon NXDomains extracted from trainset-with-gt , evaluated onNXDomains extracted from testset-with-gt . Exp.2.
RF of FANCI, trained on NXDomains extractedfrom trainset-with-gt , evaluated on testset-with-gt . TABLE IVR esults of E xp .1 to E xp .5 with only W ordlist -B ased DGA
Methods Trainset Testset PRE TPR FPR F1-score
FANCI
NXD NXD 0.014 0.177 0.118 0.026
FANCI
NXD all 0.001 0.002 0.987 0.001
FANCI all all 0.096 0.133 0.012 0.112
CNN all all 0.239 0.475 0.014 0.318
LSTM all all 0.345 0.489 0.010 0.403
IWM all all
Exp.3.
RF of FANCI, trained on trainset-with-gt , evaluatedon testset-with-gt . Exp.4.
CNN and LSTM, trained on trainset-with-gt , evalu-ated on testset-with-gt . Exp.5.
IWM, trained on trainset-with-gt , evaluated on testset-with-gt .To be clear, we describe the training strategy for IWMhere: As for labeled datasets, we extract the first half of the trainset-with-gt as the initial train set and the last half asthe new data continuously collected in the real world, named incremental-trainset . And for unlabelled datasets used to trainword embeddings, the same operations are conducted. For thesake of convenience, we divide the incremental train set (bothlabeled and unlabelled) into ten pieces. During the experiment,we first train the initial model on the initial train set andthen conduct model updating operations for each newly addeddataset.The performance of models are evaluated with metrics asfollows:
Precision , True positive rate (also called
Recall ), False positive rate and
F1-score .Through Exp.1,2 and 5, we can conclude that the perfor-mance of FANCI can hardly meet peoples expectations. Andin order to validate that our method is superior to traditionalmachine learning and deep learning algorithms based solelyon domain name strings with the same train set and test set,we conducted experiments 3, 4, and 5. We can see that IWM,whether tested with all DGA families or only wordlist-basedDGA family, performs apparently better than FANCI, CNN,and LSTM.What is more, it can be inferred that when confrontedwith wordlist- based DGA domain names, IWM trumps otherdetectors with the result of 100% recall and 99.6% precision,while the best of the others can hardly achieve half of thevalues.
Fig. 3. Training Time of Basic and Incremental Word2Vec Method WhenNew Data is Provided) TABLE VC omparison R esults of B asic and I ncremental W ord ec A lgorithm Methods PRE TPR FPR F1-score Training TimeBasic
Incre . Evaluation of Incremental Methodology
This evaluation experiment is used to show that our polishedversion of the basic word2vec algorithm, i.e., incrementalword2vec, could perform as well as or even better than itspredecessor while gaining a tremendous acceleration in modelupdating. We construct two control groups: one is standardword2vec method, and the other is incremental word2vecmethod. The way training data is processed is the same asExp.5 in section B, and both groups use testset-with-gt asevaluation dataset. The training epoch number for both groupsis 200 and the evaluation results are listed in Table 5 and Fig.3. VI. C onclusion and F uture W ork In this paper, we presented a novel system using incre-mental word2vec algorithm, which leverages inter-domainrelationships to detect DGA domains e ff ectively with scalablecapability. Our system performs ex- cellently when confrontedwith various DGA families, even with wordlist-based DGAs,which are almost invincible for traditional detectors.Moreover, to make model updating faster when new datais continuously provided, we explore an incremental trainingstrategy. In our empirical experiments, we demonstrate that ourincremental word2vec method could not only outperform otherdetectors but also gain a tremendous acceleration in model re-training.Since the datasets for training and evaluation are collectedcontinuously from the real-world networks, it is evident thatour system is an online system which deals with tens ofthousands of DNS query streams with high accuracy ande ffi ciency.The limitation of this paper is that the vocabulary ofincremental word2vec model could become very large whenunlimited data pours in, even though we already take measuressuch as merging common su ffi xes of domains to shorten thelength of the vocabulary. It is di ffi cult because we cannotmerely limit the max length of the vocabulary due to thepossible absence of some domains embedding that may leadthe classifier to fail to find a suitable vector representation.Besides, labeled datasets of domains are hard to obtain. Infuture work, we will pursue solutions to these problems andget a better detection model.A cknowledgment We thank Mingkai Tong, other classmates and teachersfor their valuable help. Additionally, we thank InformationTechnology Center of Tsinghua University for authorizing theuse of their data in our experiments. This work is supportedby the National Science and Technology Major Project underGrant No.2017YFB0803004.R eferences [1] M. Antonakakis, R. Perdisci, Y. Nadji, N. Vasiloglou, S. Abu-Nimeh, W. Lee, and D. Dagon, “From throw-away tra ffi cto bots: Detecting the rise of dga-based malware,” in Presented as part of the 21st USENIX Security Symposium(USENIX Security 12) . Bellevue, WA: USENIX, 2012, pp. 491–506. [Online]. Available: https: // / conference / usenixsecurity12 / technical-sessions / presentation / antonakakis[2] D. Plohmann, K. Yakdan, M. Klatt, J. Bader, and E. Gerhards-Padilla, “A comprehensive measurement study of domain generatingmalware,” in . Austin, TX: USENIX Association, Aug. 2016,pp. 263–278. [Online]. Available: https: // / conference / usenixsecurity16 / technical-sessions / presentation / plohmann[3] M. Pereira, S. Coleman, B. Yu, M. DeCock, and A. Nascimento, Dic-tionary Extraction and Detection of Algorithmically Generated DomainNames in Passive DNS Tra ffi c: 21st International Symposium, RAID2018, Heraklion, Crete, Greece, September 10-12, 2018, Proceedings ,09 2018, pp. 295–314.[4] S. Yadav, A. K. K. Reddy, A. L. N. Reddy, and S. Ranjan, “Detectingalgorithmically generated malicious domain names,” in Acm SigcommConference on Internet Measurement , 2010.[5] S. Sch¨uppen, D. Teubert, P. Herrmann, and U. Meyer, “ { FANCI } :Feature-based automated nxdomain classification and intelligence,” in { USENIX } Security Symposium ( { USENIX } Security 18) , 2018, pp.1165–1181.[6] Z. Wang, Z. Jia, and B. Zhang, “A detection scheme for dgadomain names based on svm,” in .Atlantis Press, 2018 /
03. [Online]. Available: https: // doi.org / / mmsa-18.2018.58[7] M. Tong, X. Sun, J. Yang, H. Zhang, S. Zhu, X. Liu, and H. Liu, D3N:DGA Detection with Deep-Learning Through NXDomain , 08 2019, pp.464–471.[8] P. Lison and V. Mavroeidis, “Automatic detection of malware-generateddomains with recurrent neural models,”
CoRR , vol. abs / // arxiv.org / abs / Neurocomputing , vol. 275, 11 2017.[10] J. J. Koh and B. Rhodes, “Inline detection of domain generationalgorithms with context-sensitive word embeddings,”
CoRR , vol.abs / // arxiv.org / abs / et al. , “Learning distributed representations of concepts,”in Proceedings of the eighth annual conference of the cognitive sciencesociety , vol. 1. Amherst, MA, 1986, p. 12.[12] W. Lopez, J. Merlino, and P. Rodrguez-Bocca, “Vector representation ofinternet domain names using a word embedding technique,” 09 2017,pp. 1–8.[13] T. Mikolov, K. Chen, G. S. Corrado, and J. Dean, “E ffi cient estimation ofword representations in vector space,” CoRR , vol. abs / CoRR ,vol. abs / // arxiv.org / abs / , pp. 232–236, 2018.[17] N. Kaji and H. Kobayashi, “Incremental skip-gram model with negativesampling,” CoRR , vol. abs / // arxiv.org / abs / CoRR , vol. abs / // arxiv.org / abs / // en.wikipedia.org / wiki / Country code top-level domain, 2019.[20] J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methodsfor online learning and stochastic optimization,”
Journal of MachineLearning Research , vol. 12, no. Jul, pp. 2121–2159, 2011.[21] J. S. Vitter, “Random sampling with a reservoir,”