Computation

## Large-data determinantal clustering

Determinantal consensus clustering is a promising and attractive alternative to partitioning about medoids and k-means for ensemble clustering. Based on a determinantal point process or DPP sampling, it ensures that subsets of similar points are less likely to be selected as centroids. It favors more diverse subsets of points. The sampling algorithm of the determinantal point process requires the eigendecomposition of a Gram matrix. This becomes computationally intensive when the data size is very large. This is particularly an issue in consensus clustering, where a given clustering algorithm is run several times in order to produce a final consolidated clustering. We propose two efficient alternatives to carry out determinantal consensus clustering on large datasets. They consist in DPP sampling based on sparse and small kernel matrices whose eigenvalue distributions are close to that of the original Gram matrix.

Other Statistics

## A Text Mining Discovery of Similarities and Dissimilarities Among Sacred Scriptures

The careful examination of sacred texts gives valuable insights into human psychology, different ideas regarding the organization of societies as well as into terms like truth and God. To improve and deepen our understanding of sacred texts, their comparison, and their separation is crucial. For this purpose, we use our data set has nine sacred scriptures. This work deals with the separation of the Quran, the Asian scriptures Tao-Te-Ching, the Buddhism, the Yogasutras, and the Upanishads as well as the four books from the Bible, namely the Book of Proverbs, the Book of Ecclesiastes, the Book of Ecclesiasticus, and the Book of Wisdom. These scriptures are analyzed based on the natural language processing NLP creating the mathematical representation of the corpus in terms of frequencies called document term matrix (DTM). After this analysis, machine learning methods like supervised and unsupervised learning are applied to perform classification. Here we use the Multinomial Naive Bayes (MNB), the Super Vector Machine (SVM), the Random Forest (RF), and the K-nearest Neighbors (KNN). We obtain that among these methods MNB is able to predict the class of a sacred text with an accuracy of about 85.84 %.

Applications

## The future of forecasting competitions: Design attributes and principles

Forecasting competitions are the equivalent of laboratory experimentation widely used in physical and life sciences. They provide useful, objective information to improve the theory and practice of forecasting, advancing the field, expanding its usage and enhancing its value to decision and policymakers. We describe ten design attributes to be considered when organizing forecasting competitions, taking into account trade-offs between optimal choices and practical concerns like costs, as well as the time and effort required to participate in them. Consequently, we map all major past competitions in respect to their design attributes, identifying similarities and differences between them, as well as design gaps, and making suggestions about the principles to be included in future competitions, putting a particular emphasis on learning as much as possible from their implementation in order to help improve forecasting accuracy and uncertainty. We discuss that the task of forecasting often presents a multitude of challenges that can be difficult to be captured in a single forecasting contest. To assess the caliber of a forecaster, we, therefore, propose that organizers of future competitions consider a multi-contest approach. We suggest the idea of a forecasting "athlon", where different challenges of varying characteristics take place.