Archive | 2019
Quantitative text analysis
Abstract
Quantitative text analysis refers to the process of analysis of text data using statistical procedures. In conducting quantitative text analyses, researchers use automated and a systematic method to process large amounts of text. For example, a number of large policy documents can be processed using data science methods to identify the different types of words and topics embedded in these documents. Quantitative text analysis can be conducted on a single document, or on a number of different text documents. Besides, quantiative text analysis can be conducted on social media posts, such as T witter feeds, Facebook posts, blogs, and transcribed data from video sharing sites such as the Youtube and others. Quantiative text analysis is also referred to as computational text analysis . Steps of conducting a quantit iative text analysis Quantitative text analyses consist of five related steps: 1. Step 1: read the text into the computer programme used for computational text analyses 2. Step 2: T okenize and convert the text to its consituent words or features by identifying word or sentence or character level boundaries and then extracting individual characters, words, or combination of words, and sentences or other syntactic level units. In this way, from the corpus of the text, individual words or units of data analyses are extracted and stored into an object which is then analysed in different ways. 3. Step 3: Extraction of document feature matrix (or document term matrix). -Once the main corpus of the text is broken down into its constituent words, other properties of the document or documents and the meta-data are also extracted. T he process of extracting words from the document is referred to as n-gram analysis. With single words, or n-grams if more than one words are used, and then depending on the value of n, such word combinations are referred to as bigrams (n = 2), or trigrams (n = 3), and correspondingly higher levels of ngrams. Once the documents are converted to the formats where their constituent words and associated meta data are made available, such individual words are referred to as features. At that point, counting and graphing of features provide insights into the syntactic organisation of the text Qeios, CC-BY 4.0 ยท Definition, September 5, 2019