CoPaSul Manual -- Contour-based parametric and superpositional intonation stylization
CCoPaSul ManualContour-based, parametric, and superpositional intonation stylization
Uwe D. ReichelResearch Institute for LinguisticsHungarian Academy of [email protected] 0.8.x, October 28th, 2018 a r X i v : . [ c s . C L ] O c t ontents Voice quality 2410 Feature sets 2511 Configurations 28
12 Output 53
13 Plotting 8814 Known bugs 8915 History 89References 94 Introduction
The purposes of the CoPaSul toolkit are (1) automatic prosodic annotation and (2) prosodic feature extraction fromsyllable to utterance level.CoPaSul stands for contour-based, parametric, superpositional intonation stylization. The core model is introducedamongst others in [14]. In this framework intonation is represented as a superposition of global and local contoursthat are described parametrically in terms of polynomial coefficients. On the global level (usually associated but notnecessarily restricted to intonation phrases) the stylization serves to represent register in terms of time-varying f0 leveland range. On the local level (e.g. accent groups), local contour shapes are described. From this parameterizationseveral features related to prosodic boundaries and prominence can be derived. Furthermore, by coefficient clusteringprosodic contour classes can be derived in a bottom-up way. Next to the stylization-based feature extraction alsostandard f0 and energy measures (e.g. mean and variance) as well as rhythmic aspects can be calculated.At the current state automatic annotation comprises: • segmentation into interpausal chunks • syllable nucleus extraction • unsupervised localization of prosodic phrase boundaries and prominent syllablesF0 and partly also energy feature sets can be extracted for: • standard measurements (as median and IQR) • register in terms of f0 level and range • prosodic boundaries • local contour shapes • bottom-up derived contour classes • Gestalt of accent groups in terms of their deviation from higher level prosodic units • rhythmic aspects quantifying the relation between f0 and energy contours and prosodic event ratesPlease see section 10 for a list of application examples. The CoPaSul command-line toolkit can be downloaded from this location: http://clara.nytud.hu/~reichelu/copasul.zip
The toolkit is written in Python 3 and depends on the following Python packages (with the specified version orhigher): • matplotlib 1.3.1 • numpy 1.8.2 • pandas 0.13.1 • scipy 0.15.1 • scikit learn 0.17.1So far the software is tested only for Linux! The installation steps are:
1. unzip the copasul.zip in your target folder DIR2. change to DIR3. open make.py in a text editor and set the string variable python path to the Python3 call related to your platform https://docs.python.org/3/using/mac.htmlhttps://docs.python.org/3/using/unix.htmlhttps://docs.python.org/3/using/windows.html
4. and call > python3 make.py this script adjusts the python path according to your changes in make.py , and inserts DIR into the pythonscripts, so that all copasul modules are found.5. for a command line test call change to DIR and type > copasul.py -c config/test.json
6. the result should be found in the test/res/ subfolder and should contain: • csv files with analysis results and corresponding R input code template files to read the csv files for furtherstatistical analyses • • The main script copasul.py can be used in a shell or within the Python3 environment. After having changed to thecopasul directory from the shell it is called as follows: > copasul.py -c myConfigFile.json , e.g. > copasul.py -c config/test.json The content of myConfigFile.json is explained in section 11.Within the Python environment the tool is used this way: >>> import copasul as copa >>> myCopa = copa.copasul( { ’config’:myConfig } ) >>> myCopa = copa.copasul( { ’config’:myConfig, ’copa’:myCopa } ) The input argument is a nested dictionary with at least one sub-dictionary config , which contains the configurations(see section 11). copasul() returns the output dictionary myCopa with the extracted feature sets (see section 12.3).In case feature extraction should not start from scratch, but an already existing dictionary should be corrected orexpanded, it will be passed to the function via the key copa as shown in second example.For shell calls as well as for calls within the Python environment the stylization output is written to a Pythonpickle file and to csv table files as specified in the configurations. See section 12.5
Input
For automatic annotation Copasul needs audio and f0 table files. For feature extraction it additionally needs annotationfiles. For the voice feature set furthermore pulse table files are needed. Corresponding files do not necessarily need tohave the same name stem, but it is assumed that all audio, f0, and annotation files are sorted the same. An examplecan be found in the input subdirectory.Additionally a configuration file in
JSON format is needed as further specified in section 11.
Currently only wav files are supported. The files can be mono or stereo. For conversion to wav, e.g.
Praat, Audacity ,or
Sox software can be used.
Plain text files. Tables with whitespace column separator. The first column contains time information. All furthercolumns contain the f0 of the respective channel. For mono files f0 tables thus consist of 2 columns, for stereo filesof 3, etc. All columns need to have the same lengths. Undefined f0 values are to be replaced by 0. Only 100 Hzsample rate is supported, and resampling is carried out from other rates. The Praat scripts extract f0.praat and extract f0 stereo.praat which are contained in this package provide the required input format.
Plain text files. Only needed for the voice feature extraction. Tables with whitespace column separator. Each columncontains the pulse time stamps for one channel in seconds. All columns must contain the same number of rows so thatfor files with more than one channel -1 has to be padded to the shorter columns. The Praat scripts extract pulse.praat and extract pulse stereo.praat which are contained in this package provide the required input format.
The Praat TextGrid format (long and short) and an XML format of the following form are supported.
The tiers need to be stored in the tiers subtree right below the root element.Each tier must have a name assigned by the element name . The items of each tier are collected in the items subtree, in which each item is stored in an item subtree.Segment tiers (see next section) must contain the elements label, t start, t end . Event tiers must contain theelements label, t . 6he XML annotation file can be extended by the user as long as it fulfills the specified requirements in the tiers subtree.
In the following the notation a:b:... refers to branches through the configuration dictionary which is introduced insection 11. The annotation files can contain tiers of the following types:
Segment tiers contain items defined by a label, a start point and an endpoint. They correspond to Praat Inter-valTiers.
Event tiers contain items without a temporal extension. They are defined by a label and a time stamp andcorrespond to Praat TextTiers.Both segment tiers and event tiers are supported for most of the analyses. Wherever needed, an event is convertedto a segment by centering a window of length preproc:point win on the event as is explained in more detail in section8.3. Pause information can only be extracted for segment tiers. In TextGrids pauses are considered to be items withempty labels or labeled as fsys:label:pau . Both event and segment tiers can serve as:
Analysis tiers
In the context of automatic annotation these tiers contain or limit the candidate locations for prosodicevents. Can be segment or event tiers. fsys:augment:glob:tierfsys:augment:loc:tier accfsys:augment:loc:tier ag
For feature extraction these segment or event tiers define the units of analysis. fsys:chunk:tierfsys:glob:tierfsys:loc:tier accfsys:loc:tier agfsys:bnd:tierfsys:gnl f0:tierfsys:gnl en:tierfsys:rhy f0:tierfsys:rhy en:tier
Parent tiers
Parent tiers (1) limit the analysis and normalization windows by their segment boundaries. As anexample, normalization across chunk boundaries can be suppressed. (2) They limit the domain of global trends againstwhich local deviation is measured. It’s strongly recommended to use segment tiers for this purpose. If not specified,the whole file is treated as a single parenting segment. For automatic annotation parent tiers are to be defined by: fsys:augment:syl:tier parentfsys:augment:glob:tier parentfsys:augment:loc:tier parent
For glob, bnd, gnl en, gnl f0, rhy en, rhy f0 feature extraction (see section 10) only speech chunks can serve asparent domains: fsys:chunk:tier
Fallback is again the entire file. For loc feature extraction only the segments of the glob analysis tier can form theparent domain due to the:
Superpositional framework
Within the CoPaSul approach (see section 8.4) the intonation contour is consideredas a superposition of a global and local components. Their domains are defined by the glob and loc option branches,respectively: fsys:glob:tierfsys:loc:tier accfsys:loc:tier ag
This has two implications on the annotation tier definitions: • for each channel only one tier is supported each for the global and the local local domain • the global domain tier is treated as the parent tier for the local domain tier7 utput tiers For automatic annotation these tiers are defined by a stem which is always expanded by the recordingchannel index. fsys:augment:chunk:tier out stmfsys:augment:syl:tier out stmfsys:augment:glob:tier out stmfsys:augment:loc:tier out stm
As an example, given a stereo file and the chunk output tier name CHUNK, the tiers CHUNK 1 and CHUNK 2will be added to the annotation file. For the sake of an uniform treatment, also for mono files the channel index willbe added.
Tier specification
For all tiers, that were not automatically generated, the user needs to specify the recordingchannel index it refers to (also for mono files!), e.g.: fsys:channel:’tierA’=1fsys:channel:’tierB’=2 tierA thus refers to channel 1, and tierB to channel 2. Tier names can be specified as strings, or as list of strings. fsys:bnd:tier=’tierA’ means, that the bnd feature extraction is to be carried out for units defined by the content of tierA . fsys:bnd:tier= [ ’tierA’,’tierB’ ]triggers a bnd feature extraction for the content of two tiers. The channels the specified tiers refer to are lookedup in fsys:channel:* .The name stem of a tier resulting from automatic annotation (e.g. CHUNK) will be expanded automatically, thusfor a chunked stereo file these two specifications are equivalent: fsys:bnd:tier=’CHUNK’fsys:bnd:tier= [ ’CHUNK 1’, ’CHUNK 2’ ]For the feature sets bnd, gnl en, gnl f0, rhy en, rhy f0 (see section 10) an arbitrary number of tiers can be specifiedfor each channel. For chunk, glob, loc only one tier per channel is supported. For f0 extraction in mono or stereo wav files the two Praat scripts contained in this package can be used.They can be called this way: > praat extract_f0.praat myStepsize myMinFreq myMaxFreq \myAudioInputDir myF0OutputDir myAudioExt myF0Ext
The usage of extract f0 stereo.praat is the same. Note that subsequent stylization in any case initiates aresampling to 100 Hz , so that myStepsize here can be directly set to 0.01. myMinFreq and myMaxFreq refer to theminimum and maximum of allowed f0 values in Hz. Values below or above are considered as measurement errors andare set to 0. The f0 range choice depends on the recorded speakers. As a rule of thumb the parameters can be setto 50 and 400 Hz, respectively. In my myAudioInputDir the sound files with the extension myAudioExt are collected,and corresponding f0 plain text table files with the audio file’s name stem and the extension myF0Ext are outputtedto the directory myF0OutputDir . Pulse extraction is needed for the voice feature set only. For its extraction in mono or stereo wav files the two Praatscripts contained in this package can be used.They can be called this way: > praat extract_pulse.praat myMinFreq myMaxFreq \myAudioInputDir myPulseOutputDir myAudioExt myPulseExt
The usage of extract pulse stereo.praat is the same. The scripts make use of Praat’s
To PointProcess (cc) routineoperating on sound and pitch objects. For pitch object creation the minimum and maximum of allowed f0 values myMinFreq and myMaxFreq need to be specified in Hz. In my myAudioInputDir the sound files with the extension myAudioExt are collected, and corresponding pulse plain text table files with the audio file’s name stem and theextension myPulseExt are outputted to the directory myPulseOutputDir .8 Automatic annotation
Automatic unsupervised prosodic annotation comprises chunking, syllable nucleus and boundary extraction, prosodicphrase extraction, and pitch accent localization. Details of the algorithms will be given in [16]. At the beginning ofeach introductory paragraph it is specified: navigation: which navigation option to set to True in the configuration file (see section 11) feature sets: which feature sets result from the annotation (see section 10) option sub-dictionary: which configuration sub-dictionaries serve to customize the respective processing (see section 11) output sub-dictionary: which subdirectory of the resulting python nested dictionary contains the extracted feature set (seesection 12.3).
Paths through the configuration dictionary are referred to by my:path:to:option . navigation: do augment * feature sets: – option sub-dictionary: fsys:augment:*:*; augment:*:* output sub-dictionary: (augmented annotation file) navigation: do augment chunk feature sets: – option sub-dictionary: fsys:augment:chunk:*; augment:chunk:* output sub-dictionary: (augmented annotation file) Chunking serves to segment the utterance into interpausal units. It is based on a pause detector, that works thefollowing way: an analysis window w a with length augment:chunk:l is moved over the lowpass-filtered signal togetherwith a longer reference window w r of length augment:chunk:l ref with the same midpoint. A pause is set where themean energy in w a is below a threshold defined relative to the energy in w r , i.e. if e ( w a ) < e ( w r ) · augment:chunk:e rel .Chunks are then trivially assigned to interpausal intervals. Silence margins can be set at chunk starts and ends by augment:chunk:margin .If w r itself is identified as a pause by e ( w r ) < e ( s ) · augment:chunk:e rel it is replaced by s ; where s consists ofselected parts of the acoustic signal in the analysed channel with absolute amplitude values above the median. Bythis lower threshold the robustness against a high occurrence of speech pauses is increased.The filtering of the signal can be customized by the sub-dictionary augment:chunk:flt . In there btype gives theButterworth filter type ( high, low, band, or none ), f the cutoff frequencie(s), and ord the order. For pauses as well asfor inter-pause intervals minimum lengths can be defined by augment:min pau l and min chunk l , respectively. Pausesare then merged across too short chunks, and chunks are merged across too short pauses. The segment tier outputwill be added to the annotation file. The tier name is specified by fsys:augment:chunk:tier out stm concatenatedwith the respective channel index. Standard labels ’x’ are assigned to chunk segments, and fsys:label:pau to thepauses inbetween. navigation: do augment syl feature sets: – option sub-dictionary: fsys:augment:syl:*; augment:syl:* output sub-dictionary: (augmented annotation file) For syllable nucleus detection the method proposed by [13] is adopted. Again an analysis w a with length augment:syl:l and a longer reference window w r of length with length augment:syl:l ref with the same mid-point are moved along the signal, which this time is band-pass filtered to focus on the frequency band related tovocalic nuclei. The filter specification in augment:syl:flt works as described for chunking. From this energy contourthe local maxima are extracted. If for a local maximum the mean energy in w a supersedes the mean energy in w r bya defined factor, i.e. if e ( w a ) > e ( w r ) · augment:syl:e rel , and if e ( w a ) is not below a defined fraction of the energyin the current chunk w c (fallback: whole file), i.e. e ( w a ) ≥ e ( w c ) · augment:syl:e min , a syllable nucleus is set. Fromwhich tier to get the current chunk is to be defined by augment:syl:tier parent . E.g. it can be the output tier of apreceding chunking step. A further constraint augment:syl:d min specifies the minimum distance between subsequentsyllable nuclei. If two nuclei are too close, they are merged to a single syllable and the point of energy maximum inthis interval is assigned to be the nucleus.Subsequently syllable boundaries are assigned to the energy minimum between adjacent syllable nuclei. They justserve as fallback prosodic boundary candidates.The output consists of two event tiers for syllable nuclei and boundaries and will be added to the annotation file.The tier name is specified by fsys:augment:syl:tier out stm . For the nuclei it is concatenated with the respectivechannel index. For the boundaries it is concatenated with a ’bnd’ infix and the channel index. Standard labels ’x’ areassigned for both tiers. 9 .3 Prosodic phrase boundary location navigation: do augment glob feature sets: – option sub-dictionary: fsys:augment:glob:*; augment:glob:* output sub-dictionary: (augmented annotation file) Prosodic phrase boundary decisions are based on nearest centroid classification. The user needs to specify the tierthat contains boundary candidates in fsys:augment:glob:tier . For segment tiers these candidates are the segmentboundaries, for event tiers, the candidates are the time stamps. If no tier is specified, syllable boundaries derived bystep 7.2 will be selected as candidates. At each boundary candidate a feature set is extracted that had been provento be related to prosodic boundaries in former studies [20, 21]. This feature set is introduced in section 8.9. The userneeds to specify which of these features should be selected by augment:glob:wgt:myBndFeatset+:myRegister+:myFeat+ .In case a phone segment tier is available and if centroids are derived from the entire data set and not separatelyfor each file (see below), in addition z-scored vowel length can be used as a feature. The length of the vowel associatedwith the prosodic event candidate is divided by its mean length derived from the entire dataset. The associated vowelis the last vowel segment with an onset before the boundary candidate time stamp. The length feature can be addedby: augment:glob:wgt:pho=1
The phonetic segment tiers (one for each channel) are to be specified in fsys:pho:tier
Vowels are identified in these tiers by a regular expression stored in fsys:pho:vow
This feature will be beneficial for languages in which phrase boundaries and/or accents are marked by phonesegment lengthening.Furthermore the user can select whether the current feature values at time i , v i , or the delta values (i.e. thedifferences to the preceding values v i − v i − ) or both should be taken: augment:glob:measure Some features require units from a parent tier which is to be specified by augment:glob:tier parent , e.g. tomeasure local f0 trend discontinuities within a superordinate unit and to limit analysis and normalization windows.Such units are e.g. chunks derived from preceding chunking. Fallback is the entire file.From the features for each of the two classes boundary B and no boundary NB a centroid can be bootstrapped inseveral ways given the specification in augment:glob:cntr mtd as described in the following sections. Centroids canbe calculated separately for each file or over the entire data set by setting the value of augment:glob:unit to file or batch , respectively. The latter is strongly recommended for corpora containing lots of short recordings. augment:glob:cntr mtd=splitaugment:glob:prct=mySplitPoint Since for all extracted pause length and pitch discontinuity boundary features are positive correlation has beenfound to perceived boundary strength [20, 21] B and NB centroids can be straight-forwardly derived from high andlow feature values, respectively. Centroids are thus derived by splitting each column in the feature matrix at thepercentile augment:glob:prct . The B centroid is defined by the median of the values above the splitpoint, the NBcentroid by the median of the values below. All feature vectors are then assigned to the nearest centroid in a singlepass. Boundaries are subsequently inserted at all candidate time points classified as B. This method works for bothsegment and event tier input. augment:glob:cntr mtd=seed kmeansaugment:glob:min l=myMinPhraseLength
This procedure works for segment tier input only since it makes use of pauses between adjacent segments. Asvisualized in Figure 1 B and NB centroids are bootstrapped based on two assumptions: (1) each pause indicates aprosodic boundary, and (2) prosodic phrases have a minimum length, thus in the vicinity of pauses there are no furtherboundaries. KMeans clustering is then initialized by these two centroids and subdivides all candidates into the B andNB cluster. Boundaries are inserted at all candidate time points belonging to the B cluster.10igure 1: Bootstrapping seed centroids for the classes 1 (boundary) and 0 (no boundary). Word boundaries areindicated by the short vertical lines. Assumptions: each pause indicates a prosodic boundary (green), and prosodicphrases have a minimum length (red window), thus in the vicinity of pauses there are no further boundaries (blue). augment:glob:cntr mtd=seed prctaugment:glob:prct=mySplitPointaugment:glob:min l=myMinPhraseLength
The seed centroid bootstrapping works as for the preceding method. Instead of kMeans, for the remaining featurevectors the Euclidean distance to the NB seed centroid is calculated. Vectors with a distance above the mySplitPoint -thpercentile of all measured distances are assigned to the B class, the others to the NB class.
The percentile split method works for both segment and event tiers, whereas the two centroid bootstrapping methodsneed segment tier input to infer pause locations. For the two percentile split approaches, the parameter augment:glob:prct serves to control for the number of inserted boundaries. The higher, the smaller the B class, thus the fewer boundarieswill be assigned.If a text transcription is at hand the user can ensure that prosodic boundaries only occur at word boundaries bypreceding signal-text alignment, e.g. by WebMAUS [25, 9].
Heuristics augment:glob:heuristics=ORT
If set by the user, this heuristics assumes a word segment tier as input and rejects boundaries after too short andthus probably function words ( < . s ) augment:glob:wgt:myBndFeatset+:myRegister+:myFeat+=myWeightaugment:glob:wgt mtd=myWeightingMethod By the augment:glob:wgt:myBndFeatset+:myRegister+:myFeat+ branches the user at the same time selects andweights features. As an example augment:glob:wgt:win:ml:rms=1 selects the feature rms derived from the register representation ml within the boundary feature set win (see sections8.9 and 10 for explanations). If the weighting method in augment:glob:wgt mtd is set to ’user’, the weight of thisfeature becomes 1. If no weighting is intended, to all selected features should be assigned the same weight. As analternative to the definition by the user, weights can also be extracted by correlation to the median or by the clustersilhouette measure. Correlation
Each feature is correlated with the medians of the feature vectors. Since as mentioned all boundaryfeatures are expected to be positively correlated to boundary strength, and since the median is expected to be morerobustly related to boundary strength than single features, the correlation between a feature and the medians to someextend reflects the goodness of this feature to predict boundary strength. Features with a negative correlation to themedian will be removed from the pool. All remaining correlations are transformed to weights summing up to 1 bydividing them by the sum of correlations.
Silhouette
The mean silhouette over all clustered data points measures how well clusters can be separated. Hereit is measured separately for each feature within the clearly assignable feature vectors from which the B and NB seedcentroids were derived. It is minmax-normalized to the range [0 1].11 .3.6 Output
The output consists of a segment tier for each channel with the name fsys:glob:tier out stm + channelIndex. Eachsegment spans the interval between two subsequent B events. If fsys:glob:tier is a segment tier, then pauses aretaken over from this tier. Standard labels ’x’ are assigned to the prosodic phrase segments.
Pitch accents are derived in an analogous bootstrap fashion as prosodic boundaries. The user needs to specify anevent tier (default: syllable nuclei) for localization of the pitch accent candidates. Furthermore the user can specify asegment tier (e.g. words) to restrict the maximum number of detected pitch accents within each segment to 1. fsys:augment:loc:tier accfsys:augment:loc:tier ag
Given a segment tier, the user can furthermore specify (1) whether each segment should get an accent or only theprominent ones augment:loc:ag select and (2) where within a segment an accent should be placed: left- or rightmost, e.g. for prosodically left- orright-headed languages, or on the most prominent candidate. augment:loc:acc select
Prominence can be parameterized by several feature sets measuring standard f0 and energy features, contour shapeswithin local segments and their deviation from a global declination trend.The user can select whether the current feature values at time i , v i , or the delta values (i.e. the differences to thepreceding values v i − v i − ) or both should be taken: augment:loc:measure Some features require units from a parent tier which is to be specified by augment:loc:tier parent , e.g. tomeasure local f0 deviations relative to some superordinate unit and to limit analysis and normalization windows. Suchunits are e.g. prosodic phrases derived from preceding phrase extraction. Fallback is the entire file.From these features for each of the two classes accented A and not accented NA a centroid can be bootstrapped inseveral ways analogously to the prosodic boundary extraction, this time given the specification in augment:loc:cntr mtd .Centroids can be calculated separately for each file or over the entire data set by setting the value of augment:loc:unit to file or batch , respectively. The latter is strongly recommended for corpora containing lots of short recordings. augment:loc:cntr mtd=splitaugment:loc:prct=mySplitPoint Given a user-defined feature set where for each feature high values indicate prominence A and NA centroids can bestraight-forwardly derived from high and low feature values, respectively. Centroids are thus derived by splitting eachcolumn in the feature matrix at the percentile augment:loc:prct . The A centroid is defined by the median of thevalues above the splitpoint, the NA centroid by the median of the values below. All feature vectors are then assignedto the nearest centroid in a single pass. Boundaries are then inserted at all candidate time points classified as B. Thismethod works for both segment and event tier input. augment:loc:cntr mtd=seed kmeansaugment:loc:max l na=myMaxLengthNAaugment:loc:min l a=myMinLengthAaugment:loc:min l=myMinLengthAG
This procedure works only if a segment tier is provided next to the event tier, and if this segment tier contains word-like units. As for the phrase boundary detection described above there are 2 (this time even more) simplifying assump-tions to derive seed centroids for cluster initialization (cf. Figure 2): (1) each word longer than augment:loc:min l a contains an accent, due to its expected high information content. (2) each word shorter than augment:loc:max l na does not contain an accent due to its expected low information content. Depending on augment:loc:acc select theA centroid is then calculated from all leftmost, rightmost, or most prominent tier acc candidates in the tier ag seg-ments fulfilling criterion (1). The NA centroid is calculated from all tier acc candidates in in the tier ag segmentsfulfilling criterion (2). KMeans clustering is then initialized by these two centroids and subdivides all candidatesinto the A and NA cluster. Multiple A cases within the same segment are reduced by augment:loc:acc select .Furthermore, among A cases closer than augment:loc:min l only the more prominent ones are kept.12igure 2: Bootstrapping seed centroids for the classes 1 (accent) and 0 (no accent). Word boundaries are indicatedby long vertical lines, and syllable nuclei by short vertical lines. Prominence is encoded by the size of the triangles.Assumptions: each word longer than some threshold contains an accent (green); each word shorter than some thresholddoes not contain an accent (blue). Within the accented word the accent is placed on the most prominent syllable (asin this example), or on the left- or rightmost syllable. augment:loc:cntr mtd=seed prctaugment:loc:prct=mySplitPointaugment:loc:max l na=myMaxLengthNAaugment:loc:min l a=myMinLengthA
The seed centroid bootstrapping works as for the preceding method. Instead of kMeans, for the remaining featurevectors the Euclidean distance to the NA seed centroid is calculated. Vectors with a distance above the mySplitPoint -thpercentile of all measured distances are assigned to the A class, the others to the NA class.
The percentile split method works with and without segment tiers, whereas the two centroid bootstrapping methodsneed segment tier input next to the event tier to infer word length. As with boundary detection, the parameter augment:loc:prct serves to control for the number of assigned accents. The higher, the smaller the A class, thus thefewer accents will be assigned.As mentioned for prosodic boundary detection, a supporting word segmentation can be derived by precedingsignal-text alignment, e.g. by WebMAUS [25, 9]. augment:loc:wgt:myFeatset+:...
The same selection and weighting mechanisms apply as described in section 7.3.5.The following feature sets can be used: acc, gst, gnl f0, gnl en (see section 10). In section 11 examples are givenhow to expand the corresponding configuration branches.As for boundary detection also for pitch accent detection z-scored vowel length can be added to the feature set.The vowel interval associated to a pitch accent candidate includes the candidate’s time stamp. See section 7.3 forfurther details. The length feature can be added by: augment:loc:wgt:pho=1
The output consists of an event tier for each channel with the name fsys:loc:tier out stm + channelIndex. Standardlabels ’x’ are assigned to each accent.
In the following the f0 preprocessing and the f0 and energy stylization steps are introduced. For each stylization stepit is specified: navigation: which navigation option to set to True in the configuration file (see section 11) feature sets: which feature sets result from the stylization (see section 10) option sub-dictionary: which configuration parts serve to customize the respective processing (see section 11) output sub-dictionary: which part of the resulting Python nested dictionary variable contains the extracted feature set (seesection 12.3).
Branches through the configuration as well as trough the result dictionary are referred to by my:branch:to:value .13 .1 F0 preprocessing
F0 preprocessing comprises resampling to 100 Hz, outlier detection, interpolation over outliers and voiceless utteranceparts, smoothing, and semitone conversion including speaker normalization.
Outliers
Outliers are identified separetely for each channel in a file. They are defined in terms of deviation from amean value or from the 1st and 3rd quartile. The deviation factor is controlled by preproc:out:f , and the referencepoint by preproc:out:m . For m=mean outliers lie outside the interval [ m − f · sd , m + f · sd]. For m=median outliers lieoutside of [ m − f · iqr , m + f · iqr]. For m=fence outliers lie outside of [ Q − f · iqr , Q + f · iqr] (sd: standard deviation;iqr: interquartile range; Q1, Q3: 1st and 3rd quartile). Interpolation
Only linear interpolation is supported. Horizontal extrapolation is carried out at file boundaries.
Smoothing
The smoothing method is chosen by preproc:smooth:mtd . Median and Savitzky-Golay filtering aresupported. Median filtering yields smoother contours, while Savitzky-Golay better preserves local f0 maxima andminima. The higher the window length preproc:smooth:win , the more smooth the contours. For the Savitzky-Golayfiltering the polynomial order needs to be specified by preproc:smooth:ord . The lower, the more the result getssmoothed away from the input data. navigation: do preproc feature sets: – option sub-dictionary: preproc:* output sub-dictionary: data:myFileIdx:myChannelIdx:f0:* Semitone conversion If preproc:st=1 , Hertz (Hz) values are transformed to semitones (st) as follows: F st =12 · log ( F Hz b ). b is a base value which is calculated separately for each channel in each f0 file. It is defined as themedian of the values below the percentile preproc:base prct and can be used for f0 normalization by file and channel.Alternatively, a grouping variable can be specified, so that for each of its levels a separate f0 base value is calculated.This is done by preproc:base prct grp . There it can be specified which grouping variable is to be assigned to eachchannel. The grouping variable must be encoded in the filename and must be extractable from fsys:grp:lab . Anexample: you have stereo f0 files with the name pattern speakerChannel1 speakerChannel2 . And you want to calculateseparately for each speaker an f0 base value which is the median of the values below the 5th percentile over all thisspeaker’s utterances in the corpus. This is to be configured as follows: fsys:grp:src=’f0’fsys:grp:sep=’ ’fsys:grp:lab= [ ’speakerChannel1’,’speakerChannel2’ ] preproc:base prct=5preproc:base prct grp:’1’=’speakerChannel1’preproc:base prct grp:’2’=’speakerChannel2’ This assigns to each channel the grouping variable to be read from the f0 file names. Note, that (1) channel indicesneed to be written in quotation marks, and (2) a shared semantics across the grouping variables is assumed. E.g. justone base value will be calculated for speaker x , regardless whether she was recorded in channel 1 or 2. Base value subtraction If preproc:st is 0, the base value introduced in the preceding paragraph will be subtractedfrom the f0 contour without semitone conversion. If you don’t want to use any base value, neither for subtraction noras conversion reference, set preproc:base prct=0 . The energy contour is simply represented in terms of the root mean squared deviation (RMSD) within the windowedsignal. The relevant parameters can be found below styl:gnl en styl:rhy en:sig . win defines the window lengthand sts the stepsize. The energy value sample rate is thus 1/ sts . wintyp and winparam give the window type andan additional parameters passed on the get window() function of the scipy.signal module. For customizing energyextraction with other than default values, please consult the scipy.signal documentation for get window() . wintyp and winparam can contain any value specified in this documentation. Windows serve (1) to transform time stamps from an event tier to segments, and (2) to locally normalize featurevalues. 14 ime stamps to segments
Most feature sets are calculated for segments, not for time stamps. Thus event tierinput is converted to segments by centering a symmetric analysis window with the length preproc:point win on eachtime stamp as shown in Figure 3. Features are then extracted within this window. The window can also be separatelyspecified for each feature set by preproc:myFeatureSet:point win . For local contour stylization a segment and anevent tier can be processed in parallel as explained in section 8.7.Figure 3: Segment and event tier input. A symmetric analysis window is centered on events. For local contourstylization, segment and event tiers can be integrated for time normalization: the event is set to 0, the pre-event partof the segment to [ − Normalization
For the feature sets loc, gnl f0 and gnl en several feature values are additionally locally normal-ized to capture their relative amount compared to the local environment. This environment length is defined by preproc:nrm win . For event tier input the normalization window is centered on each time stamp. For segment tierinput, it is centered on the midpoint of each segment. For parallel segment and event tier input which can be providedfor loc feature extraction, the window is centered on the event’s time stamp within the segment. The window can alsobe separately specified for each feature set by preproc:myFeatureSet:nrm win .Figure 4: Analysis and longer normalization window. The values derived in the analysis window are divided by thecorresponding values in the normalization window.
Window constraints
Analysis and normalization window are limited to the corresponding segment in the parenttier domain. For loc features this domain is given by the global segment tier. For the other features it is given by thespeech chunk tier if this tier is defined in fsys:chunk:tier . This means that analysis and normalization is not carriedout across global segments or chunks, respectively. An exception can be made for the bnd feature set, that might bemeaningful for chunk boundaries, too. If so, styl:bnd:cross chunk is to be set to 1. For segment tier input theminimum length of the normalization window is set to the length of the respective segment. This implies that forsegments longer than the defined normalization window, normalized feature values are the same as the not normalizedones. navigation: do preproc feature sets: – option sub-dictionary: preproc:* output sub-dictionary: data:myFileIdx:myChannelIdx:. . . : { t | to | tn } .4 Superposition The core concept of CoPaSul is to represent an f0 contour as a superposition of linear global component and polynomiallocal components as shown in Figure 5.Figure 5: Superposition of one global and four local contours.Stylization is carried out as follows: Within each global segment of the tier fsys:glob:tier (e.g. an intonationphrase) a linear register level and range representation is fitted. After subtraction of this global component, withineach local segment an n-th order polynomial is fitted to the f0 residual. As an alternative to register level subtraction,the f0 residual can also be derived by normalization of the contour to the register range.
In the annotation files global segments can be defined in 2 ways:1. by start and end point (segment tier input specified in fsys:glob:tier )2. by the segments’ right end points (event tier input specified in fsys:glob:tier that contains e.g. break indexlabels)In the second case the events are expanded to segments between the annotated boundary time stamps. Pausesmarked by an empty label or a pause label ( fsys:label:pau ) are skipped and the onset of the subsequent segmentis set to the end of the pause. Therefore, in point tiers pauses should be marked at their right end. Furthermore, ifchunks are provided by fsys:tier:chunk , then the expanded segments do not cross chunk boundaries but end andstart with the boundaries of the respective chunk they are part of.
Global segments are represented in terms of a time-varying f0 register. Register aspects are level (midline) and range(topline − baseline). Figure 6: Register (level and range) stylization in global contour segments.16 avigation: do styl glob feature sets: glob option sub-dictionary: styl:glob:* output sub-dictionary: data:myFileIdx:myChannelIdx:glob:* The register fitting procedure consists of the following steps: • A window of length styl:glob:decl win is shifted along the f0 contour with a step size of 10 ms. • Within each window the f0 median is calculated – of the values below the styl:glob:prct:bl percentile for the baseline, – of the values above the styl:glob:prct:tl percentile for the topline, and – of all values for the midline.This gives 3 sequences of medians, one for the base-, the mid-, and the topline, respectively. • To each of the three median sequences a linear regression line is fitted. To be able to compare contoursacross global segments of different lengths, time is normalized as specified by styl:glob:nrm:mtd to the range styl:glob:nrm:rng .The motivation for using f0 medians relative to respective percentiles instead of local peaks and valleys is twofold.First, the stylization is less affected by prominent pitch accents and boundary tones. Second, errors resulting fromincorrect local peak detection are circumvented. Both enhances stylization robustness as is shown in [21].The following configuration parameters serve to customize how closely the base- and topline should follow localminima and maxima: styl:glob:prct:blstyl:glob:prct:tlstyl:glob:decl win
A closer fit to local peaks and valleys is achieved by lowering styl:glob:prct:bl and styl:glob:decl win , andby raising styl:glob:prct:tl . Note however, that a closer fit will result in a higher percentage of base- and toplinecrossings. In the resulting Python dictionary such error cases are marked as described here: 12.3.2.From this stylization, regression line slope and intercept features are collected for the base-, mid-, and topline,as well as for the range. For the latter these features are simply derived by fitting a linear regression line throughthe point-wise distances between the base- and the topline. A negative slope means that base- and topline converge,whereas a positive slope signals line divergence.
Global contour classes for analyses on the categorical level are derived by slope clustering. The cluster methodcan be chosen by clst:glob:mtd . If the user expects a certain number of classes, this number can be specified by clst:glob:kMeans:n cluster . Otherwise, meanShift clustering should be chosen, either as the cluster method, or incombination with kmeans for the sake of centroid initialization. For customizing the clustering settings by non-defaultvalues several parameters are provided whose values are passed on to the respective Python sklearn functions. Theseparameters are named as in sklearn . If needed, please consult the descriptions of the sklearn functions
KMeans,MeanShift , and estimate bandwidth . Figure 7 gives an example for global and local contour classes. navigation: do clst glob feature sets: glob option sub-dictionary: clst:glob:* output sub-dictionary: data:myFileIdx:myChannelIdx:glob:class
Dependent on styl:register the influence of the global component is removed from the f0 contour in order to derivethe f0 residual for subsequent local contour stylization. If styl:register is set to bl, ml , or tl , then the base, mid, ortopline is subtracted. If the parameter is set to rng , each f0 point is normalized to the local f0 range: the correspondingpoints on the base- and topline are set to 0 and 1, respectively. Thus f0 values between base- and topline are withinthe range [0 1], f0 values below the baseline are <
0, and values above the topline are >
1. For styl:register=none no global component influence is removed. 17igure 7: Global and local intonation contour classes derived by clustering.
In the annotation files local segments can be defined in 3 ways:1. by start and end (segment tier input specified in fsys:loc:tier ag )2. by a center (event tier input specified in fsys:loc:tier acc )3. by both (segment + event tier input)For case (2) time stamps are transformed to segments by placing a symmetric window of length preproc:point win on each time stamp. In order to be able to compare contours across different segment lengths, for (1) and (2) timeis normalized as specified in styl:loc:nrm . styl:loc:nrm:mtd=minmax yields a minmax time normalization to therange styl:loc:nrm:rng .For (3) the time stamp within the segment is treated as the zero-center, that is, time is [ − fsys:loc:tier ag are considered for feature extraction to which at least onecenter is assigned in tier fsys:loc:tier acc . preproc:loc align serves for a robus treatment of multiple centerassignments. Setting this option to skip segments with more than one center are skipped. By left the first center iskept, by right the last one. The f0 residual contour (see section 8.6) in each local segment is stylized by n-th order polynomials. The order isgiven by styl:loc:ord .Figure 8: Local contour stylization by means of a 3rd order polynomial. Time is normalized to the range [0 1].As can be seen in Figure 9 the polynomial coefficients are related to several aspects of local f0 shapes. Given thepolynomial (cid:80) i =0 s i · t i , s is related to the local f0 level relative to the register level. s and s are related to the localf0 trend (rising or falling) and – for annotation cases (2) and (3) – to peak alignment, that is negative values indicate18arly, and positive values late peaks. s determines the peak shape (convex or concave) and its acuity: positive s values indicate convex (falling-rising) shapes, negative values concave (rising-falling) shapes, and high values indicatestronger acuity.Figure 9: Influence of each coefficient of the third order polynomial (cid:80) i =0 s i · t i on the local contour shape. All othercoefficients set to 0. For compactness purpose on the y-axis both function and coefficient values are shown if theydiffer. navigation: do styl loc feature sets: loc option sub-dictionary: styl:loc:*; styl:register output sub-dictionary: data:myFileIdx:myChannelIdx:loc:acc:* Local contour classes for analyses on the categorical level are derived by polynomial coefficient clustering. Thecluster method can be chosen by clst:loc:mtd . If the user expects a certain number of classes, this number can bespecified by clst:loc:kMeans:n cluster . Otherwise, meanShift clustering should be chosen, either as cluster method, orin combination with kmeans for the sake of centroid initialization. For customizing the clustering settings by non-default values several parameters are provided whose values are passed on to the respective Python sklearn functions.These parameters are named as in sklearn . If needed, please consult the descriptions of the sklearn functions
KMeans,MeanShift , and estimate bandwidth . Figure 7 gives an example for global and local contour classes. navigation: do clst loc feature sets: loc option sub-dictionary: clst:loc:* output sub-dictionary: data:myFileIdx:myChannelIdx:loc:class
Standard f0 and energy features are e.g. mean, standard deviation, median, interquartile range, and maximum. Theywill be calculated for the f0 contours for local contour segments. Additionally, the feature values are locally normalizedwithin a window of length preproc:nrm win . See section 8.3 for window length specifications in dependence of theannotation tier type. navigation: do styl loc ext feature sets: loc option sub-dictionary: styl:gnl *:* output sub-dictionary: data:myFileIdx:myChannelIdx:loc:gnl:*
As with global segments, register features can also be extracted for local features exactly the same way as introducedin section 8.5.2. navigation: do styl loc ext feature sets: loc option sub-dictionary: styl:glob:* output sub-dictionary: data:myFileIdx:myChannelIdx:loc:decl:* .7.6 Gestalt features Gestalt features quantify the deviation of the local contour register from the global contour register as shown in Figure10. For this purpose the register properties of the local segment are compared with the properties of the dominatingglobal segment in terms of root mean squared deviations and slope differences. For each register representation (base-,mid-, topline, and range regression line), the RMSD between the local and global declination line is calculated. Thehigher these values, the more the local contour sticks out from the global contour, which is of relevance for studies onprominence, accent group patterns [2], and prosodic headedness [22, 23].Figure 10: Gestalt stylization: Deviation of the local contour register aspects (base, mid, topline, range) from theglobal contour register.The inherent Gestalt properties of the local contours are represented again in terms of polynomial coefficients. Forthis purpose polynomials of n-th order specified by styl:loc:ord are fitted to all supported kinds of f0 residuals:subtraction of base-, mid-, and topline, and range normalization. This yields 4 coefficient vectors, one for each residual. navigation: do styl loc ext feature sets: loc option sub-dictionary: styl:loc:* output sub-dictionary: data:myFileIdx:myChannelIdx:loc:gst:*
Standard features are e.g. mean, standard deviation, median, interquartile range, and maximum. They will becalculated for f0 and energy contours over the entire file and for segments in an arbitrary number of annotationtiers specified in fsys:gnl f0:tier and fsys:gnl en:tier , respectively. For event tiers, the segments are given bycentering an analysis window of length preproc:point win on the time stamps. Additionally, the feature values arelocally normalized within a window of length preproc:nrm win . See section 8.3 for window length specifications independence of the annotation tier type. Furthermore, f0 and energy quotients are calculated between the mean valuesderived in contour initial and final windows and in the respective remainder part of the contour. The length of thiswindow is specified by styl:gnl:win . Finally, a second order polynomial is fitted through the f0 or energy contour,for which time is normalized to the range [0 1]. navigation: do styl gnl f0 feature sets: gnl f0, gnl f0 file option sub-dictionary: styl:gnl f0:* output sub-dictionary: data:myFileIdx:myChannelIdx:gnl f0:*, data:myFileIdx:myChannelIdx:gnl f0 file:*
An additional standard feature for energy only is spectral balance. It is realized as the SPLH–SPL measure, i.e. thesignal’s sound pressure level subtracted from the level after pre-emphasis. Pre-emphasis can be carried out in the timeof frequency domain styl:gnl en:sb:domain . The latter is implemented as proposed by [5]. In the time domainpre-emphasis is calculated as follows: s (cid:48) [ i ] = s [ i ] − α · s [ i − α is set by styl:gnl en:sb:alpha and determines thelower frequency boundary for pre-emphasis by 6dB per octave. 0.95 roughly corresponds to 150 Hz; the smaller thevalue for α , the higher the lower boundary. Alternatively, α can be set directly to the lower frequency boundary F and will be internally transformed to α = e − · π · F · ∆ t . Note that pre-emphasis in the time domain usually leads to anoverall lower energy so that SPLH–SPL will be negative. 20n the frequency domain pre-emphasis is carried out according to [5] by adding 10 · log ((1 + f ) / (1 + f )) tothe logarithmic spectrum.The spectral balance calculation can be restricted to a specified time and/or frequency window. The time windowlength is specified by styl:gnl en:sb:win to cut out the center of that length of the segment to be analysed. It servesto reduce the influence of coarticualtion on the results. High-, low- or band-pass cutoff frequencies ( styl:gnl en:sb:f ;filter type: styl:gnl en:sb:btype ) might be used to limit the analysis to a specified frequency-band (e.g. an uppercutoff frequency 5000 Hz for vowels). navigation: do styl gnl en feature sets: gnl en, gnl en file option sub-dictionary: styl:gnl en:* output sub-dictionary: data:myFileIdx:myChannelIdx:gnl en:*, data:myFileIdx:myChannelIdx:gnl en file:* Boundaries are parameterized in terms of discontinuity features of several register representations. Details and anapplication for perceived prosodic boundary strength prediction can be found in [21].Boundary features can be extracted for any number of segment or event tiers specified by fsys:bnd:tier . Featurescan be extracted for:1. navigate:do styl bnd : each adjacent segment pair. For event tiers, segments are defined as the intervalsbetween two time stamps. Note that this implies, that pause length is only available for segment tier input,where it is defined as the gap between the second segment’s starting point and the first segment’s endpoint.2. navigate:do styl win : fixed time windows. For segment tiers, the pre- and post-boundary units are not givenby the adjacent segments, but by windows of fixed length. For event tiers the window halfs of preproc:point win centered on a time stamp are considered as pre- and post-boundary units.3. navigate:do styl trend : pre- and post-boundary units, that range from the current chunk start to the bound-ary, and from the boundary to the chunk end. If no chunking available, the file start and endpoint are taken.For cases (2) and (3) holds: If styl:bnd:cross chunk is set to 0, and if a chunk tier is given by fsys:tier:chunk ,the analyses windows are limited by the start and endpoint of the current chunk.A boundary is parameterized in terms of pause length (for segment tier input only) and pitch discontinuities. Forthe latter, register features (as described in section 8.5.2) are extracted three times: for the pre-boundary segment, forthe post-boundary segment, and for the concatenation of both segments. Figure 11 illustrates the threefold registerstylization for the pre- and post-boundary as well as for the concatenated segment. Figure 12 shows, how discontinuityfor each of the register lines is expressed. Let seg , seg be the pre- and post-boundary segments, and seg theirconcatenation. Then discontinuity is given by: • the RMSD between the four register representations of seg and the corresponding part of seg . The registerrepresentations are base-, mid-, topline, and range regression line. • the RMSD between the register representations of seg and the corresponding part of seg • the RMSD between the register representations of seg and seg opposed to seg • the reset d , i.e. the difference between the initial value of the regression line in seg and the final value of theregression line in seg • the onset difference of the regression lines d o , i.e. the initial value of the seg regression line subtracted fromthe initial value of the seg line • the difference of the regression line mean values d m , the seg mean being subtracted from the seg mean. Both d o and d m could be used to measure downstep. • the pairwise slope differences s ∗ between the 3 regression lines: for s the seg is subtracted from the seg slope. For s
12 1 and s
12 2 the slopes of seg and seg are subtracted from the seg slope. • the correlation-based distances between the fitted lines calculated for the same combinations as the RMSD valuesabove. Pearson r correlations are turned into distance d values ranging from 0 to 1 by d = − r . • the quotient of RMS errors between stylization input (the respective sequence of medians) and output (the fittedlines). The error of the joint stylization is divided by the error from the single pre- and post boundary fits. Thequotient is reported separately for the entire, the pre-boundary, and the post-boundary segment.21 the increase of the Akaike information criterion (AIC) resulting from one joint vs two separate fits. The AICdoes not only account for the fitting error but also for the number of model parameters. The lower its value, thebetter the model. For least squares fit comparisons the AIC can be calculated as: 2 · k + n · ln RSS. k denotesthe number of model parameters, n the number of stylization input values, and RSS the residual sum of squares.To each fitted line 3 parameters are assigned: intercept, slope, and Gaussian noise variation. The AIC increaseis measured by subtracting the single line fit AIC from the joint fit AIC. It is reported separately for the entire,the pre-boundary, and the post-boundary segment.All features are calculated 4 times, for the base-, mid- and toplines, as well as for the range regression lines.All but the reset and the slope difference variables are positively related to discontinuity. The user might want toreplace the reset and slope differences by their absolute values.In the styl:bnd option sub-dictionary nrm, decl win , and prct have the same purpose right as in the styl:glob context, see section 8.5.2. styl:bnd:win specifies the window length of seg for window case (2).Figure 11: Prosodic boundaries: threefold base-, mid-, and topline register stylization for the pre-boundary, post-boundary, and the concatenated segment.Figure 12: Boundary features describing reset and deviation from a common trend. In this case features are extractedat a word boundary wrd-bnd . The 3 regression lines can refer to f0 baselines, midlines, toplines, and to range. Thesame features are outputted for these 4 register aspects.The boundary feature extraction can be carried out on the (preprocessed) f0 contour or on the f0 residual bysetting styl:bnd:residual to 0 or 1, respectively. The former should be used if boundaries between global segmentsas intonation phrases are examined. The residual might be used if the user is interested in boundaries between e.g.accent groups within the same global segment. Note that for residuals the boundary examination across global segmentsmight not be meaningful, since at these boundaries the residuals are derived from different register regression lines.These cases can be identified in the output by means of the is fin column (see section 12.1). The residual calculationis described in section 8.6. Running boundary stylization on residuals requires a previous global contour stylization,i.e. styl:navigate:do styl glob needs to be set to 1.The subsequent paragraphs name the configuration branches associated to the stylization cases (1)–(3), respectively. navigation: do styl bnd eature sets: bnd option sub-dictionary: styl:bnd:* output sub-dictionary: data:myFileIdx:myChannelIdx:bnd:std:* navigation: do styl bnd win feature sets: bnd option sub-dictionary: styl:bnd:* output sub-dictionary: data:myFileIdx:myChannelIdx:bnd:win:* navigation: do styl trend feature sets: bnd option sub-dictionary: styl:bnd:* output sub-dictionary: data:myFileIdx:myChannelIdx:bnd:trend:* Rhythm features can be extracted for any number of segment or event tiers specified by fsys:rhy *:tier , * rep-resenting f0 and en for the f0 and the energy contour, respectively. Time stamps of event tiers are transformed tosegments as introduced in section 8.3.Rhythm measures consist of: • spectral moments of a DCT analysis of the contour • the number of peaks in the absolute-value DCT spectrum • the frequency associated with the highest peak • event rates within the analyzed segment • the influence of these events on the f0 or energy contour within the analyzed segmentTo extract the relative weight of the low- and high-frequency components of a contour, a discrete cosine transform(DCT) is applied on the contour as in [7]. For the absolute DCT coefficient values the first n rhy *:rhy:nsm spectralmoments are calculated that (up to the forth moment) give the mean, variance, skew, and kurtosis of the DCTcoefficient weight distribution, repsectively.Before applying the DCT the contour is weighted by the two parameters rhy *:rhy:wintyp and rhy *:rhy:winparam as introduced in section 8.2.The events (time stamps or segments) for which rate and influence is to be calculated are read from one or moretier names in fsys:rhy *:tier rate . Thereby within each recording channel each analysis tier in fsys:rhy *:tier is combined with each rate tier in fsys:rhy *:tier rate . Rate is simply measured by counting the events, thatfall within the segment of analysis, and dividing it by the length of the analyzed segment. For segment tiers in fsys:rhy *:tier rate only proportions included in the segment of analysis are added to the count.The influence s of events on the f0 or energy contour is quantified as the relative weight of the DCT coefficientsaround the event rate r (+ / − rhy *:rhy:wgt:rb Hz) within all coefficients between rhy *:rhy:lb and rhy *:rhy:ub
Hz as follows: s = (cid:80) c : r − ≤ f ( c ) ≤ r +1 Hz | c | (cid:80) c : lb ≤ f ( c ) ≤ ub Hz | c | The higher s the higher thus the influence of the event rate on the f0 or energy contour. Figure 13 compares a lowevent rate with a high impact on the energy contour with a high event rate with low impact (high vs low absolutecoefficient values).The relative weight is outputted to the feature table’s columns myRateTier prop (see sections 10 and 12.1). myRateTier refers to each entry in fsys:rhy *:tier rate . The respective analysis tiers from fsys:rhy *:tier are displayed in the tier column. The proportion is outputted for each segment in the analysis tiers.Additionally, the rate of rate tier events in each analysis tier segment is provided by myRateTier rate . Finally, myRateTier mae gives the mean absolute error between the original contour and the inverse cosine transform out-put that is based on the coefficients with frequencies around the event rates. The following paragraphs name theconfiguration branches responsible for the rhythmic analyses of the f0 and energy contour, respectively. myRateTier * parameters are not calculated for analysis/rate tier combinations across recording channels. Thatis: Given are analysis tier TA1 and rate tier
RT2 refering to channels 1 and 2, respectively. Then cells in the
RT2 * columns are set to NA in all TA1 rows, which are identified by the tier column.The number of peaks n peak in the DCT spectrum is derived by counting the local amplitude maxima in thisspectrum among the values greater or equal than the amplitude related to the center of gravity.23igure 13: Influence of events on a contour in terms of the relative weight of the DCT coefficients around the eventfrequency. navigation: do styl rhy f0 feature sets: rhy f0, rhy f0 file option sub-dictionary: styl:rhy f0:* output sub-dictionary: data:myFileIdx:myChannelIdx:rhy f0:*; data:myFileIdx:myChannelIdx:rhy f0 file:*
The energy contour extraction in the analyzed segment is controlled by the styl:rhy en:sig:* sub-dictionary thesame way as explained in section 8.2. navigation: do styl rhy en feature sets: rhy en, rhy en file option sub-dictionary: styl:rhy en:* output sub-dictionary: data:myFileIdx:myChannelIdx:en f0:*; data:myFileIdx:myChannelIdx:rhy en file:*
Voice quality features can be extracted for any number of segment or event tiers specified by fsys:voice:tier . Timestamps of event tiers are transformed to segments as introduced in section 8.3. At the current state voice measuresconsist of: • jitter, • shimmer, • • relative local jitter as the mean absolute difference between adjacentperiods divided by the overall mean period. As for Praat the following parameters can be specified in styl:voice:jit . t min and t max refer to the minimum and maximum allowed period durations, and fac max to the maximally allowedquotient of adjacent periods. Periods not fulfilling these constraints are discarded from calculation.Shimmer again is calculated the same way as Praat does for the Shimmer (local) parameter, i.e. it is the meanabsolute difference between the amplitudes of adjacent periods, divided by the average amplitude.For both jitter and shimmer a 3rd order polynomial is fitted through the obtained sequence of distance values ofadjacent periods each distance divided by the average period, resp. amplitude. Time is normalized to the interval -1to 1. The purpose of these polynomials is to represent the changes of jitter and shimmer over time. As an examplea negative 1st order coefficient for the jitter sequence indicates a decrease in jitter over time (see Figure 9 for theinterpretation of the coefficients).The configuration branches related to the voice feature set are: navigation: do styl voice feature sets: voice, voice file option sub-dictionary: styl:voice:* output sub-dictionary: data:myFileIdx:myChannelIdx:voice:*; data:myFileIdx:myChannelIdx:voice file:* All features are subdivided into the following sets which can be extracted independently of each other. In the subse-quent listing * file indicates that there is an additional feature extraction on the entire file level with minor deviationsfrom the extraction on smaller domains (e.g. missing normalization). • gnl f0, gnl f0 file: general standard f0 features as mean, median, standard deviation, interquartile range; forany number of tiers • gnl en, gnl en file: general standard energy features as mean, median, standard deviation, interquartile range;for any number of tiers • glob: register (level and range) features in larger domains (e.g. intonation phrases); for one tier per channel • loc: shape features in smaller domains (e.g. accent groups) of f0 residuals (after removal of global f0 aspects).Gestalt features, i.e. deviation of accent groups from intonation phrases. This feature set requires the precedentextraction of the glob set; for one tier per channel • bnd, bnd win, bnd trend: boundary features between adjacent segments in the same domain. For bnd thefeatures are derived from the stylization of adjacent segments. In bnd win the stylization is carried out inuniform time windows centered on the segment boundaries irrespective of the segment lengths. In bnd trend the stylization is carried out from the beginning of a speech chunk to the boundary in question, and from thisboundary to the end of the chunk; for any number of tiers • rhy f0, rhy f0 file: DCT-based rhythm features; rates of prosodic events (e.g. syllable nuclei, pitch accents)and their influence on the f0 contour; for any number of tiers • rhy en, rhy en file: DCT-based rhythm features; rates of prosodic events and their influence on the energycontour; for any number of tiers • voice, voice file: voice quality features as jitter and shimmer: mean values and polynomial stylization of theirchanging over time Application examples for these feature sets areapplication feature setspitch accent prototypes for information status and discourse segmentation [14] glob, locprosodic boundary strength prediction [21] bndprosodic typology [22, 23] locempirical evidence for prosodic constituents (accentual phrases) [2, 23] locinterplay of phrasing and prominence [24] loc, bnd, gnl en, gnl f0dialog act prediction [12] glob, loc, gnl f0personality trait prediction [15] glob, loc, gnl f0infant-directed speech [11] glob, loc, gnl f0, gnl enentrainment [18, 17] glob, locofftalk detection [10] glob, loc, gnl en, gnl f0speech disfluencies [1] locpitch accent inventory for low-resource languages [8] locLombard speech characteristics [3] bndSocial media analyses [19] bndHand-stroke–speech coordination [6] rhy en, rhy f0The following tables list all currently available features in alphabetical order, give short descriptions and link themto the respective feature set. In these tables loc and glob within the superpositional setting refer to local (e.g. accentgroups) and global segments (e.g. intonation phrases), respectively. For boundary parameterization pre, post, joint refer to the pre- and post-boundary segments, and to their concatenation, respectively. For boundary features std,win , and trend refer to the underlying windowing of neighboring segments, cf. section 8.9. The number of coefficientand spectral moment variables c* and sm* depend on the polynomial order and spectral moment number specifiedby the user. For the rhy * feature sets myAnalysisTier stands for the analysis tier, and myRateTier for the rate tier,i.e. the rate and influence of events in myRateTier within segments of myAnalysisTier is measured, and all possiblecombinations of analysis and rate tiers are outputted. 25 ame description feature set bl c0 baseline intercept glob, locbl c1 baseline slope glob, locbl d fin final baseline value diff loc-glob locbl d init initial baseline value diff loc-glob locbl m baseline mean value glob, locbl r baseline reset globbl rate baseline declination rate glob, locbl rms baseline RMSD loc-glob locbl sd baseline slope diff loc-glob locbv file-domain f0 base value (Hz) glob, gnl f0 filec* polynomial loc contour coef * loc, gnl f0/en( file)ci channel index (starting with 0) (all sets) class contour class glob, locdur segment duration glob, loc, gnl f0/en( file), rhy f0/en( file)dur nrm normalized duration loc, gnl f0/enf max freq of coef with max ampl. in DCT spectrum rhy f0/en( file)fi file index (starting with 0) (all sets) gi si value of corresponding row in glob lociqr f0 interquartile range glob, loc, gnl f0/en( file)iqr nrm nrm’d f0 interquartile range loc, gnl f0/enis fin item in global segment’s final position? (all sets w/o * file) is fin chunk item in chunk final position? (all sets w/o * file) is init item in global segment’s initial position? (all sets w/o * file) is init chunk item in chunk initial position? (all sets w/o * file) jit jitter voice( file)jit c* polynomial coefs for jitter time course voiceshim shimmer voice( file)shim c* polynomial coefs for shimmer time course voice( file)lab label glob, bnd, gnl f0/en, rhy f0/enlab acc ACC tier label loclab ag AG tier label loclab next next segment’s label bndm f0, energy arit. mean glob, loc, gnl f0/en( file)m nrm f0, energy arit. nrm’d mean loc, gnl f0/enmax f0, energy max glob, loc, gnl f0/en( file)max nrm f0, energy nrm’d max loc, gnl f0/enmed f0, energy median glob, loc, gnl f0/en( file)med nrm f0, energy nrm’d median loc, gnl f0/enml c0 midline intercept glob, locml c1 midline slope glob, locml d fin final midline value diff loc-glob locml d init initial midline value diff loc-glob locml m midline mean value glob, locml r midline reset globml rate midline declination rate glob, locml rms midlines RMSD loc-glob locml sd midline slope diff loc-glob locn peak number of peaks in absoulte DCT spectrum rhy f0/en( file)p pause length (sec) bndqb quotient of means of init and fin part gnl f0/en( file)qf quotient of means of final and non-fin part gnl f0/en( file)qi quotient of means of initial and non-init gnl f0/en( file)qm quotient of means max(init, fin) part and remainder gnl f0/en( file)res bl c* baseline residual poly coef * locres ml c* midline residual poly coef * locres rng c* range line residual poly coef * locres tl c* topline residual poly coef * locrms overall RMSD gnl enrms nrm nrm’d overall RMSD gnl en ng c0 range line intercept glob, locrng c1 range line slope glob, locrng d fin final range line value diff loc-glob locrng d init initial range line value diff loc-glob locrng m range mean value glob, locrng r range line reset globrng rate range declination rate glob, locrng rms range lines RMSD loc-glob locrng sd range line slope diff loc-glob locsb spectral balance gnl ensd f0, energy standard deviation glob, loc, gnl f0/en( file)sd nrm nrm’d f0, energy standard deviation loc, gnl f0, gnl ensi segment index (starting with 0) glob, loc, gnl f0/en, rhy f0/ensm* *th spectral moment of DCT rhy f0/en( file)std | trend | win bl aicI baseline fitting AIC increase joint vs pre+post bndstd | trend | win bl aicI post baseline fitting AIC increase joint vs post bndstd | trend | win bl aicI pre baseline fitting AIC increase joint vs pre bndstd | trend | win bl corrD pre/post-joint baseline corr-based distance bndstd | trend | win bl corrD post post-joint baseline corr-based distance bndstd | trend | win bl corrD pre pre-joint baseline corr-based distance bndstd | trend | win bl d m difference of baseline means pre–post bndstd | trend | win bl d o difference of baseline onsets pre–post bndstd | trend | win bl r pre-post baseline reset bndstd | trend | win bl rms pre/post-joint baseline RMSD bndstd | trend | win bl rms post post-joint baseline RMSD bndstd | trend | win bl rms pre pre-joint baseline RMSD bndstd | trend | win bl rmsR baseline fitting error ratio joint vs pre+post bndstd | trend | win bl rmsR post baseline fitting error ratio joint vs post bndstd | trend | win bl rmsR pre baseline fitting error ratio joint vs pre bndstd | trend | win bl sd post baseline slope diff post–joint bndstd | trend | win bl sd pre baseline slope diff pre–joint bndstd | trend | win bl sd prepost baseline slope diff pre–post bndstd | trend | win ml aicI midline fitting AIC increase joint vs pre+post bndstd | trend | win ml aicI post midline fitting AIC increase joint vs post bndstd | trend | win ml aicI pre midline fitting AIC increase joint vs pre bndstd | trend | win ml corrD pre/post-joint midline corr-based distance bndstd | trend | win ml corrD post post-joint midline corr-based distance bndstd | trend | win ml corrD pre pre-joint midline corr-based distance bndstd | trend | win ml d m difference of midline means pre–post bndstd | trend | win ml d o difference of midline onsets pre–post bndstd | trend | win ml r pre–post midline reset bndstd | trend | win ml rms pre/post–joint midline RMSD bndstd | trend | win ml rms post post-joint midline RMSD bndstd | trend | win ml rms pre pre-joint midline RMSD bndstd | trend | win ml rmsR midline fitting error ratio joint vs pre+post bndstd | trend | win ml rmsR post midline fitting error ratio joint vs post bndstd | trend | win ml rmsR pre midline fitting error ratio joint vs pre bndstd | trend | win ml sd post midline slope diff post–joint bndstd | trend | win ml sd pre midline slope diff pre–joint bndstd | trend | win ml sd prepost midline slope diff pre-post bndstd | trend | win rng aicI range fitting AIC increase joint vs pre+post bndstd | trend | win rng aicI post range fitting AIC increase joint vs post bndstd | trend | win rng aicI pre range fitting AIC increase joint vs pre bndstd | trend | win rng corrD pre/post-joint range line corr-based distance bndstd | trend | win rng corrD post post-joint range line corr-based distance bndstd | trend | win rng corrD pre pre-joint range line corr-based distance bndstd | trend | win rng d m difference of range line means pre–post bndstd | trend | win rng d o difference of range line onsets pre–post bndstd | trend | win rng r pre-post range line reset bndstd | trend | win rng rms std pre/post-joint range line RMSD bndstd | trend | win rng rms post post-joint range line RMSD bndstd | trend | win rng rms pre pre-joint range line RMSD bndstd | trend | win rng rmsR range fitting error ratio joint vs pre+post bndstd | trend | win rng rmsR post range fitting error ratio joint vs post bndstd | trend | win rng rmsR pre range fitting error ratio joint vs pre bndstd | trend | win rng sd post range line slope diff post-joint bndstd | trend | win rng sd pre range line slope diff pre-joint bndstd | trend | win rng sd prepost range line slope diff pre-post bnd td | trend | win tl aicI topline fitting AIC increase joint vs pre+post bndstd | trend | win tl aicI post topline fitting AIC increase joint vs post bndstd | trend | win tl aicI pre topline fitting AIC increase joint vs pre bndstd | trend | win tl corrD pre/post-joint topline corr-based distance bndstd | trend | win tl corrD post post-joint topline corr-based distance bndstd | trend | win tl corrD pre pre-joint topline corr-based distance bndstd | trend | win tl d m difference of topline means pre–post bndstd | trend | win tl d o difference of topline onsets pre–post bndstd | trend | win tl r std pre-post topline reset bndstd | trend | win tl rms pre/post-joint topline RMSD bndstd | trend | win tl rms post post-joint topline RMSD bndstd | trend | win tl rms pre pre-joint topline RMSD bndstd | trend | win tl rmsR topline fitting error ratio joint vs pre+post bndstd | trend | win tl rmsR post topline fitting error ratio joint vs post bndstd | trend | win tl rmsR pre topline fitting error ratio joint vs pre bndstd | trend | win tl sd post topline slope diff post-joint bndstd | trend | win tl sd pre topline slope diff pre-joint bndstd | trend | win tl sd prepost topline slope diff pre-post bndstm f0 file name stem glob, loct off time offset (sec; bnd: of pre-boundary segment) glob, loc, gnl f0/en, rhy f0/en, bndt on time onset (sec; bnd: of post-boundary segment) glob, loc, gnl f0/en, rhy f0/en, bndtier tier name bnd, gnl f0/en, rhy f0/entl c0 topline intercept globtl c1 topline slope globtl d fin final topline value diff loc-glob loctl d init initial topline value diff loc-glob loctl m topline mean value glob, loctl r initial topline reset globtl rate topline declination rate glob, loctl rms topline RMSD loc-glob loctl sd topline slope diff loc-glob locmyRateTier dgm difference between rate and frequency of max amplitude coef rhy f0/en( file)myRateTier dlm difference between rate and frequency of nearest peak coef rhy f0/en( file)myRateTier mae meanAbsErr(IDCT(myRateTier),contourOfFile) rhy f0/en( file)myRateTier prop influence of myRateTier on DCT coefs rhy f0/en( file)myRateTier rate event rate of myRateTier rhy f0/en( file)
11 Configurations
The configuration file format is
JSON . Examples can be found in the config subfolder of the code distribution. copa-sul default config.json contains all default values. In the doc subfolder you find the file copasul commented config.json.txt where all options are commented for a quick overview. In the following detailed introduction of all configuration pa-rameters, the levels of the JSON dictionary are separated by a colon.For numeric and boolean parameters the “values, default” field contains the default value. For string parameters,the default value is indicated in bold face. If a configuration field is named as my* the name is user defined. +indicates “one or more” configuration branches of this kind. Example: fsys:channel:myTiername+ indicates, thatthe user needs to specify for all tiers in the annotation files, to which audio channel they belong. Let’s assume thereare two tiers spk1 and spk2 , the first belongs to channel 1, the second to channel, two, then fsys:channel:spk1=1 and fsys:channel:spk2=2 . fsdescription: f0 sample frequency type: integer values, default: remarks: currently only fs=100 supported. All f0 input will be resampled to this sample rate Automatic annotation steps can be carried out independently of each other as long they don’t depend on the output ofpreceding annotation steps, e.g. if fallback events as syllable boundaries and nuclei are required for phrase boundaryand accent detection, or if parent segments are defined to be the result of preceding automatic clustering or prosodicphrasing. Figure 14 displays the possible augmentation pipelines.28hunksylgloblocFigure 14: Automatic annotation do augment * workflow
Pipelines are defined in the navigate configurations. Processing step dependencies are shownin Figure 15. preprocglob locclst locclst glob gnl f0 gnl en bnd bnd win bnd trend rhy f0 rhy en voiceexportFigure 15: Stylization do styl and clustering do clst workflowProcessing does not always need to start from scratch. Intermediate feature extraction results are stored in Pythonpickle format and can be reloaded for further processing in a later session. The name of the pickle file to be loaded isgiven in fsys:export:dir + fsys:export:stm In order to continue an analysis of a previous session, the user thus needs to make sure that output directory andfile name stem do not change across sessions. The content of the file can be deleted by setting navigate:from scratch to 1. This and all other navigate configuration elements are introduced in the following: navigate:do augment chunkdescription: apply automatic chunking into interpausal units type: boolean values, default: remarks: If 1, a chunk segment tier is generated for each channel and added to the annotation files. navigate:do augment globdescription: apply unsupervised prosodic phrase extraction type: boolean values, default: emarks: If 1, for each channel a segment tier with automatically extracted prosodic phrases is generated and added to theannotation files. If no input tier for prosodic boundary candidates is specified, this step requires preceding syllable extraction,since syllable boundaries will then be taken as candidates. navigate:do augment locdescription: apply unsupervised pitch accent detection type: boolean values, default: remarks: If 1, for each channel an event tier with automatically extracted pitch accent locations is generated and added tothe annotation file. If no user-defined pitch accent candidates can be provided, this step requires preceding syllable nucleusextraction, which will then be taken as candidates. navigate:do augment syldescription: apply automatic syllable nucleus and boundary detection type: boolean values, default: remarks: If 1, for each channel two event tiers – a syllable nucleus and boundary tier – are generated and added to theannotation files. navigate:do clst globdescription: apply local contour clustering type: boolean values, default: remarks: cluster local contour polynomial coefficients to derive local intonation contour classes. navigate:do clst locdescription: apply global contour clustering type: boolean values, default: remarks: cluster global contour line slope coefficients to derive global intonation contour classes. navigate:do exportdescription: export the results type: boolean values, default: remarks: generate csv feature table files, and f0 table files navigate:do plotdescription: plot type: boolean values, default: remarks: online or post-analysis plotting of stylization results. Online plotting serves to check the parameter settings beforeprocessing large data. navigate:do preprocdescription: apply preprocessing type: boolean values, default: remarks: F0 preprocessing and analysis and normalization windowing. If set to 1 at non-initial application to a data set,all information previously gathered from subsequent stylization steps is deleted. navigate:do styl bnd trenddescription: extract boundary features type: boolean values, default: remarks: Extract f0 discontinuity features at each segment boundary or time stamp. This time the pre- and post-boundaryunits range from file start to the boundary, and from the boundary to the file end. If styl:bnd:cross chunk is set to 0, andif a chunk tier is given in fsys:chunk:tier , the analyses windows are limited by the start and endpoint of the current chunk. navigate:do styl bnd windescription: extract boundary features in fixed time windows type: boolean values, default: remarks: Extract f0 discontinuity features. For segment tiers, the pre- and post-boundary units are not given by theadjacent segments as for navigate:do styl bnd , but by windows of fixed length. For event tiers the window halfs of preproc:point win centered on a time stamp are considered as pre- and post-boundary units. If styl:bnd:cross chunk isset to 0, and if a chunk tier is given in fsys:chunk:tier , the analyses windows are limited by the start and endpoint of thecurrent chunk. avigate:do styl bnddescription: extract boundary features type: boolean values, default: remarks: Extract f0 discontinuity features across segments (segment tier input) or at time stamps (event tier input). Onlyfor the former the extracted pause length is meaningful. Discontinuity is amongst others expressed in the deviation of thepre- and post-boundary part from a common declination trend. For segment tiers, this common trend is calculated over bothsegments. For event tiers, the inter-time stamp intervals are considered as segments. navigate:do styl globdescription: apply global contour stylization type: boolean values, default: remarks: Apply f0 register (level and range) stylizations within global segments as e.g. IPs. navigate:do styl gnl endescription: extract standard energy features type: boolean values, default: remarks: Extract energy mean, variance and the like. navigate:do styl gnl f0description: extract standard f0 features type: boolean values, default: remarks: Extract f0 mean, variance and the like. navigate:do styl loc extdescription: extract extended feature set for local f0 contours type: boolean values, default: remarks: Extract local register and Gestalt features, i.e. deviation of the local contour from the global register trend. navigate:do styl locdescription: apply local contour stylization type: boolean values, default: remarks: Apply polynomial f0 contour stylization in local segments as e.g. AGs. navigate:do styl rhy endescription: extract energy rhythm features type: boolean values, default: remarks: apply DCT analyses on energy contour within user-defined segments and calculate the influence of events on thecontour, in terms of the relative weight of DCT coefficients navigate:do styl rhy f0description: extract f0 rhythm features type: boolean values, default: remarks: apply DCT analyses on f0 contour within user-defined segments and calculate the influence of events on thecontour, in terms of the relative weight of DCT coefficients navigate:do styl voicedescription: extract voice quality features type: boolean values, default: remarks: extract jitter and shimmer navigate:from scratchdescription: start from scratch type: boolean values, default: remarks: If 1, all configurations and analyses results in the pickle file are overwritten. navigate:overwrite configdescription: overwrite stored configurations ype: boolean values, default: remarks: If 1, the configuration stored in the pickle file is overwritten by the current user-defined setting. Useful, if e.g.selected analysis steps should be repeated by different preprocessing settings.
There are the following dependencies among the processing steps: • all do styl* steps require preceding do preproc • do styl loc requires preceding do styl glob • do styl bnd requires preceding do styl glob if the boundary features are to be extracted from the f0 residuals. • all do clst* steps require a preceding do styl* step of the same type ( loc or glob )If the preprocessing step navigate:do preproc is repeated, all already extracted features are deleted since theupdated preprocessing configuration might lead to different stylization results. Thus by repeating this step the userneeds to redo all subsequent stylizations. fsys:annot:dirdescription: annotation file directory type: string values, default:remarks: Can be nested. Depending on the task, audio, f0, and annotation files are obligatory or not. All obligatorydirectories must contain the same number of files in the same order. Optimally, same order is guaranteed using the same filename stem for corresponding audio, f0, and annotation files. However, this is not required. fsys:annot:extdescription: annotation file extension type: string values, default:
TextGrid, xml remarks: no default fsys:annot:typdescription: annotation file type type: string values, default:
TextGrid, xml remarks:
Currently, only TextGrid and xml (see section 4.4) are supported. No default. fsys:aud:dirdescription: audio file directory type: string values, default:remarks:
Can be nested. Depending on the task, audio, f0, and annotation files are obligatory or not. All obligatorydirectories must contain the same number of files in the same order. Optimally, same order is guaranteed using the same filename stem for corresponding audio, f0, and annotation files. However, this is not required. fsys:aud:extdescription: audio file extension type: string values, default:remarks:
Only files with this extension are collected from the directory. fsys:aud:typdescription: audio file mimetype type: string values, default: wav remarks: currently only wav supported fsys:augment:chunk:tier out stmdescription: tier name stem of chunking output type: string values, default: chunk remarks:
To the name stem the channel index will be added ( also for mono files! ). E.g. given a stereo file and fsys:augment:chunk:tier out stm=CHUNK , the two segment tiers
CHUNK 1 and
CHUNK 2 will be generated for channel 1and 2, respectively. fsys:augment:glob:tier out stm escription: phrasing output tier type: string values, default: glob remarks: tier name stem of phrasing output. To the name stem the channel index will be added (also for mono files!). E.g.given a stereo file and fsys:augment:glob:tier out stm=’’IP’’ , the two segment tiers IP 1 and
IP 2 will be generated forchannel 1 and 2, respectively. fsys:augment:glob:tier parentdescription: parent tier for prosodic phrase extraction type: string or list of strings values, default: fsys:augment:chunk:tier out stmremarks:
Segment tiers defining the superordinate domain for overall trend measurement from which the pre- and post-candidate-boundary segment deviate. This field can contain a single string (a single tier for mono files or any fsys:augment:*:tier out stm value which will be expanded by the channel index). The user can also explicitly specify multiple tier names in a list, ifseveral channels are to be processed and the tier names cannot be derived from any fsys:augment:*:tier out stm . Forsegment tiers only. fsys:augment:glob:tierdescription:
The tier in which to look for the prosodic boundary candidates. type: fsys:augment:syl:tier out stm + ’ bnd’values, default: string or list of strings remarks:
This field can contain a single string (a single tier for mono files or any fsys:augment:*:tier out stm value whichwill be expanded by the channel index and the syllable boundary infix). The user can also explicitly specify multiple tier namesin a list, if several channels are to be processed and the tier names cannot be derived from any fsys:augment:*:tier out stm .Tiers can be of segment or event type. Default is the the bnd -output of fsys:augment:syl:tier out stm . Note thattreating all syllable boundaries as phrase boundary candidates may result in prosodic boundaries within words. Thus a wordsegmentation tier is strongly recommended. fsys:augment:loc:tier accdescription:
Pitch accent extraction event tier type: string or list of strings values, default: [ ] remarks:
Pitch accent candidate time stamps, e.g. syllable nucleus midpoints. This field can contain a single string (a singletier for mono files or fsys:augment:syl:tier out stm which will be expanded by the channel index). The user can alsoexplicitly specify multiple tier names in a list, if several channels are to be processed and the tier names cannot be derived from fsys:augment:syl:tier out stm . For event tiers only. Field can be empty, but at least one of fsys:augment:loc:tier ag and fsys:augment:loc:tier acc needs to be specified. If only fsys:augment:loc:tier ag : analysis within segment; if only fsys:augment:loc:tier acc : analysis within symmetric window of length preproc:point win centered on the time stamp;if both: analysis within ag segment, time normalization so that 0 position is at acc time stamp within ag . fsys:augment:loc:tier agdescription: pitch accent extraction segment tier type: string or list of strings values, default: [ ] remarks: Tier with segments that are potential accent groups segment domain. This field can contain a single string formono files or a list of strings for more channels. Tiers can be of segment type only. Field can be empty, but at least oneof fsys:augment:loc:tier ag and fsys:augment:loc:tier acc needs to be specified. If only fsys:augment:loc:tier ag :analysis within segment; if only fsys:augment:loc:tier acc : analysis within symmetric window of length preproc:point win centered on the time stamp; if both: analysis within ag segment, time normalization so that 0 position is at acc time stampwithin ag . fsys:augment:loc:tier out stmdescription: accent output tier name stem type: string values, default: acc remarks: To the name stem the channel index will be added (also for mono files!). E.g. given a stereo file and fsys:augment:loc:tier out stm=’’ACC’’ , the two event tiers
ACC 1 and
ACC 2 will be generated for channel 1 and 2,respectively. fsys:augment:loc:tier parentdescription: name of parent tier for pitch accent candidates type: string or list of strings values, default: [ ] remarks:
This parent tier contains segments of a superordinate domain with respect to which the deviation of the accentcandidate segments or time stamps is calculated. This might be global segments or chunks. Fallback is file-level. Must besegment tiers. This field can contain a single string (a single tier for mono files or any fsys:augment:*:tier out stm valuewhich will be expanded by the channel index). The user can also explicitly specify multiple tier names in a list, if severalchannels are to be processed and the tier names cannot be derived from any fsys:augment:*:tier out stm . Tiers can besegment tiers only. sys:augment:syl:tier out stmdescription: tier name stem of syllable nucleus and boundary output type: string values, default: syl remarks: To the name stem the channel index will be added (also for mono files!). Syllable boundary tiers are furthermarked by the infix bnd . E.g. given a stereo file and fsys:augment:syl:tier out stm=’’SYL’’ , the four event tiers
SYL 1,SYL bnd 1 and
SYL 2, SYL bnd 2 will be generated for syllable nuclei and boundaries and for channel 1 and 2, respectively. fsys:augment:syl:tier parentdescription: parent tier for syllable nucleus extraction type: string or list of strings values, default: chunkremarks:
The parent tier defines the boundaries over which the reference window for relative energy calculation must notcross. Fallback is file level. This field can contain a single string (a single tier for mono files or any fsys:augment:*:tier out stm value which will be expanded by the channel index). The user can also explicitly specify multiple tier names in a list, ifseveral channels are to be processed and the tier names cannot be derived from any fsys:augment:*:tier out stm . fsys:bnd:tierdescription: boundary tier names type: string or list of strings values, default: [ ] remarks: each channel can contain several tiers to be analyzed. Segment or event tiers. For segment tiers the boundarybetween adjacent segments is parameterized, and for point tiers, the boundary at time stamps. fsys:channel:myTiername+description: channel index for each relevant tier name in the annotation file type: int values, default: myChannelIdx remarks: For augmentation output tiers this configuration branch is generated automatically. fsys:chunk:tierdescription: chunk tier names type: string or list of strings values, default: [ ] remarks: one item for each channel. In case of multiple channels and single string, this string (e.g. “chunk”) is expanded to“chunk 1”, “chunk 2” . . . for each available channel index. If chunk tiers specified, their segments’ boundaries are not crossedby analysis and normalization windows for most feature sets. For the bnd trend feature set pre- and post-boundary segmentsare limited by the start and endpoint of the superordinate chunk if styl:bnd:cross chunk set to 1. fsys:export:csvdescription: output csv tables type: boolean values, default: remarks: If 1, for each extracted feature set a csv file is outputted together with a code template file to read the table inR. The file names are concatenated by fsys:export:stm and the name of the feature set. fsys:export:dirdescription: output directory type: string values, default:remarks:
Directory in which all csv tables, the log file, and the pickle file are stored. fsys:export:f0 preprocdescription: output preprocessed f0 contours type: boolean values, default: remarks: If 1, preprocessed f0 values are outputted for each input f0 file. The output format is as specified in section 4.2.The output is stored in the subdirectory f0 preproc below the directory fsys:export:dir . fsys:export:f0 residualdescription: output residual f0 contours type: boolean values, default: remarks: If 1, residual f0 contours after register removal are outputted for each input f0 file. The output format is asspecified in section 4.2. The output is stored in the subdirectory f0 residual below the directory fsys:export:dir . fsys:export:f0 resyn escription: output resynthesized f0 contours type: boolean values, default: remarks: If 1, the resynthesized f0 contours as a superposition of global and local contour shapes are outputted for eachinput f0 file. The output format is as specified in section 4.2. The output is stored in the subdirectory f0 resyn below thedirectory fsys:export:dir . fsys:export:fullpathdescription: whether or not to write the full path to the csv tables into the R code template files type: boolean values, default: remarks: If 1, the full path to the csv tables is written into the R code. 0 is recommended in case the data is shared andfurther processed at different locations. fsys:export:sepdescription: table column separator type: string values, default: ,remarks: column separator for csv output tables. fsys:export:stmdescription: output file name stem type: string values, default: copasulremarks:
Same file name stem for all csv files, the log file, and the pickle file. fsys:export:summarydescription: output file/channel summary statistics type: boolean values, default: remarks: If 1, mean and variance values are calculated for all continuous-valued features outputted in the feature-set relatedcsv files per file and analysis tier. For categorical features unigram entropies are calculated. A fsys:export:stm .summary.csvfile is outputted together with an R code template file to read the table in R. fsys:f0:dirdescription: f0 file directory type: string values, default:remarks:
Can be nested. Depending on the task, audio, f0, and annotation files are obligatory or not. All obligatorydirectories must contain the same number of files in the same order. Optimally, same order is guaranteed using the same filename stem for corresponding audio, f0, and annotation files. However, this is not required. fsys:f0:extdescription:
F0 file extension type: string values, default:remarks: only files with this extension are collected from the directory fsys:f0:typdescription:type: string values, default: tab remarks:
Currently only tab supported. fsys:glob:tierdescription: global segment tier names type: string or list of strings values, default: [ ] remarks: analysis tiers for global segment, only one per each channel supported, so that global and local segments can beassigned to each other. If taken over from fsys:augment:*:tier out stm , the names must be extended by the correspondingchannel index, e.g. IP 1 etc, see fsys:augment:*:tier out stm . Segment or event tier. Events are considered to be rightboundaries of segments and are expanded accordingly to segments. fsys:gnl en:tierdescription:
Tiers for standard energy variable extraction. type: string or list of strings values, default: [ ] emarks: More than one tier per channel supported. Segment or event tiers. Events are expanded to segments by preproc:point win . fsys:gnl f0:tierdescription: Tiers for standard f0 variable extraction type: string or list of strings values, default: [ ] remarks:
More than one tier per channel supported. Segment or event tiers. Events are expanded to segments by preproc:point win . fsys:grp:labdescription: grouping labels with values derived from file names type: list of strings values, default: [ ] remarks: Labels of file-name derived grouping. Non-relevant file parts are indicated by empty strings ”. E.g. given the f0filename stem a b 2. Let’s say, “a” represents the speaker ID, “b” is not relevant for the current analysis, and “2” representsthe stimulus ID. Then set fsys:grp:src=f0 , fsys:grp:sep= , and fsys:grp:lab= [ ’spk’,’’,’stim’ ]. The output csv tablesthen contain two additional grouping columns grp spk and grp stim with values derived from the file names (in this case “a”and “2”). Note that all grouping values are treated as strings. fsys:grp:sepdescription: file name split pattern type: string values, default:remarks: How to split the file name to access the grouping values. The string is interpreted as a regular expression. Thuspredefined characters as the dot need to be protected! Thus if file parts are separated by a dot set this option to “ \\ .”. Iffileparts are separated by more than one symbol, e.g. dot and underscore, use “( |\ .)”. fsys:grp:srcdescription: grouping source type: string values, default: f0 , annot, aud remarks: from which file type to derive the file name based grouping fsys:label:chunkdescription: chunk label type: string values, default: x remarks: will be used by automatic chunking fsys:label:paudescription: pause label type: string values, default: < P > remarks: in annotation files, segments labeled by this symbol are treated as pauses and are not analyzed. For boundaryfeature extraction these segments define the pause length feature between the preceding and following segment. Note, thatthis symbol as a pause identifier must be uniform over all analyzed tiers. In Praat TextGrids also not labeled segments areconsidered as pauses. fsys:label:syldescription: syllable label type: string values, default: x remarks: will be used by automatic syllable extraction fsys:loc:tier accdescription: local event tier names type: string or list of strings values, default: [ ] remarks: tier (one for each channel) defining pitch accent time stamps. Event tiers only. Field can be empty, but at leastone of fsys:loc:tier ag and fsys:loc:tier acc needs to be specified. If only fsys:loc:tier ag : analysis within segment;if only fsys:loc:tier acc : analysis within symmetric window of length preproc:point win centered on the time stamp; ifboth: analysis within ag segment, time normalization so that 0 position is at acc time stamp within ag . fsys:loc:tier agdescription: local segment tier names type: string or list of strings values, default: [ ] emarks: tier (one for each channel) defining accent group-like units. Segment tiers only. Field can be empty, but at leastone of fsys:loc:tier ag and fsys:loc:tier acc needs to be specified. If only fsys:loc:tier ag : analysis within segment;if only fsys:loc:tier acc : analysis within symmetric window of length preproc:point win centered on the time stamp; ifboth: analysis within ag segment, time normalization so that 0 position is at acc time stamp within ag . fsys:pho:tierdescription: name of tier with phonetic segments type: string or list of strings values, default: [] remarks: one tier per channel. Used for feature extraction in prosodic boundary and accent localization. fsys:pho:vowdescription: vowel pattern type: string values, default: [AEIOUYaeiouy29 { ] remarks: to identify vowel segments in fsys:pho:tier . Is interpreted as a regular expression. fsys:pic:dirdescription: directory for plotting output type: string values, default:remarks: directory for the png files generated by plotting. fsys:pic:stmdescription: file name stem of the plot files type: string values, default: copasulremarks:fsys:pulse:dirdescription: Pulse file directory type: string values, default:remarks:
Can be nested. Only for extracting voice quality features pulse files are obligatory. All obligatory directories mustcontain the same number of files in the same order. Optimally, same order is guaranteed using the same file name stem forcorresponding audio, f0, pulse, and annotation files. However, this is not required. fsys:pulse:extdescription:
Pulse file extension type: string values, default:remarks: only files with this extension are collected from the directory fsys:pulse:typdescription:type: string values, default: tab remarks:
Currently only tab supported. fsys:rhy en:tierdescription:
Tiers for energy rhythm extraction type: string or list of strings values, default: [ ] remarks:
More than one tier per channel supported. Segment or event tiers. Events are expanded to segments by preproc:point win . fsys:rhy en:tier ratedescription: Tiers containing units whose rate is to be calculated within each segment of the fsys:rhy f0:tier tiers type: string or list of strings values, default: [ ] remarks:
More than one tier per channel supported. Segment or event tiers. fsys:rhy f0:tierdescription:
Tiers for f0 rhythm extraction type: string or list of strings values, default: [ ] remarks:
More than one tier per channel supported. Segment or event tiers. Events are expanded to segments by preproc:point win . sys:rhy f0:tier ratedescription: Tiers containing units whose rate is to be calculated within each segment of the fsys:rhy f0:tier tiers type: string or list of strings values, default: [ ] remarks:
More than one tier per channel supported. Segment or event tiers. fsys:voice:tierdescription:
Tiers for voice quality extraction type: string or list of strings values, default: [ ] remarks:
More than one tier per channel supported. Segment or event tiers. Events are expanded to segments by preproc:point win . preproc:base prctdescription: Percentile below which base value for semitone transform is calculated type: float ]0 100[ values, default: remarks: Base value for semitone transform is defined as median of the values below the specified percentile. If set to 0,the base value will be set to 1, i.e. the semitone transform is carried out without normalization. preproc:base prct grp:myChannelIndexdescription:
Grouping variable for which for each of its levels a base value for f0 semitone transform is calculated type: string values, default: ’ ’ remarks:
Indicates for each channel index, which grouping variable is relevant. The grouping variable must be extractablefrom the file name as specified in fsys:grp . E.g. preproc:base prct grp:1 =spkId requires a spkId element in the list of fsys:grp:lab . Channel indices must be written in quotation marks as strings. preproc:loc aligndescription:
Robust treatment of local segments to which more than one center is assigned in the annotation. type: string values, default: skip , left, right remarks: skip – such local segments are skipped; left – the first center is kept; right – the last center is kept. preproc:loc syncdescription:
Extract gnl * and rhy * features only at locations where loc features can be obtained. type: boolean values, default: remarks: Due to the strict hierarchy principle and to window length constraints it is not always possible to extract locfeatures at any location where gnl and rhy features can be obtained. If the user is interested only in locations where all thesefeature sets are available, so that the corresponding feature matrices can be concatenated, this option should be set to 1. preproc:nrm windescription: normalization window length (in sec) type: float values, default: remarks: length of the normalization window. For feature sets gnl * all mean, max, std values derived in the analysiswindow are normalized within longer time window which length is defined by this parameter. If segments to be analyzed arelonger than the normalization window, this window is set equal to the analyzed segment. nrm win can also be individually setfor each of the feature sets loc, gnl f0, gnl en, rhy f0, rhy en (see section 10) by specifying preproc:myFeatureSet:nrm win . preproc:out:fdescription: outlier definition factor type: float values, default: remarks: identifies non-zero f0 values as outliers, that deviate more than this factor times dispersion from the meanvalue. If preproc:out:m=mean , the mean value is given by the arithmetic mean and the dispersion by the standard devia-tion. If preproc:out:m=median , the mean value is given by the median and the dispersion by the inter quartile range. If preproc:out:m=fence , instead of the mean value the first and third quartiles are used as references and dispersion is givenby the interquartile range ( Tukey’s fences ). preproc:out:mdescription: reference value definition for outlier identification type: string alues, default: mean , median, fence remarks: Specifies definition of mean/fence and dispersion, see preproc:out:f for details. preproc:point windescription: window length to transform events to segments (in sec) type: float values, default: remarks:
The extraction of the feature sets glb *, rhy *, glob, loc is based on segments. For event tier input, segments areobtained by centering a window of this length on the time stamps. point win can also be individually set for each of thefeature sets loc, gnl f0, gnl en, rhy f0, rhy en (see section 10) by specifying preproc:myFeatureSet:point win . preproc:smooth:mtddescription: F0 smoothing method type: string values, default: sgolay , med remarks:
Savitzky-Golay or median filtering of f0 contour. Median yields stronger smoothing, Savitzky-Golay performsbetter in keeping local minima and maxima at their place. preproc:smooth:orddescription: polynomial order of smoothing method type: integer values, default: remarks: relevant for preproc:smooth:mtd=sgolay only. preproc:smooth:windescription: smoothing window length (in f0 sample indices) type: int values, default: remarks: The longer the smoothing window, the more smooth the f0 contours. preproc:stdescription:
Hertz to semitone conversion type: boolean values, default: If 1, transformed to semitones. augment:chunk:e reldescription: proportion of reference energy below which a pause is assumed type: float values, default: remarks: a pause is indicated, if the energy in the analysis window is below this factor times the energy in the longerreference window. augment:chunk:fbnddescription: assume pause at beginning and end of file type: boolean values, default: remarks: If set to 1, forced pause detection at file start and end. These pauses are subtracted from augment:chunk:n if set. augment:chunk:flt:btypedescription: filter type type: string values, default: low , high, band remarks:
Butterworth filter type to filter the signal for pause detection. Recommended: low . augment:chunk:flt:fdescription: filter cutoff frequencies (in Hz) type: float or list of floats values, default: 8000remarks: For augment:chunk:flt:btype=low, high a single cut-off frequency is expected; for band a 2-element list of lowerand upper cutoff frequency. augment:chunk:flt:orddescription: filter order type: int alues, default: remarks: Butterworth filter order. augment:chunk:l refdescription: reference window length for pause detection (in sec) type: int values, default: remarks: Energy in analysis window of length augment:chunk:l is compared against the energy within the reference window.Same midpoint as analysis window. augment:chunk:ldescription: length of the analysis window (in sec) type: float values, default: remarks: analysis window for which is to be decided, whether or not it is (part of) a pause. augment:chunk:margindescription: silence margin at chunk start and end (in sec) type: float values, default: remarks: chunks are extended by this amount on both sides. augment:chunk:min chunk ldescription: minimum chunk length (in sec) type: boolean values, default: remarks: shorter chunks are merged augment:chunk:min pau ldescription: minimum pause length (in sec) type: boolean values, default: remarks: shorter pauses are ignored. augment:chunk:ndescription: pre-specified number of pauses [sic!] type: boolean values, default: -1 remarks: In this implementation chunks are defined as interpausal units and thus depend on pause detection. If set to -1,no pre-specified pause number. augment:syl:d mindescription: minimum distance between subsequent syllable nuclei (in sec) type: float values, default: remarks:
If 2 detected nuclei are closer than this distance the weaker candidate is discarded. augment:syl:e mindescription: minimum energy factor relative to entire file type: boolean values, default: remarks:
For a syllable nucleus the RMS energy in the analysis window must be above this factor times the energy in theentire file. augment:syl:e reldescription: minimum energy factor relative to reference window type: boolean values, default: remarks:
For a syllable nucleus the RMS energy in the analysis window must be above this factor times the energy in thereference window. augment:syl:flt:btypedescription: filter type type: string values, default: low, high, bandremarks:
Butterworth filter type to filter the signal for syllable nucleus detection. Recommended: band . ugment:syl:flt:fdescription: filter cutoff frequencies (in Hz) type: float or list of floats values, default: [ 200 4000] remarks: For augment:syl:flt:btype=low, high a single cut-off frequency is expected; for band a 2-element list of lowerand upper cutoff frequency. augment:syl:flt:orddescription: filter order type: int values, default: remarks: Butterworth filter order. augment:syl:l refdescription: reference window length for syllable detection (in sec) type: boolean values, default: remarks:
Energy in analysis window with same midpoint is compared against the energy within the reference window. augment:syl:ldescription: analysis window length (in sec) type: boolean values, default: remarks: length of window within energy is calculated. Same midpoint as reference window. augment:glob:cntr mtddescription: how to define cluster centroids type: string values, default: seed prct , seed kmeans, split remarks: seed * : initialize clustering by bootstrapped seed centroids. seed prct: single-pass clustering of the boundarycandidates by their distance to these centroids. Distance values to the no-boundary seed above a specified percentile augment:glob:prct indicate boundaries. seed kmeans: kmeans clustering initialized by the seed centroids (gives a morebalanced amount of boundary/no boundary cases than seed prct ). split : centroids are derived by splitting each column in thefeature matrix at the percentile augment:glob:prct ; the boundary centroid is defined by the median of the values above thesplitpoint, the no-boundary centroid by the median of the values below; items are then assigned to the nearest centroid in asingle pass. Depending on augment:glob:unit clustering is either carried out separately within each file and each channel,or over the entire dataset. Fallback: if cluster centroids cannot be bootstrapped, this parameter’s value is changed to split . augment:glob:heuristicsdescription: heuristic macro settings type: string values, default: ORT remarks:
Only
ORT supported.
ORT assumes a word segmentation tier for prosodic boundary prediction and rejectsboundaries after too short and thus probably function words ( < . s ). Not necessarily meaningful for any language. augment:glob:measuredescription: feature values, or deltas type: string values, default: abs , delta, abs+delta remarks: Which values v to put in the feature matrix ( i =time index): abs : feature values v [ i ]; delta : feature deltas v [ i ] − v [ i − abs+delta : both augment:glob:min ldescription: minimum inter-boundary distance (in sec) type: float values, default: remarks: If 2 detected boundaries are closer than this value, only the stronger one will be kept. This distance is also usedin bootstrapping boundary and no-boundary centroids as described in section 7.3. augment:glob:prctdescription: percentile of cluster splitpoint type: float ]0 100[ values, default: emarks: Splitpoint definition for clustering in terms of a percentile value. The higher the fewer boundaries will be detected.For augment:glob:cntr mtd=split the percentile refers to the feature values, for augment:glob:cntr mtd=seed prct , it refersto the distance to the no-boundary seed centroid. augment:glob:unitdescription: derive centroids separately for each file or over entire data set type: string values, default: batch , file remarks: batch mode recommended for corpora containing lots of short recordings, within which centroids cannot reliablybe extracted. augment:glob:wgt:myBndFeatset+:myRegister+:myFeat+description: user defined feature weights type: float values, default: remarks: create one config branch for each selected boundary feature and assign a weight. Only boundary features supported.The weight becomes a dummy in case of augment:glob:wgt mtd is not user . However, the branches must be specifiedin order to mark which features to be used for boundary prediction. myBndFeatset ∈ { std, win, trend } , myRegister ∈{ bl, ml, tl, rng } , myFeat ∈ { r, rms, rms pre, . . . } . The branches must correspond to branches in the sub-dictionary copa:data:myFileIdx:myChannelIdx:bnd:myTierNameIndex:myBoundaryIndex (see section 12.3).E.g. copa:data:myFileIdx:myChannelIdx:bnd:myTierNameIndex:myBoundaryIndex:win:bl:r is addressed by augment:glob:wgt:win:bl:r . augment:glob:wgt:phodescription: use/weight normalized vowel length as feature type: float values, default: remarks: only compliant with augment:glob:unit=batch . augment:glob:wgt mtddescription: feature weighting method type: string values, default: silhouette , correlation, user remarks: For silhouette an initial clustering is carried out, and for each feature its weight is then defined by its cluster-separating power. For correlation weights are defined for each feature by its correlation to the feature vector medians. For user , the weights specified in the augment:glob:wgt:myBndFeatset+:myRegister+:myFeat+ branches are taken. augment:loc:acc selectdescription: which syllable within a segment to select type: string values, default: max , left, right remarks:
Choose the accent position among all time stamps in augment:loc:tier acc that are in the same segment of fsys:augment:loc:tier ag . max: the most prominent one; left, right : accent first/last syllable, which might be useful if fsys:augment:loc:tier ag contains word segments, and word stress is fixed. augment:loc:ag selectdescription: which segments to select for accentuation type: string values, default: max , all remarks: all: assign an accent to each segment in fsys:augment:loc:tier ag ; max: assign accents to the most prominentsegments only. augment:loc:cntr mtddescription: how to define cluster centroids type: string values, default: seed prct , seed kmeans, split remarks: seed * : initialize clustering by bootstrapped seed centroids. seed prct: single-pass clustering of the accent candi-dates by their distance to these centroids. Distance values to the no-accent seed above a specified percentile augment:loc:prct indicate accents. seed kmeans: kmeans clustering initialized by the seed centroids (gives a more balanced amount of accent/no-accent cases than seed prct ). split : centroids are derived by splitting each column in the feature matrix at the percentile augment:glob:prct ; the accent centroid is defined by the median of the values above the splitpoint, the no-accent cen-troid by the median of the values below; items are then assigned to the nearest centroid in a single pass. Depending on augment:loc:unit clustering is carried out either separately within each file and each channel, or over the entire dataset.Fallback: if cluster centroids cannot be bootstrapped, this parameter’s value is changed to split . augment:loc:heuristics escription: heuristic macro settings type: string values, default: ORT remarks: only
ORT supported.
ORT assumes a word segmentation tier for accent extraction. Short words (see augment:loc:max l na )will be treated as non-accent seeds, long words (see augment:loc:min l a ) as accent seeds. augment:loc:max l nadescription: maximum length of definitely non-accented words (in sec) type: float values, default: remarks: from words below that length the non-accented seed centroid is derived augment:loc:measuredescription: feature values, or deltas type: string values, default: abs , delta, abs+delta remarks:
Which values v to put in the feature matrix ( i =time index): abs : feature values v [ i ]; delta : feature deltas v [ i ] − v [ i − abs+delta : both augment:loc:min l adescription: minimum length of definitely accented words (in sec) type: float values, default: remarks: from words above that length the accented seed centroid is derived augment:loc:min ldescription: minimum inter-accent distance (in sec) type: float values, default: remarks: If 2 detected accents are closer than this value, only the more prominent one will be kept. augment:loc:prctdescription: percentile of cluster splitpoint type: float ]0 100[ values, default: remarks: Splitpoint definition for clustering in terms of a percentile value. The higher the fewer accents will be detected.For augment:loc:cntr mtd=split the percentile refers to the feature values, for augment:loc:cntr mtd=seed prct , it refersto the distance to the no-accent seed centroid. augment:loc:unitdescription: derive centroids separately for each file or over entire data set type: string values, default: batch , file remarks: batch mode recommended for corpora containing lots of short recordings, within which centroids cannot reliablybe extracted. augment:loc:wgt:myFeatset+:. . .description: user defined feature weights type: float values, default: remarks: create one config branch for each selected prominence feature and assign a weight. myFeatset ∈ { acc, gst, gnl f0,gnl en } . The weight becomes a dummy in case of augment:loc:wgt mtd is not user . However, the branches must be specified inorder to mark which features to be used for accent prediction. The branches must correspond to branches in the sub-dictionary copa:data:myFileIdx:myChannelIdx:loc (see section 12.3). E.g. copa:data:myFileIdx:myChannelIdx:loc:gst:bl:rms is addressed by augment:loc:wgt:gst:bl:rms . If the value at this branch is a list (e.g. the polynomial coefficients in ...augment:loc:wgt:acc:c ) the weight can either be a scalar to weight all list elements equally or a list of same length asthe value list, to individually weight each element. (Only) for polynomial coefficients absolute values are taken. augment:loc:wgt:phodescription: use/weight normalized vowel length as feature type: float values, default: remarks: only compliant with augment:loc:unit=batch . augment:loc:wgt mtddescription: feature weighting method type: string values, default: silhouette , correlation, user remarks: For silhouette an initial clustering is carried out, and for each feature its weight is then defined by its cluster-separating power. For correlation weights are defined for each feature by its correlation to the feature vector medians. For user , the weights specified in the augment:loc:wgt:...+ branches are taken. styl:glob:decl windescription: window length for median calculation (in sec) type: float values, default: remarks: Within each window a median each for the base-, mid-, and topline is derived. styl:glob:nrm:mtddescription: time normalization method type: string values, default: minmaxremarks: for time normalization in global segment. Currently only minmax supported. styl:glob:nrm:rngdescription: normalized time range type: list of floats values, default: [0 , remarks: normalized time of segment start and endpoint styl:glob:prct:bldescription: percentile below which the baseline input medians are calculated type: float ]0 100[ values, default: remarks: A sequence of lower range medians is calculated along the f0 contour. The baseline is given by linear regressionthrough this sequence. styl:glob:prct:tldescription: percentile, above which the topline input medians are calculated type: float ]0 100[ values, default: remarks: A sequence of upper range medians is calculated along the f0 contour. The topline is given by linear regressionthrough this sequence. styl:loc:nrm:mtddescription: time normalization method type: string values, default: minmaxremarks: for time normalization in the local segment. Currently only minmax supported. styl:loc:nrm:rngdescription: normalized time range type: list of floats values, default: [ − , remarks: normalized time of segment start and endpoint. [ − ,
1] is recommended to center the polynomial around 0. styl:loc:orddescription: polynomial order type: int values, default: remarks: Each coefficient will get its output column in the exported tables, thus the table size depends on this order. styl:registerdescription: register definition for residual calculation type: string values, default: ml , bl, tl, rng, none remarks: how to remove the global component from the f0 contour to get the residual the local contour is calculated on; bl,ml, tl : base-, mid- or topline subtraction; rng pointwise [0 1] normalization of the f0 contour with respect to the base- andtopline. Recommended: ml, rng . rng normalizes for range declination (lower f0 amplitudes at the end of prosodic phrases). styl:bnd:cross chunkdescription: stylization windows across chunks type: boolean values, default: remarks: if set to 1, the windows defined by styl:bnd:win can cross chunks, else they are limited by the current chunk’sboundaries. If set to 1 for do bnd trend , lines are fitted from file start and till file end. Else, they are limited by the currentchunk’s boundaries. styl:bnd:decl windescription: window length for median calculation (in sec) type: float values, default: remarks: Within each window a median each for the base-, mid- and topline is derived. styl:bnd:nrm:mtddescription: time normalization method type: string values, default: minmaxremarks:
Only minmax supported. styl:bnd:nrm:rngdescription: normalized time range type: list of floats values, default: [0 , remarks: to allow for comparisons independent of segment length, time is normalized to this range. styl:bnd:prct:bldescription: percentile below which the baseline input medians are calculated type: float ]0 100[ values, default: remarks: A sequence of lower range medians is calculated along the f0 contour. The baseline is given by linear regressionthrough this sequence. styl:bnd:prct:tldescription: percentile, above which the topline input medians are calculated type: float ]0 100[ values, default: remarks: A sequence of upper range medians is calculated along the f0 contour. The topline is given by linear regressionthrough this sequence. styl:bnd:residualdescription: use f0 residual type: boolean values, default: remarks: measure discontinuity on (preprocessed) f0 contour or on its residual after register subtraction. styl:bnd:windescription: window length (in sec) type: float values, default: remarks: stylization window length for navigate:do styl bnd win styl:gnl:sb:alphadescription: pre-emphasis factor or lower boundary frequency type: float values, default: remarks: For pre-emphasis in the time domain for spectral balance calculation. 0 ≤ α ≤
1: factor in s (cid:48) [ i ] = s [ i ] − α · s [ i − α >
1: lower boundary frequency from which pre-emphasis should start. Will be internally converted to the factor in theabove formula. styl:gnl:sb:btypedescription: filter type to restrict frequency window type: string values, default: none , band, high, low emarks: Restrict frequency window for spectral balance calculation styl:gnl:sb:domaindescription: domain for spectral balance calculation type: string values, default: time , freq remarks:
Specifies whether spectral balance should be calculated in the time (’time’) or frequency (’freq’) domain. styl:gnl:sb:fdescription: filter cutoff frequencies (in Hz) for spectral balance calculation type: float or list of floats values, default: -1 remarks: Specifies the upper cutoff frequency for a low-pass filter, the lower cutoff frequency for a high-pass filter, or bothfor a bandpass filter. See styl:gnl:sb:btype . styl:gnl:sb:windescription: length (in sec) of central analysis window in analysed segment type: float values, default: -1 remarks: To be set if coarticulatory influence on spectral balance calculation should be removed. If -1 the entire segmentis used. styl:gnl:windescription: window length to determine initial and final part of contour type: float values, default: remarks:
Length of window (in sec) for initial and final part of f0 or energy contour to calculate mean f0 pr energy quotientsof these parts and the entire contour. styl:gnl en:alphadescription: pre-emphasis factor type: float values, default: remarks:
Pre-emphasis is carried out in the time domain as follows: s (cid:48) [ i ] = s [ i ] − α · s [ i − DEPRECATED! NOWSPECIFIED BY styl:gnl en:sb:alphastyl:gnl en:stsdescription: step size (in sec) type: float values, default: remarks:
Stepsize by which energy window is shifted. styl:gnl en:winparamdescription: window parameter type: string or int values, default: – remarks: Depends on styl:gnl en:wintyp ; as required by scipy.signal.get window() . styl:gnl en:wintypdescription: window type type: string values, default: hamming , kaiser, . . . remarks: All window types that are supported by scipy.signal.get window() can be used. styl:gnl en:windescription: window length (in sec) type: float values, default: remarks:
Energy is calculated in terms of RMSD within windows of this length. styl:rhy f0:rhy:lbdescription: Lower frequency boundary of DCT coefficients (in Hz) type: boolean values, default: remarks: Can be raised if low-frequency events should be ignored. styl:rhy f0:rhy:nsmdescription: number of spectral moments type: int values, default: remarks: How many spectral moments to calculate from DCT analysis of f0 contour. styl:rhy f0:rhy:rmodescription: remove DCT offset type: boolean values, default: remarks: Remove first DCT coefficient. styl:rhy f0:rhy:ubdescription: upper frequency boundary of DCT coefficients (in Hz) type: float values, default: remarks: Upper boundary of analyzed DCT spectrum (higher-frequency events assumed not to be influential for prosody). styl:rhy f0:rhy:wgt:rbdescription: rate band (in Hz) type: float values, default: remarks: Frequency band around event frequency, within which the influence of the event in terms of absolute DCTcoefficient values is integrated. E.g. for an event rate of 4 Hz and a rate band of 1 Hz the absolute values of the DCTcoefficients between 3 and 5 Hz are summed up. styl:rhy f0:rhy:winparamdescription: window parameter type: string or int values, default: remarks: depends on styl:gnl en:wintyp ; as required by scipy.signal.get window() . styl:rhy f0:rhy:wintypdescription: window type for DCT analysis type: string values, default: hamming, kaiser , . . . remarks: All window types that are supported by scipy.signal.get window() can be used. styl:rhy en:rhy:lbdescription: lower frequency boundary of DCT coefficients (in Hz) type: float values, default: remarks: Can be raised if low-frequency events should be ignored. styl:rhy en:rhy:nsmdescription: number of spectral moments type: int values, default: remarks: How many spectral moments to be calculated from DCT analysis of energy contour. styl:rhy en:rhy:rmodescription: remove DCT offset type: boolean values, default: remarks: Remove first DCT coefficient. styl:rhy en:rhy:ub escription: upper frequency boundary of DCT coefficients (in Hz) type: float values, default: remarks: Upper boundary of analyzed DCT spectrum (higher-frequency events assumed not to be influential for prosody). styl:rhy en:rhy:wgt:rbdescription: rate band (in Hz) type: float values, default: remarks: Frequency band around event frequency, within which the influence of the event in terms of absolute DCTcoefficient values is integrated. E.g. for an event rate of 4 Hz and a rate band of 1 Hz the absolute values of the DCTcoefficients between 3 and 5 Hz are summed up. styl:rhy en:rhy:winparamdescription:
DCT window parameter type: string or int values, default: remarks: Depends on styl:rhy en:wintyp ; as required by scipy.signal.get window() . styl:rhy en:rhy:wintypdescription: window type for DCT type: string values, default: hamming, kaiser , . . . remarks: All window types that are supported by scipy.signal.get window() . styl:rhy en:sig:scaledescription: scale signal to maximum amplitude 1 type: boolean values, default: remarks: if set to 1, the signal is scaled to its maximum amplitude. This is suggested especially if signals of differentrecording conditions are to be compared. styl:rhy en:sig:stsdescription: step size (in sec) type: float values, default: remarks: Step size by which the energy window is shifted. styl:rhy en:sig:winparamdescription: window parameter type: string or int values, default: – remarks: Depends on styl:rhy en:wintyp ; as required by scipy.signal.get window() . styl:rhy en:sig:wintypdescription: window type of energy calculation type: string values, default: hamming , kaiser, . . . remarks: all window types that are supported by scipy.signal.get window() . styl:rhy en:sig:windescription: window length (in sec) type: float values, default: remarks: Energy is calculated in terms of RMSD within windows of this length. styl:voice:jit:fac maxdescription: maximally allowed quotient of adjacent periods type: float values, default: remarks: corresponds to Praat parameter Maximum period factor. styl:voice:jit:t maxdescription: maximum period length in sec type: float alues, default: remarks: corresponds to Praat parameter Period ceiling. styl:voice:jit:t mindescription: minimum period length in sec type: float values, default: remarks: corresponds to Praat parameter Period floor. clst:glob:estimate bandwidth:n samplesdescription: number of samples to estimate bandwidth type: integer values, default: remarks: Computationally expensive, high numbers will require long processing time. clst:glob:estimate bandwidth:quantiledescription: estimate bandwidth quantile parameter type: float values, default: remarks:
Lower values result in higher clusters numbers. clst:glob:kMeans:initdescription: initialization method of kmeans type: string values, default: meanShift remarks:
All methods that are supported by kMeans() can be used. For meanShift the number of clusters does not need tobe specified. clst:glob:kMeans:max iterdescription: kMeans: maximum number of iterations type: int values, default: remarks:
When to stop cluster re-adjustment, if not yet converged. clst:glob:kMeans:n clusterdescription: kMeans: predefined number of contour classes type: int values, default: remarks: Irrelevant, if kmeans centroids are initialized by clst:glob:kMeans:init=meanShift . clst:glob:kMeans:n initdescription: number of initialization trials type: int values, default: remarks: kMeans is repeated with different cluster initializations from which the best clustering result is kept. clst:glob:meanShift:bandwidthdescription: bandwidth parameter for meanShift cluster center initialization type: float values, default: remarks: clst:glob:meanShift:bin seedingdescription: bin seeding type: boolean values, default: remarks: parameter for meanShift clustering clst:glob:meanShift:min bin freqdescription: minimum number of items in each bin type: int values, default: remarks: Parameter for meanShift clustering. clst:glob:mtddescription: clustering method type: string values, default: meanShift , kmeans remarks:
No initial cluster number specification needed for meanShift . clst:loc:estimate bandwidth:n samplesdescription: number of samples to estimate bandwidth type: int values, default: remarks: Computationally expensive, high numbers will require long processing time. clst:loc:estimate bandwidth:quantiledescription: estimate bandwidth quantile parameter type: float values, default: remarks:
Lower values result in higher clusters numbers. clst:loc:kMeans:initdescription: initialization method of kmeans type: string values, default: meanShift remarks:
All methods that are supported by kMeans() can be used. For meanShift the number of clusters does not need tobe specified. clst:loc:kMeans:max iterdescription: kMeans: maximum number of iterations type: int values, default: remarks:
When to stop cluster re-adjustment, if not yet converged. clst:loc:kMeans:n clusterdescription: kMeans: predefined number of contour classes type: int values, default: remarks: Irrelevant, if kmeans centroids are initialized by clst:glob:kMeans:init=meanShift . clst:loc:kMeans:n initdescription: number of initialization trials type: int values, default: remarks: kMeans is repeated with different cluster initializations from which the best clustering result is kept. clst:loc:meanShift:bandwidthdescription: bandwidth parameter for meanShift cluster center initialization type: boolean values, default: remarks:clst:loc:meanShift:bin seedingdescription: bin seeding type: boolean values, default: remarks: parameter for meanShift clustering clst:loc:meanShift:min bin freqdescription: minimum number of items in each bin type: int values, default: remarks: Parameter for meanShift clustering. clst:loc:mtddescription: clustering method type: string values, default: meanShift , kmeans remarks:
No initial cluster number specification needed for meanShift . plot:browse:savedescription: save plots according to fsys:pic type: boolean values, default: remarks: Store png files in fsys:pic:dir with file name stem fsys:pic:stm . plot:browse:single plot:activedescription: switch on single plot mode type: boolean values, default: remarks: switch on single plot mode if only one segment specified by file index, channel index, and segment index is to beplotted plot:browse:single plot:channel idescription: channel index of selected segment type: integer values, default: remarks: channel index of selected segment to be plotted plot:browse:single plot:file idescription: file index of selected segment type: integer values, default: remarks: file index of selected segment to be plotted plot:browse:single plot:segment idescription: segment index of selected segment type: integer values, default: remarks: segment index of selected segment to be plotted plot:browse:timedescription: when to do plotting type: string values, default: online, finalremarks: online: plot at stylization stage for immediate check of appropriateness of configurations. final: plot segment-wisefrom the finally stored results. Click on plot: next; press return : quit. plot:browse:type:clst:contoursdescription: plot global and local intonation class centroids type: boolean values, default: remarks:plot:browse:type:complex:gestaltdescription: plot local contour Gestalt stylization type: boolean values, default: remarks:plot:browse:type:complex:superposdescription: plot global and local contour superposition type: boolean values, default: remarks:plot:browse:type:glob:decldescription: plot global contour register stylization type: boolean values, default: remarks:plot:browse:type:loc:accdescription: plot local contour polynomial stylization type: boolean values, default: emarks:plot:browse:type:loc:decldescription: plot local contour register stylization type: boolean values, default: remarks:plot:browse:type:complex:bnddescription: plot boundary stylization type: boolean values, default: remarks:plot:browse:type:complex:bnd windescription: plot boundary stylization (fixed window) type: boolean values, default: remarks:plot:browse:type:complex:bnd trenddescription: plot boundary stylization (trend) type: boolean values, default: remarks:plot:browse:type:rhy en:rhydescription: plot influence of rate tier events on DCT of energy contour in analysis tier type: boolean values, default: remarks:plot:browse:type:rhy f0:rhydescription: plot influence of rate tier events on DCT of f0 contour in analysis tier type: boolean values, default: remarks:plot:browse:verbosedescription: display file, channel and segment index for each plot type: boolean values, default: remarks: written to STDOUT plot:colordescription: plot in color (1) or black-white (0) type: boolean values, default: remarks: plot:grp:groupingdescription: list of selected grouping variables from fsys:grp:lab type: list of strings values, default: [ ] remarks: For each combination of grouping factor levels the stylization plot based on the respective parameter mean vectoris stored as a png file in fsys:pic:dir with file name stem fsys:pic:stm and an infix expressing the respective factor levelcombination. plot:grp:savedescription: save plots according to fsys:pic type: boolean values, default: remarks: Store png files in fsys:pic:dir with file name stem fsys:pic:stm . One file per group. plot:grp:type:glob:decldescription: plot global contour declination centroid for each group type: boolean alues, default: remarks: Plots are not displayed but saved as png files to fsys:pic . plot:grp:type:loc:accdescription: plot local contour polynomial shape centroid for each group type: boolean values, default: remarks: Plots are not displayed but saved as png files to fsys:pic . plot:grp:type:loc:decldescription: plot local contour declination centroid for each group type: boolean values, default: remarks: Plots are not displayed but saved as png files to fsys:pic .
12 Output If fsys:export:csv is set to 1, for each feature set selected by the navigate:* options a csv table file is generatedin config:fsys:export:dir . The file name is the underscore-concatenation of config:fsys:export:stm and thefeature set name. Extension is csv . Columns are separated by a comma. The column titles correspond to the featurenames given in the tables in section 10, and each row corresponds to one segment or event for which the features wereextracted. These feature vectors are additionally linked to the data origin by the following columns: name description ci channel index (starting with 0)fi file index (starting with 0)ii item (segment or event) index (starting with 0)stm annotation file name stemt on time onsett off time offset (same as t on for events)tier tier nameInter-tier relations are provided by the following columns name description is init initial position in a global segmentis fin final position in a global segmentis init chunk initial position in a chunkis fin chunk final position in a chunkAll columns contain the values yes and no . Medial position is simply indicated by is init=no and is fin=no . Thesecolumns can be used for data subsetting. As an example, let’s assume that boundary features were extracted betweenaccent groups, and the global segments correspond to intonation phrases. Then is fin serves to hold apart IP-final andnon-final boundaries. Equivalently, phrase-final and non-final accents can be held apart. is init chunk and is fin chunk work the same on the chunk level. If no chunk tier is specified, the entire channel is considered to be a single chunk.If no global segment tier is specified, all is init and is fin are set to no .Finally, if specified by the user, an arbitrary number of grouping columns will be added to the tables that arederived from the filenames. Their names are prefixed by grp . See the grouping options fsys:grp:* in section 11.3for details. Each table file comes along with an R code template file with the same name and the extension .R to readthis table by the R software. By setting fsys:export:summary to 1 the table output described in section 12.1 can be summarized per file andanalysis tier. Summarization for continuous-valued features is done in terms of their mean, median, standard devia-tion, and inter-quartile range. For categorical features as intonation contour classes the unigram entropy is calculated.The resulting table is written to the directory fsys:export:dir with the file stem fsys:export:stm plus the suf-fix summary and the extension csv . Columns are separated by a comma. There is one row of statistic values peranalysed tier in a file. Each continous-valued feature within each analysis tier is represented by four columns. Forfeatures of the sets glob and loc for which there is only one analysis tier the column names follow the pattern feature-Set featureName statisticMeasure . The suffixes representing the statistic measurements are listed in the table rightbelow. For features of all other sets with potentially more than one analysis tier the column names are built like this: featureSet analysisTierName featureName statisticMeasure . Categorical features are represented by one column eachwith the same name building schema. 53uffix meaning feature typem arit. mean continuousmed median continuoussd standard deviation continuousiqr inter-quartile range continuoush unigram entropy categoricalFile level groupings, i.e. the grp * columns of the csv tables decribed in section 12.1, are copied to the summarytable. File and channel index are given in the columns fi and ci , respectively, the file stem is written to column stm .Next to the csv file an R code template file is generated with the same name and the extension .R to read thesummary table by the R software. The pickle file which is outputted in config:fsys:export:dir contains a nested dictionary copa for the sake of furtherprocessing within other Python projects.On the top level copa can be subdivided into the sub-dictionaries • config : configurations underlying the current analysis • data : extracted features in a structured way described below • clst : contour clustering results • val : validation metrics for stylization and clusteringIn the subsequent paragraphs all branches through the copasul nested output dictionary are described. Thefollowing index key conventions will be used:fi file indexci channel indexti tier indexii item (segment or event) indexAll indices start with 0, thus channel 1 is represented by index 0, etc. Levels in the dictionary are separated bycolons. To give an example for the data sub-dictionary how to translate this notation into Python code: data:fi:ci:bnd:ti:ii:lab – with index values: data:0:0:bnd:0:0:lab refers to: file 1 : channel 1 : boundary feature set : first tier, for which this set was extracted : segment 1 in thistier : label of this segment. In Python this label can be accessed by: copa[’data’][0][0][’bnd’][0][0][’lab’] Variables to be replaced by annotation-dependent tiernames etc. are marked by my* . As an example data:0:0:rhy f0:0:0:rate:myTierName* is expanded to one branch for each tier in fsys:rhy f0:tier rate referring to channel 1 in fsys:channel (seesection 11). Let fsys:rhy f0:tier rate =[”syl 1”, “syl 2”], of which only the former refers to channel 1, i.e. fsys:channel:syl 1 =1. Then the corresponding rate value of items in tier syl 1 within file 1, channel 1, segment 1,and analysis tier 1 ( fi=ci=ti=ii=0 ) is addressed in Python by: copa[’data’][0][0][’rhy f0’][0][0][’rate’][’syl 1’]
This sub-dictionary is accessed by copa[’config’] and simply contains a copy of the user-defined and default con-figurations which are introduced in section 11.
Is accessed by copa[’data’] and can further be subdivided into dictionaries for file information, f0 preprocessingoutput, chunk segmentation, and feature sets. Time information is always given in seconds and can be accessed bythe keys t, tn, to, tt . to always contains the original time values derived from the annotations, while t, tn , and tt values are rounded to the second decimal place to be in sync with f0 values that are sampled at 100 Hz. Thesemantics of t, tn, to, tt depends on the respective sub-dictionary. In the following all copa[’data’] brancheswill be described in alphabetical order. If a feature variable at the end of a branch is listed in one of the tables insection 10, here only the feature name is given, which can be looked up in these tables.54 oundary features Boundary features can be extracted within an arbitrary number of tiers. These tiers areindexed by the variable ti . For segment tiers the index ii refers to the segment preceding the boundary. data:fi:ci:bnd:ti:ii:decl:bl:cdescription: F0 baseline coefficients (descending order) type: data:fi:ci:bnd:ti:ii:decl:bl:xdescription: baseline stylization input type: list of floats data:fi:ci:bnd:ti:ii:decl:bl:ydescription: stylized baseline values type: list of floats data:fi:ci:bnd:ti:ii:decl:errdescription:
True if top- and baseline cross type: boolean data:fi:ci:bnd:ti:ii:decl:ml:cdescription:
F0 midline coefficients (descending order) type: data:fi:ci:bnd:ti:ii:decl:ml:xdescription: midline stylization input type: list of floats data:fi:ci:bnd:ti:ii:decl:ml:ydescription: stylized midline values type: list of floats data:fi:ci:bnd:ti:ii:decl:rng:cdescription:
F0 range coefficients (descending order) type: data:fi:ci:bnd:ti:ii:decl:rng:xdescription: range stylization input type: list of floats data:fi:ci:bnd:ti:ii:decl:rng:ydescription: stylized range values type: list of floats data:fi:ci:bnd:ti:ii:decl:tl:cdescription:
F0 topline coefficients (descending order) type: data:fi:ci:bnd:ti:ii:decl:tl:xdescription: topline stylization input type: list of floats data:fi:ci:bnd:ti:ii:decl:tl:ydescription: stylized topline values type: list of floats data:fi:ci:bnd:ti:ii:decl:tndescription: normalized time values (same length as bl | ml | rng | tl:y ) type: list of floats data:fi:ci:bnd:ti:ii:labdescription: lab type: string data:fi:ci:bnd:ti:ii:std:bl:aicIdescription: std bl aicI type: float ata:fi:ci:bnd:ti:ii:std:bl:aicI postdescription: std bl aicI post type: float data:fi:ci:bnd:ti:ii:std:bl:aicI predescription: std bl aicI pre type: float data:fi:ci:bnd:ti:ii:std:bl:corrDdescription: std bl corrD type: float data:fi:ci:bnd:ti:ii:std:bl:corrD postdescription: std bl corrD post type: float data:fi:ci:bnd:ti:ii:std:bl:corrD predescription: std bl corrD pre type: float data:fi:ci:bnd:ti:ii:std:bl:d mdescription: std bl d m type: float data:fi:ci:bnd:ti:ii:std:bl:d odescription: std bl d o type: float data:fi:ci:bnd:ti:ii:std:bl:rdescription: std bl r type: float data:fi:ci:bnd:ti:ii:std:bl:rmsdescription: std bl rms type: float data:fi:ci:bnd:ti:ii:std:bl:rms postdescription: std bl rms post type: float data:fi:ci:bnd:ti:ii:std:bl:rms predescription: std bl rms pre type: float data:fi:ci:bnd:ti:ii:std:bl:rmsRdescription: std bl rmsR type: float data:fi:ci:bnd:ti:ii:std:bl:rmsR postdescription: std bl rmsR post type: float data:fi:ci:bnd:ti:ii:std:bl:rmsR predescription: std bl rmsR pre type: float data:fi:ci:bnd:ti:ii:std:bl:sd postdescription: std bl sd post type: float data:fi:ci:bnd:ti:ii:std:bl:sd predescription: std bl sd pre type: float data:fi:ci:bnd:ti:ii:std:bl:sd prepost escription: std bl sd prepost type: float data:fi:ci:bnd:ti:ii:std:ml:aicIdescription: std ml aicI type: float data:fi:ci:bnd:ti:ii:std:ml:aicI postdescription: std ml aicI post type: float data:fi:ci:bnd:ti:ii:std:ml:aicI predescription: std ml aicI pre type: float data:fi:ci:bnd:ti:ii:std:ml:corrDdescription: std ml corrD type: float data:fi:ci:bnd:ti:ii:std:ml:corrD postdescription: std ml corrD post type: float data:fi:ci:bnd:ti:ii:std:ml:corrD predescription: std ml corrD pre type: float data:fi:ci:bnd:ti:ii:std:ml:d mdescription: std ml d m type: float data:fi:ci:bnd:ti:ii:std:ml:d odescription: std ml d o type: float data:fi:ci:bnd:ti:ii:std:ml:rdescription: std ml r type: float data:fi:ci:bnd:ti:ii:std:ml:rmsdescription: std ml rms type: float data:fi:ci:bnd:ti:ii:std:ml:rms postdescription: std ml rms post type: float data:fi:ci:bnd:ti:ii:std:ml:rms predescription: std ml rms pre type: float data:fi:ci:bnd:ti:ii:std:ml:rmsRdescription: std ml rmsR type: float data:fi:ci:bnd:ti:ii:std:ml:rmsR postdescription: std ml rmsR post type: float data:fi:ci:bnd:ti:ii:std:ml:rmsR predescription: std ml rmsR pre type: float data:fi:ci:bnd:ti:ii:std:ml:sd postdescription: std ml sd post type: float ata:fi:ci:bnd:ti:ii:std:ml:sd predescription: std ml sd pre type: float data:fi:ci:bnd:ti:ii:std:ml:sd prepostdescription: std ml sd prepost type: float data:fi:ci:bnd:ti:ii:std:pdescription: p type: float data:fi:ci:bnd:ti:ii:std:rng:aicIdescription: std rng aicI type: float data:fi:ci:bnd:ti:ii:std:rng:aicI postdescription: std rng aicI post type: float data:fi:ci:bnd:ti:ii:std:rng:aicI predescription: std rng aicI pre type: float data:fi:ci:bnd:ti:ii:std:rng:corrDdescription: std rng corrD type: float data:fi:ci:bnd:ti:ii:std:rng:corrD postdescription: std rng corrD post type: float data:fi:ci:bnd:ti:ii:std:rng:corrD predescription: std rng corrD pre type: float data:fi:ci:bnd:ti:ii:std:rng:d mdescription: std rng d m type: float data:fi:ci:bnd:ti:ii:std:rng:d odescription: std rng d o type: float data:fi:ci:bnd:ti:ii:std:rng:rdescription: std rng r type: float data:fi:ci:bnd:ti:ii:std:rng:rmsdescription: std rng rms type: float data:fi:ci:bnd:ti:ii:std:rng:rms postdescription: std rng rms post type: float data:fi:ci:bnd:ti:ii:std:rng:rms predescription: std rng rms pre type: float data:fi:ci:bnd:ti:ii:std:rng:rmsRdescription: std rng rmsR type: float data:fi:ci:bnd:ti:ii:std:rng:rmsR post escription: std rng rmsR post type: float data:fi:ci:bnd:ti:ii:std:rng:rmsR predescription: std rng rmsR pre type: float data:fi:ci:bnd:ti:ii:std:rng:sd postdescription: std rng sd post type: float data:fi:ci:bnd:ti:ii:std:rng:sd predescription: std rng sd pre type: float data:fi:ci:bnd:ti:ii:std:rng:sd prepostdescription: std rng sd prepost type: float data:fi:ci:bnd:ti:ii:std:tl:aicIdescription: std tl aicI type: float data:fi:ci:bnd:ti:ii:std:tl:aicI postdescription: std tl aicI post type: float data:fi:ci:bnd:ti:ii:std:tl:aicI predescription: std tl aicI pre type: float data:fi:ci:bnd:ti:ii:std:tl:corrDdescription: std bl corrD type: float data:fi:ci:bnd:ti:ii:std:tl:corrD postdescription: std tl corrD post type: float data:fi:ci:bnd:ti:ii:std:tl:corrD predescription: std tl corrD pre type: float data:fi:ci:bnd:ti:ii:std:tl:d mdescription: std tl d m type: float data:fi:ci:bnd:ti:ii:std:tl:d odescription: std tl d o type: float data:fi:ci:bnd:ti:ii:std:tl:rdescription: std tl r type: float data:fi:ci:bnd:ti:ii:std:tl:rmsdescription: std tl rms type: float data:fi:ci:bnd:ti:ii:std:tl:rms postdescription: std tl rms post type: float data:fi:ci:bnd:ti:ii:std:tl:rms predescription: std tl rms pre type: float ata:fi:ci:bnd:ti:ii:std:tl:rmsRdescription: std tl rmsR type: float data:fi:ci:bnd:ti:ii:std:tl:rmsR postdescription: std tl rmsR post type: float data:fi:ci:bnd:ti:ii:std:tl:rmsR predescription: std tl rmsR pre type: float data:fi:ci:bnd:ti:ii:std:tl:sd postdescription: std tl sd post type: float data:fi:ci:bnd:ti:ii:std:tl:sd predescription: std tl sd pre type: float data:fi:ci:bnd:ti:ii:std:tl:sd prepostdescription: std tl sd prepost type: float data:fi:ci:bnd:ti:ii:tdescription: segment tier: time start and end of current segment; event tier: interval from the preceding to the currenttime stamp (for bnd:...:std features) type: data:fi:ci:bnd:ti:ii:tierdescription: tier name type: string data:fi:ci:bnd:ti:ii:tndescription: start and end of pre-boundary analysis window, start and end of post-boundary analysis window (for bnd:...:win features) type: data:fi:ci:bnd:ti:ii:todescription: t non-rounded type: data:fi:ci:bnd:ti:ii:trend:bl:aicIdescription: trend bl aicI type: float data:fi:ci:bnd:ti:ii:trend:bl:aicI postdescription: trend bl aicI post type: float data:fi:ci:bnd:ti:ii:trend:bl:aicI predescription: trend bl aicI pre type: float data:fi:ci:bnd:ti:ii:trend:bl:corrDdescription: trend bl corrD type: float data:fi:ci:bnd:ti:ii:trend:bl:corrD postdescription: trend bl corrD post type: float data:fi:ci:bnd:ti:ii:trend:bl:corrD predescription: trend bl corrD pre type: float ata:fi:ci:bnd:ti:ii:trend:bl:d mdescription: trend bl d m type: float data:fi:ci:bnd:ti:ii:trend:bl:d odescription: trend bl d o type: float data:fi:ci:bnd:ti:ii:trend:bl:rdescription: trend bl r type: float data:fi:ci:bnd:ti:ii:trend:bl:rmsdescription: trend bl rms type: float data:fi:ci:bnd:ti:ii:trend:bl:rms postdescription: trend bl rms post type: float data:fi:ci:bnd:ti:ii:trend:bl:rms predescription: trend bl rms pre type: float data:fi:ci:bnd:ti:ii:trend:bl:rmsRdescription: trend bl rmsR type: float data:fi:ci:bnd:ti:ii:trend:bl:rmsR postdescription: trend bl rmsR post type: float data:fi:ci:bnd:ti:ii:trend:bl:rmsR predescription: trend bl rmsR pre type: float data:fi:ci:bnd:ti:ii:trend:bl:sd postdescription: trend bl sd post type: float data:fi:ci:bnd:ti:ii:trend:bl:sd predescription: trend bl sd pre type: float data:fi:ci:bnd:ti:ii:trend:bl:sd prepostdescription: trend bl sd prepost type: float data:fi:ci:bnd:ti:ii:trend:ml:aicIdescription: trend ml aicI type: float data:fi:ci:bnd:ti:ii:trend:ml:aicI postdescription: trend ml aicI post type: float data:fi:ci:bnd:ti:ii:trend:ml:aicI predescription: trend ml aicI pre type: float data:fi:ci:bnd:ti:ii:trend:ml:corrDdescription: trend ml corrD type: float data:fi:ci:bnd:ti:ii:trend:ml:corrD post escription: trend bl corrD post type: float data:fi:ci:bnd:ti:ii:trend:ml:corrD predescription: trend bl corrD pre type: float data:fi:ci:bnd:ti:ii:trend:ml:d mdescription: trend ml d m type: float data:fi:ci:bnd:ti:ii:trend:ml:d odescription: trend ml d o type: float data:fi:ci:bnd:ti:ii:trend:ml:rdescription: trend ml r type: float data:fi:ci:bnd:ti:ii:trend:ml:rmsdescription: trend ml rms type: float data:fi:ci:bnd:ti:ii:trend:ml:rms postdescription: trend ml rms post type: float data:fi:ci:bnd:ti:ii:trend:ml:rms predescription: trend ml rms pre type: float data:fi:ci:bnd:ti:ii:trend:ml:rmsRdescription: trend ml rmsR type: float data:fi:ci:bnd:ti:ii:trend:ml:rmsR postdescription: trend ml rmsR post type: float data:fi:ci:bnd:ti:ii:trend:ml:rmsR predescription: trend ml rmsR pre type: float data:fi:ci:bnd:ti:ii:trend:ml:sd postdescription: trend ml sd post type: float data:fi:ci:bnd:ti:ii:trend:ml:sd predescription: trend ml sd pre type: float data:fi:ci:bnd:ti:ii:trend:ml:sd prepostdescription: trend ml sd prepost type: float data:fi:ci:bnd:ti:ii:trend:pdescription: trend ml rms pre type: float data:fi:ci:bnd:ti:ii:trend:rng:aicIdescription: trend rng aicI type: float data:fi:ci:bnd:ti:ii:trend:rng:aicI postdescription: trend rng aicI post type: float ata:fi:ci:bnd:ti:ii:trend:rng:aicI predescription: trend rng aicI pre type: float data:fi:ci:bnd:ti:ii:trend:rng:corrDdescription: trend rng corrD type: float data:fi:ci:bnd:ti:ii:trend:rng:corrD postdescription: trend rng corrD post type: float data:fi:ci:bnd:ti:ii:trend:rng:corrD predescription: trend rng corrD pre type: float data:fi:ci:bnd:ti:ii:trend:rng:d mdescription: trend rng d m type: float data:fi:ci:bnd:ti:ii:trend:rng:d odescription: trend rng d o type: float data:fi:ci:bnd:ti:ii:trend:rng:rdescription: trend rng r type: float data:fi:ci:bnd:ti:ii:trend:rng:rmsdescription: trend rng rms type: float data:fi:ci:bnd:ti:ii:trend:rng:rms postdescription: trend rng rms post type: float data:fi:ci:bnd:ti:ii:trend:rng:rms predescription: trend rng rms pre type: float data:fi:ci:bnd:ti:ii:trend:rng:rmsRdescription: trend rng rmsR type: float data:fi:ci:bnd:ti:ii:trend:rng:rmsR postdescription: trend rng rmsR post type: float data:fi:ci:bnd:ti:ii:trend:rng:rmsR predescription: trend rng rmsR pre type: float data:fi:ci:bnd:ti:ii:trend:rng:sd postdescription: trend rng sd post type: float data:fi:ci:bnd:ti:ii:trend:rng:sd predescription: trend rng sd pre type: float data:fi:ci:bnd:ti:ii:trend:rng:sd prepostdescription: trend rng sd prepost type: float data:fi:ci:bnd:ti:ii:trend:tl:aicI escription: trend bl aicI type: float data:fi:ci:bnd:ti:ii:trend:tl:aicI postdescription: trend tl aicI post type: float data:fi:ci:bnd:ti:ii:trend:tl:aicI predescription: trend tl aicI pre type: float data:fi:ci:bnd:ti:ii:trend:tl:corrDdescription: trend tl corrD type: float data:fi:ci:bnd:ti:ii:trend:tl:corrD postdescription: trend tl corrD post type: float data:fi:ci:bnd:ti:ii:trend:tl:corrD predescription: trend tl corrD pre type: float data:fi:ci:bnd:ti:ii:trend:tl:d mdescription: trend tl d m type: float data:fi:ci:bnd:ti:ii:trend:tl:d odescription: trend tl d o type: float data:fi:ci:bnd:ti:ii:trend:tl:rdescription: trend tl r type: float data:fi:ci:bnd:ti:ii:trend:tl:rmsdescription: trend tl rms type: float data:fi:ci:bnd:ti:ii:trend:tl:rms postdescription: trend tl rms post type: float data:fi:ci:bnd:ti:ii:trend:tl:rms predescription: trend tl rms pre type: float data:fi:ci:bnd:ti:ii:trend:tl:rmsRdescription: trend tl rmsR type: float data:fi:ci:bnd:ti:ii:trend:tl:rmsR postdescription: trend tl rmsR post type: float data:fi:ci:bnd:ti:ii:trend:tl:rmsR predescription: trend tl rmsR pre type: float data:fi:ci:bnd:ti:ii:trend:tl:sd postdescription: trend tl sd post type: float data:fi:ci:bnd:ti:ii:trend:tl:sd predescription: trend tl sd pre type: float ata:fi:ci:bnd:ti:ii:trend:tl:sd prepostdescription: trend tl sd prepost type: float data:fi:ci:bnd:ti:ii:ttdescription: start and endpoint for 2 trend windows: from file or chunk start (depending on the styl:bnd:cross chunk valuein the configurations, see section 11) till the end of the pre-boundary segment; from the start of the post-boundary segmenttill file/chunk end (for bnd:...:trend features). type: data:fi:ci:bnd:ti:ii:win:bl:aicIdescription: win bl aicI type: float data:fi:ci:bnd:ti:ii:win:bl:aicI postdescription: win bl aicI post type: float data:fi:ci:bnd:ti:ii:win:bl:aicI predescription: win bl aicI pre type: float data:fi:ci:bnd:ti:ii:win:bl:corrDdescription: win bl corrD type: float data:fi:ci:bnd:ti:ii:win:bl:corrD postdescription: win bl corrD post type: float data:fi:ci:bnd:ti:ii:win:bl:corrD predescription: win bl corrD pre type: float data:fi:ci:bnd:ti:ii:win:bl:d mdescription: win bl d m type: float data:fi:ci:bnd:ti:ii:win:bl:d odescription: win bl d o type: float data:fi:ci:bnd:ti:ii:win:bl:rdescription: win bl r type: float data:fi:ci:bnd:ti:ii:win:bl:rmsdescription: win bl rms type: float data:fi:ci:bnd:ti:ii:win:bl:rms postdescription: win bl rms post type: float data:fi:ci:bnd:ti:ii:win:bl:rms predescription: win bl rms pre type: float data:fi:ci:bnd:ti:ii:win:bl:rmsRdescription: win bl rmsR type: float data:fi:ci:bnd:ti:ii:win:bl:rmsR postdescription: win bl rmsR post type: float ata:fi:ci:bnd:ti:ii:win:bl:rmsR predescription: win bl rmsR pre type: float data:fi:ci:bnd:ti:ii:win:bl:sd postdescription: win bl sd post type: float data:fi:ci:bnd:ti:ii:win:bl:sd predescription: win bl sd pre type: float data:fi:ci:bnd:ti:ii:win:bl:sd prepostdescription: win bl sd prepost type: float data:fi:ci:bnd:ti:ii:win:ml:aicIdescription: win ml aicI type: float data:fi:ci:bnd:ti:ii:win:ml:aicI postdescription: win ml aicI post type: float data:fi:ci:bnd:ti:ii:win:ml:aicI predescription: win ml aicI pre type: float data:fi:ci:bnd:ti:ii:win:ml:corrDdescription: win ml corrD type: float data:fi:ci:bnd:ti:ii:win:ml:corrD postdescription: win ml corrD post type: float data:fi:ci:bnd:ti:ii:win:ml:corrD predescription: win ml corrD pre type: float data:fi:ci:bnd:ti:ii:win:ml:d mdescription: win ml d m type: float data:fi:ci:bnd:ti:ii:win:ml:d odescription: win ml d o type: float data:fi:ci:bnd:ti:ii:win:ml:rdescription: win ml r type: float data:fi:ci:bnd:ti:ii:win:ml:rmsdescription: win ml rms type: float data:fi:ci:bnd:ti:ii:win:ml:rms postdescription: win ml rms post type: float data:fi:ci:bnd:ti:ii:win:ml:rms predescription: win ml rms pre type: float data:fi:ci:bnd:ti:ii:win:ml:rmsR escription: win ml rmsR type: float data:fi:ci:bnd:ti:ii:win:ml:rmsR postdescription: win ml rmsR post type: float data:fi:ci:bnd:ti:ii:win:ml:rmsR predescription: win ml rmsR pre type: float data:fi:ci:bnd:ti:ii:win:ml:sd postdescription: win ml sd post type: float data:fi:ci:bnd:ti:ii:win:ml:sd predescription: win ml sd pre type: float data:fi:ci:bnd:ti:ii:win:ml:sd prepostdescription: win ml sd prepost type: float data:fi:ci:bnd:ti:ii:win:pdescription: p type: float data:fi:ci:bnd:ti:ii:win:rng:aicIdescription: win rng aicI type: float data:fi:ci:bnd:ti:ii:win:rng:aicI postdescription: win rng aicI post type: float data:fi:ci:bnd:ti:ii:win:rng:aicI predescription: win rng aicI pre type: float data:fi:ci:bnd:ti:ii:win:rng:corrDdescription: win rng corrD type: float data:fi:ci:bnd:ti:ii:win:rng:corrD postdescription: win rng corrD post type: float data:fi:ci:bnd:ti:ii:win:rng:corrD predescription: win rng corrD pre type: float data:fi:ci:bnd:ti:ii:win:rng:d mdescription: win rng d m type: float data:fi:ci:bnd:ti:ii:win:rng:d odescription: win rng d o type: float data:fi:ci:bnd:ti:ii:win:rng:rdescription: win rng r type: float data:fi:ci:bnd:ti:ii:win:rng:rmsdescription: win rng rms type: float ata:fi:ci:bnd:ti:ii:win:rng:rms postdescription: win rng rms post type: float data:fi:ci:bnd:ti:ii:win:rng:rms predescription: win rng rms pre type: float data:fi:ci:bnd:ti:ii:win:rng:rmsRdescription: win rng rmsR type: float data:fi:ci:bnd:ti:ii:win:rng:rmsR postdescription: win rng rmsR post type: float data:fi:ci:bnd:ti:ii:win:rng:rmsR predescription: win rng rmsR pre type: float data:fi:ci:bnd:ti:ii:win:rng:sd postdescription: win rng sd post type: float data:fi:ci:bnd:ti:ii:win:rng:sd predescription: win rng sd pre type: float data:fi:ci:bnd:ti:ii:win:rng:sd prepostdescription: win rng sd prepost type: float data:fi:ci:bnd:ti:ii:win:tl:aicIdescription: win tl aicI type: float data:fi:ci:bnd:ti:ii:win:tl:aicI postdescription: win tl aicI post type: float data:fi:ci:bnd:ti:ii:win:tl:aicI predescription: win tl aicI pre type: float data:fi:ci:bnd:ti:ii:win:tl:corrDdescription: win tl corrD type: float data:fi:ci:bnd:ti:ii:win:tl:corrD postdescription: win tl corrD post type: float data:fi:ci:bnd:ti:ii:win:tl:corrD predescription: win tl corrD pre type: float data:fi:ci:bnd:ti:ii:win:tl:d mdescription: win tl d m type: float data:fi:ci:bnd:ti:ii:win:tl:d odescription: win tl d o type: float data:fi:ci:bnd:ti:ii:win:tl:r escription: win tl r type: float data:fi:ci:bnd:ti:ii:win:tl:rmsdescription: win tl rms type: float data:fi:ci:bnd:ti:ii:win:tl:rms postdescription: win tl rms post type: float data:fi:ci:bnd:ti:ii:win:tl:rms predescription: win tl rms pre type: float data:fi:ci:bnd:ti:ii:win:tl:rmsRdescription: win tl rmsR type: float data:fi:ci:bnd:ti:ii:win:tl:rmsR postdescription: win tl rmsR post type: float data:fi:ci:bnd:ti:ii:win:tl:rmsR predescription: win tl rmsR pre type: float data:fi:ci:bnd:ti:ii:win:tl:sd postdescription: win tl sd post type: float data:fi:ci:bnd:ti:ii:win:tl:sd predescription: win tl sd pre type: float data:fi:ci:bnd:ti:ii:win:tl:sd prepostdescription: win tl sd prepost type: float Chunks data:fi:ci:chunk:ii:labdescription: label type: string data:fi:ci:chunk:ii:tdescription: time start and end type: data:fi:ci:chunk:ii:todescription: t non-rounded type: F0 data:fi:ci:f0:bvdescription: file/channel related f0 base value type: float data:fi:ci:f0:rdescription: f0 residual after removal of the global f0 component type: list of floats data:fi:ci:f0:tdescription: time stamps type: list of floats data:fi:ci:f0:ydescription: f0 values after preprocessing type: list of floats, same length as t ile information data:fi:ci:fsys:annot:dirdescription: directory of annotation file type: string data:fi:ci:fsys:annot:extdescription: extension of annotation file type: string data:fi:ci:fsys:annot:lab chunkdescription: general chunk label type: string data:fi:ci:fsys:annot:lab paudescription: general pause label type: string data:fi:ci:fsys:annot:lab syldescription: general syllable nucleus label type: string data:fi:ci:fsys:annot:stmdescription: annotation file name stem type: string data:fi:ci:fsys:annot:typdescription: annotation file type ( xml or TextGrid ) type: string data:fi:ci:fsys:aud:dirdescription: directory of audio file type: string data:fi:ci:fsys:aud:extdescription: extension of audio file type: string data:fi:ci:fsys:aud:stmdescription: audio file name stem type: string data:fi:ci:fsys:aud:typdescription: audio file type type: string data:fi:ci:fsys:augment:channel:myTierNamedescription: channel number of each relevant tier myTierName in the annotation. Names of tiers derived by automaticchunking, phrasing, etc. will be added automatically. type: int data:fi:ci:fsys:augment:chunk:tier out stmdescription: chunk tier output stem. In the augmented annotation file, the stem is concatenated with the respective channelnumber type: string data:fi:ci:fsys:augment:glob:tierdescription: analysis tier name for prosodic boundaries, i.e. tier with prosodic boundary candidates. Max. 1 for eachchannel! type: string data:fi:ci:fsys:augment:glob:tier out stmdescription: prosodic phrase tier output stem. In the augmented annotation file, the stem is concatenated with therespective channel number type: string data:fi:ci:fsys:augment:glob:tier parent escription: name of the parent tier (e.g. chunks), whose boundaries limit the analysis and normalization window boundariesfor prosodic phrase extraction type: string data:fi:ci:fsys:augment:lab chunkdescription: uniform chunk label type: string data:fi:ci:fsys:augment:lab paudescription: uniform pause label type: string data:fi:ci:fsys:augment:lab syldescription: uniform syllable nucleus label. Syllable boundaries are derived from this string by concatenating bnd type: string data:fi:ci:fsys:augment:loc:tier accdescription: analysis event tier name for pitch accent detection, i.e. the tier containing the time stamps of pitch accentcandidates (e.g. syllable nuclei). Max 1 for each channel. type: list of strings data:fi:ci:fsys:augment:loc:tier agdescription: analysis segment tier name for pitch accent detection, i.e. the tier containing segments within which maximallyone pitch accent can be realized (e.g. words) type: string data:fi:ci:fsys:augment:loc:tier out stmdescription: pitch accent tier output stem. In the augmented annotation file, the stem is concatenated with the respectivechannel number type: string data:fi:ci:fsys:augment:loc:tier parentdescription: name of the parent tier (e.g. prosodic phrases), relative to which the accent Gestalt is measured, and whoseboundaries limit the analysis and normalization window boundaries for pitch accent extraction type: string data:fi:ci:fsys:augment:ncdescription: number of channels type: int data:fi:ci:fsys:augment:stmdescription: annotation file name stem type: string data:fi:ci:fsys:augment:syl:tier out stmdescription: syllable nucleus and boundary tier output stem. In the augmented annotation file, for the syllable boundarytier bnd is added to the stem, and for both nuclei and boundaries, the stem is concatenated with the respective channelnumber type: string data:fi:ci:fsys:augment:syl:tier parentdescription: name of the parent tier (e.g. chunks), within which reference values are calculated, and whose boundarieslimit the analysis and normalization window boundaries for syllable nucleus detection) type: string data:fi:ci:fsys:bnd:tierdescription: analysis tiers for boundary parameterization. Arbitrary number for each channel. type: list of strings data:fi:ci:fsys:chunk:tierdescription: names of tiers that contain a chunk segmentation (only 1 tier for each channel). Names of automaticallygenerated tiers are expanded by channel index ci . type: list of strings data:fi:ci:fsys:f0:dirdescription: f0 file directory ype: string data:fi:ci:fsys:f0:extdescription: f0 file extension type: string data:fi:ci:fsys:f0:stmdescription: f0 file name stem type: string data:fi:ci:fsys:f0:typdescription: f0 file type type: string data:fi:ci:fsys:glob:tierdescription: analysis tiers for global contour stylization. Max. 1 tier per channel. type: list of strings data:fi:ci:fsys:gnl en:tierdescription: names of analysis tiers for standard energy feature extraction. Any number of tiers per channel supported. type: list of strings data:fi:ci:fsys:gnl f0:tierdescription: names of analysis tiers for standard f0 feature extraction. Any number of tiers per channel supported. type: list of strings data:fi:ci:fsys:loc:tier accdescription: time stamp analysis tiers for the 0-center of normalized time within a local contour segment. Max. 1 tier perchannel. type: list of strings data:fi:ci:fsys:loc:tier agdescription: segment tiers for local contours. Max. 1 tier per channel. type: list of strings data:fi:ci:fsys:rhy en:tierdescription: names of analysis tiers for energy rhythm feature extraction. Any number per channel. type: list of strings data:fi:ci:fsys:rhy en:tier ratedescription: names of rate tiers for energy rhythm feature extraction. Any number per channel. type: list of strings data:fi:ci:fsys:rhy f0:tierdescription: names of analysis tiers for f0 rhythm feature extraction. Any number per channel. type: list of strings data:fi:ci:fsys:rhy f0:tier ratedescription: names of rate tiers for f0 rhythm feature extraction. Any number per channel. type: list of strings lobal segment features data:fi:ci:glob:ii:classdescription: class ; global contour class index derived by clustering type: int data:fi:ci:glob:ii:decl:bl:cdescription: bl c1, bl c0 type: data:fi:ci:glob:ii:decl:bl:rdescription: bl r type: float data:fi:ci:glob:ii:decl:bl:ratedescription: bl rate type: float data:fi:ci:glob:ii:decl:bl:ydescription: stylized f0 baseline values type: list of floats data:fi:ci:glob:ii:decl:errdescription: True if base and topline crossing type: boolean data:fi:ci:glob:ii:decl:ml:cdescription: ml c1, ml c0 type: data:fi:ci:glob:ii:decl:ml:rdescription: ml r type: float data:fi:ci:glob:ii:decl:ml:ratedescription: ml rate type: float data:fi:ci:glob:ii:decl:ml:ydescription: stylized f0 midline values type: list of floats data:fi:ci:glob:ii:decl:rng:cdescription: rng c1, rng c0 type: data:fi:ci:glob:ii:decl:rng:rdescription: rng r type: float data:fi:ci:glob:ii:decl:rng:ratedescription: rng rate type: float data:fi:ci:glob:ii:decl:rng:ydescription: stylized f0 range values type: list of floats data:fi:ci:glob:ii:decl:tl:cdescription: tl c1, tl c0 type: data:fi:ci:glob:ii:decl:tl:rdescription: tl r type: float data:fi:ci:glob:ii:decl:tl:rate escription: tl rate type: float data:fi:ci:glob:ii:decl:tl:ydescription: stylized f0 topline values type: list of floats data:fi:ci:glob:ii:decl:tndescription: normalized time values (same length as all bl | ml | tl | rng:y ) type: list of floats data:fi:ci:glob:ii:gnl:durdescription: dur type: float data:fi:ci:glob:ii:gnl:iqrdescription: iqr type: float data:fi:ci:glob:ii:gnl:mdescription: m type: float data:fi:ci:glob:ii:gnl:maxdescription: max type: float data:fi:ci:glob:ii:gnl:meddescription: med type: float data:fi:ci:glob:ii:gnl:mindescription: min type: float data:fi:ci:glob:ii:gnl:sddescription: sd type: float data:fi:ci:glob:ii:labdescription: lab type: string data:fi:ci:glob:ii:ridescription: indices of local segments contained in global segment ii type: list of int data:fi:ci:glob:ii:tdescription: global phrase time start and end type: data:fi:ci:glob:ii:todescription: t non-rounded type: tandard energy features data:fi:ci:gnl en:ti:ii:labdescription: lab type: string data:fi:ci:gnl en:ti:ii:std:durdescription: dur type: float data:fi:ci:gnl en:ti:ii:std:dur nrmdescription: normalized duration type: float data:fi:ci:gnl en:ti:ii:std:iqrdescription: iqr type: float data:fi:ci:gnl en:ti:ii:std:iqr nrmdescription: iqr nrm type: float data:fi:ci:gnl en:ti:ii:std:mdescription: m type: float data:fi:ci:gnl en:ti:ii:std:m nrmdescription: m nrm type: float data:fi:ci:gnl en:ti:ii:std:maxdescription: max type: float data:fi:ci:gnl en:ti:ii:std:max nrmdescription: max nrm type: float data:fi:ci:gnl en:ti:ii:std:meddescription: med type: float data:fi:ci:gnl en:ti:ii:std:med nrmdescription: med nrm type: float data:fi:ci:gnl en:ti:ii:std:mindescription: min type: float data:fi:ci:gnl en:ti:ii:std:min nrmdescription: min nrm type: float data:fi:ci:gnl en:ti:ii:std:rmsdescription: rms type: float data:fi:ci:gnl en:ti:ii:std:rms nrmdescription: rms nrm type: float data:fi:ci:gnl en:ti:ii:std:sbdescription: sb type: float data:fi:ci:gnl en:ti:ii:std:sd escription: sd type: float data:fi:ci:gnl en:ti:ii:std:sd nrmdescription: sd nrm type: float data:fi:ci:gnl en:ti:ii:tdescription: analysis window start and endpoint type: data:fi:ci:gnl en:ti:ii:tierdescription: tier name related to index ti type: string data:fi:ci:gnl en:ti:ii:tndescription: normalization window start and endpoint type: data:fi:ci:gnl en:ti:ii:todescription: t non-rounded type: list of floats data:fi:ci:gnl en:ti:ii:ttdescription: trend window (not used) type: list of floats data:fi:ci:gnl en file:durdescription: dur type: float data:fi:ci:gnl en file:iqrdescription: iqr type: float data:fi:ci:gnl en file:mdescription: m type: float data:fi:ci:gnl en file:maxdescription: max type: float data:fi:ci:gnl en file:meddescription: med type: float data:fi:ci:gnl en file:mindescription: min type: float data:fi:ci:gnl en file:sddescription: sd type: float tandard f0 features data:fi:ci:gnl f0:ti:ii:labdescription: lab type: string data:fi:ci:gnl f0:ti:ii:std:durdescription: dur type: float data:fi:ci:gnl f0:ti:ii:std:dur nrmdescription: normalized duration type: float data:fi:ci:gnl f0:ti:ii:std:iqrdescription: iqr type: float data:fi:ci:gnl f0:ti:ii:std:iqr nrmdescription: iqr nrm type: float data:fi:ci:gnl f0:ti:ii:std:mdescription: m type: float data:fi:ci:gnl f0:ti:ii:std:m nrmdescription: m nrm type: float data:fi:ci:gnl f0:ti:ii:std:maxdescription: max type: float data:fi:ci:gnl f0:ti:ii:std:max nrmdescription: max nrm type: float data:fi:ci:gnl f0:ti:ii:std:meddescription: med type: float data:fi:ci:gnl f0:ti:ii:std:med nrmdescription: med nrm type: float data:fi:ci:gnl f0:ti:ii:std:mindescription: min type: float data:fi:ci:gnl f0:ti:ii:std:min nrmdescription: min nrm type: float data:fi:ci:gnl f0:ti:ii:std:sddescription: sd type: float data:fi:ci:gnl f0:ti:ii:std:sd nrmdescription: sd nrm type: float data:fi:ci:gnl f0:ti:ii:tdescription: analysis window start and endpoint type: data:fi:ci:gnl f0:ti:ii:tier escription: tier name related to index ti type: string data:fi:ci:gnl f0:ti:ii:tndescription: normalization window start and endpoint type: data:fi:ci:gnl f0:ti:ii:todescription: t non-rounded type: list of floats data:fi:ci:gnl f0:ti:ii:ttdescription: trend window, not used type: list of floats data:fi:ci:gnl f0 file:durdescription: dur type: float data:fi:ci:gnl f0 file:iqrdescription: iqr type: float data:fi:ci:gnl f0 file:mdescription: m type: float data:fi:ci:gnl f0 file:maxdescription: max type: float data:fi:ci:gnl f0 file:meddescription: med type: float data:fi:ci:gnl f0 file:mindescription: min type: float data:fi:ci:gnl f0 file:sddescription: sd type: float Grouping data:fi:ci:grp:myGroupVar*description: myGroupvar refers to the grouping variable names specified in the configuration sub-dictionary fsys:grp forfile name based grouping. The values are extracted from the file name and are always strings type: string
Local segment features data:fi:ci:loc:ii:acc:cdescription: c ∗ ; polynomial coefficients (highest order first) type: list of floats data:fi:ci:loc:ii:acc:tndescription: normalized time values type: list of floats data:fi:ci:loc:ii:acc:ydescription: polynomial stylization values (same length as tn) type: list of floats data:fi:ci:loc:ii:classdescription: class ; local contour class index derived by clustering ype: int data:fi:ci:loc:ii:decl:bl:cdescription: bl c1, bl c0 type: data:fi:ci:loc:ii:decl:bl:ratedescription: bl rate type: float data:fi:ci:loc:ii:decl:bl:ydescription: stylized f0 baseline values type: list of floats data:fi:ci:loc:ii:decl:errdescription: True if base- and topline cross type: boolean data:fi:ci:loc:ii:decl:ml:cdescription: ml c1, ml c0 type: data:fi:ci:loc:ii:decl:ml:ratedescription: ml rate type: float data:fi:ci:loc:ii:decl:ml:ydescription: stylized f0 midline values type: list of floats data:fi:ci:loc:ii:decl:rng:cdescription: rng c1, rng c0 type: data:fi:ci:loc:ii:decl:rng:ratedescription: rng rate type: float data:fi:ci:loc:ii:decl:rng:ydescription: stylized f0 range values type: list of floats data:fi:ci:loc:ii:decl:tl:cdescription: tl c1, tl c0 type: data:fi:ci:loc:ii:decl:tl:ratedescription: tl rate type: float data:fi:ci:loc:ii:decl:tl:ydescription: stylized f0 topline values type: list of floats data:fi:ci:loc:ii:decl:tndescription: normalized time values type: list of floats data:fi:ci:loc:ii:gnl:durdescription: dur type: float data:fi:ci:loc:ii:gnl:dur nrmdescription: dur nrm type: float ata:fi:ci:loc:ii:gnl:iqrdescription: iqr type: float data:fi:ci:loc:ii:gnl:iqr nrmdescription: iqr nrm type: float data:fi:ci:loc:ii:gnl:mdescription: m type: float data:fi:ci:loc:ii:gnl:m nrmdescription: m nrm type: float data:fi:ci:loc:ii:gnl:maxdescription: max type: float data:fi:ci:loc:ii:gnl:max nrmdescription: max nrm type: float data:fi:ci:loc:ii:gnl:meddescription: med type: float data:fi:ci:loc:ii:gnl:med nrmdescription: med nrm type: float data:fi:ci:loc:ii:gnl:mindescription: min type: float data:fi:ci:loc:ii:gnl:min nrmdescription: min nrm type: float data:fi:ci:loc:ii:gnl:sddescription: sd type: float data:fi:ci:loc:ii:gnl:sd nrmdescription: sd nrm type: float data:fi:ci:loc:ii:gst:bl:d findescription: bl d fin type: float data:fi:ci:loc:ii:gst:bl:d initdescription: bl d init type: float data:fi:ci:loc:ii:gst:bl:rmsdescription: bl rms type: float data:fi:ci:loc:ii:gst:bl:sddescription: bl sd type: float data:fi:ci:loc:ii:gst:ml:d fin escription: ml d fin type: float data:fi:ci:loc:ii:gst:ml:d initdescription: ml d init type: float data:fi:ci:loc:ii:gst:ml:rmsdescription: ml rms type: float data:fi:ci:loc:ii:gst:ml:sddescription: ml sd type: float data:fi:ci:loc:ii:gst:residual:bl:cdescription: c ∗ ; local contour coefs in descending order. Polynomial fitted on residual after baseline subtraction type: list of floats data:fi:ci:loc:ii:gst:residual:ml:cdescription: c ∗ ; local contour coefs in descending order. Polynomial fitted on residual after midline subtraction type: list of floats data:fi:ci:loc:ii:gst:residual:rng:cdescription: c ∗ ; local contour coefs in descending order. Polynomial fitted on residual after range normalization type: list of floats data:fi:ci:loc:ii:gst:residual:tl:cdescription: c ∗ ; local contour coefs in descending order. Polynomial fitted on residual after topline subtraction type: list of floats data:fi:ci:loc:ii:gst:rng:d findescription: rng d fin type: float data:fi:ci:loc:ii:gst:rng:d initdescription: rng d init type: float data:fi:ci:loc:ii:gst:rng:rmsdescription: rng rms type: float data:fi:ci:loc:ii:gst:rng:sddescription: rng sd type: float data:fi:ci:loc:ii:gst:tl:d findescription: tl d fin type: float data:fi:ci:loc:ii:gst:tl:d initdescription: tl d init type: float data:fi:ci:loc:ii:gst:tl:rmsdescription: tl rms type: float data:fi:ci:loc:ii:gst:tl:sddescription: tl sd type: float data:fi:ci:loc:ii:is findescription: is fin (yes, no) type: string ata:fi:ci:loc:ii:is fin chunkdescription: is fin chunk (yes, no) type: string data:fi:ci:loc:ii:is initdescription: is init (yes, no) type: string data:fi:ci:loc:ii:is init chunkdescription: is init chunk (yes, no) type: string data:fi:ci:loc:ii:lab accdescription: lab pnt ; label from local event tier type: string data:fi:ci:loc:ii:lab agdescription: lab ; label from local segment tier type: string data:fi:ci:loc:ii:ridescription: index of global parent segment type: int data:fi:ci:loc:ii:tdescription: analysis window starting point, endpoint, and center. For only event tier input, these values are given by startand end of a symmetric window around each time stamp, and the time stamp itself. For only segment tier input, start- andendpoint are given by the segment’s on and offset, and the center corresponds to the segment’s midpoint. For both event andsegment tier input, start- and endpoint are given by the segment’s on and offset, and the center by the event’s time stamp. type: data:fi:ci:loc:ii:tndescription: Normalization window start and endpoint to normalize f0 standard features. type: data:fi:ci:loc:ii:todescription: original input time values (1 for events, 2 for segments, 3 for events+segments) type:
1, 2 or 3 element list of floats
Event rates data:fi:ci:rate:myTierName*description: for the events or segments of all tier names specified in the configuration by rhy f0:tier rate their overall rateis measured within the file. type: float
Energy rhythm features data:fi:ci:rhy en:ti:ii:labdescription: lab type: string data:fi:ci:rhy en:ti:ii:rate:myTierName*description: myRateTier rate ; rate of items in myTierName within the current interval ii of tier with index ti type: float data:fi:ci:rhy en:ti:ii:rhy:cdescription: DCT coefficients in user defined frequency band type: list of floats data:fi:ci:rhy en:ti:ii:rhy:c origdescription: all DCT coefficients type: list of floats data:fi:ci:rhy en:ti:ii:rhy:cbin escription: summed DCT coefs within frequency bins type: list of floats data:fi:ci:rhy en:ti:ii:rhy:durdescription: dur type: float data:fi:ci:rhy en:ti:ii:rhy:fdescription: frequencies of DCT coefs in c (same length as c ) type: list of floats data:fi:ci:rhy en:ti:ii:rhy:f origdescription: frequencies of DCT coef in c orig (same length as c orig ) type: list of floats data:fi:ci:rhy en:ti:ii:rhy:fbindescription: lower boundaries of frequency bins (same length as cbin ) type: list of floats data:fi:ci:rhy en:ti:ii:rhy:mdescription: weighted coefficient mean type: float data:fi:ci:rhy en:ti:ii:rhy:maedescription: mean absolute error between IDCT and original contour type: float data:fi:ci:rhy en:ti:ii:rhy:sddescription: weighted coefficient standard deviation type: float data:fi:ci:rhy en:ti:ii:rhy:smdescription: sm ∗ ; spectral moments of DCT coefs type: list of floats; length depends on config branch styl:rhy en:rhy:nsm data:fi:ci:rhy en:ti:ii:rhy:wgt:myTierName*:maedescription: myAnalysisTier myRateTier mae ; mean absolute error between original contour and IDCT of coefficientsaround the rate of the items in tier myTierName type: float data:fi:ci:rhy en:ti:ii:rhy:wgt:myTierName*:propdescription: myAnalysisTier myAnalysisTier prop ; proportion of the coefficient weights around the rate of the items in tier myTierName relative to coefficients’ overall sum type: float data:fi:ci:rhy en:ti:ii:rhy:wgt:myTierName*:ratedescription: myAnalysisTier myAnalysisTier rate ; rate of the items in tier myTierName type: float data:fi:ci:rhy en:ti:ii:ri:myTierName*description: item indices in tier myTierName that fall within the segment ii of tier with index ti type: list of int data:fi:ci:rhy en:ti:ii:tdescription: segment ii start and endpoint type: data:fi:ci:rhy en:ti:ii:tierdescription: tier name related to index ti type: string data:fi:ci:rhy en:ti:ii:tndescription: as t, irrelevant type: data:fi:ci:rhy en:ti:ii:to escription: t non-rounded type: data:fi:ci:rhy en:ti:ii:ttdescription: segment ii start, mid, and endpoint; irrelevant type: data:fi:ci:rhy en file:cdescription: DCT coefficients in user defined frequency band type: list of floats data:fi:ci:rhy en file:c origdescription: all DCT coefficients type: list of floats data:fi:ci:rhy en file:cbindescription: summed DCT coefs within frequency bins type: list of floats data:fi:ci:rhy en file:durdescription: dur type: float data:fi:ci:rhy en file:fdescription: frequencies of DCT coefs in c (same length as c ) type: list of floats data:fi:ci:rhy en file:f origdescription: frequencies of DCT coef in c orig (same length as c orig ) type: list of floats data:fi:ci:rhy en file:fbindescription: lower boundaries of frequency bins type: list of floats data:fi:ci:rhy en file:mdescription: weighted coefficient mean type: string data:fi:ci:rhy en file:maedescription: mean absolute error between IDCT and original contour type: float data:fi:ci:rhy en file:sddescription: weighted coefficient standard deviation type: float data:fi:ci:rhy en file:smdescription: sm ∗ ; spectral moments of DCT coefs type: list of floats data:fi:ci:rhy en file:wgt:myTierName*:maedescription: myAnalysisTier mae ; mean absolute error between original contour and IDCT of coefficients around the rateof the items in tier myTierName type: float data:fi:ci:rhy en file:wgt:myTierName*:propdescription: myAnalysisTier prop ; proportion of the coefficient weights around the rate of the items in tier myTierName relative to coefficients’ overall sum type: float data:fi:ci:rhy en file:wgt:myTierName*:ratedescription: myAnalysisTier rate ; rate of the items in tier myTierName type: float data:fi:ci:rhy f0:ti:ii:labdescription: lab type: string data:fi:ci:rhy f0:ti:ii:rate:myTierName*description: myAnalysisTier rate ; rate of items in myTierName within the current interval ii of tier with index ti type: float data:fi:ci:rhy f0:ti:ii:rhy:cdescription: DCT coefficients in user defined frequency band type: list of floats data:fi:ci:rhy en:ti:ii:rhy:c origdescription: all DCT coefficients type: list of floats data:fi:ci:rhy f0:ti:ii:rhy:cbindescription: summed DCT coefs within frequency bins type: list of floats data:fi:ci:rhy f0:ti:ii:rhy:durdescription: dur type: float data:fi:ci:rhy f0:ti:ii:rhy:fdescription: frequencies of DCT coefs in c (same length as c ) type: list of floats data:fi:ci:rhy f0:ti:ii:rhy:f origdescription: frequencies of DCT coef in c orig (same length as c orig ) type: list of floats data:fi:ci:rhy f0:ti:ii:rhy:fbindescription: lower boundaries of frequency bins type: list of floats data:fi:ci:rhy f0:ti:ii:rhy:mdescription: weighted coefficient mean type: string data:fi:ci:rhy f0:ti:ii:rhy:maedescription: mean absolute error between IDCT and original contour type: float data:fi:ci:rhy f0:ti:ii:rhy:sddescription: weighted coefficient standard deviation type: float data:fi:ci:rhy f0:ti:ii:rhy:smdescription: sm ∗ ; spectral moments of DCT coefs type: list of floats data:fi:ci:rhy f0:ti:ii:rhy:wgt:myTierName*:maedescription: myAnalysisTier myAnalysisTier mae ; mean absolute error between original contour and IDCT of coefficientsaround the rate of the items in tier myTierName type: float data:fi:ci:rhy f0:ti:ii:rhy:wgt:myTierName*:propdescription: myAnalysisTier myAnalysisTier prop ; proportion of the coefficient weights around the rate of the items in tier myTierName relative to coefficients’ overall sum type: float data:fi:ci:rhy f0:ti:ii:rhy:wgt:myTierName*:ratedescription: myAnalysisTier myAnalysisTier rate ; rate of the items in tier myTierName ype: float data:fi:ci:rhy f0:ti:ii:ri:myTierNamedescription: item indices in tier myTierName that fall within the segment ii of tier with index ti type: list of int data:fi:ci:rhy f0:ti:ii:tdescription: segment ii start and endpoint type: data:fi:ci:rhy f0:ti:ii:tierdescription: tier name related to index ti type: string data:fi:ci:rhy f0:ti:ii:tndescription: as t, irrelevant type: data:fi:ci:rhy f0:ti:ii:todescription: t non-rounded type: list of floats data:fi:ci:rhy f0:ti:ii:ttdescription: segment ii start, mid, and endpoint; irrelevant type: list of floats data:fi:ci:rhy f0 file:cdescription: DCT coefficients in user defined frequency band type: list of floats data:fi:ci:rhy f0 file:c origdescription: all DCT coefficients type: list of floats data:fi:ci:rhy f0 file:cbindescription: summed DCT coefs within frequency bins type: list of floats data:fi:ci:rhy f0 file:durdescription: dur type: float data:fi:ci:rhy f0 file:fdescription: frequencies of DCT coefs in c (same length as c ) type: list of floats data:fi:ci:rhy f0 file:f origdescription: frequencies of DCT coef in c orig (same length as c orig ) type: list of floats data:fi:ci:rhy f0 file:fbindescription: lower boundaries of frequency bins type: list of floats data:fi:ci:rhy f0 file:mdescription: weighted coefficient mean type: string data:fi:ci:rhy f0 file:maedescription: mean absolute error between IDCT and original contour type: float data:fi:ci:rhy f0 file:sddescription: weighted coefficient standard deviation type: float ata:fi:ci:rhy f0 file:smdescription: sm ∗ ; spectral moments of DCT coefs type: list of floats data:fi:ci:rhy f0 file:wgt:myTierName*:maedescription: myAnalysisTier mae ; mean absolute error between original contour and IDCT of coefficients around the rateof the items in tier myTierName type: float data:fi:ci:rhy f0 file:wgt:myTierName*:propdescription: myAnalysisTier prop ; proportion of the coefficient weights around the rate of the items in tier myTierName relative to coefficients’ over all sum type: float data:fi:ci:rhy f0 file:wgt:myTierName*:ratedescription: myAnalysisTier rate ; rate of the items in tier myTierName type: float This sub-dictionary is accessed by copa[’clst’] and contains the outcome of the clustering of global and localcontours (cf. sections 8.5.3 and 8.7.3).
Global contour classes clst:glob:cdescription: slope coef matrix to be clustered type:
2d array of floats clst:glob:cntrdescription: centroid matrix, one row per contour class type:
2d array of floats clst:glob:ijdescription: location of each feature vector in copa:data:fi:ci:glob:ii : each row contains 3 indices for the file fi , thechannel ci , and the global segment ii , respectively) type:
2d array of int clst:glob:objdescription: clustering object which was used for global contour clustering type: object clst:glob:valdescription: mean silhouette type: float
Local contour classes clst:loc:cdescription: polynomial coef matrix to be clustered type: list of floats clst:loc:cntrdescription: centroid matrix, one row per contour class type: list of strings clst:loc:ijdescription: location of each feature vector in copa:data:fi:ci:loc:ii : each row contains 3 indices for the file, thechannel, and the local segment, respectively type: list of floats clst:loc:objdescription: clustering object which was used for local contour clustering type: string clst:loc:valdescription: mean silhouette type: float This sub-dictionary is accessed by copa[’val’] and contains validation measures for stylization and clustering.
Clustering val:clst:glob:sil meandescription: mean silhouette of all data points for global contour clustering type: float val:clst:loc:sil meandescription: mean silhouette of all data points for local contour clustering type: float
Stylization val:styl:glob:err propdescription: proportion of base/topline crossings over all global segments type: float val:styl:loc:rms meandescription: mean RMSD between original and stylized local contour type: float
Three types of f0 tables can be exported: • preprocessed f0 • residual f0 (after removal of the global register component) • resynthesized f0 (superposition of global and local stylized component)As the f0 table input format in each output table the first column gives the time stamps, and the second till lastcolumns contain the f0 values (in Hz) for the recording channels. The tables will be stored below fsys:export:dir insub-directories named after the type of f0 output ( f0 preproc, f0 residual, f0 resyn ). For each input f0 file an outputfile with the same name is generated. The log file in fsys:export:dir + fsys:export:stm + log.txt contains warnings, information about too short segments tobe skipped, and some validations below the line ’
13 Plotting
To activate plotting, set navigate:do plot=1
Browsing
Browsing through stylizations can be carried out online (in order to check for appropriate stylizationparameter settings) or after feature extraction, which is controlled by plot:browse:time
To select the stylization to be plotted the corresponding branches in plot:browse:typ:*:* need to be set to 1. E.g. plot:browse:typ:complex:superpos=1 produces plots as in Figure 5.88 rouping
One can also plot stylizations based on parameter centroids for a specified grouping. By plot:grp:typ:*:*=1 the user selects the stylization to be plotted. The grouping is defined by plot:grp:grouping
The entries in this list can be lab for item labels or the grouping factor names specified in fsys:grp:lab . Centroidswill be plotted for each factor level combination.Browsing and grouping plots can be saved as .png files by plot:browse:save=1plot:grp:save=1
The browse mode output file names are the concatenation of fsys:pic:dir + fsys:pic:stm + final | online + typ + set + fileIndex + channelIndex + tierName + itemIndex . typ and set refer to the *-keys in plot:browse:typ:*:* setto 1.The grouping mode output file names are concatenated from fsys:pic:dir + fsys:pic:stm + factorLevelCom-bination . One file is generated for each factor level combination.
14 Known bugs
Not yet all missing or wrong configurations will be reported in the log file but might result in some Python errormessage. Same with unexpected annotations. If you cannot locate the error, you can send the configuration file, thelog file, and the error message to: [email protected] error message that includes the line return json.load(h) results from a wrongly formatted JSON configurationfile. The last line of the error message points you to the erroneous line in the JSON file.Currently (September 24th, 2018), depending on your scipy and numpy versions, scipy.signal may cause a
Future-Warning: Using a non-tuple sequence for multidimensional indexing is deprecated . This warning can be ignored.
15 History
In the following only those updates directly relevant for the user are documented, that is, configuration and/or featureset updates. For a documentation of all remaining updates please see the history.txt file which is part of the codedistribution.
Version 0.2.1, December 30th, 2016Batch clustering for phrase boundary and pitch accent extraction.
Additionally to clustering of bound-ary/pitch accent candidates on the file level, clustering is now also supported on the entire dataset level. This isexpected to improve the prosodic structure extraction in short files (e.g. backchannel turns in dialog data), that donot contain enough material for clustering. Dataset vs. file-level clustering is selected by the configuration branches: augment:glob:unitaugment:loc:unit
See section 7.3 and 7.4 for further details.
Phone duration feature for phrase boundary and pitch accent extraction.
In case a phone segment tier isavailable, z-scored vowel length can be used as a feature for phrase boundary and pitch accent detection. The length ofthe vowel associated with the prosodic event candidate is divided by its mean length derived from the entire dataset.For boundary candidates the associated vowel is the last vowel segment with an onset before the boundary candidatetime stamp. For accent candidates the associated vowel interval includes the candidate time stamp. This feature willbe beneficial for languages in which phrase boundaries and/or accents are marked by phone segment lengthening. Seesection 7.3 and 7.4 for details. The related new configuration branches are: augment:glob:wgt:pho : add vowel length to boundary feature set augment:loc:wgt:pho : add vowel length to accent feature set fsys:pho:tier : name(s) of phone segment tier(s) fsys:pho:vow : vowel pattern
Version 0.3.1, January 31st, 2017 ynchronize locations where to extract loc, loc ext, gnl f0 | en , and rhy f0 | en feature sets Due to thestrict hierarchy principle and to window length constraints it is not always possible to extract loc features at anylocation where gnl and rhy features can be obtained. If the user is interested only in locations where all these featuresets are available, so that the corresponding feature matrices can be concatenated, then the option preproc:loc sync is to be set to 1.
Transforming events to segments separately for each feature set
Next to the global window length settingin opt:preproc:point winopt:preproc:nrm win window lenghts can be set individually for each of the feature sets loc( ext), gnl f0, gnl en, rhy f0, rhy en byspecifying: preproc:myFeatureSet:point winpreproc:myFeatureSet:nrm win
New design of rhy f0 and rhy en output tables
All myAnalysisTier myRateTier myParameter column namesare renamed to myRateTier myParameter . The analysis tier name can be read from the tier column. By this theanalysis tier can be used as a grouping factor. Analysis/rate tier combinations across recording channels are notconsidered. Thus, cells in myRateTier myParameter columns are set to NA if myRateTier and the analysis tier of therespective row are not derived from the same channel. New fullpath switch for R code files
If set to 0, only the file stem is outputted. For 1, the full path is written. fsys:export:fullpath
Version 0.4.1, May 23rd, 2017New global segment register features • { bl,ml,tl,rng } rate : base-/mid-/topline/range rate (ST or Hz per sec) New local segment register features (extended feature set) • { bl,ml,tl,rng } { c0,c1,rate } : base-/mid-/topline/range intercept, slope, and rate (ST or Hz per sec) New option for rhythm analyses of energy contours • styl:rhy en:sig:scale – if set to 1, the signal is scaled to its maximum amplitude. This is suggested especiallyif signals of different recording conditions are to be compared. Version 0.4.2, July 18th, 2017New options for selected segment plotting and index printing • plot:browse:single plot:active • plot:browse:single plot:file i • plot:browse:single plot:channel i • plot:browse:single plot:segment i • plot:browse:verbose • plot:color Version 0.4.3, September 5th, 2017Time stamps added to boundary feature output
Columns t off and t on added to boundary features output. t off gives the end time of the pre-boundary segment, t on the start time of the post-boundary segment.
Version 0.5.1, October 12th, 2017 ummary table output Output of a csv summary file, that contains mean and variance values for each featureper file and channel. fsys:export:summary
Version 0.6.1, November 20th, 2017F0 reference value by any grouping variable
So far the f0 reference value for semitone conversion was calculatedseparately for each file and each channel. Now it can be calculated for each level of a grouping variable (most relevant:speaker ID) which can be read from the file name as to be specified in fsys:grp . preproc:base prct grp Version 0.6.2, November 23rd, 2017New features for sets glob and loc
Mean values for base-, mid-, topline, and range ( bl m, ml m, tl m, rng m ). Version 0.6.3, December 11th, 2017New features for sets rhy en and rhy f0
Number of peaks in absolute DCT spectrum ( n peak ), correspondingfrequency for coefficient with amplitude maximum ( f max ), and frequency difference for each selected prosodic eventrate to f max ( dgm ) and to the nearest peak ( dlm ). Version 0.7.1, January 10th, 2018New features for sets gnl en and gnl f0
F0 and energy quotients: mean(initPart)/ mean(nonInit) ( qi ), mean(finalPart)/mean(nonFinal) ( qf ), mean(initialPart)/ mean(finalPart) ( qb ), mean(maxPart)/ mean(nonMax) (qm). 2nd order poly-nomial fit through contour ( c0, c1, c2 ). Option: styl:gnl:win to determine length of initial and final part (in seconds). Version 0.7.3, January 18th, 2018New outlier definition and default
Now also Tukey’s fences outlier definition is supported. The default referencenow is set to mean instead of median . Version 0.7.4, January 19th, 2018Event tier support for global segments
In event tiers the time points are treated as right boundaries of globalsegments. See section 8.5.1.
Version 0.7.5, January 23rd, 2018Chunking update • default for augment:chunk:e rel changed to 0.1 • fallback RMS over entire channel content now calculated on absolute amplitude values greater than the median.This prevents too many extracted chunks in signals that consist mainly of speech pauses. Version 0.7.6, January 25th, 2018Chunking update
Silence margins can be set at chunk starts and ends. augment:chunk:margin
Version 0.7.7, January 30th, 2018Output tables column separator now can freely be chosen. fsys:export:sep
Version 0.7.9, April 5th, 2018 pectral balance calculation update • now can be carried out in spectral and time domain • time window in the center of the analysed segment and frequency window can be specified • for time domain analysis α can be specified as factor or as lower boundary frequency for pre-emphasis. • styl:gnl en:alpha is deprecated and replaced by styl:gnl en:sb:alphastyl:gnl en:sb Version 0.7.12, June 26th, 2018New position features of local segments in global ones • loc feature table now contains two more columns is init and is fin both describing the position of the localsegment in the global one. Version 0.8.1, July 3rd, 2018“voice” feature set for voice quality added • for shimmer and jitter features: mean values and 3rd order polynomial stylization to capture changes of thesevariables over time. New features jit, jit c [0 − , shim, shim c [0 − • to extract these features pulse files need to be extracted beforehand, e.g. by using the added Praat scripts extract pulse.praat and extract pulse stereo.praat navigate:do styl voicestyl:voicefsys:pulsefsys:voice:tier Version 0.8.5, August 2nd, 2018Boundary features now can also be calculated on f0 residual contour • useful e.g. to normalize subsequent accent groups by their respective register in an IP • Beware: not meaningful across IP boundaries. To be filtered by column is fin (see next paragraph) styl:registerstyl:bnd:residual
All feature sets: new columns marking initial and or final position of items within global segments andchunks • extension of update to version 0.7.12. • all feature tables now contain the columns is init , is fin , is init chunk , and is fin chunk . The former two describethe position of the current item (e.g. local segment, segment boundary etc.) in the global segment one. Thelatter two locate the current item within the underlying chunk. If no chunk tier is specified, is fin chunk and is init chunk give the item’s position in the entire channel. If no global segment’s tier is specified, is init and is fin both are set to no . Version 0.8.8, September 7th, 2018 • Scipy version ≥ Version 0.8.9, September 10th, 2018New boundary discontinuity features • for fitted line slope differences separately for each boundary window definition std, trend, win and registerrepresentation bl, ml, rng, tl : • post : f0 slope of post-boundary segment subtracted from slope of joint segment • pre : f0 slope of pre-boundary segment subtracted from slope of joint segment • prepost : f0 slope of post-boundary segment substracted from slope of pre-boundary segment • Features: myWindow myRegister sd post | sd pre | sd prepost Version 0.8.10, September 14th, 2018 ew boundary discontinuity features • for fitting error (RMSE) ratios and the Akaike information criterion increase of joint vs. single fits. • separately for each boundary window definition std, trend, win and register representation bl, ml, rng, tl : • single pre- and post-boundary vs. joint segment stylization • pre-boundary vs. first half of joint window ( pre ) • post-boundary vs. second half of joint segment ( post ) • Features: myWindow myRegister rmsR | rmsR post | rmsR pre , myWindow myRegister aicI | aicI post | aicI pre Version 0.8.11, September 24th, 2018More robust treatment of multiple accent per accent group cases • relevant if both fsys:loc:tier ag and fsys:loc:tier acc are specified • new option preproc:loc align with values skip (skipping AGs with more than 1 ACC), left (keeping first ACC),and right (keeping last ACC) Commented configuration file added • In the doc/ directory a new file copasul commented config.json.txt was added in which all options are commentedfor a quick overview.
Version 0.8.12, October 19th, 2018 bnd feature set extended • for each register representation 2 more features were added that could be of use e.g. for measuring downstep. • d o: onset difference • d m: difference of means 93 eferences [1] Belz, M. and
U. Reichel : Pitch characteristics of filled pauses in spontaneous speech . In
DiSS 2015 , Edinburgh,Scotland, 2015. .[2]
Beˇnuˇs, ˇS. , U. Reichel and
K. M´ady : Modelling accentual phrase intonation in Slovak and Hungarian . In
Complex Visibles Out There , vol. 4, pp. 677–689. Palack´y University, Olomouc, Czech Republic, 2014. .[3]
Beˇnuˇs, ˇS. , U. Reichel and
J. ˇSimko : F0 discontinuity as a marker of prosodic boundary strength in Lombardspeech . In
Proc. Interspeech , p. paper 953, Dresden, Germany, 2015. .[4]
Boersma, P. and
D. Weenink : PRAAT, a system for doing phonetics by computer . Techn. Rep., Institute ofPhonetic Sciences of the University of Amsterdam, 1999. 132–182.[5]
Fant, G. , A. Kruckenberg , J. Liljecrants and
S. Hertegard : Acoustic-phonetic studies of prominence inSwedish . TMH-QPSR, 41(2–3):1–52, 2000.[6]
Fuchs, S. and
U. Reichel : On the relation between pointing gestures and speech production in German countingout rhymes: Evidence from motion capture data and speech acoustics . In
Proc. P&P , Munich, Germany, 2016. .[7]
Heinrich, C. and
F. Schiel : The influence of alcoholic intoxication on the short-time energy function of speech .J. Acoust. Soc. Am., 135(5):2942–2951, 2014.[8]
Kalkhoff, A. : Corpus data and tools for the analysis of spoken Haitian Creole prosody . Poster, 2015. https://methodenromanistentag2015.files.wordpress.com/2015/10/kalkhoff.pdf .[9]
Kisler, T. , U. Reichel , F. Schiel , C. Draxler , B. Jackl and
N. P¨orner : BAS Speech Science WebServices - an update of current developments . In
Proc. LREC , pp. 3880–3885, Portoroˇz, Slovenia, 2016. .[10]
M´ady, K. and
U. Reichel : How to distinguish between self– and other-directed wh-questions? . In
Proc. P&P , Munich, Germany, 2016. .[11]
M´ady, K. , U. Reichel , A. Szalontai , A. Koh´ari and
A. Deme : Prosodic characteristics of infant-directedspeech as a function of multiple pregnancy . In
Proc. Speech Prosody , pp. 294–298, Poznan, Poland, 2018. .[12]
Mittelhammer, K. and
U. Reichel : Characterization and prediction of dialogue acts using prosodic features .In
Jokisch, O. (ed.):
Elektronische Sprachverarbeitung 2016 , vol. 81 of
Studientexte zur Sprachkommunika-tion , pp. 160–167. TUDpress, Dresden, Germany, 2016. .[13]
Pfitzinger, H. , S. Burger and
S. Heid : Syllable Detection in Read and Spontaneous Speech . In
Proc. ICSLP ,vol. 2, pp. 1261–1264, Philadelphia, 1996.[14]
Reichel, U. : Linking bottom-up intonation stylization to discourse structure . Computer, Speech,and Language, 28:1340–1365, 2014. .[15]
Reichel, U. : Personality prediction based on intonation stylization . In
Proc. ICPhS , p. paper 616, Glasgow, Scot-land, 2015. .[16]
Reichel, U. : Unsupervised extraction of prosodic structure . In
Elektronische Sprachsignalverarbeitung 2017 ,vol. 86 of
Studientexte zur Sprachkommunikation , pp. 262–269. TUDPress, Dresden, Germany, 2017. .[17]
Reichel, U. , ˇS. Beˇnuˇs and K. M´ady : Entrainment profiles: Comparison by gender, role, and feature set .Speech Communication, 100:46–57, 2018. https://doi.org/10.1016/j.specom.2018.04.009 .[18]
Reichel, U. and
J. Cole : Entrainment analysis of categorical intonation representations . In
Proc. P&P , Mu-nich, Germany, 2016. .[19]
Reichel, U. and
P. Lendvai : Veracity computing from lexical cues and perceived certainty trends . In
Proc.2nd Workshop on Noisy User-generated Text , Osaka, Japan, 2016. . 9420]
Reichel, U. and
K. M´ady : Parameterization of F0 register and discontinuity to predict prosodic boundarystrength in Hungarian spontaneous speech . In
Wagner, P. (ed.):
Elektronische Sprachsignalverarbeitung 2013 ,vol. 65 of
Studientexte zur Sprachkommunikation , pp. 223–230. TUDpress, Dresden, Germany, 2013. .[21]
Reichel, U. and
K. M´ady : Comparing parameterizations of pitch register and its discontinuities at prosodicboundaries for Hungarian . In
Proc. Interspeech 2014 , pp. 111–115, Singapore, 2014. .[22]
Reichel, U. , K. M´ady and ˇS. Beˇnuˇs : Parameterization of prosodic headedness . In
Proc. Interspeech , p.paper 929, Dresden, Germany, 2015. .[23]
Reichel, U. , K. M´ady and ˇS. Beˇnuˇs : Acoustic profiles for prosodic headedness and constituency . In
Proc.Speech Prosody , pp. 699–703, Poznan, Poland, 2018. .[24]
Reichel, U. , K. M´ady and
F. Kleber : How prominence and prosodic phrasing interact . In
Jokisch, O. (ed.):
Elektronische Sprachverarbeitung 2016 , vol. 81 of
Studientexte zur Sprachkommunikation , pp. 153–159.TUDpress, Dresden, Germany, 2016. .[25]
Schiel, F. : Automatic Phonetic Transcription of Non-Prompted Speech . In